Sunday, March 25, 2007

Breaking up large files for transport to S3

I decided to try and place a copy of my windows XP virtual image on S3 for backup. I quickly realized that JungleDisk has a file size limitation and so does S3. Probably not a bad idea since transferring a large file from home or office will most likely be limited to 384k or 768k up. For a 30G file, this would take a while. If we break the file up and something happens to the connection, it will be much easier to start over the next time.

I looked for an easy way to split the files up and stumbled across SPLIT. This will take an input file and break it into pieces based on info provided. See the MAN page for details.
Here's the command I used:
split -b 102400k winxp.hdd.aa winxp.hdd.part

This broke my file up into 100M chunks. Now I can load them to S3.

If you ever need to re-construct the files, the cat command will work well. You'll need to CAT them together in the appropriate order but that shouldn't be difficult given the way we split them up. For example, to create the original image, you would simple type
cat winxp.hdd.part.aa winxp.hdd.part.ab ... winxp.hdd.part.az

Could this be any easier?

Monday, March 12, 2007

More EC2 Progress

I was able to get a few more things installed. After finishing up with SWIG and the ruby bindings so I could get Collaboa running, things started acting funny. Turns out I was running out of space on the main volume. It was then I realized that I needed to move some directories to the /mnt volume.

I moved all the usr/local stuff directories, the /var directory and with it the MYSQL files. I then went back and uninstalled a few of the items that were already installed via YUM since it appears to dump everything into the /usr directory. Since I complied Apache 2.2 myself, everything was in /usr/local/apache2.

This also lead me to further enhancements with the autorun script. It now does a bit more than simply register the domain name with ZoneEdit. I now sync the /mnt directory with S3 and then pull it back down on boot. Therefore, I'll need to create some symbolic links so everything looks good and then restart some of the services that tried to start before my script was run.

Haven't tried a reboot yet, but it's not too difficult, it will lengthen the start up time though since it's pulling about 200M from S3.

I think I'll also be running ec2-bundle-vol possibly daily to have a hot backup ready. Not sure on this one yet.

Saturday, March 10, 2007

Editing Cron through a script

This one is pretty easy ...

Basically, copy the current file to a temporary file. Then append the text from some_file. Finally, have crontab mv the file back to it's proper location.

crontab -l > /tmp/file
cat some_file >> /tmp/file
crontab /tmp/file

EC2 Progress

I finally managed to create a baseline image on EC2. I needed to have Apache be able to serve various sites as well as Subversion. That took a while due to my lack of understanding the Apache config files. Basically, when a site is running in Virtual mode meaning more than one site on a single IP address, all the baseline configurations are ignored for DocumentRoot and ServerName. That took a while to work through.

Next step will be to have the image connect to S3 on boot up and pull down a set of scripts that will perform various tasks such as setting the domain name properly since we don't get a static IP. We'll also need to mod the cron files to make sure all the data is backed up on a regular schedule. It will be set up in such a way that based on the URL passed in when the instance is started, it can perform various tasks. This way we can use the same mechanisms across multiple instances for different needs.