philippe::niquille | regular niche market thoughts

Amazon S3 tripple encrypted true Rsync backup

Jun 15th 2008
2 Comments
respond
trackback
See a workaround for whitespaces in directory names here.

As a followup for my local rsync script compilation I now present a small python script which takes care of rsyncing your folders to your S3 buckets and taking care of local encryption. I am really fond of the idea of “storing your data in the cloud”.

You will need:

1) An Amazon AWS Account with S3 enabled, sign up here. Amazon will charge you 0.15US$ per GB stored per month.

2) An Rsync Account from s3rsync.com. They built a special rsync setup in an Amazon EC2 cloud in order to allow real partial file syncing. The problem with direct rsync-S3 connections is that S3 only allows you to PUT entire objects (files). They charge you 19US$ per 380 rsync-hours which is 0.05US$ per hour. Remember, only transfers after huge changes in your data structure are time intensive (and of course slow connections). See their FAQ.

You will also need to install some tools on your system in order for the encryption to work properly. The special issue with rsync encryption is, that normal encryption usually messes up your entire file structure and therefore the entire file needs to be transfered at all times. Rsync would not be of much use in such a case. Although Murk weakens encryption by dividing a file in small blocks and encrypting each one of them, it seems to be the best transfer-efficient solution.

Be sure to have your developer tools installed and first get the following two source packages:

BZip2, compile with ./configure && make && make install

OpenSSL./configure darwin-i386-cc && make

Then get the Murk sources and compile them since you now got the relevant crypto and compression libraries.

- Murk: ./configure -L ../openssl-0.9.8h && make && sudo cp ./murk /usr/bin/

Next create a new cipher key so Murk can de- and encrypt in standalone mode. I suggest using AES-256: murk -a <your_key> -c aes-256

You would now be able to compress a folder, e.g. test and encrypt it in an rsync friendly manner by doing: tar -cf - test | murk -n -c aes-256 -k /Users/fluppel/.murk -v -o rsync.tar.zm <your_key> However I learned that piping the tar output to murk produces corrupt files, at least with my setup. Try using a tmp file by issuing the following commands: tar -cf no_cipher.tar ../test && murk -n -c aes-256 -k /Users/fluppel/.murk -o rsync.tar.zm <your_key> no_cipher.tar && rm no_cipher.tar

The *.tar.zm file can be decrypted by using: murk -n -k /Users/fluppel/.murk -d <your_key> rsync.tar.zm

This technique now allows you to encrypt large directories and rsync them to an S3 bucket by using the s3rsync.com service. But before issuing the command, be sure to save the supplied SSH private key to a local path of yours. Then issue: rsync -v –stats -e “ssh -i /Users/fluppel/.ssh/s3rsync_dsa.priv” -az /Users/fluppel/Desktop/Inbox/test_due/ <your_user>@farm.s3rsync.com:%%<your_bucket>%%<your_AWS_id>%%<your_AWS_secret_key>

The service then stores the transfered contents again as a tar file in the corresponding bucket. And wait, before you rush in you’ll also need to create some buckets in your Amazon S3 Account. I used a service such as s3interface.com, just be sure to change your secret AWS key after creating your buckets. You never know who gets hold of your key by using third party services.

Now the nice thing about this solution is, that your backuped directories reside in a bucket somewhere in an Amazon datacenter in the US. You can access the *.tar file anytime by using Cyberduck, JungleDisk or s3fs. You can then just download the file, uncompress it and decrypt its contents by using the murk command as described above. Remember, now you will pay transfer fees to Amazon (whereas by using the rsync service you do not pay traffic but the hours used). I get around 250 - 350 kbits/s downloading from a US S3 bucket to a european location which is not so bad.

Now this all seems pretty complicated. I therefore compiled a small python script which actually takes care of encrypting certain parts of your data tree and rsyncs it to your buckets. I personally do not encrypt all my data for two reasons:

1) Your data is being transfered over an SSH connection and is already being stored in an encrypted bucket by Amazon.

2) Encrypting my whole documents folder and so forth always uses a great amount of CPU and time. I therefore limit encryption to more or less sensitive data. In the end it really all comes down to how much you trust your gateways and hosters.

Feel free to use my python hacking. I know its not perfect but I think it will do its job. Try calling it from crontab or whatever service you like.

Get v0.1 here.

You will also need growlnotify installed which should be part of the default Growl installation.

There is one issue with the script though: The rsync exclude pattern does not seem to work properly with folder names containing whitespaces.

2 Comments

  1. Other option which give more flexibility it to “murk” file by file* to create temporary encrypted directory, and then Rsync this directory.

    It this case you will be able to restore single file faster.

    You can restore single file or single directory trough Rsync include/exclude using s3rsync.com service so you don’t need to download the entire archive.

    * I believe it can be done in single command by piping “find” to “murk”.

  2. sehe

    I believe the more-or-less-accepted-common-fix for spaces in excludes is to use ‘?’-marks instead of spaces. Of course this implies globbing-file lists and it could not do what you expect depending on how clever/confused your directory naming conventions are…

Incoming Links

Leave a Reply