Hosting data in the cloud has many advantages, but it usually has the distinct disadvantage of being beholden to a single vendor. Outages do happen, even with S3 (see July 2008). Until now, it was a huge pain–and a significant cost–to establish any kind of mirroring between S3 and any another major cloud storage vendor. This is because you need to compare the file lists at both locations, and then copy whatever files are needed, which means that you need to have a server that is at least partly dedicated to this mirroring task.
However, as I have intimated, mirroring between Amazon S3 and Rackspace Cloud Files (“CF”) is now much easier, and very cheap. Read on for the gory details.
To mirror between S3 and CF, one needs a running server, but only for as long as it takes to do the comparing and copying. Also, the server doesn’t need a lot of power, since almost all of what it will do is transfer files from one service to another. In fact, the cheapest Amazon EC2 server, the t1.micro, is perfect for this job, as is spot pricing for instances. This means that the server costs can reliably be as low as 1 cent per hour. So, in line with the philosophy I articulated in my “Servers are Software” article, I have created a model “multi-cloud-mirroring” server that launches periodically, runs until the files are synchronized between an S3 bucket and a CF container, and then self-terminates.
Ordinarily, this type of flexible, auto-launching, configuring, and terminating server would be time-consuming to create and would require another server of mine to launch it, but RightScale‘s advanced cloud management platform makes it very easy, and removes the requirement of having another server for launch. I have created a public ServerTemplate for fellow RightScale users called the “Multi-Cloud Mirroring Manager“. You can use that template in a RightScale autoscaling server array so that the server launches periodically (say, twice per day), and quits once done. See my step-by-step tutorial on setting up the autoscaling array.
In addition to RightScale, I have leaned heavily on Python and both the boto and python-cloudfiles libraries. I wrote a distinct script to handle mirroring from S3 to CF and vice versa, called multi-cloud-mirror, and I created a separate google code page for it. Please feel free to do whatever you would like with the script; I have released it under the MPL.
Finally, let me explain my estimate of $7.00/month. Assuming that you are already paying for storage on Amazon S3, and you will copy 5GB of data each month to Cloud Files, saving a rotating 6 months worth of data on Cloud Files, checking twice per day, and running two hours each time at the 0.01/hour spot pricing rate. Based upon an average file size of 10MB, this would result in less than an additional $7.00/month in both S3 and CF storage, requests, and bandwidth costs.