Wednesday, November 18, 2009

Offsite Sync and Backup

I have a large amount of music (~30gb) and photo files (~300gb). I back them up to my Time Capsule but that wouldn't protect me if my house burnt down. (Photo files from my Pentax K7 are 20mb each and I might take 10,000 in a year - that's 200gb added per year.)

So for an off-site backup, and so I can access them I keep a "mirror" copy on my computer at work. Currently, I update this mirror manually periodically, by copying new files to a portable hard drive and carrying that to work. But this is an awkward solution, and I don't update as often as I should.

There are a variety of backup and sync products out there, but none of them seem to handle this scenario.

I have been using Dropbox to sync my jSuneido files between home and work and laptop and it works really well. But their biggest account is 100gb.

Google's storage is getting cheaper, but Picasa won't let me store my big DNG (raw) photo files.

Jungle Disk has no limit storage, but at $.15 per gb that's roughly $50 per month, which isn't cheap.

Apart from the cost, the big problem with online storage is that uploading 300gb takes a long time. I signed up for Jungle Disk but it estimated 60 days to upload my files! Obviously, after that I'd only have to upload new files, but even a few thousand photos from a long holiday will take days or weeks to upload. Maybe I need a faster internet connection!

CrashPlan has a really interesting approach of letting you backup to other machines, either your own or your friends. This avoids the cost of storage. The upload speed may be better since the machines are local and aren't servicing other users. But CrashPlan doesn't sync, so I'd have an off-site backup, but I couldn't access the files (without restoring them). Another problem with CrashPlan is it requires both machines to be turned on at the same time. But to be environmentally friendly, I try to turn off my computers when I'm not using them.

Note: Jungle Disk only recently added sync and from their forum it sounds like it has problems.

A Proposed Solution

Here is an idea for a new service.

I don't really need a copy of my files in the cloud. If I could sync between my home and work computers that would be sufficient. I don't really want to be paying $50 per month just to store my files in the cloud.

All I really need to store in the cloud is a "summary" of my files (e.g. file names, dates, sizes, maybe hashes) plus any new or modified files. Once the files have propagated to my computers they can be removed from the cloud. If you used a clever hash scheme you keep even do partial updates of large files. (Although for music and photos this isn't that important since the files don't usually change.)

This would require far less storage than keeping a complete copy in the cloud.

You'd still have the problem of the initial syncing. But that could either be done by a different method e.g. a portable hard drive like I've been using, or by requiring both computers to be running at the same time for the initial sync. This is similar to Amazon allowing you to send them physical media to load data into S3. And if you had a big addition of files (like the photos from a long holiday) you could use an alternate method to move them around, and the sync could recognize that you already had the same files on each computer.

The businesses that make money from selling storage probably wouldn't be crazy about this idea, but it seems like a natural addition to CrashPlan since they aren't charging for storage, and charging for the sync service would be additional revenue. And presumably it could be cheap since the storage and bandwidth needs are minimal. (The actual data would be transferred peer to peer.)

You could even borrow some ideas from Git - their "tree" of hash values would work well for this, and also provides security and error checking.

If I had some spare time it would be a fun project. If anyone out there wants to implement it, you can count me in as your first customer :-)

No comments: