August 31, 2014

rsync for file sync and backup - storing my peronal media

You do backup all of your important files, right?

TLDR: Dropbox + NAS + Local Storage + rsync = my backup strategy.

I have a lot of data that is important to me. Mostly: documents, code, images, music, and videos. I currently have about 1.2TB stored. This is an overview of how I store and backup my data.


About rsync:

rsync is a great file synchronization and file transfer program. Since being announced in 1996, it has become a standard Linux utility, included in all popular Linux distributions (and other Unix-like systems).


I have a Dropbox with an 11GB quota. This suffices for storing my documents, code, and images. It nicely syncs them to to all my computers. I can easily access this data from outside my LAN on any device I own.

Then I have a NAS as my main storage within my LAN. It is a 2-bay enclosure, with 2 2TB hard drives inside that are in a RAID configuration. I can easily access all of my media from any computer or device on my home network. I also have a secondary NAS attached to my network, purely for backup/disaster situations.

To complete the storage picture... on my main workstation, I have a 2TB hard disk mounted as a secondary data drive.

Storage Overview:

  • Dropbox
  • 2 NAS servers mounted as drives on my main workstation (using SMB)
  • Main workstation with 2TB Hard disk mounted internally
  • Gigabit Ethernet LAN

So, that's the given hardware setup. I use a shell script to actually run my backup. It is initiated from the main workstation (scheduled from cron).

The script follows this workflow:

  • local Dropbox gets rsync'ed to my primary NAS
  • primary NAS gets rsync'ed to my local workstation's data drive
  • primary NAS gets rsync'ed to my backup NAS

I have ~20k files (~1.2TB) in my archive. The first time running the backup job was slow (several hours), but subsequent backups are extremely fast. This is due to the rsync algorithm, delta encoding, and compression. If not many files changed since my last backup/sync, differential backups take only a few seconds or minutes.

Besides not having a offsite copy for disaster recovery, I like this system, and it makes me sleep well knowing my data is safe and recoverable.

Any flaws?