August 31, 2014

rsync for file sync and backup - storing my peronal media

You do backup all of your important files, right?

TLDR: Dropbox + NAS + Local Storage + rsync = my backup strategy.

I have a lot of data that is important to me. Mostly: documents, code, images, music, and videos. I currently have about 1.2TB stored. This is an overview of how I store and backup my data.


About rsync:

rsync is a great file synchronization and file transfer program. Since being announced in 1996, it has become a standard Linux utility, included in all popular Linux distributions (and other Unix-like systems).


I have a Dropbox with an 11GB quota. This suffices for storing my documents, code, and images. It nicely syncs them to to all my computers. I can easily access this data from outside my LAN on any device I own.

Then I have a NAS as my main storage within my LAN. It is a 2-bay enclosure, with 2 2TB hard drives inside that are in a RAID configuration. I can easily access all of my media from any computer or device on my home network. I also have a secondary NAS attached to my network, purely for backup/disaster situations.

To complete the storage picture... on my main workstation, I have a 2TB hard disk mounted as a secondary data drive.

Storage Overview:

  • Dropbox
  • 2 NAS servers mounted as drives on my main workstation (using SMB)
  • Main workstation with 2TB Hard disk mounted internally
  • Gigabit Ethernet LAN

So, that's the given hardware setup. I use a shell script to actually run my backup. It is initiated from the main workstation (scheduled from cron).

The script follows this workflow:

  • local Dropbox gets rsync'ed to my primary NAS
  • primary NAS gets rsync'ed to my local workstation's data drive
  • primary NAS gets rsync'ed to my backup NAS

I have ~20k files (~1.2TB) in my archive. The first time running the backup job was slow (several hours), but subsequent backups are extremely fast. This is due to the rsync algorithm, delta encoding, and compression. If not many files changed since my last backup/sync, differential backups take only a few seconds or minutes.

Besides not having a offsite copy for disaster recovery, I like this system, and it makes me sleep well knowing my data is safe and recoverable.

Any flaws?

January 11, 2014

Python - Fixing My Photo Library Dates (Exif Metadata)

I have a large image library of photos I've taken or downloaded over the years. They are from various cameras and sources, many with missing or incomplete Exif metadata.

This is problematic because some image viewing programs and galleries use metadata to sort images into timelines. For example, when I view my library in Dropbox Photos timeline, images with missing Exif date tags are not displayed.

To remedy this, I wrote a Python script to fix the dates in my photo library. It uses gexiv2, which is a wrapper around the Exiv2 photo metadata library.

The scipt will:

  • recursively scan a directory tree for jpg and png files
  • get each file's creation time
  • convert it to a timestamp string
  • set Exif.Image.DateTime tag to timestamp
  • set Exif.Photo.DateTimeDigitized tag to timestamp
  • set Exif.Photo.DateTimeOriginal tag to timestamp
  • save file with modified metadata
  • set file access and modified times to file creation time

* Note: it does modifications in-place.

The Code: