April 27, 2015

Linux - Simulating Degraded Network Conditions

When testing an application or service, it can be very useful to simulate degraded network conditions. This allows you to see how they might perform for real users. Testing on a controlled LAN is not always realistic, since the LAN has much different network properties than a real user would experience over a WAN.

I will give examples using 2 tools that alter a network interface's properties. The tools are: NetEm and Wondershaper.

NetEm is an enhancement of the Linux traffic control facilities that allows you to add delay, packet loss, duplication, corruption, reordering, and more to outgoing packets from a specified network interface. NetEm is controlled by the command line tool tc, which is part of the iproute2 package and included in most Linux distros.

Wondershaper is a traffic shaping script that allows you to throttle network transfers on a specified network interface. This is useful for testing latency when bandwidth is limited.


add fixed 250ms delay to all outgoing packets on Ethernet interface eth1:
  $ sudo tc qdisc add dev eth1 root netem delay 250ms
turn it off:
  $ sudo tc qdisc del dev eth1 root netem

limit bandwidth of eth1 interface to 256kbps upload and 128kbps download:
  $ sudo wondershaper eth1 256 128
turn it off:
  $ sudo wondershaper clear eth1

August 31, 2014

rsync for file sync and backup - storing my peronal media

You do backup all of your important files, right?

TLDR: Dropbox + NAS + Local Storage + rsync = my backup strategy.

I have a lot of data that is important to me. Mostly: documents, code, images, music, and videos. I currently have about 1.2TB stored. This is an overview of how I store and backup my data.

About rsync:

rsync is a great file synchronization and file transfer program. Since being announced in 1996, it has become a standard Linux utility, included in all popular Linux distributions (and other Unix-like systems).

I have a Dropbox with an 11GB quota. This suffices for storing my documents, code, and images. It nicely syncs them to to all my computers. I can easily access this data from outside my LAN on any device I own.

Then I have a NAS as my main storage within my LAN. It is a 2-bay enclosure, with 2 2TB hard drives inside that are in a RAID configuration. I can easily access all of my media from any computer or device on my home network. I also have a secondary NAS attached to my network, purely for backup/disaster situations.

To complete the storage picture... on my main workstation, I have a 2TB hard disk mounted as a secondary data drive.

Storage Overview:

  • Dropbox
  • 2 NAS servers mounted as drives on my main workstation (using SMB)
  • Main workstation with 2TB Hard disk mounted internally
  • Gigabit Ethernet LAN

So, that's the given hardware setup. I use a shell script to actually run my backup. It is initiated from the main workstation (scheduled from cron).

The script follows this workflow:

  • local Dropbox gets rsync'ed to my primary NAS
  • primary NAS gets rsync'ed to my local workstation's data drive
  • primary NAS gets rsync'ed to my backup NAS

I have ~20k files (~1.2TB) in my archive. The first time running the backup job was slow (several hours), but subsequent backups are extremely fast. This is due to the rsync algorithm, delta encoding, and compression. If not many files changed since my last backup/sync, differential backups take only a few seconds or minutes.

Besides not having a offsite copy for disaster recovery, I like this system, and it makes me sleep well knowing my data is safe and recoverable.

Any flaws?

January 11, 2014

Python - Fixing My Photo Library Dates (Exif Metadata)

I have a large image library of photos I've taken or downloaded over the years. They are from various cameras and sources, many with missing or incomplete Exif metadata.

This is problematic because some image viewing programs and galleries use metadata to sort images into timelines. For example, when I view my library in Dropbox Photos timeline, images with missing Exif date tags are not displayed.

To remedy this, I wrote a Python script to fix the dates in my photo library. It uses gexiv2, which is a wrapper around the Exiv2 photo metadata library.

The scipt will:

  • recursively scan a directory tree for jpg and png files
  • get each file's creation time
  • convert it to a timestamp string
  • set Exif.Image.DateTime tag to timestamp
  • set Exif.Photo.DateTimeDigitized tag to timestamp
  • set Exif.Photo.DateTimeOriginal tag to timestamp
  • save file with modified metadata
  • set file access and modified times to file creation time

* Note: it does modifications in-place.

The Code:

November 17, 2013

Gource Visualization of Ubuntu Touch Core Apps Development

TLDR: I made a cool version control visualization of all the Ubuntu Touch Core Apps.

The video: https://www.youtube.com/watch?v=nAmKAgRS0tw

* Warning: abrasive techno music
* To be watched in HD, preferably at maximum volume

Making Gource visualizations of complex software projects is awesome. I love seeing a VCS commit log come to life as blooming trees and swarming workers. Normally, I do a visualization video of a single repository. But in this case, I used a bash script to create a visualization of multiple source code repositories. I wanted to see the progress of the entire stack of Ubuntu Touch Core Apps (17 projects). Ubuntu Touch Core Apps is an umbrella project for all [17] of the core apps that are available in Ubuntu on mobile devices

The Ubuntu Touch Core Apps:

  • Dropping Letters
  • Evernote Online Accounts plugin
  • QtDeclarative bindings for the Grilo media scanner
  • Stock Ticker App
  • Sudoku App
  • Ubuntu Calculator App
  • Ubuntu Calendar App
  • Ubuntu Clock App
  • Ubuntu Document Viewer App
  • Ubuntu E-mail App
  • Ubuntu Facebook App
  • Ubuntu File Manager App
  • Ubuntu Music App
  • Ubuntu Phone Commons
  • Ubuntu RSS Feed Reader App
  • Ubuntu Terminal App
  • Ubuntu Weather App

Making the visualization:

Assuming you have a bunch of source code repositories already branched/cloned locally, here is a general version of the script to generate visualization videos of multiple projects/repositories: https://gist.github.com/cgoldberg/7488521

The script I used to create the Ubuntu Touch Core Apps video: https://gist.github.com/cgoldberg/7516510

October 22, 2013

deadsnakes - Using Old Versions of Python on Ubuntu

How do you install an older version of Python on Ubuntu without building it yourself?

The Python packages in the official Ubuntu archives generally don't go back all that far, but people might still need to develop and test against these old Python interpreters. Felix Krull maintains a PPA (package archive) of older Python versions that are easy to install on Ubuntu.

see: https://launchpad.net/~fkrull/+archive/deadsnakes

Currently supported Python releases: 2.4, 2.5, 2.6, 2.7, 3.1, 3.2, 3.3


Add the deadsnakes repository:

$ sudo add-apt-repository ppa:fkrull/deadsnakes

Run Update:

$ sudo apt-get update

Install an older version of Python:

$ sudo apt-get install python2.6 python2.6-dev

June 22, 2013

Generating Audio Spectrograms in Python

A spectrogram is a visual representation of the spectrum of frequencies in a sound sample.

more info: wikipedia spectrogram

Spectrogram code in Python, using Matplotlib:
(source on GitHub)

"""Generate a Spectrogram image for a given WAV audio sample.

A spectrogram, or sonogram, is a visual representation of the spectrum
of frequencies in a sound.  Horizontal axis represents time, Vertical axis
represents frequency, and color represents amplitude.

import os
import wave

import pylab

def graph_spectrogram(wav_file):
    sound_info, frame_rate = get_wav_info(wav_file)
    pylab.figure(num=None, figsize=(19, 12))
    pylab.title('spectrogram of %r' % wav_file)
    pylab.specgram(sound_info, Fs=frame_rate)

def get_wav_info(wav_file):
    wav = wave.open(wav_file, 'r')
    frames = wav.readframes(-1)
    sound_info = pylab.fromstring(frames, 'Int16')
    frame_rate = wav.getframerate()
    return sound_info, frame_rate

if __name__ == '__main__':
    wav_file = 'sample.wav'

Spectrogram code in Python, using timeside:
(source on GitHub)

"""Generate a Spectrogram image for a given audio sample.

Compatible with several audio formats: wav, flac, mp3, etc.
Requires: https://code.google.com/p/timeside/

A spectrogram, or sonogram, is a visual representation of the spectrum
of frequencies in a sound.  Horizontal axis represents time, Vertical axis
represents frequency, and color represents amplitude.

import timeside

audio_file = 'sample.wav'

decoder = timeside.decoder.FileDecoder(audio_file)
grapher = timeside.grapher.Spectrogram(width=1920, height=1080)
(decoder | grapher).run()

happy audio hacking.

June 10, 2013

Python - concurrencytest: Running Concurrent Tests

Add parallel testing to your unit test framework.

In my previous post, I described running concurrent tests using nose as a loader and runner.

On a similar note, let's look at building concurrency into your own test framework built on Python's unittest.

Have a look at this module: concurrencytest

(Thanks to bits and concepts taken from testtools and bzrlib)

An Example:

Say you have a 'TestSuite' of tests loaded. You could run them with the standard 'TextTestRunner' like this:

runner = unittest.TextTestRunner()

That would run the tests in your suite sequentially in a single process.

By adding the concurrencytest module, you can use a 'ConcurrentTestSuite' instead, by adding:

from concurrencytest import ConcurrentTestSuite, fork_for_tests

concurrent_suite = ConcurrentTestSuite(suite, fork_for_tests(4))

That would run the same tests split across 4 processes (workers).

Note: this relies on 'os.fork()' which only works on Unix systems.

There's no way to understand this better than looking at some contrived examples!

This first example is totally unrealistic, but shows off concurrency perfectly. The test cases it loads each sleep for 0.5 seconds and then exit.

The Code:


Loaded 50 test cases...

Run tests sequentially:
Ran 50 tests in 25.031s


Run same tests across 50 processes:
Ran 50 tests in 0.525s



Now another example that shows concurrency with CPU-bound test cases. The test cases it loads each calculate fibonacci of 31 (recursively!) and then exit. We can see how it performs on my 8-core machine (Core2 i7 quad, hyperthreaded).

The Code:


Loaded 50 test cases...

Run tests sequentially:
Ran 50 tests in 21.941s


Run same tests with 2 processes:
Ran 50 tests in 11.081s


Run same tests with 4 processes:
Ran 50 tests in 5.862s


Run same tests with 8 processes:
Ran 50 tests in 4.743s


happy hacking.

June 9, 2013

Python - Nose: Running Concurrent Tests

To enable multiprocessing with N workers,
run nose with:

$ nosetests --processes=N

When writing tests in Python, I start with TestCase's derived from unittest.TestCase, and standard test discovery. When I need more complex test discovery/loading or output reports, I often use nose and its assortment of plugins as my test loader/runner.

One nice feature of nose is the multiprocess plugin. It allows you to run your tests suites concurrently rather than sequentially, spread across a number of worker processes. Running tests in parallel like this can potentially give you a large speedup in your test run times.

from the nose multiprocess docs:

"You can parallelize a test run across a configurable number of worker processes. While this can speed up CPU-bound test runs, it is mainly useful for IO-bound tests that spend most of their time waiting for data to arrive from someplace else and can benefit from parallelization."

Normally, you run tests from nose with:

$ nosetests

To run the same tests split across 4 processes (workers), you would just do:

$ nosetests --processes=4

Assuming your tests are properly isolated, everything should run normally, and you can benefit from a speedup on a multiprocessor machine.

However, Beware.

"Not all test suites will benefit from, or even operate correctly using, this plugin. For example, CPU-bound tests will run more slowly if you don't have multiple processors."
"But the biggest issue you will face is probably concurrency. Unless you have kept your tests as religiously pure unit tests, with no side-effects, no ordering issues, and no external dependencies, chances are you will experience odd, intermittent and unexplainable failures and errors when using this plugin. This doesn't necessarily mean the plugin is broken; it may mean that your test suite is not safe for concurrency."