April 27, 2015
I will give examples using 2 tools that alter a network interface's properties. The tools are: NetEm and Wondershaper.
NetEm is an enhancement of the Linux traffic control facilities that allows you to add delay, packet loss, duplication, corruption, reordering, and more to outgoing packets from a specified network interface. NetEm is controlled by the command line tool tc, which is part of the iproute2 package and included in most Linux distros.
Wondershaper is a traffic shaping script that allows you to throttle network transfers on a specified network interface. This is useful for testing latency when bandwidth is limited.
add fixed 250ms delay to all outgoing packets on Ethernet interface eth1:
$ sudo tc qdisc add dev eth1 root netem delay 250ms
turn it off:
$ sudo tc qdisc del dev eth1 root netem
limit bandwidth of eth1 interface to 256kbps upload and 128kbps download:
$ sudo wondershaper eth1 256 128
turn it off:
$ sudo wondershaper clear eth1
August 31, 2014
You do backup all of your important files, right?
TLDR: Dropbox + NAS + Local Storage + rsync = my backup strategy.
I have a lot of data that is important to me. Mostly: documents, code, images, music, and videos. I currently have about 1.2TB stored. This is an overview of how I store and backup my data.
rsync is a great file synchronization and file transfer program. Since being announced in 1996, it has become a standard Linux utility, included in all popular Linux distributions (and other Unix-like systems).
I have a Dropbox with an 11GB quota. This suffices for storing my documents, code, and images. It nicely syncs them to to all my computers. I can easily access this data from outside my LAN on any device I own.
Then I have a NAS as my main storage within my LAN. It is a 2-bay enclosure, with 2 2TB hard drives inside that are in a RAID configuration. I can easily access all of my media from any computer or device on my home network. I also have a secondary NAS attached to my network, purely for backup/disaster situations.
To complete the storage picture... on my main workstation, I have a 2TB hard disk mounted as a secondary data drive.
- 2 NAS servers mounted as drives on my main workstation (using SMB)
- Main workstation with 2TB Hard disk mounted internally
- Gigabit Ethernet LAN
So, that's the given hardware setup. I use a shell script to actually run my backup. It is initiated from the main workstation (scheduled from cron).
The script follows this workflow:
- local Dropbox gets rsync'ed to my primary NAS
- primary NAS gets rsync'ed to my local workstation's data drive
- primary NAS gets rsync'ed to my backup NAS
I have ~20k files (~1.2TB) in my archive. The first time running the backup job was slow (several hours), but subsequent backups are extremely fast. This is due to the rsync algorithm, delta encoding, and compression. If not many files changed since my last backup/sync, differential backups take only a few seconds or minutes.
Besides not having a offsite copy for disaster recovery, I like this system, and it makes me sleep well knowing my data is safe and recoverable.
January 11, 2014
I have a large image library of photos I've taken or downloaded over the years. They are from various cameras and sources, many with missing or incomplete Exif metadata.
This is problematic because some image viewing programs and galleries use metadata to sort images into timelines. For example, when I view my library in Dropbox Photos timeline, images with missing Exif date tags are not displayed.
The scipt will:
- recursively scan a directory tree for jpg and png files
- get each file's creation time
- convert it to a timestamp string
- set Exif.Image.DateTime tag to timestamp
- set Exif.Photo.DateTimeDigitized tag to timestamp
- set Exif.Photo.DateTimeOriginal tag to timestamp
- save file with modified metadata
- set file access and modified times to file creation time
* Note: it does modifications in-place.
November 17, 2013
TLDR: I made a cool version control visualization of all the Ubuntu Touch Core Apps.
The video: https://www.youtube.com/watch?v=nAmKAgRS0tw
* Warning: abrasive techno music
* To be watched in HD, preferably at maximum volume
Making Gource visualizations of complex software projects is awesome. I love seeing a VCS commit log come to life as blooming trees and swarming workers. Normally, I do a visualization video of a single repository. But in this case, I used a bash script to create a visualization of multiple source code repositories. I wanted to see the progress of the entire stack of Ubuntu Touch Core Apps (17 projects). Ubuntu Touch Core Apps is an umbrella project for all  of the core apps that are available in Ubuntu on mobile devices
The Ubuntu Touch Core Apps:
- Dropping Letters
- Evernote Online Accounts plugin
- QtDeclarative bindings for the Grilo media scanner
- Stock Ticker App
- Sudoku App
- Ubuntu Calculator App
- Ubuntu Calendar App
- Ubuntu Clock App
- Ubuntu Document Viewer App
- Ubuntu E-mail App
- Ubuntu Facebook App
- Ubuntu File Manager App
- Ubuntu Music App
- Ubuntu Phone Commons
- Ubuntu RSS Feed Reader App
- Ubuntu Terminal App
- Ubuntu Weather App
Making the visualization:
Assuming you have a bunch of source code repositories already branched/cloned locally, here is a general version of the script to generate visualization videos of multiple projects/repositories: https://gist.github.com/cgoldberg/7488521
The script I used to create the Ubuntu Touch Core Apps video: https://gist.github.com/cgoldberg/7516510
October 22, 2013
How do you install an older version of Python on Ubuntu without building it yourself?
The Python packages in the official Ubuntu archives generally don't go back all that far, but people might still need to develop and test against these old Python interpreters. Felix Krull maintains a PPA (package archive) of older Python versions that are easy to install on Ubuntu.
Currently supported Python releases: 2.4, 2.5, 2.6, 2.7, 3.1, 3.2, 3.3
Add the deadsnakes repository:
$ sudo add-apt-repository ppa:fkrull/deadsnakes
$ sudo apt-get update
Install an older version of Python:
$ sudo apt-get install python2.6 python2.6-dev
June 22, 2013
A spectrogram is a visual representation of the spectrum of frequencies in a sound sample.
more info: wikipedia spectrogram
Spectrogram code in Python, using Matplotlib: (source on GitHub)
"""Generate a Spectrogram image for a given WAV audio sample. A spectrogram, or sonogram, is a visual representation of the spectrum of frequencies in a sound. Horizontal axis represents time, Vertical axis represents frequency, and color represents amplitude. """ import os import wave import pylab def graph_spectrogram(wav_file): sound_info, frame_rate = get_wav_info(wav_file) pylab.figure(num=None, figsize=(19, 12)) pylab.subplot(111) pylab.title('spectrogram of %r' % wav_file) pylab.specgram(sound_info, Fs=frame_rate) pylab.savefig('spectrogram.png') def get_wav_info(wav_file): wav = wave.open(wav_file, 'r') frames = wav.readframes(-1) sound_info = pylab.fromstring(frames, 'Int16') frame_rate = wav.getframerate() wav.close() return sound_info, frame_rate if __name__ == '__main__': wav_file = 'sample.wav' graph_spectrogram(wav_file)
"""Generate a Spectrogram image for a given audio sample. Compatible with several audio formats: wav, flac, mp3, etc. Requires: https://code.google.com/p/timeside/ A spectrogram, or sonogram, is a visual representation of the spectrum of frequencies in a sound. Horizontal axis represents time, Vertical axis represents frequency, and color represents amplitude. """ import timeside audio_file = 'sample.wav' decoder = timeside.decoder.FileDecoder(audio_file) grapher = timeside.grapher.Spectrogram(width=1920, height=1080) (decoder | grapher).run() grapher.render('spectrogram.png')
happy audio hacking.
June 10, 2013
Add parallel testing to your unit test framework.
On a similar note, let's look at building concurrency into your own test framework built on Python's unittest.
Have a look at this module: concurrencytest
Say you have a 'TestSuite' of tests loaded. You could run them with the standard 'TextTestRunner' like this:
runner = unittest.TextTestRunner() runner.run(suite)
That would run the tests in your suite sequentially in a single process.
By adding the concurrencytest module, you can use a 'ConcurrentTestSuite' instead, by adding:
from concurrencytest import ConcurrentTestSuite, fork_for_tests concurrent_suite = ConcurrentTestSuite(suite, fork_for_tests(4)) runner.run(concurrent_suite)
That would run the same tests split across 4 processes (workers).
Note: this relies on 'os.fork()' which only works on Unix systems.
There's no way to understand this better than looking at some contrived examples!
This first example is totally unrealistic, but shows off concurrency perfectly. The test cases it loads each sleep for 0.5 seconds and then exit.
Loaded 50 test cases... Run tests sequentially: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 25.031s OK Run same tests across 50 processes: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 0.525s OK
Now another example that shows concurrency with CPU-bound test cases. The test cases it loads each calculate fibonacci of 31 (recursively!) and then exit. We can see how it performs on my 8-core machine (Core2 i7 quad, hyperthreaded).
Loaded 50 test cases... Run tests sequentially: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 21.941s OK Run same tests with 2 processes: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 11.081s OK Run same tests with 4 processes: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 5.862s OK Run same tests with 8 processes: .................................................. ---------------------------------------------------------------------- Ran 50 tests in 4.743s OK
June 9, 2013
To enable multiprocessing with N workers,
run nose with:
$ nosetests --processes=N
When writing tests in Python, I start with TestCase's derived from unittest.TestCase, and standard test discovery. When I need more complex test discovery/loading or output reports, I often use nose and its assortment of plugins as my test loader/runner.
One nice feature of nose is the multiprocess plugin. It allows you to run your tests suites concurrently rather than sequentially, spread across a number of worker processes. Running tests in parallel like this can potentially give you a large speedup in your test run times.
from the nose multiprocess docs:
"You can parallelize a test run across a configurable number of worker processes. While this can speed up CPU-bound test runs, it is mainly useful for IO-bound tests that spend most of their time waiting for data to arrive from someplace else and can benefit from parallelization."
Normally, you run tests from nose with:
To run the same tests split across 4 processes (workers), you would just do:
$ nosetests --processes=4
Assuming your tests are properly isolated, everything should run normally, and you can benefit from a speedup on a multiprocessor machine.
"Not all test suites will benefit from, or even operate correctly using, this plugin. For example, CPU-bound tests will run more slowly if you don't have multiple processors."
"But the biggest issue you will face is probably concurrency. Unless you have kept your tests as religiously pure unit tests, with no side-effects, no ordering issues, and no external dependencies, chances are you will experience odd, intermittent and unexplainable failures and errors when using this plugin. This doesn't necessarily mean the plugin is broken; it may mean that your test suite is not safe for concurrency."
April 1, 2013
Use Squeezebox, without buying a Squeezebox...
Recently, Logitech discontinued most Squeezebox streaming music players. However, the media server is Open Source, so it looks like some form of Logitech Media Server (LMS) will live on, no matter what Logitech eventually does with it.
I've been a user of Squeezebox network music player since it was released by SlimDevices (SliMP3/SlimServer), and throughout the transfer to Logitech. I've owned 3 Squeezebox models over the years... currently enjoying the Squeezebox Touch, with music streamed from Logitech Media Server.
It works flawlessly for streaming my own music collection (FLAC/MP3/etc), and streaming radio (Pandora/Slacker/Sirius/etc), to my HiFi. I use the digital (S/PDIF) outputs, and sometimes the DAC/analog (RCA) outputs.
Now... with the release of Squeezelite, you can build your own Squeezebox, or use an existing computer/laptop with digital output as a Squeezebox.
Squeezelite is a cross-platform, headless, LMS client that supports playback synchronization, gapless playback, direct streaming, and playback at various sampling rates. It runs on Linux using ALSA audio output and other platforms using PortAudio. It is aimed at supporting high quality audio.
I gave Squeezelite 1.0 a try on Ubuntu 12.04, with S/PDIF optical output to my DAC. It worked like a charm!
Squeezelite download (precompiled binaries for x86/amd64/arm):
Enjoy the music.
March 28, 2013
I had a bunch of FLAC (.flac) audio files together in a directory. They are from various sources and their metadata (tags) were somewhat incomplete or incorrect.
I managed to manually get all of the files standardized in "%Artist% - %Title%.flac" file name format. However, What I really wanted was to clear their metadata and just save "Artist" and "Title" tags, pulled from file names.
I looked at a few audio tagging tools in the Ubuntu repos, and came up short finding something simple that covered my needs. (I use Audio Tag Tool for MP3's, but it has no FLAC file support.)
So, I figured the easiest way to get this done was a quick Python script.
I grabbed Mutagen, a Python module to handle audio metadata with FLAC support.
This is essentially the task I was looking to do:
#!/usr/bin/env python import glob import os from mutagen.flac import FLAC for filename in glob.glob('*.flac'): artist, title = os.path.splitext(filename).split(' - ', 1) audio = FLAC(filename) audio.clear() audio['artist'] = artist audio['title'] = title audio.save()
It iterates over .flac files in the current directory, clearing the metadata and rewriting only the artist/title tags based on each file name.
I created a repository with a slightly more full-featured version, used to re-tag single FLAC files: