Corey Goldberg

January 25, 2011

Going to PyCon 2011!

This year I am attending my first PyCon (the annual Python community conference).

I will be in Atlanta: March 10-13.

If anyone is interested in meeting up or collaborating while I'm there, get in touch:

Twitter: @cgoldberg
Homepage/Info: goldb.org

December 1, 2010

Python - Search a Local or Remote Splunk Server

Some basic instructions for searching Splunk from Python...

First, you must install Splunk on the machine you will run the Python script from. Splunk installs its own Python interpreter that you can use to run your code. I am using Splunk 4.14, which includes Python 2.6.

(It looks like you can set some environment variables and install a few Python dependencies along with the Python SDK and get this going "outside" of Splunk. But the easiest option is just to run on their interpreter).

To run your own Python scripts on Splunk's interpreter:
- save script into Splunk's "bin" directory
(usually "/opt/splunk/bin" or "C:\Program Files\Splunk\bin")
- go to the "bin" directory and run:
splunk cmd python your_script.py

So...
What goes in your Python code?

First, import the modules you will need:

 
import time
import splunk.auth
import splunk.search

Next, authenticate and get a session key.

For the local splunk host:

 
key = splunk.auth.getSessionKey('user', 'password')

If you are going to search a remote splunk host, you must authenticate against it by adding the "hostPath" parameter:

 
key = splunk.auth.getSessionKey('user', 'password', hostPath='https://mysplunk:8089')

Tips:
- use https, even if you are not using ssl in your splunk web interface
- 'admin' user doesn't seem to work. user a normal user/password.

Next, submit a search job.

For a local search:

 
job = splunk.search.dispatch('search index="os" *', earliest_time='-15m')

For a remote search, use the "hostPath" parameter again:

 
job = splunk.search.dispatch('search index="os" *', earliest_time='-15m', hostPath='https://mysplunk:8089')

print the job details:

 
print job

wait for the results:

 
while not job.isDone:
    time.sleep(.25)

print results

 
for result in job.results:
    print result

Altogether in a Python script:

 
#!/usr/bin/env python
# Corey Goldberg - 2010
# 
#  search a remote splunk server
#
#  instructions:
#   - save script into splunk's "bin" directory
#     (usually "/opt/splunk/bin" or "C:\Program Files\Splunk\bin")
#   - go to the "bin" directory and run: 
#     $ splunk cmd python my_script.py
#



import time
import splunk.auth
import splunk.search



SPLUNK_SERVER = '192.168.12.173'
USER_NAME = 'foo'
PASSWORD = 'secret'
SEARCH_STRING = 'search index="os"'
EARLIEST_TIME = '-15m'



def main():
    # authenticate
    key = splunk.auth.getSessionKey(USER_NAME, PASSWORD, hostPath='https://%s:8089' % SPLUNK_SERVER)
    print 'auth key:\n%s' % key
    
    # submit a search job
    job = splunk.search.dispatch(SEARCH_STRING, earliest_time=EARLIEST_TIME, hostPath='https://%s:8089' % SPLUNK_SERVER)
    print 'job details:\n%s' % job

    # wait for results
    while not job.isDone:
        time.sleep(.25)
    
    print 'results:'    
    for result in job.results:
        print result
          
          

if __name__== '__main__':
    main()

October 8, 2010

Python - Shorten a URL Using Google's Shortening Service (goo.gl)

Using Python to shorten a URL with Google's shortening service (goo.gl):

 
#!/usr/bin/python 
#  Corey Goldberg - 2010

import json
import urllib
import urllib2


def shorten(url):
    gurl = 'http://goo.gl/api/url?url=%s' % urllib.quote(url)
    req = urllib2.Request(gurl, data='')
    req.add_header('User-Agent', 'toolbar')
    results = json.load(urllib2.urlopen(req))
    return results['short_url']


if __name__ == '__main__':
    print shorten('http://www.goldb.org/')
    print shorten('www.yahoo.com')

You give it a URL to shorten: shorten('http://www.goldb.org/long_url')

... and it returns a shortened URL for you: 'http://goo.gl/jh4W'

September 24, 2010

Python - Linux: Parse Network Stats From ifconfig

I needed to get some Linux networking stats in my Python program today. Specifically, I needed 'bytes sent' and 'bytes received' counts since last reboot from the local machine.

ifconfig is a network configuration utility for Linux that you run from the command line:

corey@studio17:~$ ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:22:19:e5:07:31  
          inet addr:10.0.0.5  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::222:19ff:fee5:731/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3353822 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3052408 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3476310326 (3.4 GB)  TX bytes:256706611 (256.7 MB)
          Interrupt:17

The following function parses output from ifconfig to get the network stats I was after:

import re
import subprocess

def get_network_bytes(interface):
    output = subprocess.Popen(['ifconfig', interface], stdout=subprocess.PIPE).communicate()[0]
    rx_bytes = re.findall('RX bytes:([0-9]*) ', output)[0]
    tx_bytes = re.findall('TX bytes:([0-9]*) ', output)[0]
    return (rx_bytes, tx_bytes)

Example usage:

import re
import subprocess

def main():
    rx_bytes, tx_bytes = get_network_bytes('eth0')
    print '%s bytes received' % rx_bytes
    print '%s bytes sent' % tx_bytes
      
def get_network_bytes(interface):
    output = subprocess.Popen(['ifconfig', interface], stdout=subprocess.PIPE).communicate()[0]
    rx_bytes = re.findall('RX bytes:([0-9]*) ', output)[0]
    tx_bytes = re.findall('TX bytes:([0-9]*) ', output)[0]
    return (rx_bytes, tx_bytes)

if __name__ == '__main__':
    main()

Update: someone left an anonymous comment and mentioned you can just read from proc/net/dev rather than using ifconfig. I modified his code sample and came up with this:

def get_network_bytes(interface):
    for line in open('/proc/net/dev', 'r'):
        if interface in line:
            data = line.split('%s:' % interface)[1].split()
            rx_bytes, tx_bytes = (data[0], data[8])
            return (rx_bytes, tx_bytes)

Example Usage:

def main():
    rx_bytes, tx_bytes = get_network_bytes('eth0')
    print '%s bytes received' % rx_bytes
    print '%s bytes sent' % tx_bytes
      
def get_network_bytes(interface):
    for line in open('/proc/net/dev', 'r'):
        if interface in line:
            data = line.split('%s:' % interface)[1].split()
            rx_bytes, tx_bytes = (data[0], data[8])
            return (rx_bytes, tx_bytes)

if __name__ == '__main__':
    main()

September 23, 2010

My Chrome Users Now Outnumber Internet Explorer (Traffic Stats)

I really like to analyze the traffic to my various websites (homepage, blog, project pages, open source code).

Since my visitors must be somewhat interested in what I post, I like to think they are *just* *like* *me*.

So... of course, my own personal web metrics are better than anything else out there :)
(to you they might be worthless, but to me they are gold)

I get a pretty steady stream of about 40k pageviews per month. In the past year, I've served a paltry half million pageviews. But hey, its a good enough sample size to work with, and pretty flattering to have attracted that many eyeballs.

Look what happened this past month:

(stats from aug_22_2010 - sept_21_2010)

Chrome visitors overtook Internet Explorer in site usage.

Here is how it has been playing out in the past year and a half:

(Notice my Firefox visitors are waning a little also.)

September 22, 2010

Python - Multi-Mechanize Script With HTTP Profiling Timers

Multi-Mechanize is an open source framework for web performance and load testing. It allows you to run simultaneous python scripts to generate load (synthetic transactions) against a web site or web service.

The default response timer wraps the entire Transaction(), so it will time everything included. For more granular timing, you need to instrument your script code with custom timers.

Multi-Mechanize is pure Python and you have access to all of Python's standard library in your scripts. For example, you can use httplib to write a virtual user agent script and get detailed HTTP profiling times (TTFB, TTLB, etc). http://docs.python.org/library/httplib.html

 
import httplib
import time


class Transaction(object):
    def __init__(self):
        self.custom_timers = {}
    
    def run(self):
        conn = httplib.HTTPConnection('www.example.com')
        start = time.time()
        conn.request('GET', '/')
        request_time = time.time()
        resp = conn.getresponse()
        response_time = time.time()
        conn.close()     
        transfer_time = time.time()
        
        self.custom_timers['request sent'] = request_time - start
        self.custom_timers['response received'] = response_time - start
        self.custom_timers['content transferred'] = transfer_time - start
        
        assert (resp.status == 200), 'Bad HTTP Response'

To test it out, you can add this to the bottom of the script and run it from the command line:

 
if __name__ == '__main__':
    trans = Transaction()
    trans.run()
    
    for timer in ('request sent', 'response received', 'content transferred'):
        print '%s: %.5f secs' % (timer, trans.custom_timers[timer])

Output:

 
request sent: 0.14429 secs
response received: 0.25995 secs
content transferred: 0.26007 secs

* if you are running MS Windows, replace the time.time() calls with time.clock() for better timer accuracy. On all other operating systems, use time.time()

September 20, 2010

CSharp/.NET - Use Stopwatch() Instead of DateTime() for Accurate High Precision Timing

(originally posted at coreygoldberg.blogspot.com)

In C# (.NET), the DateTime() class is not accurate for high precision timing. Instead, use the Stopwatch() class if you need a timer. Most hardware and operating systems support a high-resolution performance counter that can be accessed through System.Diagnostics.Stopwatch.

Don't use DateTime() like this if you need accuracy:

 
using System;

class Program
{
    public static void Main()
    {
        DateTime start = DateTime.Now;

            // do timed work here

        DateTime stop = DateTime.Now;

        // don't do this. you won't get accurate timing
        Console.WriteLine("{0} ms", (stop - start).TotalMilliseconds);

        // definitely don't do this. you won't get accurate timing or full timer resolution
        Console.WriteLine("{0} ms", (stop - start).Milliseconds);
    }
}

Stopwatch() uses operating system's high-resolution performance counter:

 
using System;
using System.Diagnostics;

class Program
{
    public static void Main()
    {
        Stopwatch stopWatch = Stopwatch.StartNew();

            // do timed work here

        stopWatch.Stop();

        // don't do this. you won't get full timer resolution
        Console.WriteLine("{0} ms", stopWatch.ElapsedMilliseconds);

        // do this to get accurate high precision timing
        Console.WriteLine("{0} ms", stopWatch.Elapsed.TotalMilliseconds);
    }
}

The Stopwatch class is in the System.Diagnostics namespace:

 
using System.Diagnostics;

Stopwatch measures elapsed time by counting timer ticks in the underlying timer mechanism. If the installed hardware and operating system support a high-resolution performance counter, then the Stopwatch class uses that counter to measure elapsed time. Otherwise, the Stopwatch class uses the system timer (DateTime class) to measure elapsed time.

To see if your system supports a high-resolution performance counter, check the Stopwatch.IsHighResolution property:

 
if (Stopwatch.IsHighResolution)
    Console.WriteLine("Using the system's high-resolution performance counter.");
else 
    Console.WriteLine("Using the DateTime class.");

To check the timing accuracy, use the Stopwatch.Frequency property:

 
long frequency = Stopwatch.Frequency;

Console.WriteLine("Timer frequency in ticks per second: {0}", frequency);

long nanosecPerTick = (1000L*1000L*1000L) / frequency;

Console.WriteLine("Timer is accurate within {0} nanoseconds", nanosecPerTick);

August 1, 2010

Load Testing Web Services with Python and Multi-Mechanize

(originally posted at coreygoldberg.blogspot.com)

I had to test and benchmark a SOAP web service recently, and figured I'd write up some instructions for load testing with Multi-Mechanize.

Multi-Mechanize is a performance and load testing framework that enables you to write plugins (in Python) to create virtual user scripts that run in parallel against your service.

Project Home: multimechanize.com
Virtual User Script Development Guide: DevelopingScripts

Since Multi-Mechanize scripts are pure Python, you can use any Python module inside them. In the case of a web service that uses SOAP, you have a few options. You can use any 3rd party module, so something like Suds (a lightweight SOAP client lib) would be a reasonable choice. However, I decided to stick with the Python standard library and build my scripts with urllib2. Afterall, SOAP is just slinging a bunch of XML back-and-forth over HTTP.

So before I get started with Multi-Mechanize and performance/load testing...

Here is a Python (2.x) script using urllib2 to interact with a SOAP web service over HTTP:

import urllib2

with open('soap.xml') as f:
    soap_body = f.read()
    
req = urllib2.Request(url='http://www.foo.com/service', data=soap_body)
req.add_header('Content-Type', 'text/xml')
resp = urllib2.urlopen(req)
content = resp.read()

(The script above assumes you have a file named 'soap.xml' in the local directory that contains your payload (SOAP XML message). It will send an HTTP POST request containing your payload in the body.)

After the initial script is created in Python, the next step is to convert it into a Multi-Mechanize script. To do this, you create a Transaction class with a run() method and add your code there.

As a Multi-Mechanize script, the same thing would be done like this:

import urllib2


class Transaction(object):
    def __init__(self):
        with open('soap.xml') as f:
            self.soap_body = f.read()
    
    def run(self):
        req = urllib2.Request(url='http://www.foo.com/service', data=self.soap_body)
        req.add_header('Content-Type', 'text/xml')
        resp = urllib2.urlopen(req)
        content = resp.read()
        
        assert ('Example SOAP Response' in content), 'Failed Content Verification'

(Notice I also added an assert statement to verify the content returned from the service.)

Now that you have a script created, it's time to configure Multi-Mechanize and run some tests.

You can download Multi-Mechanize from: code.google.com
It requires Python 2.x (2.6 or 2.7), and if you want it to generate graphs, you must also install Matplotlib and its dependencies. See the FAQ for help.

Once you have Multi-Mechanize downloaded, unzip it and go to the "/projects" directory. You can create a new project directory here. You can call it "soap_project". Inside this directory, you will need 2 things: a config file ("config.cfg"), and a "test_scripts" directory (containing the script you previously created, which you can call "soap_client.py". Since the script is looking for a data file named "soap.xml", make sure you have one created in the main multi-mechanize directory.

The directory and file layout will look like this:

/multi-mechanize
    /lib
    /projects
        /soap_project
            /test_scripts
                soap_client.py
            config.cfg
    multi-mechanize.py
    soap.xml

(you can see the "default_project" for an example of how it should be setup.)

To begin with, you can use a simple config.cfg file like this:

[global]
run_time: 30
rampup: 0
console_logging: on
results_ts_interval: 10

[user_group-1]
threads: 1
script: soap_client.py

This will just a run a single thread of your virtual user script for 30 seconds; good enough for testing and getting things going.

To run a test, go to the topmost multi-mechanize directory and run:

$ python multi-mechanize.py soap_project

(You should see timer output in the console.)

Once you have things running well, you can turn off "console_logging" and increase the workload. It will take some adjustment and plenty of trials to get the load dialed in correctly for your service. You can also get a lot more sophisticated with your workload model and create multiple virtual user scripts doing different transactions. In this case, I'll keep it simple and just test one web service request type in isolation.

Once I ran plenty of iterations and got a feel for how the service responded and what its limits were, I settled with a config file like this:

[global]
run_time: 900
rampup: 900
console_logging: off
results_ts_interval: 90

[user_group-1]
threads: 30
script: soap_client.py

(30 threads, increasing load over 15 mins)

After each test run you do, a new results directory is created containing your test results. Look for "results.html" and view it in your browser to see the output report.

Have questions about Multi-Mechanize?
Post to the discussion group: groups.google.com/group/multi-mechanize

Pages