This year I am attending my first PyCon (the annual Python community conference).
I will be in Atlanta: March 10-13.
If anyone is interested in meeting up or collaborating while I'm there, get in touch:
- Twitter: @cgoldberg
- Homepage/Info: goldb.org
This year I am attending my first PyCon (the annual Python community conference).
I will be in Atlanta: March 10-13.
If anyone is interested in meeting up or collaborating while I'm there, get in touch:
Some basic instructions for searching Splunk from Python...
First, you must install Splunk on the machine you will run the Python script from. Splunk installs its own Python interpreter that you can use to run your code. I am using Splunk 4.14, which includes Python 2.6.
(It looks like you can set some environment variables and install a few Python dependencies along with the Python SDK and get this going "outside" of Splunk. But the easiest option is just to run on their interpreter).
To run your own Python scripts on Splunk's interpreter:
- save script into Splunk's "bin" directory
(usually "/opt/splunk/bin" or "C:\Program Files\Splunk\bin")
- go to the "bin" directory and run:
splunk cmd python your_script.py
So...
What goes in your Python code?
First, import the modules you will need:
import time import splunk.auth import splunk.search
Next, authenticate and get a session key.
For the local splunk host:
key = splunk.auth.getSessionKey('user', 'password')
If you are going to search a remote splunk host, you must authenticate against it by adding the "hostPath" parameter:
key = splunk.auth.getSessionKey('user', 'password', hostPath='https://mysplunk:8089')
Tips:
- use https, even if you are not using ssl in your splunk web interface
- 'admin' user doesn't seem to work. user a normal user/password.
Next, submit a search job.
For a local search:
job = splunk.search.dispatch('search index="os" *', earliest_time='-15m')
For a remote search, use the "hostPath" parameter again:
job = splunk.search.dispatch('search index="os" *', earliest_time='-15m', hostPath='https://mysplunk:8089')
print the job details:
print job
wait for the results:
while not job.isDone:
time.sleep(.25)
print results
for result in job.results:
print result
Altogether in a Python script:
#!/usr/bin/env python
# Corey Goldberg - 2010
#
# search a remote splunk server
#
# instructions:
# - save script into splunk's "bin" directory
# (usually "/opt/splunk/bin" or "C:\Program Files\Splunk\bin")
# - go to the "bin" directory and run:
# $ splunk cmd python my_script.py
#
import time
import splunk.auth
import splunk.search
SPLUNK_SERVER = '192.168.12.173'
USER_NAME = 'foo'
PASSWORD = 'secret'
SEARCH_STRING = 'search index="os"'
EARLIEST_TIME = '-15m'
def main():
# authenticate
key = splunk.auth.getSessionKey(USER_NAME, PASSWORD, hostPath='https://%s:8089' % SPLUNK_SERVER)
print 'auth key:\n%s' % key
# submit a search job
job = splunk.search.dispatch(SEARCH_STRING, earliest_time=EARLIEST_TIME, hostPath='https://%s:8089' % SPLUNK_SERVER)
print 'job details:\n%s' % job
# wait for results
while not job.isDone:
time.sleep(.25)
print 'results:'
for result in job.results:
print result
if __name__== '__main__':
main()
Using Python to shorten a URL with Google's shortening service (goo.gl):
#!/usr/bin/python
# Corey Goldberg - 2010
import json
import urllib
import urllib2
def shorten(url):
gurl = 'http://goo.gl/api/url?url=%s' % urllib.quote(url)
req = urllib2.Request(gurl, data='')
req.add_header('User-Agent', 'toolbar')
results = json.load(urllib2.urlopen(req))
return results['short_url']
if __name__ == '__main__':
print shorten('http://www.goldb.org/')
print shorten('www.yahoo.com')
You give it a URL to shorten: shorten('http://www.goldb.org/long_url')
... and it returns a shortened URL for you: 'http://goo.gl/jh4W'
I needed to get some Linux networking stats in my Python program today. Specifically, I needed 'bytes sent' and 'bytes received' counts since last reboot from the local machine.
ifconfig is a network configuration utility for Linux that you run from the command line:
corey@studio17:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:22:19:e5:07:31
inet addr:10.0.0.5 Bcast:10.0.0.255 Mask:255.255.255.0
inet6 addr: fe80::222:19ff:fee5:731/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3353822 errors:0 dropped:0 overruns:0 frame:0
TX packets:3052408 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3476310326 (3.4 GB) TX bytes:256706611 (256.7 MB)
Interrupt:17
The following function parses output from ifconfig to get the network stats I was after:
import re
import subprocess
def get_network_bytes(interface):
output = subprocess.Popen(['ifconfig', interface], stdout=subprocess.PIPE).communicate()[0]
rx_bytes = re.findall('RX bytes:([0-9]*) ', output)[0]
tx_bytes = re.findall('TX bytes:([0-9]*) ', output)[0]
return (rx_bytes, tx_bytes)
Example usage:
import re
import subprocess
def main():
rx_bytes, tx_bytes = get_network_bytes('eth0')
print '%s bytes received' % rx_bytes
print '%s bytes sent' % tx_bytes
def get_network_bytes(interface):
output = subprocess.Popen(['ifconfig', interface], stdout=subprocess.PIPE).communicate()[0]
rx_bytes = re.findall('RX bytes:([0-9]*) ', output)[0]
tx_bytes = re.findall('TX bytes:([0-9]*) ', output)[0]
return (rx_bytes, tx_bytes)
if __name__ == '__main__':
main()
Update: someone left an anonymous comment and mentioned you can just read from proc/net/dev rather than using ifconfig. I modified his code sample and came up with this:
def get_network_bytes(interface):
for line in open('/proc/net/dev', 'r'):
if interface in line:
data = line.split('%s:' % interface)[1].split()
rx_bytes, tx_bytes = (data[0], data[8])
return (rx_bytes, tx_bytes)
Example Usage:
def main():
rx_bytes, tx_bytes = get_network_bytes('eth0')
print '%s bytes received' % rx_bytes
print '%s bytes sent' % tx_bytes
def get_network_bytes(interface):
for line in open('/proc/net/dev', 'r'):
if interface in line:
data = line.split('%s:' % interface)[1].split()
rx_bytes, tx_bytes = (data[0], data[8])
return (rx_bytes, tx_bytes)
if __name__ == '__main__':
main()
I really like to analyze the traffic to my various websites (homepage, blog, project pages, open source code).
Since my visitors must be somewhat interested in what I post, I like to think they are *just* *like* *me*.
So... of course, my own personal web metrics are better than anything else out there :)
(to you they might be worthless, but to me they are gold)
I get a pretty steady stream of about 40k pageviews per month. In the past year, I've served a paltry half million pageviews. But hey, its a good enough sample size to work with, and pretty flattering to have attracted that many eyeballs.
Look what happened this past month:
Chrome visitors overtook Internet Explorer in site usage.
Here is how it has been playing out in the past year and a half:
(Notice my Firefox visitors are waning a little also.)
Multi-Mechanize is an open source framework for web performance and load testing. It allows you to run simultaneous python scripts to generate load (synthetic transactions) against a web site or web service.
The default response timer wraps the entire Transaction(), so it will time everything included. For more granular timing, you need to instrument your script code with custom timers.
Multi-Mechanize is pure Python and you have access to all of Python's standard library in your scripts. For example, you can use httplib to write a virtual user agent script and get detailed HTTP profiling times (TTFB, TTLB, etc). http://docs.python.org/library/httplib.html
import httplib
import time
class Transaction(object):
def __init__(self):
self.custom_timers = {}
def run(self):
conn = httplib.HTTPConnection('www.example.com')
start = time.time()
conn.request('GET', '/')
request_time = time.time()
resp = conn.getresponse()
response_time = time.time()
conn.close()
transfer_time = time.time()
self.custom_timers['request sent'] = request_time - start
self.custom_timers['response received'] = response_time - start
self.custom_timers['content transferred'] = transfer_time - start
assert (resp.status == 200), 'Bad HTTP Response'
To test it out, you can add this to the bottom of the script and run it from the command line:
if __name__ == '__main__':
trans = Transaction()
trans.run()
for timer in ('request sent', 'response received', 'content transferred'):
print '%s: %.5f secs' % (timer, trans.custom_timers[timer])
Output:
request sent: 0.14429 secs response received: 0.25995 secs content transferred: 0.26007 secs
* if you are running MS Windows, replace the time.time() calls with time.clock() for better timer accuracy. On all other operating systems, use time.time()
(originally posted at coreygoldberg.blogspot.com)
In C# (.NET), the DateTime() class is not accurate for high precision timing. Instead, use the Stopwatch() class if you need a timer. Most hardware and operating systems support a high-resolution performance counter that can be accessed through System.Diagnostics.Stopwatch.
Don't use DateTime() like this if you need accuracy:
using System;
class Program
{
public static void Main()
{
DateTime start = DateTime.Now;
// do timed work here
DateTime stop = DateTime.Now;
// don't do this. you won't get accurate timing
Console.WriteLine("{0} ms", (stop - start).TotalMilliseconds);
// definitely don't do this. you won't get accurate timing or full timer resolution
Console.WriteLine("{0} ms", (stop - start).Milliseconds);
}
}
Stopwatch() uses operating system's high-resolution performance counter:
using System;
using System.Diagnostics;
class Program
{
public static void Main()
{
Stopwatch stopWatch = Stopwatch.StartNew();
// do timed work here
stopWatch.Stop();
// don't do this. you won't get full timer resolution
Console.WriteLine("{0} ms", stopWatch.ElapsedMilliseconds);
// do this to get accurate high precision timing
Console.WriteLine("{0} ms", stopWatch.Elapsed.TotalMilliseconds);
}
}
The Stopwatch class is in the System.Diagnostics namespace:
using System.Diagnostics;
Stopwatch measures elapsed time by counting timer ticks in the underlying timer mechanism. If the installed hardware and operating system support a high-resolution performance counter, then the Stopwatch class uses that counter to measure elapsed time. Otherwise, the Stopwatch class uses the system timer (DateTime class) to measure elapsed time.
To see if your system supports a high-resolution performance counter, check the Stopwatch.IsHighResolution property:
if (Stopwatch.IsHighResolution)
Console.WriteLine("Using the system's high-resolution performance counter.");
else
Console.WriteLine("Using the DateTime class.");
To check the timing accuracy, use the Stopwatch.Frequency property:
long frequency = Stopwatch.Frequency;
Console.WriteLine("Timer frequency in ticks per second: {0}", frequency);
long nanosecPerTick = (1000L*1000L*1000L) / frequency;
Console.WriteLine("Timer is accurate within {0} nanoseconds", nanosecPerTick);
(originally posted at coreygoldberg.blogspot.com)
I had to test and benchmark a SOAP web service recently, and figured I'd write up some instructions for load testing with Multi-Mechanize.
Multi-Mechanize is a performance and load testing framework that enables you to write plugins (in Python) to create virtual user scripts that run in parallel against your service.
Since Multi-Mechanize scripts are pure Python, you can use any Python module inside them. In the case of a web service that uses SOAP, you have a few options. You can use any 3rd party module, so something like Suds (a lightweight SOAP client lib) would be a reasonable choice. However, I decided to stick with the Python standard library and build my scripts with urllib2. Afterall, SOAP is just slinging a bunch of XML back-and-forth over HTTP.
So before I get started with Multi-Mechanize and performance/load testing...
Here is a Python (2.x) script using urllib2 to interact with a SOAP web service over HTTP:
import urllib2
with open('soap.xml') as f:
soap_body = f.read()
req = urllib2.Request(url='http://www.foo.com/service', data=soap_body)
req.add_header('Content-Type', 'text/xml')
resp = urllib2.urlopen(req)
content = resp.read()
(The script above assumes you have a file named 'soap.xml' in the local directory that contains your payload (SOAP XML message). It will send an HTTP POST request containing your payload in the body.)
After the initial script is created in Python, the next step is to convert it into a Multi-Mechanize script. To do this, you create a Transaction class with a run() method and add your code there.
As a Multi-Mechanize script, the same thing would be done like this:
import urllib2
class Transaction(object):
def __init__(self):
with open('soap.xml') as f:
self.soap_body = f.read()
def run(self):
req = urllib2.Request(url='http://www.foo.com/service', data=self.soap_body)
req.add_header('Content-Type', 'text/xml')
resp = urllib2.urlopen(req)
content = resp.read()
assert ('Example SOAP Response' in content), 'Failed Content Verification'
(Notice I also added an assert statement to verify the content returned from the service.)
Now that you have a script created, it's time to configure Multi-Mechanize and run some tests.
You can download Multi-Mechanize from: code.google.com
It requires Python 2.x (2.6 or 2.7), and if you want it to generate graphs, you must also install Matplotlib and its dependencies. See the FAQ for help.
Once you have Multi-Mechanize downloaded, unzip it and go to the "/projects" directory. You can create a new project directory here. You can call it "soap_project". Inside this directory, you will need 2 things: a config file ("config.cfg"), and a "test_scripts" directory (containing the script you previously created, which you can call "soap_client.py". Since the script is looking for a data file named "soap.xml", make sure you have one created in the main multi-mechanize directory.
The directory and file layout will look like this:
/multi-mechanize
/lib
/projects
/soap_project
/test_scripts
soap_client.py
config.cfg
multi-mechanize.py
soap.xml
(you can see the "default_project" for an example of how it should be setup.)
To begin with, you can use a simple config.cfg file like this:
[global] run_time: 30 rampup: 0 console_logging: on results_ts_interval: 10 [user_group-1] threads: 1 script: soap_client.py
This will just a run a single thread of your virtual user script for 30 seconds; good enough for testing and getting things going.
To run a test, go to the topmost multi-mechanize directory and run:
$ python multi-mechanize.py soap_project
(You should see timer output in the console.)
Once you have things running well, you can turn off "console_logging" and increase the workload. It will take some adjustment and plenty of trials to get the load dialed in correctly for your service. You can also get a lot more sophisticated with your workload model and create multiple virtual user scripts doing different transactions. In this case, I'll keep it simple and just test one web service request type in isolation.
Once I ran plenty of iterations and got a feel for how the service responded and what its limits were, I settled with a config file like this:
[global] run_time: 900 rampup: 900 console_logging: off results_ts_interval: 90 [user_group-1] threads: 30 script: soap_client.py
(30 threads, increasing load over 15 mins)
After each test run you do, a new results directory is created containing your test results. Look for "results.html" and view it in your browser to see the output report.
Have questions about Multi-Mechanize?
Post to the discussion group: groups.google.com/group/multi-mechanize