Thursday, January 14, 2010

Python - Web Load Tester - Multiple Processes and Threads

Here is my latest HTTP load generator in Python (web performance test). You just give it a URL and some runtime parameters and it will hammer a resource with HTTP requests. You could easily adapt this to run against a web service or set of links. It is useful for quickly loading a web resource with synthetic transactions for performance testing or tuning purposes.

I have built lots of different load testing tool in Python in the past (see: Pylot), but they all suffered a similar problem. Their concurrency model was based on Threads. Because of this threaded design, combined with Python's GIL implemenatation, my tools were unable to fully utilize multiple cores or processors.

Load generators shouldn't really suffer from processor contention because they are inherently IO-bound, not CPU-bound. However, if you add some client-side processing (response parsing, content verification, etc) and SSL, you could quickly run into a situation where you need more CPU horsepower.

The addition of multiprocessing in Python 2.6 gives me a whole new set of ideas for distributing load over multiple OS processes. It allows me to sidestep the GIL limitation of using a purely threaded model for concurrency. So now I can spawn multiple processes (to scale across processors/cores), with each one spawning multiple threads (for non-blocking i/o). This combination of processes and threads makes the basis for a very scalable and powerful load generating tool.


The Script:
http_load_multiprocess_multithread.py

In the code, you can define the following constants:

URL = 'http://www.example.com/foo?q=bar'
PROCESSES = 4
PROCESS_THREADS = 10
INTERVAL = 2  # secs
RUN_TIME = 60  # secs
RAMPUP = 60  # secs

It is a single Python script with no dependencies. I tested it on a dual quad-core system and it scaled nicely across all 8 cores. I hope to use something like this as a synthetic transaction engine at the core of a new load testing tool.

The output of the script is a single file named 'results.csv' with raw timing data in the following CSV format:

elapsed time, response time, http status

It looks like this:

0.562,0.396,200
0.562,0.319,200
0.578,0.405,200
0.578,0.329,200
...

Here is what the raw data looks like as a scatter plot:

To get more useful information, you need to do some post-processing on the results data. Here is a small script that will crunch some of the data from the results file. It breaks the data up into a time-series and calculates throughput and average response time per interval.

http_load_multiprocess_multithread_results_parse.py

Once you have this derived data, you can graph it it to get a more useful view of how the test and system performed.

Wednesday, January 13, 2010

Python - Command Line Progress Bar With Percentage and Elapsed Time Display

Here is a Python module which produces an ascii command line progress bar with percentage and elapsed time display.

code:
progress_bar.py

to use:

from progress_bar import ProgressBar

p = ProgressBar(60)
    
p.update_time(15)
print p
    
p.fill_char = '='
p.update_time(40)
print p

results:

[##########       25%                  ]  15s/60s
[=================67%=====             ]  40s/60s

Sunday, January 10, 2010

Python - Lookup Google PageRank Score

Here is a Python module for getting the PageRank of a site from Google. It uses the Google Toolbar 3.0.x/4.0.x Pagerank Checksum Algorithm.

The code is here:

pagerank.py

You can import it as a module and use its get_pagerank(url) method to lookup the PageRank of a URL from your Python code.

For example:

#!/usr/bin/env python

import pagerank

rank = pagerank.get_pagerank('http://www.google.com')
print rank

* note: only use this within Google's terms of service

Wednesday, December 30, 2009

New Year’s Python Meme

What’s the coolest Python application, framework or library you have discovered in 2009?

Multiprocessing, which was included in the standard library starting with Python 2.6. It is uses the same API as the threading module, but spawns separate OS processes, rather than threads. You can use it as a basis to build some very powerful concurrent programs that fully utilize multi-core systems.


What new programming technique did you learn in 2009?

Better concurrency with threads and processes. Up until a few years ago, I mostly dealt with uniprocessor systems. 1 processor, 1 core, concurrency was simple since you didn't have to think about it beyond using threads for non-blocking IO. Now everything is multi-way and multi-core and you have to think about true concurrency. Python gives you some primitives to deal with that, and I have been poking around and trying to learn what I can so I can build concurrent and scalable systems.

I also started dabbling in GUI programming with the new ttk (Tk Themed Widgets) in Python 2.7/3.1.


What’s the name of the open source project you contributed the most in 2009? What did you do?

www.pylot.org

Pylot is a performance/load testing tool for http and web services. I am the author and maintainer.

www.goldb.org
corey-projects

I have also been releasing snippets of open source code at my new Google code repository and on my web site.


What was the Python blog or website you read the most in 2009?

Planet Python gets you most everything in the Python world. But if I had to pick, I'd say my favorite Python blogs are from Jesse Noller and Michael Foord (Voidspace). Both of them write about topics that I am very interested in. (They are both Python Core maintainers).

I also like the regular geek sites: Reddit Programming, Slashdot, Digg. Then there is a big crew of Python developers on Twitter that are fun to follow (follow me: @cgoldberg).

I also really got into StackOverflow this year. It is a really useful question/answer site. The wealth of Python information on there is staggering. I mean, where else can you ask a Python question and have people like Alex Martelli answering it 5 mins later? ... with others fighting to provide even better answers. It's great!


What are the three top things you want to learn in 2010?

- Selenium/WebDriver 2.0 - the new WebDriver browser driver and API. The Selenium guys have been doing a lot of great work and this is becoming the industry standard tool for UI testing of browser based apps. 2.0 is a big change from 1.x and is the merger of 2 projects: Selenium and WebDriver.

- Python 3.1 - I have already been writing some code in Py3k, and to be honest it's barely any different than 2.x. It feels right at home, with some minor syntactical changes and reorganization of some standard library modules.

- Better copy writing - I produce lots of code in the form of snippets or larger projects. Most of these often get lost because I don't have the words to go along with them. If I was a better writer and could express myself better, I would be able to post more often and get my ideas out there.

Thursday, December 17, 2009

Python - PyQt4 - Hello World

basic "Hello World" GUI application using Python and Qt (PyQt4)


#!/usr/bin/env python
 
import sys
from PyQt4 import Qt

app = Qt.QApplication(sys.argv)
lbl = Qt.QLabel('Hello World')
lbl.show()
app.exec_()

Wednesday, December 16, 2009

Python - Send Email With smtplib

This is mostly just for my own reference


sending an email using Python's smtplib:


import smtplib
from email.MIMEText import MIMEText

smtp_server = 'exchangeserver.foo.com'
recipients = ['corey@foo.com',]
sender = 'corey@foo.com'
subject = 'subject goes here'
msg_text = 'message body goes here'


msg = MIMEText(msg_text)
msg['Subject'] = subject
s = smtplib.SMTP()
s.connect(smtp_server)
s.sendmail(sender, recipients, msg.as_string())
s.close()

Book Sample: Matplotlib for Python Developers - Plotting Data


Today's post is a sample from the new book: "Matplotlib for Python Developers" by Sandro Tosi. I will review the book in an upcoming post.

-Corey



Disclosure: Packt Publishing sent me a free copy of this book to review.




Matplotlib for Python Developers


http://www.packtpub.com/matplotlib-python-development/book





Matplotlib for Python Developers


In this two-part article, by Sandro Tosi you will see several examples of Matplotlib usage in real-world situations, showing some of the common cases where we can use Matplotlib to draw a plot of some values.

There is a common workflow for this kind of job:

  1. Identify the best data source for the information we want to plot
  2. Extract the data of interest and elaborate it for plotting
  3. Plot the data

Usually, the hardest part is to extract and prepare the data for plotting. Due to this, we are going to show several examples of the first two steps.


The examples are:

  • Plotting data from a database
  • Plotting data from a web page
  • Plotting the data extracted by parsing an Apache log file
  • Plotting the data read from a comma-separated values (CSV) file
  • Plotting extrapolated data using curve fitting
  • Third-party tools using Matplotlib (NetworkX and mpmath)

Let's begin

Plotting data from a database

Databases often tend to collect much more information than we can simply extract and watch in a tabular format (let's call it the "Excel sheet" report style).

Databases not only use efficient techniques to store and retrieve data, but they are also very good at aggregating it.

One suggestion we can give is to let the database do the work. For example, if we need to sum up a column, let's make the database sum the data, and not sum it up in the code. In this way, the whole process is much more efficient because:

  • There is a smaller memory footprint for the Python code, since only the aggregate value is returned, not the whole result set to generate it
  • The database has to read all the rows in any case. However, if it's smart enough, then it can sum values up as they are read
  • The database can efficiently perform such an operation on more than one column at a time

The data source we're going to query is from an open source project: the Debian distribution. Debian has an interesting project called UDD , Ultimate Debian Database, which is a relational database where a lot of information (either historical or actual) about the distribution is collected and can be analyzed.

On the project website http://udd.debian.org/, we can fi nd a full dump of the database (quite big, honestly) that can be downloaded and imported into a local PostgreSQL instance (refer to http://wiki.debian.org/UltimateDebianDatabase/CreateLocalReplica for import instructions

Now that we have a local replica of UDD, we can start querying it:

# module to access PostgreSQL databases
import psycopg2
# matplotlib pyplot module
import matplotlib.pyplot as plt

Since UDD is stored in a PostgreSQL database, we need psycopg2 to access it. psycopg2 is a third-party module available at http://initd.org/projects/psycopg

# connect to UDD database
conn = psycopg2.connect(database="udd")
# prepare a cursor
cur = conn.cursor()

We will now connect to the database server to access the udd database instance, and then open a cursor on the connection just created.

# this is the query we'll be making
query = """
select to_char(date AT TIME ZONE 'UTC', 'HH24'), count(*)
from upload_history
where to_char(date, 'YYYY') = '2008'
group by 1
order by 1"""


We have prepared the select statement to be executed on UDD. What we wish to do here is extract the number of packages uploaded to the Debian archive (per hour) in the whole year of 2008.

  • date AT TIME ZONE 'UTC': As date field is of the type timestamp with time zone, it also contains time zone information, while we want something independent from the local time. This is the way to get a date in UTC time zone.
  • group by 1: This is what we have encouraged earlier, that is, let the database do the work. We let the query return the already aggregated data, instead of coding it into the program.
# execute the query
cur.execute(query)
# retrieve the whole result set
data = cur.fetchall()

We execute the query and fetch the whole result set from it.

# close cursor and connection
cur.close()
conn.close()

Remember to always close the resources that we've acquired in order to avoid memory or resource leakage and reduce the load on the server (removing connections that aren't needed anymore).

# unpack data in hours (first column) and
# uploads (second column)
hours, uploads = zip(*data)

The query result is a list of tuples, (in this case, hour and number of uploads), but we need two separate lists—one for the hours and another with the corresponding number of uploads. zip() solves this with *data, we unpack the list, returning the sublists as separate arguments to zip(), which in return, aggregates the elements in the same position in the parameters into separated lists. Consider the following example:

In [1]: zip(['a1', 'a2'], ['b1', 'b2'])
Out[1]: [('a1', 'b1'), ('a2', 'b2')]

To complete the code:

# graph code
plt.plot(hours, uploads)
# the the x limits to the 'hours' limit
plt.xlim(0, 23)
# set the X ticks every 2 hours
plt.xticks(range(0, 23, 2))
# draw a grid
plt.grid()
# set title, X/Y labels
plt.title("Debian packages uploads per hour in 2008")
plt.xlabel("Hour (in UTC)")
plt.ylabel("No. of uploads")

The previous code snippet is the standard plotting code, which results in the following screenshot:

From this graph we can see that in 2008, the main part of Debian packages uploads came from European contributors. In fact, uploads were made mainly in the evening hours (European time), after the working days are over (as we can expect from a voluntary project).


Plotting data from the Web

Often, the information we need is not distributed in an easy-to-use format such as XML or a database export but for example only on web sites.

More and more often we find interesting data on a web page, and in that case we have to parse it to extract that information: this is called web scraping .

In this example, we will parse a Wikipedia article to extracts some data to plot. The article is at http://it.wikipedia.org/wiki/Demografia_d'Italia and contains lots of information about Italian demography (it's in Italian because the English version lacks a lot of data); in particular, we are interested in the population evolution over the years.

Probably the best known Python module for web scraping is BeautifulSoup ( http://www.crummy.com/software/BeautifulSoup/). It's a really nice library that gets the job done quickly, but there are situations (in particular with JavaScript embedded in the web page, such as for Wikipedia) that prevent it from working.

As an alternative, we find lxml quite productive (http://codespeak.net/lxml/). It's a library mainly used to work with XML (as the name suggests), but it can also be used with HTML (given their quite similar structures), and it is powerful and easy–to-use.

Let's dig into the code now:

# to get the web pages
import urllib2
# lxml submodule for html parsing
from lxml.html import parse
# regular expression module
import re
# Matplotlib module
import matplotlib.pyplot as plt

Along with the Matplotlib module, we need the following modules:

  • urllib2: This is the module (from the standard library) that is used to access resources through URL (we will download the webpage with this).
  • lxml: This is the parsing library.
  • re: Regular expressions are needed to parse the returned data to extract the information we need. re is a module from the standard library, so we don't need to install a third-party module to use it.
# general urllib2 config
user_agent = 'Mozilla/5.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
url = "http://it.wikipedia.org/wiki/Demografia_d'Italia"

Here, we prepare some configuration for urllib2, in particular, the user_agent header is used to access Wikipedia and the URL of the page.

# prepare the request and open the url
req = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(req)

Then we make a request for the URL and get the HTML back.

# we parse the webpage, getroot() return the document root
doc = parse(response).getroot()

We parse the HTML using the parse() function of lxml.html and then we get the root element. XML can be seen as a tree, with a root element (the node at the top of the tree from where every other node descends), and a hierarchical structure of elements.

# find the data table, using css elements
table = doc.cssselect('table.wikitable')[0]

We leverage the structure of HTML accessing the first element of type table of class wikitable because that's the table we're interested in.

# prepare data structures, will contain actual data
years = []
people = []

Preparing the lists that will contain the parsed data.

# iterate over the rows of the table, except first and last ones
for row in table.cssselect('tr')[1:-1]:

We can start parsing the table. Since there is a header and a footer in the table, we skip the first and the last line from the lines (selected by the tr tag) to loop over.

# get the row cell (we will use only the first two)
data = row.cssselect('td')

We get the element with the td tag that stands for table data: those are the cells in an HTML table.

# the first cell is the year
tmp_years = data[0].text_content()
# cleanup for cases like 'YYYY[N]' (date + footnote link)
tmp_years = re.sub('[.]', '', tmp_years)

We take the first cell that contains the year, but we need to remove the additional characters (used by Wikipedia to link to footnotes).

# the second cell is the population count
tmp_people = data[1].text_content()
# cleanup from '.', used as separator
tmp_people = tmp_people.replace('.', '')

We also take the second cell that contains the population for a given year. It's quite common in Italy to separate thousands in number with a '.' character: we have to remove them to have an appropriate value.

# append current data to data lists, converting to integers
years.append(int(tmp_years))
people.append(int(tmp_people))

We append the parsed values to the data lists, explicitly converting them to integer values.

# plot data
plt.plot(years,people)
# ticks every 10 years
plt.xticks(range(min(years), max(years), 10))
plt.grid()
# add a note for 2001 Census
plt.annotate("2001 Census", xy=(2001, people[years.index(2001)]),
xytext=(1986, 54.5*10**6),
arrowprops=dict(arrowstyle='fancy'))

Running the example results in the following screenshot that clearly shows why the annotation is needed:

In 2001, we had a national census in Italy, and that's the reason for the drop in that year: the values released from the National Institute for Statistics (and reported in the Wikipedia article) are just an estimation of the population. However, with a census, we have a precise count of the people living in Italy.




Plotting data by parsing an Apache log file

Plotting data from a log file can be seen as the art of extracting information from it.

Every service has a log format different from the others. There are some exceptions of similar or same format (for example, for services that come from the same development teams) but then they may be customized and we're back at the beginning.

The main differences in log files are:

  • Fields orders: Some have time information at the beginning, others in the middle of the line, and so on
  • Fields types: We can find several different data types such as integers, strings, and so on
  • Fields meanings: For example, log levels can have very different meanings

From all the data contained in the log file, we need to extract the information we are interested in from the surrounding data that we don't need (and hence we skip).

In our example, we're going to analyze the log file of one of the most common services: Apache. In particular, we will parse the access.log file to extract the total number of hits and amount of data transferred per day.

Apache is highly configurable, and so is the log format. Our Apache configuration, contained in the httpd.conf file, has this log format:

"%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

In this way, we extract the day of the request along with the response size from every line.

# prepare dictionaries to contain the data
day_hits = {}
day_txn = {}

These dictionaries will store the parsed data.

# we open the file
with open('<location of the Apache>/access.log') as f:
# and for every line in it
for line in f:

we open the file (we have to select the proper location of access.log as it differs between operating systems and/or installation), and for every line in it

# we pass the line to regular expression
m = apa_line.match(line)
# and we get the 2 values matched back
day, call_size = m.groups()

We parse the line and take the resulting values.

# if the current day is already present
if day in day_hits:
# we add the call and the size of the request
day_hits[day] += 1
day_txn[day] += int(call_size)

else if the current day is already present in the dictionaries, then add the hit and the size to the respective dictionaries.

else:
# else we initialize the dictionaries
day_hits[day] = 1
day_txn[day] = int(call_size)

If the current day is not present, then we need to initialize the dictionaries for a new day.

# prepare a list of the keys (days)
keys = sorted(day_hits.keys())

We prepare a sorted list of dictionary keys, since we need to access it several times.

# prepare a figure and an Axes in it
fig = plt.figure()
ax1 = fig.add_subplot(111)

We prepare a Figure and an Axes in it.

# bar width
width = .4

We define the bars width.

# for each key (day) and it's position
for i, k in enumerate(keys):

Then we enumerate the item in keys so that we have a progressive number associated with each key.

# we plot a bar
ax1.bar(i - width/2, day_hits[k], width=width, color='y')

Now we can plot a bar at position i (but shifted by width/2 to center the tick) with its height proportional to the number of hits, and width set to width. We set the bar color to yellow.

# for each label for the X ticks
for label in ax1.get_xticklabels():
# we hide it
label.set_visible(False)

We hide the labels on the X-axis to avoid its superimposition with the other Axes labels (for transfer size).

# add a label to the Y axis (for the first plot)
ax1.set_ylabel('Total hits')

We set a label for the Y-axis.

# create another Axes instance, twin of the previous one
ax2 = ax1.twinx()

We now create a second Axes, sharing the X-axis with the previous one.

# plot the total requests size
ax2.plot([day_txn[k] for k in keys], 'k', linewidth=2)

We plot a line for the transferred size, using the black color and with a bigger width.

# set the Y axis to start from 0
ax2.set_ylim(ymin=0)

We let the Y-axis start from 0 so that it can be congruent with the other Y-axis.

# set the X ticks for each element of keys (days)
ax2.set_xticks(range(len(keys)))
# set the label for them to keys, rotating and align to the right
ax2.set_xticklabels(keys, rotation=25, ha='right')

We set the ticks for the X-axis (that will be shared between this and the bar plot) and the labels, rotating them by 25 degrees and aligning them to the right to better fit the plot.

# set the formatter for Y ticks labels
ax2.yaxis.set_major_formatter(FuncFormatter(megabytes))
# add a label to Y axis (for the second plot)
ax2.set_ylabel('Total transferred data (in Mb)')

Then we set the formatter for the Y-axis so that the labels are shown in megabytes (instead of bytes).

# add a title to the whole plot
plt.title('Apache hits and transferred data by day

Finally, we set the plot title.

On executing the preceding code snippet, the following screenshot is displayed:

The preceding screenshot tells us that it is not a very busy server, but it still shows what's going on.


Continue Reading Plotting data using Matplotlib: Part 2

Monday, December 14, 2009

http_rrd_profiler - HTTP Request Profiler with RRD Storage and Graphing

http_rrd_profiler is an HTTP testing/monitoring tool written in Python. It uses Python's httplib to send a GET request to a web resource and adds performance timing instrumentation between each step of the request/response. RRDTool is used for data storage and graph generation. RRDtool is an open source, high performance logging and graphing system for time-series data.

The RRD graphs it produces:



The code is located in my repository. You will need the following files:
http_rrd_profiler.py
rrd.py


Usage:

http_rrd_profiler.py <host> <interval in secs>

You invoke it from the command like like this:

> python http_rrd_profiler.py www.goldb.org 5

This will send an HTTP GET to http://www.goldb.org/ every 5 seconds.

Timing results are printed to the console and stored in a Round Robin Database (RRD). During each interval, a graph (png image) is generated.

Console output:

----------------
216 request sent
337 response received
456 content transferred (8044 bytes)
----------------
208 request sent
332 response received
453 content transferred (8044 bytes)
----------------
212 request sent
332 response received
452 content transferred (8044 bytes)
----------------
210 request sent
329 response received
448 content transferred (8044 bytes)
----------------
214 request sent
333 response received
452 content transferred (8044 bytes)

* tested on Ubuntu Karmic and WinXP with Python 2.6

Wednesday, December 9, 2009

Python 3 - tkinter (Tk Widgets) - BusyBar Busy Indicator

Here is another example GUI application using Python 3.1.

For this example, I used BusyBar.py, a module for creating a "busy indicator" (like the knight-rider light). To do this, I first had to port BusyBar to Python 3.1 from 2.x. The new code for this module can be found here:

BusyBar.py

It renders on Ubuntu (with Gnome) like this:

Code:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk
import BusyBar


class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('BusyBar Demo')
        ttk.Frame(self.root, width=300, height=100).pack()
        
        bb = BusyBar.BusyBar(self.root, width=200)
        bb.place(x=40, y=20)
        bb.on()
    

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

Tuesday, December 8, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - Blocking Command Example

Here is another example GUI application with ttk in Python 3.1.

This example shows usage of the Button, Text, and Scrollbar widgets, along with some basic layout management using grid().

To use it, enter a URL and click the button. It will send an HTTP GET request to the URL and insert the response headers into the Text widget. You will notice that the GUI is blocked (frozen) while the URL is fetched. This is because all of the work is done in the main event thread. I will show how to use threads for non-blocking events in a future post.

It renders on Ubuntu (with Gnome) like this:

Code:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk
import urllib.request



class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Blocking Command Demo')
        
        self.init_widgets()
            
            
    def init_widgets(self):
        self.btn = ttk.Button(self.root, command=self.get_url, text='Get Url', width=8)
        self.btn.grid(column=0, row=0, sticky='w')
        
        self.entry = ttk.Entry(self.root, width=60)
        self.entry.grid(column=0, row=0, sticky='e')
        
        self.txt = tkinter.Text(self.root, width=80, height=20)
        self.txt.grid(column=0, row=1, sticky='nwes')
        sb = ttk.Scrollbar(command=self.txt.yview, orient='vertical')
        sb.grid(column=1, row=1, sticky='ns')
        self.txt['yscrollcommand'] = sb.set
        

    def get_url(self):
        url = self.entry.get()
        headers = urllib.request.urlopen(url).info()
        self.txt.insert(tkinter.INSERT, headers)
    
    

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

Monday, December 7, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - Button/Text Example

Here is some boilerplate code for setting up a basic GUI application with ttk in Python 3.1.

This example shows usage of the Frame, Button, and Text widgets. When you click the button, "Hello World" is inserted into the Text widget.

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk


class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Button Demo')
        ttk.Frame(self.root, width=250, height=100).pack()
        
        self.init_widgets()
            
    def init_widgets(self):
        ttk.Button(self.root, command=self.insert_txt, text='Click Me', width='10').place(x=10, y=10)
        self.txt = tkinter.Text(self.root, width='15', height='2')
        self.txt.place(x=10, y=50)
        
    def insert_txt(self):
        self.txt.insert(tkinter.INSERT, 'Hello World\n')


if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

It renders on Ubuntu (with Gnome) like this:

Sunday, December 6, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - GUI Hello World

Here is some boilerplate code for setting up a basic GUI application with ttk in Python 3.1:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk

class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Hello World')
        ttk.Frame(self.root, width=200, height=100).pack()
        ttk.Label(self.root, text='Hello World').place(x=10, y=10)

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

This example shows usage of the Frame and Label widgets, and some basic layout management using pack() and place(). It renders on Ubuntu (with Gnome) like this:

Thursday, December 3, 2009

Python - Accurate Cross Platform Timers - time.time() vs. time.clock()

In Python, you can add simple timers around any action using the clock() or time() functions from the time module. time.clock() gives the best timer accuracy on Windows, while the time.time() function gives the best accuracy on Unix/Linux. You should use whichever one is more appropriate for your system.

Inside the timeit module's source code (in Python's standard library), you can see an example of this. The timer type is specified by the platform:

if sys.platform == "win32":
    # On Windows, the best timer is time.clock()
    default_timer = time.clock
else:
    # On most other platforms, the best timer is time.time()
    default_timer = time.time

This seems like a nice way to create a timer that is accurate on either platform.

Here is an example:

import sys
import time

# choose timer to use
if sys.platform.startswith('win'):
    default_timer = time.clock
else:
    default_timer = time.time

start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)

Friday, October 23, 2009

Linus on Evolution

Here is a great rant from Linus Torvalds back in 2002 on biological and software evolution.
... from a thread on LKML, summarized here: http://kerneltrap.org/node/11


You know what the most complex piece of engineering known to man in the whole solar system is?

Guess what - it's not Linux, it's not Solaris, and it's not your car.

It's you. And me.

And think about how you and me actually came about - not through any complex design.

Right. "sheer luck".

Well, sheer luck, AND:

  • free availability and _crosspollination_ through sharing of "source code", although biologists call it DNA.
  • a rather unforgiving user environment, that happily replaces bad versions of us with better working versions and thus culls the herd (biologists often call this "survival of the fittest")
  • massive undirected parallel development ("trial and error")

I'm deadly serious: we humans have _never_ been able to replicate something more complicated than what we ourselves are, yet natural selection did it without even thinking.

Don't underestimate the power of survival of the fittest.

And don't EVER make the mistake that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That's giving your intelligence _much_ too much credit.



Linus

Monday, October 19, 2009

Python - URL Timer - Web Monitor Utility

In a previous post, I showed how to insert timers between steps in an http request to do some basic profiling. Peter Bengtsson contacted me with some enhancements to the original code. I took his enhancements and added some more of my own to create a neat little web monitoring utility.

It uses Python's httplib to send a GET requests at regular intervals to a URL and adds some timing instrumentation between each step of the request/response. Results are printed to the console.

get it here:
http://code.google.com/p/corey-projects/source/browse/trunk/python2/url_timer.py

to use, give it a site and interval:

> python url_timer.py www.goldb.org 5

... or an https/ssl url and an interval:

> python url_timer.py https://ssl.sonic.net/ 5

sample output:

>python url_timer.py www.goldb.org 5
request sent         response received    content transferred  size
------------         -----------------    -------------------  ----
0.0040 (0.0040)      0.2830 (0.2830)      0.2846 (0.2846)      9810 bytes (0.010 MB total)
0.0009 (0.0024)      0.2790 (0.2810)      0.2803 (0.2825)      9810 bytes (0.020 MB total)
0.0009 (0.0019)      0.2841 (0.2821)      0.2854 (0.2834)      9810 bytes (0.029 MB total)
0.0010 (0.0017)      0.2775 (0.2809)      0.2789 (0.2823)      9810 bytes (0.039 MB total)
0.0008 (0.0015)      0.2794 (0.2806)      0.2820 (0.2822)      9810 bytes (0.049 MB total)
0.0009 (0.0014)      0.2845 (0.2813)      0.2859 (0.2828)      9810 bytes (0.059 MB total)
0.0010 (0.0013)      0.2796 (0.2810)      0.2809 (0.2826)      9810 bytes (0.069 MB total)
0.0009 (0.0013)      0.2798 (0.2809)      0.2808 (0.2823)      9810 bytes (0.078 MB total)
0.0009 (0.0012)      0.2791 (0.2807)      0.2803 (0.2821)      9810 bytes (0.088 MB total)
...

Monday, October 5, 2009

Automated Web/HTTP Profiler with Selenium-RC and Python

A small new open source project I am working on:

Selenium-profiler is a web/http profiler built with Selenium-RC and Python. It profiles page load time and network traffic for a web page. The profiler uses Selenium-RC to automate site navigation (via browser), proxy traffic, and sniff the proxy for network traffic stats as requests pass through during a page load.

It is useful to answer questions like:

  • how many http requests does that page make?
  • how fast are the http responses coming back?
  • which http status codes are returned?
  • how many of each object type are requested?
  • what is the total page load time?

Get it here:
http://selenium-profiler.googlecode.com/

Sample Output:

--------------------------------
results for http://www.google.com/

content size: 38.06 kb

http requests: 9
status 200: 6
status 204: 1
status 403: 2

profiler timing:
2.063 secs (page load)
2.047 secs (network: end last request)
0.111 secs (network: end first request)

file extensions: (count, size)
gif: 1, 8.558 kb
ico: 2, 2.488 kb
js: 2, 18.046 kb
png: 1, 5.401 kb
unknown: 3, 3.567 kb

http timing detail:
403, GET, /favicon.ico, 111 ms
200, GET, /newkey, 738 ms
200, GET, /, 332 ms
403, GET, /favicon.ico, 3 ms
200, GET, /logo.gif, 255 ms
200, GET, /d9n_Nh4I09g.js, 411 ms
200, GET, /2cca7b2e99206b9c.js, 65 ms
204, GET, /generate_204, 88 ms
200, GET, /nav_logo7.png, 49 ms

Wednesday, September 23, 2009

Selenium RC with Python in 30 Seconds

Selenium is a suite of tools to automate web app testing across many platforms. It has various pieces (Core, RC, IDE, etc), and I struggled trying to figure out how everything fits together and works. At the end of the day, all I wanted to do was use Selenium from my Python code to drive a browser session.

Selenium ships with full tests and some good sample code, but the driver examples all contain test frameworks/runners (JUnit, unittest, etc). All I wanted was a simple way to integrate with Python. To do this, the piece I need is Selenium RC (which includes Selenium Core).

So here is the beginners' 30-second guide to getting the Python client driver working with Selenium-RC:

  1. Install Python (and add it to your path)
  2. Install Java (and add it to your path)
  3. Download Selenium RC
  4. Unzip Selenium RC and search for 'selenium-server.jar' and 'selenium.py'
  5. Copy them to a directory and run 'java -jar selenium-server.jar' to start the server
  6. Start writing your Python driver code! (import selenium)

So what's actually going on when you run a driver script?

Selenium-server is the core program, which also contains an integrated web server. You send HTTP requests to the server to instruct it how to drive your browser. The 'selenium.py' module is just a wrapper around 'httplib' that provides a Python API for interacting with the Selenium-server. You need to import the 'selenium.py' module from your script and then you are ready to go.

now let's test it out and see if it works. Try the following Python script. It should open Firefox and navigate to www.google.com.

#!/usr/bin/env python

from selenium import selenium

sel = selenium('localhost', 4444, '*firefox', 'http://www.google.com/')
sel.start()
sel.open('/')
sel.wait_for_page_to_load(10000)
sel.stop()

Friday, September 11, 2009

Python - HTTP Request Profiler

Here is a script in Python that profiles an HTTP request to a web server.

to use, give it a url:


> python http_profiler.py www.goldb.org

output:


0.12454 request sent
0.24967 response received
0.49155 data transferred (9589 bytes)

It uses Python's httplib to send a GET request to the URL and adds some timing instrumentation between each step of the request/response.

Python Code:


#!/usr/bin/env python
# Corey Goldberg - September 2009


import httplib
import sys
import time



if len(sys.argv) != 2:
    print 'usage:\nhttp_profiler.py \n(do not include http://)'
    sys.exit(1)

# get host and path names from url
location = sys.argv[1]
if '/' in location:
    parts = location.split('/')
    host = parts[0]
    path = '/' + '/'.join(parts[1:])
else:
    host = location
    path = '/'

# select most accurate timer based on platform
if sys.platform.startswith('win'):
    default_timer = time.clock
else:
    default_timer = time.time

# profiled http request
conn = httplib.HTTPConnection(host)
start = default_timer()  
conn.request('GET', path)
request_time = default_timer()
resp = conn.getresponse()
response_time = default_timer()
size = len(resp.read())
conn.close()     
transfer_time = default_timer()

# output
print '%.5f request sent' % (request_time - start)
print '%.5f response received' % (response_time - start)
print '%.5f content transferred (%i bytes)' % ((transfer_time - start), size)

Thursday, September 10, 2009

Web Page Profilers for Windows/Internet Explorer

A "Web Page Profiler" is a tool that is used to measure and analyze web page performance. It is usually implemented as a browser plugin or add-on, and lets you see performance of web pages/objects as they are transferred/loaded/executed/rendered.

For Firefox, choosing a profiler is a no-brainer. Firebug is an excellent developer tool that includes profiling capabilities.

Unfortunately, Firebug does not work with Microsoft's Internet Explorer. There is Firebug Lite, which "can" work with IE, but it is limited in functionality and requires you to install some server side code (which is not always feasible).

There are some web profilers specifically for IE, but none of them live up to the functionality or stability of Firebug. Some to look at are:

... also, Internet Explorer 8 has built-in "Developer Tools" (press F12) that include a basic performance profiler for JavaScript execution.


Does anyone know of any other profilers/performance tools that work with IE? Thoughts?

Wednesday, August 5, 2009

Segue/Borland/MicroFocus SilkPerformer - Who Owns It?

SilkPerformer is a popular performance and load testing tool. It has a confusing history and has bounced around between companies quite a bit. Here is where it came from and the current state of who owns it.

The history AFAIK:

  • Segue buys a product from some (scandanavian?) company and names it Segue SilkPerformer.
  • Borland buys Segue and names it Borland SilkPerformer.
  • Micro Focus buys Borland and name it Micro Focus SilkPerformer.

Right now it is a little confusing because Borland still lists it as a product. The acquisition of Borland just went through a few days ago and now Micro Focus has it listed on its site also. I would expect the Borland site to eventually disappear.

To make it even more confusing; Micro Focus also bought Compuware which had its own performance/load testing application. So now Micro Focus is marketing QALoad as well as SilkPerformer, which overlap in most functionality.

It will be interesting to see where this goes and what happens to the product lines.