September 30, 2008

Python - StackOverflow Fight

I've been having fun using the new StackOverflow site for answering technical questions.

Here is a Python script I call 'StackOverlow Fight' (like Google Fight). It takes a list of tags and gets the count of questions that are tagged with each one.

As an example, here is StackOverflow fight between some popular programming languages:

#!/usr/bin/env python

import urllib
import re

langs = ('python', 'ruby', 'perl', 'java')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count
    print lang, ':', count
sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Output:

python : 733
ruby : 391
perl : 167
java : 1440
java wins with 1440

September 25, 2008

Python - Thread Synchronization and Thread-safe Operations

This is the best article on thread synchronization in Python that I have come across: Thread Synchronization Mechanisms in Python

It is very clearly explained and just helped me solve a major concurrency issues I was having.

Especially interesting is the list of thread safe operations in Python.


Here are some thread-safe operations:

  • reading or replacing a single instance attribute
  • reading or replacing a single global variable
  • fetching an item from a list
  • modifying a list in place (e.g. adding an item using append)
  • fetching an item from a dictionary
  • modifying a dictionary in place (e.g. adding an item, or calling the clear method)

Here is an explanation of global value mutation that are thread safe:

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm


Update: Now I am confused. Doesn't this example in the article contradict bullet point 2 above?

Update Update: See comments below for better explanation and info.

September 22, 2008

Stack Overflow Needs More Pythonistas

Stack Overflow is a new community site for answering programming questions. It is very impressive to me, but there seems to be a very Microsoft (.NET, C#, etc) slant to the questions being asked.

It would be great to see more Python questions and answers over there. The one python related question I did ask was answered fabulously... so there are Python-heads reading and answering, but not posting very many questions.

Fellow Pythonistas, hop to it! Let's get a bigger Python presence over there.

http://stackoverflow.com/tags/python

September 19, 2008

GUI Test Automation Considered Harmful

I have been saying this for years, and Alan Page finally posted about it:

GUI Schmooey
"The point is that I think testers, in general, spend too much time trying to automate GUIs. I’m not against automation – just write automation starting at a level below the GUI and move down from there."

The GUI tends to be the most fragile layer of an application. It also tends to be the layer where a lot of automated test infrastructure is built. I don't like this style of "black box automation". It is tempting to automate a regular user's UI experience; and indeed this can be useful. However, in most cases, you are not being as productive as you could be by taking a manual/unscripted approach to your GUI functional testing.

Manual and Exploratory testing seems like a good fit for GUI testing, while lower layers can benefit most from test automation.

September 17, 2008

Python Timing - time.clock() vs. time.time()

Dear Lazyweb,

Which is better to use for timing in Python? time.clock() or time.time()? Which one provides more accuracy?

for example:

start = time.clock()
... do something
elapsed = (time.clock() - start)

vs.

start = time.time()
... do something
elapsed = (time.time() - start)

Update:
From the comments I received and others I have talked with, it looks like the answer is: use time() on Unix/Linux systems and clock() on Windows systems. However, the 'timeit' module gives you more options and is accurate across platforms.

September 11, 2008

Python vs. Java - HTTP GET Request Complexity

Wow.. talk about programming with a ball and chain on! Some languages just take so much code to do a simple thing.

The following 2 programs each do an HTTP GET and print the contents of the URL.

In Java:

import java.net.*;
import java.io.*;

public class JGet {
    public static void main (String[] args) throws IOException {
        try {
            URL url = new URL("http://www.google.com");
    
            BufferedReader in = 
                new BufferedReader(new InputStreamReader(url.openStream()));
            String str;

            while ((str = in.readLine()) != null) {
                System.out.println(str);
            }

            in.close();
        } 
        catch (MalformedURLException e) {} 
        catch (IOException e) {}
    }
}

Equivalent in Python:

import urllib
print urllib.urlopen('http://www.google.com').read()

Python Ruined Me

I've been programming almost exclusively in Python for the past 4 years or so. Occasionally I have to write code in C# or Perl.

After using Python:

  • Perl seems *so much* more confusing than I used to think it was.
  • C# with its static typing and forced OO feels like programming with a ball & chain on.

I love me some Python... but dammit... you ruined me.

Test Tools - Open Source Feedback and Participation

Here is a story and some observations I have made as an Open Source developer:

I had a specific need for a tool to test web services a few years ago ('02-'03). I had been rolling my own tools in various languages for quite a while and had some stuff that I thought others might be interested in; so I setup a SourceForge project and released it in Jan'04. The tool is called WebInject. Not many people used it at first. I just kept adding features and releasing. I was scratching my own itches and using my tool internally at a company I was working for. Eventually I got a few bites and some people started to download it and try it out. As of today, the tool has been downloaded over 60,000 times. I've had feedback from all corners of the world.

In the early days, *any* bug that was reported, I would dive into immediately. I was turning around fixes and new releases in crazy time. This personal attention sparked a lot of lengthy email conversations where I was starting to get real feedback. It wasn't the case where I could just sit back and expect quality feedback. I really engaged the users and asked questions. Pretty soon I was getting more feedback than I could reasonably handle. Around this time I started getting some patches sent to me. This accelerated development as I started to get various ideas from other people. A few silly bugs were also cleaned up; stuff I never would have caught on my own. Encouraging patches is huge. Once someone gets a piece of their own code into your code base, they have an intimate connection.

I don't maintain WebInject much any more. I think it is cool and useful, but it's a rats nest of Perl code that needs some serious love. There is some stuff that bothers me so much about it that I barely want to touch it sometimes :) I spend all my time on newer tools these days, but I still keep on top of it enough to facilitate others using it and posting patches/updates to it (though I haven't released in a long time). Oddly, it has become somewhat entrenched in monitoring systems. Most of the users lately seem to be people running Nagios (open source monitoring system) that need an intelligent web plugin/agent.


Some basic thoughts on open source tool adoption and getting feedback:

  • A tool without documentation doesn't exist.. without quality documentation, don't even bother.. nobody will touch it.

  • If it takes someone more than 10 minutes to install, configure, run, and see the results from a simple test case.. nobody will use it. Once someone sees something work, it is tangible and they become engaged. I achieved this with my tool. if you download and run it, you are presented with a GUI. Pressing "run" executes a preconfigured sample test case that hits my site. I wanted the most clueless of people to be able to immediately see a green "PASS", and say "hey, cool it actually did something". This satisfaction must come immediately or it will be abandoned. (unless of course you already have a user base that has made it past this hurdle and can encourage others to put in the effort to get it working)

  • Make your tool easy to get. The online presence of your tool should allow someone to visit the website and immediately download it. I have seen tools released that take 5 levels of navigation before you get to the download options, or don't even have a web site or project page somewhere.

  • Announce your tools.. release early, release often.. who cares.. blog it, forum it, email it, digg it, usenet it, comment about it.. see if anyone has interest. Don't annoyingly spam it, bet let people know it exists. If you announce your tool once only, buried deep in a technical mailing list, don't expect people to come rushing.

  • Encourage participation. Go out of your way to let people know that you encourage and appreciate feedback, bug reports, offers for collaboration, patches, etc.

  • You need a forum or a mailing list with a web accessible archive. It has to be super simple to talk to you in public and let others see answers to questions and begin to participate on their own and to offer advice. Using a many-to-one scheme where you just tell people to email you isn't gonna work. It is too private.. people want to know what is going on and hear from each other and offer tips. I have had several thousand posts to my forums for this little tool.

September 8, 2008

SMU using WebInject in Advanced Java Course

My web testing tool (WebInject) is now being used as part of the Advanced Java class being taught at SMU's Engineering school.

They will be using it for testing Servlets.

very cool!

http://engr.smu.edu/~coyle/cse7345/

List of Programming Languages I Know

This is a an ordered list of programming languages I have learned in the past 15 years:

  1. BASIC
  2. LISP (Scheme)
  3. C++
  4. Shell
  5. Perl
  6. JavaScript
  7. Java
  8. Python
  9. C#

What does your list look like? (add it to the comments)

September 5, 2008

Performance Testing - Load Balancer SSL Stress Testing

I am in the process of stress testing a new load balancer (F5 Big-IP 9.x). One concern with the device is its ability to handle concurrent SSL connections and terminations, and what the impact is on performance and resources. The nature off SSL makes it very CPU intensive (encryption). This is basically a torture test of HTTP GETs using SSL. What I need is a program that can keep HTTP requests active at line-speed so there are always concurrent requests being served.

The goal I am looking to reach is 50k concurrent active SSL connections going through the load balancer to a web farm.

First off, I need a load generator program. If I were to use a commercial load testing tool (LoadRunner, SilkPerformer, etc), I would most likely be unable to do a test like this due to Virtual User licensing costs. So the best way for me to achieve this is with a custom homebrewed tool. So of course I reached for Python.

The load generating machines I was given have Dual 2GHz CPUs,2GB memory, running Windows XP. The real challenge is seeing how many concurrent requests I can send from a Windows box!

First I started with a Python script that launches threads to do the HTTP (SSL) requests.

The script looks like this:
#!/usr/bin/env python
# ssl_load.py - Corey Goldberg - 2008

import httplib
from threading import Thread

threads = 250
host = '192.168.1.14'
file = '/foo.html'

def main():
    for i in range(threads):
        agent = Agent()
        agent.start()
            
class Agent(Thread):
    def __init__(self):
        Thread.__init__(self)
        
    def run(self):
        while True:
            conn = httplib.HTTPSConnection(host)
            conn.request('GET', file)
            resp = conn.getresponse()

if __name__ == '__main__':
    main()
This works great for small workloads, but Windows stops me from creating threads at around 800 per process (sometimes it was flaky and crashed after 300), so that won't work. The next step was to break the load into several processes and launch threads from each process. Rather than try to wrap this into a single script, I just use a starter script to call many instances of my load test script. I used a starter script like this:
#!/usr/bin/env python

import subprocess
processes = 60
for i in range(processes):
    subprocess.Popen('python ssl_load.py') 
250 threads per process seemed about right and didn't crash my system. Using this, I was able to launch 60 processes which gave me a total of 15k threads running.

I didn't reach the 50k goal, but on that hardware I doubt anyone could. With my 15k threads per machine, I was able to use 4 load generating machines to achieve 60k concurrent SSL connections. Not bad!