Corey Goldberg: 2008

December 30, 2008

Pylot - 6 Tools You Can Use To Analyze Your Web Site's Performance

Over at "Inside the Nerdery" (love the name), Jodi Chromey did a short writeup on web performance and analysis tools. Pylot made the list :)

6 tools you can use to analyze your Web site’s performance

"Everyone wants a Web site that’s running at optimum speed, but for a lot of people they have no idea what the heck that is. That’s why the Nerds have created this list of six Web performance analysis tools."

December 29, 2008

When Will I Officially Switch To Python 3.0?

I've been writing code in Python 3.0 (3000) this week. It feels really nice. Basically the same old Python, with some corners cleaned up and reorganized.

I have tons of legacy code to support that is written in 2.4/2.5, so I still need to keep my hands in Python 2.x for quite a long time. But when will I switch over all new projects so they use Python 3.0?

Well the answer is: as soon as my 5 favorite libraries work with Python 3.0. Here is what I'm waiting on:

I have no idea when these will be ported to work with 3.0. Hopefully I'm not waiting *years* for this.

When are you switching or migrating?

December 28, 2008

WebInject - Web Monitoring - Getting It Working

My monitoring tool: WebInject is a popular plugin in the Nagios community. A Nagios user (Felipe Ferreira) just wrote a small how-to on setting up WebInject. (though the official manual is much more thorough)

Webinject How to

"When monitoring websites, many times we will need more than just checking if the site is up. We may need to see if the internals of the website are working. A good example is making a user login check. Using Nagios alone it can not be done, but thanks to Corey Goldberg, it is possible using his script webinject.pl. [snip...] And yes it works with HTTPS and redirections."

Thanks Felipe!

December 26, 2008

Python - Windows - Reboot a Remote Server

Here is a Python script to reboot a remote Windows server. It requires the Python Win32 Extensions to be installed.

# Reboot a remove server

import win32security
import win32api
import sys
import time
from ntsecuritycon import *


def RebootServer(message='Server Rebooting', timeout=30, bForce=0, bReboot=1):
    AdjustPrivilege(SE_SHUTDOWN_NAME)
    try:
        win32api.InitiateSystemShutdown(None, message, timeout, bForce, bReboot)
    finally:
        # Remove the privilege we just added.
        AdjustPrivilege(SE_SHUTDOWN_NAME, 0)


def AbortReboot():
    AdjustPrivilege(SE_SHUTDOWN_NAME)
    try:
        win32api.AbortSystemShutdown(None)
    finally:
        # Remove the privilege we just added.
        AdjustPrivilege(SE_SHUTDOWN_NAME, 0)


def AdjustPrivilege(priv, enable=1):
    # Get the process token.
    flags = TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY
    htoken = win32security.OpenProcessToken(win32api.GetCurrentProcess(), flags)
    # Get the ID for the system shutdown privilege.
    id = win32security.LookupPrivilegeValue(None, priv)
    # Obtain the privilege for this process.
    # Create a list of the privileges to be added.
    if enable:
        newPrivileges = [(id, SE_PRIVILEGE_ENABLED)]
    else:
        newPrivileges = [(id, 0)]
    # and make the adjustment.
    win32security.AdjustTokenPrivileges(htoken, 0, newPrivileges)
    
    
if __name__=='__main__':
        message = 'This server is pretending to reboot\nThe shutdown will stop in 10 seconds'
        RebootServer(message)
        print 'Sleeping for 10 seconds'
        time.sleep(10)
        print 'Aborting shutdown'
        AbortReboot()

Note: This code came from somewhere on the net and was not attributed to anyone. If you wrote it, let me know.

December 25, 2008

New Book - Programming in Python 3

Holiday reading just arrived...

Finally got my hands on the brand new "Programming in Python 3 - A Complete Introduction to the Python Language". As far as I know, this is the first print book covering Python 3.0 (Python 3000). A quick skim looked promising.

Update: I've read a good chunk of this book now, and it is excellent. It's very clearly written and all code examples are clean and follow nice conventions. It covers a wide variety of topics including all the important stuff in the language and standard library. This book would make a good mid level introductory book to the language, or as a good reference/refresher to a more experience Python hacker. I highly recommend picking this up.

December 24, 2008

REST Simplifies Performance and Load Testing

I've finally realized...

REST is a godsend for Performance and Load Testing. It's not just a steaming pile of crap tunneled over HTTP.

December 17, 2008

Dropbox - Great File Synching and Backup

Dropbox is a file synching and backup tool that I began using a few months back. It is now one of my favorite tools.

You install a service and designate a folder as your "dropbox" and then register the machine with your online account. You can do this on as many machines as you want. When you add content to the folder, all the registered machines with it installed synch the folder together.

So I have one common "code" folder on my 2 home laptops and work PC that I keep in synch. It also stores the data in a secure web repository and has version control.

Dropbox is useful for synching multiple machines or collaborating with a group of people that have synched dropboxes to the same account. It works over the web on port 80 so you don't have to worry about VPN's or corporate firewalls or anything. It also works on almost all platforms and is free (up to 2GB storage).

December 16, 2008

Wordle - Word Cloud Of My Twitter Tweets

Wordle generates "word clouds" from text that you provide. Here is a Wordle mashup showing the word density in my Tweets from Twitter:

December 15, 2008

Python - Get Remote Windows Metrics Script

In a recent blog post, I introduced a script for remotely monitoring Windows machines. It is used to retrieve performance/health metrics. It uses WMI (Windows Management Instrumentation) to interact with the Windows machine and its Win32 classes.

Now I will show how you an example of using the module (which I named "machine_stats.py"). I call it from a multithreaded controller to poll multiple machines in parallel.

The script below reads a list of Host Names (or IP's) from a file named "machines.txt". It then spawns a thread for each machine to be monitored and grabs each health metric. Results are printed to STDOUT so you can pipe it to a file (it is in csv format) to analyze. You can call this script at regular intervals to continuously monitor a set of machines.

#!/usr/bin/env python
# (c) Corey Goldberg (corey@goldb.org) 2008


import time
import threading
import pythoncom
import machine_stats



# set these to the Windows login name with admin privileges
user_name = 'corey'
pwd = 'secret'
        
        
        
def main():           
    host_names = [line.strip() for line in open('machines.txt', 'r').readlines()]
    metric_store = MetricStore()
               
    # gather metrics in threads
    for host in host_names:
        mp = MetricsProbe(host, metric_store)
        mp.start()
    while len(threading.enumerate()) > 1:  # wait for threads to finish
        time.sleep(.25)
    
        

class MetricsProbe(threading.Thread):
    def __init__(self, host, metric_store):
        threading.Thread.__init__(self)
        self.host = host
        self.metric_store = metric_store
        
    def run(self):
        pythoncom.CoInitialize()  # need this for multithreading COM/WMI
        try:
            uptime = machine_stats.get_uptime(self.host, user_name, pwd)
            cpu = machine_stats.get_cpu(self.host, user_name, pwd)
            mem_mbyte = machine_stats.get_mem_mbytes(self.host, user_name, pwd)
            mem_pct = machine_stats.get_mem_pct(self.host, user_name, pwd)            
        except:
            uptime = 0
            cpu = 0
            mem_mbyte = 0
            mem_pct = 0
            print 'error getting stats from host: ' + self.host
        self.metric_store.uptimes[self.host] = uptime
        self.metric_store.cpus[self.host] = cpu
        self.metric_store.mem_mbytes[self.host] = mem_mbyte
        self.metric_store.mem_pcts[self.host] = mem_pct
        print '%s,uptime:%d,cpu:%d,mem_mbyte:%d,mem_pct:%d' \
            % (self.host, uptime, cpu, mem_mbyte, mem_pct)
                


class MetricStore(object):
    def __init__(self):
        self.uptimes = {}
        self.cpus = {}
        self.mem_mbytes = {}
        self.mem_pcts = {}
            


if __name__ == '__main__':
    main()

Output is a single line like this:

12/15/08 08:10:33,server2,uptime:158,cpu:5,mem_mbyte:1125,mem_pct:29

Usage Instructions:

Download the Code from here: monitor_win_machines.zip
Edit "machines.txt" and add your own Hosts/IPs
Edit the username/password in "monitor_machines.py"
Run the "monitor_machines.py" script from the command line.

You could continuously pipe the output to text file like this to build your results file:

$> python monitor_machines.py >> output.txt

December 12, 2008

Python - Monitor Windows Remotely With WMI

Below is a simple Python module for remotely monitoring Windows machines. It is used to retrieve performance/health metrics.

I only implemented 5 functions:

Uptime
CPU Utilization
Available Memory
Memory Used
Ping

You can get an idea of how to interact with Windows via WMI (Windows Management Instrumentation). I use it from a multithreaded manager so I can concurrently poll a distributed farm of servers.

To run this, you will first need to install the WMI module. Follow the instructions and get the download from here: http://timgolden.me.uk/python/wmi.html (hint: it is just a single script you can drop in the same directory your script runs from)

import re
import wmi
from subprocess import Popen, PIPE



def get_uptime(computer, user, password):
    c = wmi.WMI(computer=computer, user=user, password=password, find_classes=False)
    secs_up = int([uptime.SystemUpTime for uptime in c.Win32_PerfFormattedData_PerfOS_System()][0])
    hours_up = secs_up / 3600
    return hours_up


def get_cpu(computer, user, password):
    c = wmi.WMI(computer=computer, user=user, password=password, find_classes=False)
    utilizations = [cpu.LoadPercentage for cpu in c.Win32_Processor()]
    utilization = int(sum(utilizations) / len(utilizations))  # avg all cores/processors
    return utilization

    
def get_mem_mbytes(computer, user, password):
    c = wmi.WMI(computer=computer, user=user, password=password, find_classes=False)
    available_mbytes = int([mem.AvailableMBytes for mem in c.Win32_PerfFormattedData_PerfOS_Memory()][0])
    return available_mbytes


def get_mem_pct(computer, user, password):
    c = wmi.WMI(computer=computer, user=user, password=password, find_classes=False)
    pct_in_use = int([mem.PercentCommittedBytesInUse for mem in c.Win32_PerfFormattedData_PerfOS_Memory()][0])
    return pct_in_use
    
    
def ping(host_name):
    p = Popen('ping -n 1 ' + host_name, stdout=PIPE)
    m = re.search('Average = (.*)ms', p.stdout.read())
    if m:
        return True
    else:
        raise Exception

I am building some home-brewed monitoring tools that use these functions as a basis for plugins.

How are other people monitoring Windows from Python?
Suggestions?
Improvements?

Windows - NagiosPluginsNT - Remote WMI Monitoring

Note to self:

Here is a really good collection of remote WMI monitoring scripts for Windows:

NagiosPluginsNT

It is written in C# and looks nicely done.

December 11, 2008

Big Push 2009 - Free Software Foundation Appeal

About 6 years ago, I spent a good chunk of time as a webmaster for the GNU Project and the Free Software Foundation. Volunteering for the FSF and GNU was one way to express my ideals and somehow give back to the community. This is also what prompts me to release most of my own software under the GPL. The ethics and technical merits of Free Software have appealed to me since I was first introduced to the concept about a decade ago.

Here is the latest newsletter from the FSF regarding their current campaigns. It is a great summary of what is going on in the world of Free Software right now:

The Big Push 2009 -- Free Software Foundation Appeal

"As advocates for free software, we can challenge the status quo and so-called convenience of using the invasive tools of proprietary software companies, because the opportunities for change have never been better."

-Peter Brown
Executive Director, Free Software Foundation

Best Sticker Ever?

best sticker ever?

December 9, 2008

Work Computer Setup

Current Rig - Dec 2008 (AKA "The Matrix"):

... life begins at 3 monitors

Python - Trying Out NetBeans IDE

I normally don't use an IDE for Python development. The majority of my day is spent inside my SciTE editor.

However, Sun recently released NetBeans IDE 6.5 for Python, and I figured I would give it a try. So far I really like it. The Python support works well. It has syntax highlighting, code complete, debugger, integrated console, etc. Overall, it is a comprehensive and nicely designed IDE.

The IDE is written in Java so I thought it would be dog slow, but it's pretty snappy to work with.

I've decided I will try it for a week and then decide whether to use it as my main dev environment.

Anyone else using NetBeans for Python development?

December 3, 2008

Development Setup: Tools I Can't Live Without

At my current gig, these are the essential tools I use for almost all development:

IDE/Editors:

SciTE
Visual Studio
Eclipse

Language Compilers/Interpreters:

Python
C#
Java
Perl

Utilities/Etc:

Firefox
FileZilla
DropBox
Cygwin
WinMerge

December 2, 2008

"Wolf Fence" Debugging

I heard this term for the first time today, and realized it is a habit I usually follow rather than firing up a debugger and stepping through my code.

Defined by Dennis Lee Bieber here:
http://www.velocityreviews.com/forums/t356093-how-to-debug-python-code.html

"There's one wolf in Alaska, how do you find it? First build a fence down the middle of the state, wait for the wolf to howl, determine which side of the fence it is on. Repeat process on that side only, until you get to the point where you can see the wolf). In other words; put in a few "print" statements until you find the statement that is failing (then maybe work backwords from the "tracks" to find out where the wolf/bug comes from)."

The original term comes from "The Wolf Fence algorithm for debugging" (Edward J. Gauss, ACM - 1982)

December 1, 2008

Home Computer Setup

Current Rig (Nov 2008):

November 14, 2008

Does Pylot Run on Macs? Yup.

I had previously tested Pylot on various machines and OS's, including: Windows XP, Vista, Cygwin, Ubuntu, Eee PC.

Recently I got news that it also runs on Macs, and the GUI looks pretty damn good:

November 12, 2008

Get CPU Utilization from Remote Windows Machines with WMI

This is a little script that gets the CPU Utilization metric from a remote Windows machine. On multi-way or multi-core machines, it will return an average of all processors/cores.

To run this, you will first need to install the WMI module. Follow the instructions and get the download from here: http://timgolden.me.uk/python/wmi.html

import wmi

def get_cpu(computer, user, password):
    c = wmi.WMI(find_classes=False, computer=computer, user=user, password=password)
    utilizations = [cpu.LoadPercentage for cpu in c.Win32_Processor()]
    utilization = int(sum(utilizations) / len(utilizations))
    return utilization

November 7, 2008

Pylot - 15 Tools to Help You Develop Faster Web Pages

15 Tools to Help You Develop Faster Web Pages

"Response times, availability, and stability are vital factors to bear in mind when creating and maintaining a web application. If you’re concerned about your web pages’ speed or want to make sure you’re in tip-top shape before starting or launching a project, here’s a few useful, free tools to help you create and sustain high-performance web applications."

Pylot makes the list :)

October 30, 2008

Pylot and WebInject - Chosen as "Top web application load and stress test tools"

Pylot (my web performance test tool) and WebInject (my web functional test tool) were both just named on the "List of top web application load and stress test tools" from WarePrise.

I am really glad people are finding Pylot useful. It still needs a lot of work and functionality to get it to where I want it to be, but so far adoption is good for such a simple tool. It is written in Python and I use it regularly for my own testing.

WebInject is a more advanced tool that has been widely adopted for a few years as a testing tool and monitoring plugin. Lately it has taken on a life of its own and keeps popping up everywhere. Who knew my ugly ball of Perl code would get that popular?

Thanks for the props!

Python - Job Paradox

Paul Graham has some great essays online that made up the content of his book: Hackers & Painters. One of my favorite posts is The Python Paradox, which presents something rather counterintuitive at first read:

"I'll call it the Python paradox: if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.

This concept may be a little scary to some. Learning an esoteric language improves your chances at getting a "good" job? (presumably "good" meaning one you like) But what about all the recruiters salivating for Java/JEE and .NET programmers?

"People don't learn Python because it will get them a job; they learn it because they genuinely like to program and aren't satisfied with the languages they already know."

.. and there lies the paradox. If you don't know an esoteric language, or one that is not considered mainstream Enterprise, you can't learn one just to land a job. You must already happen to enjoy programming enough to seek it on your own.

October 24, 2008

SQAForums - 3000th Post

I just made my 3000th comment/post over at SQAForums. I've been a member and have enjoyed being part of that community since 2001. 8 years and 3000 posts.. wow.

I spend a lot of time posting in the "Performance & Load Testing" and "Automated Testing" forums. Good stuff.

October 20, 2008

Pylot - Use Case: Solid-State Drive Web Server Test

Someone did a nice job of using my tool (Pylot) for some webserver benchmarking:

"This article shows how response times improved after upgrading to a new dedicated web server with solid-state drives and faster processors."

http://www.websiteoptimization.com/speed/tweak/solid-state-drive/

October 6, 2008

Python - HTTPConnection State Machine

This state machine diagram from the Python source code (client.py in Python 3, httplib.py in Python 2) is useful for understanding the different states of a client/server connection via HTTP:

http://svn.python.org/view/python/branches/py3k/Lib/http/

HTTPConnection goes through a number of "states", which define when a client
may legally make another request or fetch the response for a particular
request. This diagram details these state transitions:

    (null)
      |
      | HTTPConnection()
      v
    Idle
      |
      | putrequest()
      v
    Request-started
      |
      | ( putheader() )*  endheaders()
      v
    Request-sent
      |
      | response = getresponse()
      v
    Unread-response   [Response-headers-read]
      |\____________________
      |                     |
      | response.read()     | putrequest()
      v                     v
    Idle                  Req-started-unread-response
                     ______/|
                   /        |
   response.read() |        | ( putheader() )*  endheaders()
                   v        v
       Request-started    Req-sent-unread-response
                            |
                            | response.read()
                            v
                          Request-sent

This diagram presents the following rules:
  -- a second request may not be started until {response-headers-read}
  -- a response [object] cannot be retrieved until {request-sent}
  -- there is no differentiation between an unread response body and a
     partially read response body

October 2, 2008

Python 2.6 is Released - Thanks!

Python 2.6 is official:
http://www.python.org/download/releases/2.6/

What's new in Python 2.6:
http://docs.python.org/whatsnew/2.6.html

Thanks to all the devs and contributors.. you make my programming life great.

A good writeup here:
http://jessenoller.com/2008/10/02/python-26-is-released/

October 1, 2008

Python - Python 3.0 Early Books and Documentation

I have lots of references and books for Python 2.x. I am looking forward to upgrading them and doing lots of reading for the upcoming Python 3.0 release.

As usual, the official Python 3.0 docs are great:
http://docs.python.org/dev/3.0/

There is already a Python 3.0 book online by Swaroop C H:
A Byte Of Python

I also just pre-ordered Mark Summerfield's printed book on Python 3.0:
Programming in Python 3: A Complete Introduction to the Python Language

Hopefully lots more books and docs on Python 3.0 are coming soon.

September 30, 2008

Python - StackOverflow Fight

I've been having fun using the new StackOverflow site for answering technical questions.

Here is a Python script I call 'StackOverlow Fight' (like Google Fight). It takes a list of tags and gets the count of questions that are tagged with each one.

As an example, here is StackOverflow fight between some popular programming languages:

#!/usr/bin/env python

import urllib
import re

langs = ('python', 'ruby', 'perl', 'java')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count
    print lang, ':', count
sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Output:

python : 733
ruby : 391
perl : 167
java : 1440
java wins with 1440

September 25, 2008

Python - Thread Synchronization and Thread-safe Operations

This is the best article on thread synchronization in Python that I have come across: Thread Synchronization Mechanisms in Python

It is very clearly explained and just helped me solve a major concurrency issues I was having.

Especially interesting is the list of thread safe operations in Python.

Here are some thread-safe operations:

reading or replacing a single instance attribute

reading or replacing a single global variable

fetching an item from a list

modifying a list in place (e.g. adding an item using append)

fetching an item from a dictionary

modifying a dictionary in place (e.g. adding an item, or calling the clear method)

Here is an explanation of global value mutation that are thread safe:

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

Update: Now I am confused. Doesn't this example in the article contradict bullet point 2 above?

Update Update: See comments below for better explanation and info.

September 22, 2008

Stack Overflow Needs More Pythonistas

Stack Overflow is a new community site for answering programming questions. It is very impressive to me, but there seems to be a very Microsoft (.NET, C#, etc) slant to the questions being asked.

It would be great to see more Python questions and answers over there. The one python related question I did ask was answered fabulously... so there are Python-heads reading and answering, but not posting very many questions.

Fellow Pythonistas, hop to it! Let's get a bigger Python presence over there.

http://stackoverflow.com/tags/python

September 19, 2008

GUI Test Automation Considered Harmful

I have been saying this for years, and Alan Page finally posted about it:

GUI Schmooey

"The point is that I think testers, in general, spend too much time trying to automate GUIs. I’m not against automation – just write automation starting at a level below the GUI and move down from there."

The GUI tends to be the most fragile layer of an application. It also tends to be the layer where a lot of automated test infrastructure is built. I don't like this style of "black box automation". It is tempting to automate a regular user's UI experience; and indeed this can be useful. However, in most cases, you are not being as productive as you could be by taking a manual/unscripted approach to your GUI functional testing.

Manual and Exploratory testing seems like a good fit for GUI testing, while lower layers can benefit most from test automation.

September 17, 2008

Python Timing - time.clock() vs. time.time()

Dear Lazyweb,

Which is better to use for timing in Python? time.clock() or time.time()? Which one provides more accuracy?

for example:

start = time.clock()
... do something
elapsed = (time.clock() - start)

vs.

start = time.time()
... do something
elapsed = (time.time() - start)

Update:
From the comments I received and others I have talked with, it looks like the answer is: use time() on Unix/Linux systems and clock() on Windows systems. However, the 'timeit' module gives you more options and is accurate across platforms.

September 11, 2008

Python vs. Java - HTTP GET Request Complexity

Wow.. talk about programming with a ball and chain on! Some languages just take so much code to do a simple thing.

The following 2 programs each do an HTTP GET and print the contents of the URL.

In Java:

import java.net.*;
import java.io.*;

public class JGet {
    public static void main (String[] args) throws IOException {
        try {
            URL url = new URL("http://www.google.com");
    
            BufferedReader in = 
                new BufferedReader(new InputStreamReader(url.openStream()));
            String str;

            while ((str = in.readLine()) != null) {
                System.out.println(str);
            }

            in.close();
        } 
        catch (MalformedURLException e) {} 
        catch (IOException e) {}
    }
}

Equivalent in Python:

import urllib
print urllib.urlopen('http://www.google.com').read()

Python Ruined Me

I've been programming almost exclusively in Python for the past 4 years or so. Occasionally I have to write code in C# or Perl.

After using Python:

Perl seems *so much* more confusing than I used to think it was.
C# with its static typing and forced OO feels like programming with a ball & chain on.

I love me some Python... but dammit... you ruined me.

Test Tools - Open Source Feedback and Participation

Here is a story and some observations I have made as an Open Source developer:

I had a specific need for a tool to test web services a few years ago ('02-'03). I had been rolling my own tools in various languages for quite a while and had some stuff that I thought others might be interested in; so I setup a SourceForge project and released it in Jan'04. The tool is called WebInject. Not many people used it at first. I just kept adding features and releasing. I was scratching my own itches and using my tool internally at a company I was working for. Eventually I got a few bites and some people started to download it and try it out. As of today, the tool has been downloaded over 60,000 times. I've had feedback from all corners of the world.

In the early days, *any* bug that was reported, I would dive into immediately. I was turning around fixes and new releases in crazy time. This personal attention sparked a lot of lengthy email conversations where I was starting to get real feedback. It wasn't the case where I could just sit back and expect quality feedback. I really engaged the users and asked questions. Pretty soon I was getting more feedback than I could reasonably handle. Around this time I started getting some patches sent to me. This accelerated development as I started to get various ideas from other people. A few silly bugs were also cleaned up; stuff I never would have caught on my own. Encouraging patches is huge. Once someone gets a piece of their own code into your code base, they have an intimate connection.

I don't maintain WebInject much any more. I think it is cool and useful, but it's a rats nest of Perl code that needs some serious love. There is some stuff that bothers me so much about it that I barely want to touch it sometimes :) I spend all my time on newer tools these days, but I still keep on top of it enough to facilitate others using it and posting patches/updates to it (though I haven't released in a long time). Oddly, it has become somewhat entrenched in monitoring systems. Most of the users lately seem to be people running Nagios (open source monitoring system) that need an intelligent web plugin/agent.

Some basic thoughts on open source tool adoption and getting feedback:

A tool without documentation doesn't exist.. without quality documentation, don't even bother.. nobody will touch it.

If it takes someone more than 10 minutes to install, configure, run, and see the results from a simple test case.. nobody will use it. Once someone sees something work, it is tangible and they become engaged. I achieved this with my tool. if you download and run it, you are presented with a GUI. Pressing "run" executes a preconfigured sample test case that hits my site. I wanted the most clueless of people to be able to immediately see a green "PASS", and say "hey, cool it actually did something". This satisfaction must come immediately or it will be abandoned. (unless of course you already have a user base that has made it past this hurdle and can encourage others to put in the effort to get it working)

Make your tool easy to get. The online presence of your tool should allow someone to visit the website and immediately download it. I have seen tools released that take 5 levels of navigation before you get to the download options, or don't even have a web site or project page somewhere.

Announce your tools.. release early, release often.. who cares.. blog it, forum it, email it, digg it, usenet it, comment about it.. see if anyone has interest. Don't annoyingly spam it, bet let people know it exists. If you announce your tool once only, buried deep in a technical mailing list, don't expect people to come rushing.

Encourage participation. Go out of your way to let people know that you encourage and appreciate feedback, bug reports, offers for collaboration, patches, etc.

You need a forum or a mailing list with a web accessible archive. It has to be super simple to talk to you in public and let others see answers to questions and begin to participate on their own and to offer advice. Using a many-to-one scheme where you just tell people to email you isn't gonna work. It is too private.. people want to know what is going on and hear from each other and offer tips. I have had several thousand posts to my forums for this little tool.

September 8, 2008

SMU using WebInject in Advanced Java Course

My web testing tool (WebInject) is now being used as part of the Advanced Java class being taught at SMU's Engineering school.

They will be using it for testing Servlets.

very cool!

http://engr.smu.edu/~coyle/cse7345/

List of Programming Languages I Know

This is a an ordered list of programming languages I have learned in the past 15 years:

BASIC

LISP (Scheme)

C++

Shell

Perl

JavaScript

Java

Python

C#

What does your list look like? (add it to the comments)

September 5, 2008

Performance Testing - Load Balancer SSL Stress Testing

I am in the process of stress testing a new load balancer (F5 Big-IP 9.x). One concern with the device is its ability to handle concurrent SSL connections and terminations, and what the impact is on performance and resources. The nature off SSL makes it very CPU intensive (encryption). This is basically a torture test of HTTP GETs using SSL. What I need is a program that can keep HTTP requests active at line-speed so there are always concurrent requests being served.

The goal I am looking to reach is 50k concurrent active SSL connections going through the load balancer to a web farm.

First off, I need a load generator program. If I were to use a commercial load testing tool (LoadRunner, SilkPerformer, etc), I would most likely be unable to do a test like this due to Virtual User licensing costs. So the best way for me to achieve this is with a custom homebrewed tool. So of course I reached for Python.

The load generating machines I was given have Dual 2GHz CPUs,2GB memory, running Windows XP. The real challenge is seeing how many concurrent requests I can send from a Windows box!

First I started with a Python script that launches threads to do the HTTP (SSL) requests.

The script looks like this:

#!/usr/bin/env python
# ssl_load.py - Corey Goldberg - 2008

import httplib
from threading import Thread

threads = 250
host = '192.168.1.14'
file = '/foo.html'

def main():
    for i in range(threads):
        agent = Agent()
        agent.start()
            
class Agent(Thread):
    def __init__(self):
        Thread.__init__(self)
        
    def run(self):
        while True:
            conn = httplib.HTTPSConnection(host)
            conn.request('GET', file)
            resp = conn.getresponse()

if __name__ == '__main__':
    main()

This works great for small workloads, but Windows stops me from creating threads at around 800 per process (sometimes it was flaky and crashed after 300), so that won't work. The next step was to break the load into several processes and launch threads from each process. Rather than try to wrap this into a single script, I just use a starter script to call many instances of my load test script. I used a starter script like this:

#!/usr/bin/env python

import subprocess
processes = 60
for i in range(processes):
    subprocess.Popen('python ssl_load.py')

250 threads per process seemed about right and didn't crash my system. Using this, I was able to launch 60 processes which gave me a total of 15k threads running.

I didn't reach the 50k goal, but on that hardware I doubt anyone could. With my 15k threads per machine, I was able to use 4 load generating machines to achieve 60k concurrent SSL connections. Not bad!

September 3, 2008

Benchmarking JavaScript in Chrome, FF3, and IE 7

Just ran some JavaScript benchmarks from here:
http://celtickane.com/webdesign/jsspeed.php

FireFox 3 : 207 ms
Chrome : 200 ms
IE 7 : 1766ms

Wow.. Chrome and FireFox area FAST (at least on Windows XP)

Happy 25th Anniversary to the GNU Project

August 27, 2008

Pylot 1.11 Released - Web Performance Tool

I just did a new release Pylot (Web Performance Tool). No new features; just some internal fixes and a small change to the console output.

Grab it here: http://pylt.googlecode.com/files/pylot_1.11.zip

Pylot Home: www.pylot.org

August 25, 2008

Net Neutrality Defined by Tim Berners-Lee:

Net Neutrality:

"If I pay to connect to the Net with a given quality of service, and you pay to connect to the Net with the same or higher quality of service; Then you and I can communicate across the net with that quality of service. That's all."

Fundamental Social Basis of the Internet:

"Freedom of connection with any application to any party."

Simple, right?

August 21, 2008

Using WebInject With GroundWork Monitor

Wow.. I love seeing my Open Source code propagate through the community.

I am the author of WebInject, which is a popular web application/service test tool that also works as a Nagios plugin. GroundWork has an Open Source monitoring system based on Nagios.

I just stumbled across an awesome screencast tutorial from GroundWork called "Using WebInject To Monitor Web Applications And Services". It shows how to install and setup WebInject as a monitoring agent.

Go check it out!

you can also download the screencast and PDF without registering by going here: http://www.groundworkopensource.com/resources/webcasts/monitoring-web-applications-thanks.html

August 7, 2008

Editra - Nice Python Editor

I just saw this post about Editra from Flávio Codeço Coelho:
Editra: a Great New Python Editor

I downloaded it and was VERY impressed. For Python programming, I don't use a full blown IDE. I prefer a cross-platform text editor with syntax highlighting and integrated shell that works well on Windows and Linux.

I have been using SciTE as my programmer's editor for several years now; going back to my days as a Perl programmer. I have never seen an editor that rivals SciTE (shut up VIM freaks :)

Well, Editra just might be that editor. It is only in an Alpha stage, but is very useful and packs most of the features I want including autocomplete (which SciTE lacks).

I will definitely be keeping an eye on this editor as its development progresses.

http://editra.org/

August 6, 2008

Top Screen Resolutions - My Web Visitors

This is a breakdown of screen resolutions from visitors to my site and blog during the past 30 days

Python - Downloading Binary Files

Here is a quick example showing how to download and save a binary file (image, zip, etc) to disk using HTTP. This will grab the image file from a website and save it locally:

import urllib

url = 'http://www.goldb.org/images/goldb.png'
f = urllib.urlopen(url)    
fh = open('goldb.png', 'wb')
fh.write(f.read())
fh.close()

August 5, 2008

Default Languages on Linux, Mac, Windows

Mark Damon Hughes:

"All Linux distributions ship with: Python, Perl, C, and C++"

"A standard install of Mac OS X Leopard has: Python, Perl, Java, Ruby, AppleScript"

"MS Windows ships with nothing. No BASIC. No C compiler. You're trapped, stuck playing with Solitaire and MS Paint"

July 25, 2008

Python - Counting Items In A List

Counting items in a list...

Lets say you have a list like this:

foo = [1, 2, 1, 2, 1, 2, 1, 1]

... and you want to count the items in the list, ending up with a dictionary full of items (dict keys) and counts (dict values), like this:

{1: 5, 2: 3}

Here is an easy way to do it:

d = {}
for i in set(foo):
    d[i] = foo.count(i)

got anything more Pythonic?

July 3, 2008

Searching For Your Open Source Code

One cool thing about developing free open source software is seeing where your code ends up. It is satisfying to know that bits of code you wrote end up in other projects and code bases.

There are several online services that index source code from various places like websites and online svn/cvs repositories. It makes searching for my code a breeze. This is also useful for looking up examples of code I have written in the past that I want to copy from.

Google Code Search:
Search public source code for: "corey goldberg"

Koders Search:
Search public source code for: "corey goldberg"

Krugle Code Search:
Search public source code for: "corey goldberg"

July 2, 2008

Twitter API vs. Yahoo Web Services - Performance and Reliability

I've noticed the Twitter API is very slow and unreliable. Requests often take 10+ seconds to return. Sometimes they timeout with no response at all.

I put together some monitoring tools to see exactly how slow and unreliable the API really is (when it is even available!). This is obviously not a very scientific test; just a ballpark idea of performance. The monitoring tools are built with Python and RRDTool. To start with, I used a modified version of my rrdpy suite of tools (http://code.google.com/p/rrdpy).

I ran a test for 2 hours, hitting the API's Test Method every 30 seconds. The tools generated the following time-series graph of server response times:

The average response was 2.53 seconds, and we see a *lot* of variance in the data samples.

In comparison, here is the same test run against the Yahoo Search REST API:

Notice the small variance and the average server response time of 0.186 secs!

I would be interested in running these tests again in a few months to see if Twitter's API has become faster and more reliable.

July 1, 2008

My Website Visitors By Operating System

Some interesting stats from my web properties.

Here is the breakdown of Operating Systems for visitors in the past 30 days:

June 27, 2008

Back Bay Lightning Strike

The Building across the street from me (Back Bay, Boston, MA) just got struck by lightning during a thunderstorm:

Python - Sort A Dictionary By Values

Python dictionaries are unordered, but you can extract the keys and values and sort them yourself in a list.

lets say you have this dictionary:

{'a': 2, 
 'b': 4, 
 'c': 3,
 'd': 1}

.. and you want to convert it into a list, sorted by value (not key), like this:

[('d', 1), 
 ('a', 2), 
 ('c', 3), 
 ('b', 4)]

There is a Python Recipe for it with various implementations discussed in the comments.

My favorite one is this:

sorted(foo.items(), key=lambda(k,v):(v,k))

June 20, 2008

MS SQL Server - Trace Information and Stopping Traces

Server-side tracing is the process of having your SQL Server machine save events to a physical file on that machine without using the Profiler client tool. Server-side tracing is enabled and controlled by using SQL Server system-supplied stored procedures and functions. With these system-supplied processes, you can identify what to trace, when to start and stop tracing, what traces are running, and view trace information stored in the trace file.

Here is how you view the number of traces currently running:

SELECT count(*) FROM :: fn_trace_getinfo(default) WHERE property = 5 and value = 1

Here is how you can find more detail about the running traces:

SELECT * FROM :: fn_trace_getinfo(default)

You can terminate a trace with the 'sp_trace_setstatus' stored procedure using the traceid:

EXEC sp_trace_setstatus 1, @status = 0
EXEC sp_trace_setstatus 1, @status = 2

setting the status to 0 stops the trace setting the status to 2 closes the trace and deletes its definition from the server

June 19, 2008

List of Free Online Python Books

I found a link to a very good collection of legal and free online books and documentation for Python programming. Nice list and great content.. go check it out.

June 12, 2008

Tools to Develop Faster Web Pages

Jacob Gube just published "15 Tools to Help You Develop Faster Web Pages" at sixrevisions.com. It gives a nice list and high level overview of free tools that are useful for web performance. It includes such tools as load generators, profilers, and debugging proxies.

Pylot (my tool) made his list.. so go have a look!

His description:

"Pylot is an open-source performance and scalability testing tool. It uses HTTP load tests so that you can plan, benchmark, analyze and tweak performance. Pylot requires that you have Python installed on the server - but you don’t need to know the language, you use XML to create your testing scenarios."

June 5, 2008

Python - Generating Sparkline Graphs For Stock Pricing

In a previous post, I introduced the new historical stock pricing function in the ystockquote.py module.

Here is an interesting use of the module to create sparklines of historical stock pricing for a given stock.

To generate the sparklines, I use Grig Gheorghiu's Sparkplot Python module (which uses Matplotlib for graphing).

This module expects a file named data.txt, which contains lines of data to be graphed. All you need to do is generate a data file with the pricing data, and the Sparkplot module will take care of the rest.

To generate the data, I use a script like this:

import ystockquote

ticker = 'GOOG'
start = '20080101'
end = '20080523'

data = ystockquote.get_historical_prices(ticker, start, end)

closing_prices = [x[4] for x in data][1:]
closing_prices.reverse()

fh = open('data.txt', 'w')
for closing_price in closing_prices:
    fh.write(closing_price + '\n')
fh.close()

Now that you have the data.txt file setup, you can call the Sparkplot script via the command line to generate the sparkline image:

python sparkplot.py --label_first --label_last

*Note: The Sparkplot module needs Matplotlib installed. The sparkline output is an image (png) like this:

June 4, 2008

Java - Run a System Command and Return Output

Run a system command and return output:

import java.io.*;

public static String cmdExec(String cmdLine) {
    String line;
    String output = "";
    try {
        Process p = Runtime.getRuntime().exec(cmdLine);
        BufferedReader input = new BufferedReader
            (new InputStreamReader(p.getInputStream()));
        while ((line = input.readLine()) != null) {
            output += (line + '\n');
        }
        input.close();
        }
    catch (Exception ex) {
        ex.printStackTrace();
    }
    return output;
}

Sample Usage:

public static void main(String[] args) {
    CmdExec cmd = new CmdExec();
    System.out.println(cmd.run("ls -a"));
}

May 30, 2008

Python - rrdpy - Round Robin Databases (RRDTool)

(all scripts and code referenced in this post can be found here: http://code.google.com/p/rrdpy/source/browse/trunk)

"RRDtool is the Open Source industry standard, high performance data logging and graphing system for time series data. It stores the data in a very compact way that will not expand over time, and it can create beautiful graphs."

If you are developing tools that need a data repository and graphing capabilities, RRDTool provides you both. You create an RRD and then you insert data values at regular intervals. You then call the graphing API to have a graph displayed. The cool thing about this data storage is its “round robin” nature. You define various time spans and the granularity at which you want them stored. A fixed binary file is created, and this never grows in size over time. As you insert more data, it is inserted into each span. As results are collected, they are averaged and rolled into successive time spans. It makes a much more efficient system than using your own complex data structures, relational databases, or file system storage.

Recently, I started a small project on Google Code (rrdpy) to create a set of Python tools to make dealing with Round Robin Databases (RRD) less painful. Setting up RRD's can be tough if you don't know what you are doing.

Below are some example ways to use these tools.

First, I will create a Round Robin Database (RRD) using the rrd_maker.py script, which uses my rrd.py class. This script creates an RRD named test.rrd, which is expecting to be updated every 10 seconds:

import rrd

interval = 10
rrd_file = 'test.rrd'

my_rrd = rrd.RRD(rrd_file, vertical_label='value')
my_rrd.create_rrd(interval)

Now that I have my RRD created, what can I do with it? For a quick example, I can use the rrd_feeder_rand.py script to feed in random numbers and generate a graph every 10 seconds. The graph generated will show the past 60 mins of data. The code for that would look like:

import rrd
import random
import time


interval = 10
rrd_file = 'test.rrd'
           
my_rrd = rrd.RRD(rrd_file)

while True:
    rand = random.randint(1, 100)
    my_rrd.update(rand)
    my_rrd.graph(60)
    time.sleep(interval)

OK, pretty boring. How about an HTTP website monitor? I created the rrd_feeder_http.py script to show how this is done. This script will send HTTP GET requests to a specified url. A request is sent every 10 seconds and a graph of response times for the past hour is generated.

import rrd
import time
import httplib


host = 'www.python.org'
path = '/'
use_ssl = False

interval = 10
rrd_file = 'test.rrd'
            
            
def main():            
    my_rrd = rrd.RRD(rrd_file, 'Response Time')
    while True:   
        start_time = time.time()
        if send(host):
            end_time = time.time()
            raw_latency = end_time - start_time
            expire_time = (interval - raw_latency)
            latency = ('%.3f' % raw_latency)
            my_rrd.update(latency)
            my_rrd.graph(60)
            print latency
        else:
            expire_time = interval
        if expire_time > 0:
            time.sleep(expire_time)
                

def send(host):
    if use_ssl:
        conn = httplib.HTTPSConnection(host)
    else:
        conn = httplib.HTTPConnection(host)
    try:
        conn.request('GET', path)
        body = conn.getresponse().read()
        return True
    except:
        print 'Failed request'
        return False
        
        
if __name__ == '__main__':
    main()

The output from running the HTTP website monitor script looks like this:

May 29, 2008

Python - Yahoo Stock Quotes - Historical Pricing

I recently received a patch to my ystockquote.py module for retrieving historical stock prices. It takes a start and end date (YYYMMDD) and a ticker symbol, and returns pricing data (a nested list) for the time period specified. For example:

import ystockquote

ticker = 'GOOG'
start = '20080520'
end = '20080523'

data = ystockquote.get_historical_prices(ticker, start, end)

for dat in data:
    print dat

Output:

['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Clos']
['2008-05-23', '546.96', '553.00', '537.81', '544.62', '4431500', '544.6']
['2008-05-22', '551.95', '554.21', '540.25', '549.46', '5076300', '549.4']
['2008-05-21', '578.52', '581.41', '547.89', '549.99', '6468100', '549.9']
['2008-05-20', '574.63', '582.48', '572.91', '578.60', '3313600', '578.6']

New Blog - Switched To Blogger - First Post

I was sick of the hassles of hosting my own blog, so I just switched to Blogger. All of my old feeds should remain the same... so if you are already subscribed, you are all set.

The old blog will remain up in read-only mode, so its info is still indexed and searchable.

Coming soon:
more Python tricks, more performance testing tips, etc.

Visit the new blog here: http://coreygoldberg.blogspot.com or Subscribe here: http://feeds.feedburner.com/goldblog