October 5, 2009

Automated Web/HTTP Profiler with Selenium-RC and Python

A small new open source project I am working on:

Selenium-profiler is a web/http profiler built with Selenium-RC and Python. It profiles page load time and network traffic for a web page. The profiler uses Selenium-RC to automate site navigation (via browser), proxy traffic, and sniff the proxy for network traffic stats as requests pass through during a page load.

It is useful to answer questions like:

  • how many http requests does that page make?
  • how fast are the http responses coming back?
  • which http status codes are returned?
  • how many of each object type are requested?
  • what is the total page load time?

Get it here:
http://selenium-profiler.googlecode.com/

Sample Output:

--------------------------------
results for http://www.google.com/

content size: 38.06 kb

http requests: 9
status 200: 6
status 204: 1
status 403: 2

profiler timing:
2.063 secs (page load)
2.047 secs (network: end last request)
0.111 secs (network: end first request)

file extensions: (count, size)
gif: 1, 8.558 kb
ico: 2, 2.488 kb
js: 2, 18.046 kb
png: 1, 5.401 kb
unknown: 3, 3.567 kb

http timing detail:
403, GET, /favicon.ico, 111 ms
200, GET, /newkey, 738 ms
200, GET, /, 332 ms
403, GET, /favicon.ico, 3 ms
200, GET, /logo.gif, 255 ms
200, GET, /d9n_Nh4I09g.js, 411 ms
200, GET, /2cca7b2e99206b9c.js, 65 ms
204, GET, /generate_204, 88 ms
200, GET, /nav_logo7.png, 49 ms

28 comments:

Milan said...

Why would you use the http profiler with Selenium-RC?

You can get the same data from Fiddler or Firbug Net Panel?

Corey Goldberg said...

Milan,
the point is that it is an *automated* profiler. It runs the browser for you and can step through complex web applications without manually browsing.

Fiddler and Firebug can't drive a browser.

I use something like this in a setup where it launches itself every 30 seconds and records timings.. it works as a real user monitor.

Milan said...

Corey,
Makes sense. Sorry I did not pay attention to the Automated Part.

Automated testing (through browser) being fragile I won't like to exercise this option, I would prefer some load testing tool (single user) to do that. I am sure there must be some tools that would also give you the breakup of the timings, I for sure know commercial tools can do that. And off course you can schedule it at regular intervals.

Also the stats for 200s, 304's, etc won't change unless you application is deployed again. The only thing that would change is the response time from your previous runs.

Corey Goldberg said...

well.. by "load testing tools", I suppose you mean running by sending HTTP directly, rather than go through the browser.

Without going through the browser, you are missing the "real interaction" (I've been using load testing tools for over a decade and love them, so this isn't a knock against that). to get a "page load time", you need to actually render a page, not just request individual objects over HTTP.

Most load test tools also use a proxy based recorder, so think of this as running a recording session in your load tool.

as to your other points, you are correct.. all that would change in subsequent runs is the timing.

I haven't seen any other tool that automates a real browser and also captures the type of timing data that this does.

CharekChen said...

Nice work. Any plans to write a Java version? This is something I have been looking for: Selenium instrumented with page analysis/metrics during a functional test. Now if it could create a Gantt chart like Firebug's net monitor :-)

Corey Goldberg said...

CharekChen,

that is exactly what I was going for. an instrumented page load that gives you metrics.

I'm looking at doing a waterfall chart if I have time, or some sort of useful graph.

Unfortunately, the Selenium Core API doesn't have very granular instrumentation... so the stats in my report is about all you can get out of it. I might try to help the selenium folks extend that at some point if anyone is interested. I want more detail like this gives you: http://coreygoldberg.blogspot.com/2009/09/python-http-request-profiler.html

maybe this really belongs in selenium core so all the RC API's can benefit.

I wonder if anyone else would be interested.

Marc said...

Corey,

This tool came at the right time for me. I spent the last hour playing with it and it looks like something I can use. Some suggestions to make it more useful:

1) Pass in arguments to specify
SITE
PATH
BROWSER

2) right after startup, increase the server timeout ( some of my test servers are VERY slow)

sel.start()
# increase timeout to 5 minutes
sel.set_timeout(300000)

Corey Goldberg said...

Marc,

thanks. those features sound easy enough to add. I'll update it soon.


.

j pimmel said...

I have just been playing with it and its neat. This also has come at a useful time for me..

It would be really cool if you could somehow play selenium scripts from web_profiler.py - am going ahead coding in my selenium commands directly in there, but wouild be neat if I could point to a directory and tell it to run a suite etc

Corey Goldberg said...

that's a great idea to be able to feed it a directory of selenium scripts. I think I might hack together a new version with some other features I'm thinking of also.. I can add all that and probably put on a command line interface.

in the case of a directory of scripts, how should the output look? ... a directory of text reports? html? one huge report? a summary report with detail section? ... just wondering what is the most useful way to present results.

if you think of anything else, let me know.

-Corey

Corey Goldberg said...

I just checked in a version that accepts command line args for SITE/PATH/BROWSER. I also bumped up the timeout to 60 secs.

-Corey

Unknown said...

please tell the same step in java instead of python (selenium)

Corey Goldberg said...

jaya,
sorry, I just wrote a Python version. Porting it to Java is an exercise for you :)

Marc said...

Corey,

I've been playing with this a bit more to get statistics on page loading. It works on IE8, FF3.0, and Safari 3 ( under windows XP).
Is there a way to programatically force the web browser cache be cleared before each run to guarantee that I get ALL data from the web server each time? I want to avoid manually clearing the cache.

Also how does your tool compare to using Jmeter to measure page loading statistics?

Thanks,
Marc

Corey Goldberg said...

Marc,
I think on FF, a new profile is created each time, so you start with an empty cache. Not sure if the same is true for IE or other browsers. I'll look for a way to programatically start a clean session.

Benjamin said...

It is very nice tool you developed! I played around it and love it. Thanks. I came across to your script by searching for the usage of captureNetworkTraffic(). I want to capture the returns from ajax calls but no luck yet. I am new to Selenium RC. I would like to parameterize the python functional test scripts for test automation. So I need to parse ajax returned data from subsequent calls from the page. The captureNetworkTraffic() should return all responses from the page but it seems not, at least it does not capture ajax returns (maybe in different level or timing?). Or I may miss something. I even manually set "print traffic_xml" (the var you defined in your web_profiler.py) but I still did not see the ajax data I was looking for. Do you have any idea why is that?

Corey Goldberg said...

benjamin,
captureNetworkTraffic() should return all requests/responses that are sent/received. This includes requests initiated by Ajax or otherwise.

Blake said...
This comment has been removed by the author.
Blake said...

repost with more details:

Corey,

Just found your blog and this script - thanks.

Keep getting the 'ERROR - did not receive traffic stats. try manually setting your browser proxy'

Details: FFX 3.5.9, PY 2.6, WinXP, latest selenium rc (1.0.3).

Tried various connection setting in FFX but no luck. Any thoughts appreciated.

What actually happens: script runs, FFX launches briefly, goes to google, then closes right away. See error message noted above in console.

Enjoying the blog since I'm new to selenium and python.

Regards,

Blake

Blake said...

Ignore previous. I thought I had swapped out the selenium.py files, but had not. Got the right one in there and it worked fine.

Blake

Leonid Barinshteyn said...

Hi guys,

I am using selenium.captureNetworkTraffic() but for some reason it returns an empty string... I am getting set of empty tags if I'm using XML (
) or [] if I'm using json.
Any advice would be extremely appreciated.

Leonid Barinshteyn said...

I just realized that it did not work because I was using IE... works fine with FF.

Unknown said...

Very frustrated: this plugin is *great* but I keep getting ConcurrentModificationExceptions during the captureNetworkTraffic.

I've tried 2.0a5 but get empty tags back.

I can't find the 1.0.3 source to apply the patch here to. Their source repository is a mess if you're not looking for 2.0.

Anyone have any idea?

Unknown said...

Using 2.0a5 I managed to get it working by massaging the XML returned from the selenium-server - it was malformed such that

method=""GET""

was returned. A little replacement work and joy!

This also points out how to change the timeout for connections that need to be longer than 30s.

RS said...

Corey, I have been looking for this functionality for a while now (and needed it so much so that I'm investing the time to learn Python...:)).
I'm running into an issue that's been previously posted:
ERROR - did not receive traffic stats. try manually setting your browser proxy

What would cause this problem, and how do I solve it? Thanks!

Corey Goldberg said...

@RS

> ERROR - did not receive traffic
> stats. try manually setting your
> browser proxy
> What would cause this problem,
> and how do I solve it? Thanks!

hmm.. that means the profiler is not getting any stats returned from the selenium bindings/API.

back when I originally wrote this, the python binding didn't support this, so I hacked the start() method by adding 'captureNetworkTraffic=true'. see the selenium.py in my repo http://code.google.com/p/selenium-profiler/source/browse/trunk/selenium.py which are the old bindings containing that hack.

If you are using newer bindings, then you have to figure out how to enable captureNetworkTraffic in the API. then the profiler should pick it up.

(It's been a while since i've touched this stuff.)

hope that helps.

-Corey

RS said...

Thanks so much Corey. Using your selenium.py version did the trick. I appreciate the prompt response

Imran Khan said...

This is great stuff. Thanks for posting.