(all scripts and code referenced in this post can be found here:
http://code.google.com/p/rrdpy/source/browse/trunk)
"RRDtool is the Open Source industry standard, high performance data logging and graphing system for time series data. It stores the data in a very compact way that will not expand over time, and it can create beautiful graphs."
If you are developing tools that need a data repository and graphing capabilities, RRDTool provides you both. You create an RRD and then you insert data values at regular intervals. You then call the graphing API to have a graph displayed. The cool thing about this data storage is its “round robin” nature. You define various time spans and the granularity at which you want them stored. A fixed binary file is created, and this never grows in size over time. As you insert more data, it is inserted into each span. As results are collected, they are averaged and rolled into successive time spans. It makes a much more efficient system than using your own complex data structures, relational databases, or file system storage.
Recently, I started a small project on Google Code (
rrdpy) to create a set of Python tools to make dealing with Round Robin Databases (RRD) less painful. Setting up RRD's can be tough if you don't know what you are doing.
Below are some example ways to use these tools.
First, I will create a Round Robin Database (RRD) using the
rrd_maker.py script, which uses my
rrd.py class.
This script creates an RRD named test.rrd, which is expecting to be updated every 10 seconds:
import rrd
interval = 10
rrd_file = 'test.rrd'
my_rrd = rrd.RRD(rrd_file, vertical_label='value')
my_rrd.create_rrd(interval)
Now that I have my RRD created, what can I do with it?
For a quick example, I can use the
rrd_feeder_rand.py script to feed in random numbers and generate a graph every 10 seconds. The graph generated will show the past 60 mins of data. The code for that would look like:
import rrd
import random
import time
interval = 10
rrd_file = 'test.rrd'
my_rrd = rrd.RRD(rrd_file)
while True:
rand = random.randint(1, 100)
my_rrd.update(rand)
my_rrd.graph(60)
time.sleep(interval)
OK, pretty boring. How about an HTTP website monitor?
I created the
rrd_feeder_http.py script to show how this is done. This script will send HTTP GET requests to a specified url. A request is sent every 10 seconds and a graph of response times for the past hour is generated.
import rrd
import time
import httplib
host = 'www.python.org'
path = '/'
use_ssl = False
interval = 10
rrd_file = 'test.rrd'
def main():
my_rrd = rrd.RRD(rrd_file, 'Response Time')
while True:
start_time = time.time()
if send(host):
end_time = time.time()
raw_latency = end_time - start_time
expire_time = (interval - raw_latency)
latency = ('%.3f' % raw_latency)
my_rrd.update(latency)
my_rrd.graph(60)
print latency
else:
expire_time = interval
if expire_time > 0:
time.sleep(expire_time)
def send(host):
if use_ssl:
conn = httplib.HTTPSConnection(host)
else:
conn = httplib.HTTPConnection(host)
try:
conn.request('GET', path)
body = conn.getresponse().read()
return True
except:
print 'Failed request'
return False
if __name__ == '__main__':
main()
The output from running the HTTP website monitor script looks like this: