May 30, 2008

Python - rrdpy - Round Robin Databases (RRDTool)

(all scripts and code referenced in this post can be found here: http://code.google.com/p/rrdpy/source/browse/trunk)
"RRDtool is the Open Source industry standard, high performance data logging and graphing system for time series data. It stores the data in a very compact way that will not expand over time, and it can create beautiful graphs."
If you are developing tools that need a data repository and graphing capabilities, RRDTool provides you both. You create an RRD and then you insert data values at regular intervals. You then call the graphing API to have a graph displayed. The cool thing about this data storage is its “round robin” nature. You define various time spans and the granularity at which you want them stored. A fixed binary file is created, and this never grows in size over time. As you insert more data, it is inserted into each span. As results are collected, they are averaged and rolled into successive time spans. It makes a much more efficient system than using your own complex data structures, relational databases, or file system storage.

Recently, I started a small project on Google Code (rrdpy) to create a set of Python tools to make dealing with Round Robin Databases (RRD) less painful. Setting up RRD's can be tough if you don't know what you are doing.

Below are some example ways to use these tools.


First, I will create a Round Robin Database (RRD) using the rrd_maker.py script, which uses my rrd.py class. This script creates an RRD named test.rrd, which is expecting to be updated every 10 seconds:
import rrd

interval = 10
rrd_file = 'test.rrd'

my_rrd = rrd.RRD(rrd_file, vertical_label='value')
my_rrd.create_rrd(interval)
Now that I have my RRD created, what can I do with it? For a quick example, I can use the rrd_feeder_rand.py script to feed in random numbers and generate a graph every 10 seconds. The graph generated will show the past 60 mins of data. The code for that would look like:
import rrd
import random
import time


interval = 10
rrd_file = 'test.rrd'
           
my_rrd = rrd.RRD(rrd_file)

while True:
    rand = random.randint(1, 100)
    my_rrd.update(rand)
    my_rrd.graph(60)
    time.sleep(interval)
OK, pretty boring. How about an HTTP website monitor? I created the rrd_feeder_http.py script to show how this is done. This script will send HTTP GET requests to a specified url. A request is sent every 10 seconds and a graph of response times for the past hour is generated.
import rrd
import time
import httplib


host = 'www.python.org'
path = '/'
use_ssl = False

interval = 10
rrd_file = 'test.rrd'
            
            
def main():            
    my_rrd = rrd.RRD(rrd_file, 'Response Time')
    while True:   
        start_time = time.time()
        if send(host):
            end_time = time.time()
            raw_latency = end_time - start_time
            expire_time = (interval - raw_latency)
            latency = ('%.3f' % raw_latency)
            my_rrd.update(latency)
            my_rrd.graph(60)
            print latency
        else:
            expire_time = interval
        if expire_time > 0:
            time.sleep(expire_time)
                

def send(host):
    if use_ssl:
        conn = httplib.HTTPSConnection(host)
    else:
        conn = httplib.HTTPConnection(host)
    try:
        conn.request('GET', path)
        body = conn.getresponse().read()
        return True
    except:
        print 'Failed request'
        return False
        
        
if __name__ == '__main__':
    main()
The output from running the HTTP website monitor script looks like this:

4 comments:

Unknown said...

RRDTOOL is one of my favorite tools. I use it to log process memory usage to help identify potential memory leaks on our applications. Having the history available to see when memory jumps also helps figure out what steps may be triggering memory leaks.

Anonymous said...

This is a nice convenience wrapper but why not wrap the much faster http://www.nongnu.org/py-rrdtool/ than popen the rrdtool binary? It would be great to see some updates to py-rrdtool to handle the "Aberrant Behavior Detection" additions or even a ctypes implementation.

Skyler Call said...

Thank you for sharing this code. I have made some modifications for my own project making it only check my website every two minutes and altered the graph to show the last 24 hours. How can I get it to create four graphs instead of just one? I want a graph for 24 hours, 1 week, 1 month, and 1 year.

Unknown said...

Hi Sir,

very nice and helpful article,
do you have any document on the same ?, i want to create custom graph, eg. bandwidth usage, so need any document so that i can this rrd module,
i tried with some changes as my required and it's work but can't understand how this work and want to customize more, so please help me with document,

import commands
import re
import time
import rrd

interval = 10

rrd_file = 'test.rrd'

def tansfer_Bytes():
interface = commands.getstatusoutput("ifconfig eth0| grep bytes")[1]
Rx = re.split(':| ',interface)[12]
Tx = re.split(':| ',interface)[18]
return '%.3f' % (int(Rx)/1024)


def main():
my_rrd = rrd.RRD(rrd_file, 'Transfer Speed in Kbps')
while True:
raw_data = tansfer_Bytes()
time.sleep(interval)
my_rrd.update(raw_data)
my_rrd.graph(60)
print raw_data

if __name__ == '__main__':
main()