Pages

September 30, 2008

Python - StackOverflow Fight

I've been having fun using the new StackOverflow site for answering technical questions.

Here is a Python script I call 'StackOverlow Fight' (like Google Fight). It takes a list of tags and gets the count of questions that are tagged with each one.

As an example, here is StackOverflow fight between some popular programming languages:

#!/usr/bin/env python

import urllib
import re

langs = ('python', 'ruby', 'perl', 'java')

url_stem = 'http://stackoverflow.com/questions/tagged/'

counts = {}
for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count
    print lang, ':', count
sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
sorted_counts.reverse()
print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

Output:

python : 733
ruby : 391
perl : 167
java : 1440
java wins with 1440

8 comments:

  1. Add C# to the mix. Guaranteed it crushes Java.

    ReplyDelete
  2. I was correct -- C# hauls in 2209 questions.

    ReplyDelete
  3. Does this officially mean that C# is the most difficult language on StackOverflow? ;-)

    ReplyDelete
  4. Yes it is absolut true. C# wins. Change the 4. line to

    langs = ('python', 'ruby', 'perl', 'java','c%23')

    ReplyDelete
  5. And PHP ;) It's close to Java.

    shon@ubuntu:~/python$ date
    Tue Dec 8 18:28:56 IST 2009

    shon@ubuntu:~/python$ python stack_fight.py
    python : 14226
    ruby : 5932
    perl : 3147
    java : 27568
    c%23 : 51962
    php : 21305
    c%23 wins with 51962

    ReplyDelete
  6. Cool post. I thought it would be fun to see how things have changed over the last 3 1/2 years since the last post on this. I also added other languages for fun:

    Java has 255997 hits
    PHP has 235259 hits
    Javascript has 220234 hit
    C# has 220234 hits
    C++ has 130371 hits
    .NET has 119156 hits
    Python has 112801 hits
    Objective-C has 87613 hit
    SQL has 83174 hits
    C has 61612 hits
    Ruby has 47578 hits
    Perl has 18683 hits
    delphi has 15376 hits
    Groovy has 4237 hits
    Lisp has 1937 hits
    Go has 891 hits
    Pascal has 480 hits
    Ada has 324 hits
    Basic has 73 hits
    Logo has 23 hits
    NXT-G has 1 hits

    Java wins with 255997

    So Java is still on top, and StackOverflow is Overflowing with questions and answers!

    I also modified your code a bit to a) printed the languages after you sorted the list
    b) put a check in for a None exception I got a couple times when playing around with your fun script.

    Here is the updated code:

    import urllib
    import re

    langs = ('Python', 'Ruby', 'Perl', 'Java', 'C++', 'PHP', 'C', 'Go',
    'Javascript', 'C#', 'Groovy', 'Objective-C', 'Basic', 'SQL',
    'delphi', '.NET', 'Lisp', 'Pascal', 'Ada',
    'Logo', 'NXT-G', 'Visual Basic')

    url_stem = 'http://stackoverflow.com/questions/tagged/'

    counts = {}
    for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    if m is None:
    counts[lang] = count
    else:
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count
    #print lang, ':', count
    sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
    sorted_counts.reverse()

    for name,hcount in sorted_counts:
    print name ,"has",hcount,"hits"
    print ''
    print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

    ReplyDelete
  7. Great idea Corey, lets see where things stand June 2012:

    Java has 255997 hits
    PHP has 235259 hits
    Javascript has 220234 hits
    C# has 220234 hits
    C++ has 130371 hits
    .NET has 119156 hits
    Python has 112801 hits
    Objective-C has 87613 hits
    SQL has 83174 hits
    C has 61612 hits
    Ruby has 47578 hits
    Perl has 18683 hits
    delphi has 15376 hits
    Groovy has 4237 hits
    Lisp has 1937 hits
    Go has 891 hits
    Pascal has 480 hits
    Ada has 324 hits
    Basic has 73 hits
    Logo has 23 hits
    NXT-G has 1 hits

    Java wins with 255997

    So Java is still 'on top', and wow, what an lot of questions and answers in 2 1/2 years! StackOverflow is overflowing for sure. ;)

    I also made a few small changes to your script: a) added more languages, languages are printed in order of hit count, added a check for a None I got twice when playing around with the language list; not sure why though. here is the update, hopefully it displays OK:

    import urllib
    import re

    langs = ('Python', 'Ruby', 'Perl', 'Java', 'C++', 'PHP', 'C', 'Go',
    'Javascript', 'C#', 'Groovy', 'Objective-C', 'Basic', 'SQL',
    'delphi', '.NET', 'Lisp', 'Pascal', 'Ada',
    'Logo', 'NXT-G', 'Visual Basic')

    url_stem = 'http://stackoverflow.com/questions/tagged/'

    counts = {}
    for lang in langs:
    resp = urllib.urlopen(url_stem + lang).read()
    m = re.search('summarycount.*>(.*)<', resp)
    if m is None:
    counts[lang] = count
    else:
    count = int(m.group(1).replace(',', ''))
    counts[lang] = count

    sorted_counts = sorted(counts.items(), key=lambda(k,v):(v,k))
    sorted_counts.reverse()

    for name,hcount in sorted_counts:
    print name ,"has",hcount,"hits"
    print ''
    print sorted_counts[0][0], 'wins with', sorted_counts[0][1]

    ReplyDelete