Mainly Tech projects on Python and Electronic Design Automation.

Sunday, February 22, 2009

Vanity Search on Rosetta Code

A vanity search is usually when you look for your own name on Google to
show how popular you are (in those terms, no personal slight intended).

I wanted to know how many new pages on Rosetta Code that I had started
so wrote the following script that starts from a users first page of href="">contributions,
and downloads all further pages searching for new page creations, which
are specially marked in the HTML and show as style="font-weight: bold;">N in the table.

The code:

Rosetta Code Vanity search:
color="#ff00ff"> How many new pages has someone created?

color="#a020f0">import urllib, re

user = ' color="#ff00ff">Paddy3118'

site = ' color="#ff00ff">'
nextpage = site + ' color="#ff00ff">/wiki/Special:Contributions/' + user
nextpage_re = re.compile(
r' color="#ff00ff"><a href="([^"]+)" title="[^"]+" rel="next">older ')

newpages = []
pagecount = 0
color="#804040">while nextpage:
page = urllib.urlopen(nextpage)
pagecount +=1
nextpage = ''
color="#804040">for line color="#804040">in page:
color="#804040">if color="#804040">not nextpage:
color="#0000ff"># Search for URL to next page of results for download
nextpage_match =, line)
color="#804040">if nextpage_match:
nextpage = (site + nextpage_match.groups()[0]).replace(' color="#ff00ff">&amp;', ' color="#ff00ff">&')
color="#0000ff">#print nextpage
color="#804040">if ' color="#ff00ff"><span class="newpage">' color="#804040">in line:
color="#0000ff"># extract N page name from title
newpages.append(line.partition(' color="#ff00ff"> title="')[2].partition(' color="#ff00ff">"')[0])

nontalk = [p color="#804040">for p color="#804040">in newpages color="#804040">if color="#804040">not p.startswith(' color="#ff00ff">Talk:')]

color="#804040">print " color="#ff00ff">User: %s has created %i new pages of which %i were not Talk: pages, from approx %i edits" % (
user, len(newpages), len(nontalk), pagecount*50 )
color="#804040">print " color="#ff00ff">New pages created, in order, are: color="#6a5acd">\n ",
color="#804040">print " color="#6a5acd">\n ".join(nontalk[::-1])

What I have created on RC

The output of the program shows all the pages I created , in order of

User: Paddy3118 has created 31 new pages of which 20 were not Talk: pages, from approx 300 edits
New pages created, in order, are:
href="">Monty Hall simulation
Web Scraping
Sequence of Non-squares
User talk:Lupus
href="">Max Licenses In Use
One dimensional cellular automata
Conway's Game of Life
Village Pump:Home/Foldable output
Data Munging
Data Munging 2
Column Aligner
Probabilistic Choice
href="">Knapsack Problem
Yuletide Holiday
Common number base conversions
Integer literals
Command Line Interpreter

I have added links to show when I blogged about a task as well as
starting the RC page. I can see that I have a 'User talk:' page that
should also be filtered out.

I was always writing small examples that I thought might be useful
examples for a Python training course. I was looking for a public home
for them and initially thought that, after stumbling across RC, that RC
would be a good home for them. I am only partially right, but I have
found the discipline of writing for RC to be enjoyable in itself, so
continue to contribute.

I quite enjoyed the challenge of creating an RC task beginning with the
letters K and then Y, so they could complete their full alphabet of
 named tasks. Yuletide Holiday was created around xmas 2008
and is really about Y2k errors - but they seem to have stuck with my
name :-)

I need to re-visit Data Munging and add extra clarification to the task
as RC needs a good task description, wheras a lot of data munging tasks

If you are interested in language comparison sites then you might want
to take a look at RC too!

- Paddy.

No comments:

Post a Comment


Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

Blog Archive