Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

Saturday, July 24, 2010

Python and EDA" at EuroPython

I presented on Thursday at EuroPython. from the feedback at the end it seemed to go well.

I've been told that the video will go up on the EuroPython web site by the end of next week which should be "interesting", as it would be the first time that I've watched myself present, (and I don't do many conference presentations).

What I found different was that I am used to presenting to others within the industry who know about RTL, VHDL, digital simulation, coverage, ... To address this Python-savvy audience I re-wrote a presentation on coverage ranking of Verilog/VHDL and cast it in terms of getting coverage of a Python program and ranking the coverage of three Python tests of the Python program. I then walked them through the ranking algorithm for this much simplified case, then tried to show some of the extra work that needs to be done for the real app.

I decided to not show a more polished version of the script. I showed a version where the main functionality could easily be split into more functions for readability; a psyco import does effectively nothing ... The reason was that what I showed was what I did to convince myself that the algorithm was correct and the speed was fast enough. I would have chucked it at this stage if it was slow. (The need was speed of execution as well as an acceptable algorithm for ranking).

The finish was a "winding down" example of just what you can do when you decide to add documentation for a file format in with the program.

I would like to thank my audience for making someone from an industry they may not have known, feel welcome; and the EuroPython organisers, especially John Pinner, for making it happen.

Sunday, April 25, 2010

More on binary search

I wrote this post: Am I one of the 10% of programmers who can write a binary search? as a reply to: Are you one of the 10% of programmers who can write a binary search? by Mike Taylor.

Since then, Mike has followed up with this: Testing is not a substitute for thinking which starts as a summary of the responses to the earlier post then moves on to his own views on testing.

There is mention of some great work by Darius Bacon who tests around twenty solutions gathered, painstakingly, from the comments. Darius' program is great work, but I thought that my testing strategy might be better so modified his code to:
  1. Add my attempt as function: paddy3118
  2. Graft on a version of my test routine to his list of function variants
The delta code was added to the end and looks like the following:


def paddy3118(data, item):
if not data:
return -1
lo, hi = 0, len(data)-1
mid = (hi + lo) // 2
#print(lo,mid,hi)
while item != data[mid] and lo < hi:
lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
if item < data[mid] else
(mid+1, (hi + mid + 1) // 2, hi) )
#print(lo,mid,hi)
#0/0
return mid if item == data[mid] else -1
# PASSES test()
#test(paddy3118)

#print sorted(k for k,v in globals().items() if type(v)==type(dan) and 'test' not in k)
tests = (Max, aaron_maxwell, ashish_yadav, ben_gutierrez, clark,
dan, dave_r, ed_marshall, guilherme_melo, jasper, martin,
michael_fogleman, paddy3118, patrick_shields, paul,
rodrigo_b, scott, si, tomas_henriquez,
travis_wrapped, yc)

if __name__ == '__main__':
totpass = 0
for t in tests:
fails =0
for dlimit in range(5):
data = list(range(dlimit))
for item in data:
try:
assert t(data, item) != -1
except:
fails += 1
print(" ERROR: %s FAIL when looking for %i in %r" %
(t.func_name, item, data))
break
for item in list(range(dlimit+1)):
try:
assert t(data, item - 0.5) == -1
except:
fails += 1
print(" ERROR: %s FAIL as %3.1f is found in %r" %
(t.func_name, item - 0.5, data))
break
if fails:
break
if not fails:
totpass += 1
print('PASS: %s' % t.func_name)
print("\n%i/%i = %3.1f%% of functions pass" %
(totpass, len(tests), 100 * totpass / float(len(tests))))
'''
PASS: Max
ERROR: aaron_maxwell FAIL as -0.5 is found in []
PASS: ashish_yadav
PASS: ben_gutierrez
ERROR: clark FAIL when looking for 0 in [0]
ERROR: dan FAIL as -0.5 is found in []
ERROR: dave_r FAIL as -0.5 is found in []
ERROR: ed_marshall FAIL as -0.5 is found in []
ERROR: guilherme_melo FAIL as -0.5 is found in []
ERROR: jasper FAIL as -0.5 is found in []
ERROR: martin FAIL as -0.5 is found in []
PASS: michael_fogleman
PASS: paddy3118
ERROR: patrick_shields FAIL as -0.5 is found in []
PASS: paul
ERROR: rodrigo_b FAIL as -0.5 is found in []
ERROR: scott FAIL as 1.5 is found in [0, 1, 2]
PASS: si
ERROR: tomas_henriquez FAIL when looking for 0 in [0, 1]
ERROR: tomas_henriquez FAIL as 1.5 is found in [0, 1]
ERROR: travis_wrapped FAIL when looking for 0 in [0]
PASS: yc

8/21 = 38.1% of functions pass
'''
I find that i cannot agree with a lot of Mikes writing on test driven development as he seems to take an extreme position by separating test from development too much.
When I was creating my function, i new that there were likely to be issues with off-by-ones when modifying range indices, and with handling of an empty data list. I chose to leave a 'mental place-holder' on the off-by-one issues as I thought i could easily write tests to exercise those corners and fill them in. The alternative would be to mentally interpret those corner cases to write them down in 'development', then still have to thoroughly test for them anyway.

When writing my tests:
  • I thought about the division of the range of the indices- and thought that I should use data of both odd and even lengths, and of sufficient length so that a range might be divided more than one.
  • I thought about all the comparisons - and decided to test for every item that is in the data.
  • I thought about the significant ways that an item might not be in the data - and decided to test for items between each and every data item, an items below the smallest and above the largest data items.
My paddy3118 function passed the Darius tests. My test routine failed the guilherme_melo test that was passed by Darius.

The only conclusion I can bring, is that you need to do some white box testing, i.e. tests based on knowledge of the code too. Mike Taylor needs to recognise that imposing rigid categories might well lead to extremes. I guess, their is no substitute for experience, but the unexperienced need to be given rules until they can learn enough to break them - Do we teach testing up front? Or not? (Remembering that given half a chance, testing by the novice will be inadequate).

- Paddy.

Tuesday, April 20, 2010

Am I one of the 10% of programmers who can write a binary search?

I just read this post where the blogger read a passage in a book that stated that only 10% of programmers could write a binary search from the description. I read up to the description of the binary search algorithm given in the book then thought I would try and implement it without finishing the reading of the rest of his blog entry.

I took around half an hour to come up with the following:

'''
from: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/
"Only 10% of programmers can write a binary search"

Binary search solves the problem [of searching within a pre-sorted
array] by keeping track of a range within the array in which T
[i.e. the sought value] must be if it is anywhere in the array.
Initially, the range is the entire array. The range is shrunk by
comparing its middle element to T and discarding half the range.
The process continues until T is discovered in the array, or until
the range in which it must lie is known to be empty. In an
N-element table, the search uses roughly log(2) N comparisons.

'''

def binsearch(data, item):
if not data:
return False
lo, hi = 0, len(data)-1
mid = (hi + lo) // 2
#print(lo,mid,hi)
while item != data[mid] and lo < hi:
lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
if item < data[mid] else
(mid+1, (hi + mid + 1) // 2, hi) )
#print(lo,mid,hi)
#0/0
return item == data[mid]

if __name__ == '__main__':
for dlimit in range(5):
data = list(range(dlimit))
print(data)
for item in data:
assert binsearch(data, item)
for item in list(range(dlimit+1)):
print(item)
assert not binsearch(data, item - 0.5)

What happens with an empty list, and the little plus ones were added after I put in the assert statements, and I was satisfied enough to go on and complete the reading of the blog.

Extra Restrictions


The blog goes on to state that the rules of test are that no testing is to be done. A compiler can be used to ensure that their are no mechanical errors however.

Whoa! I was shocked and then secretly pleased. I use Python, and automatically added the tests as part of my development process. In no way did I think that I could do the task without passing some tests. I am still not saying that my implementation is correct, but I have tested for a lot of corner-cases such as:
  • Lists of length zero, one and more than one.
  • Items in every position of the list.
  • Items outside the list: less than, greater than and between all members of the list.
I of course assume that items are comparable, and that data is sorted and subscript-able.

All I can conclude is that if given the test and access to a compiler/interpreter of your choice then one should argue convincingly for testing any submission as you would never deign to submit any code as complete unless tested. If the testers remain unconvinced then it may be time to stick to your guns as you don't gain by lowering your standards, especially in an interview situation when you know that testing is right!

P.S. I have since had a look at the code of the Python bisect module and the algorithm given on Wikipedia, and they both seem more elegant than mine.

Tuesday, March 30, 2010

Why Python?

Choose for yourself. Have a look at the examples on the Rosetta Code site. You might like to follow the links to a few pages that I did the task descriptions for as they tend to be meatier tasks, but the examples in different languages should allow you to compare how different languages are used to solve the same problems.

Unfortunately it cannot eliminate the issue of the competence of the individual programmers, but it might provide some useful info.

Have fun :-)

Tuesday, March 16, 2010

Expressions to Truth-Tables

I had a need for the data in the truth tables for a standard cell library, rather like that shown here if you look at say page 35 you will see that for for every Cell it gives the cell name; the cell function as a boolean expression, and the truth-table for the cell.

My cell library was an Infineon one and I had the pdf version of the databook. A quick squirt through pdftotext scrambled the truth table, but did preserve the boolean expression of each cells' function, and the cell name.

I wrote a Python script that used a multi-line regexp to extract each cells info including the boolean equation.

The equation from the Infineon databook uses +,*, and ! for boolean or, and and not operators, but also used lots of parentheses so that operator precedence did not affect the value of the infix expression. I realised that I could substitute for the Python bitwise operators, and form a python expression that could then be evaluated with appropriate values for any input names to create a truth table.

I have left out the parsing and present a function to create a truth-table from a boolean expression:

from itertools import product

def truthtable(expressionstring):
comp = compile(expressionstring.strip(), '-', 'eval')
inputs = sorted(comp.co_names)
table = {}
for values in product( *([(0,1)] * len(inputs)) ):
valuedict = dict(zip(inputs, values))
answer = eval(comp, valuedict)
table[values] = answer
return expressionstring.strip(), inputs, table

def pptable(ex, inputs, table):
print("\nTRUTH-TABLE FOR EXPRESSION: %r\n" % ex)
print(' ' + ' '.join('%2s' % inp for inp in inputs) + ': OUTPUT')
fmt = ' %i' * len(inputs) + ': %i'
for inp, ans in sorted(table.items()):
print(fmt % tuple(list(inp) + [ans]))

ex, inputs, table = truthtable('(A & (~C)) | D')
pptable(ex, inputs, table)


Going through function truthtable:
  • compile() gives access to all the names in the compiled expression via attribute .co_names - this is all the names used in the expression.
  • product() allows you to quickly generate the 1/0 value combinations for the input names.
  • valuedict assigns a value to each name used in the expression.
  • eval evaluates the expression with the given values on each name
The output of the program is:

TRUTH-TABLE FOR EXPRESSION: '(A & (~C)) | D'

A C D: OUTPUT
0 0 0: 0
0 0 1: 1
0 1 0: 0
0 1 1: 1
1 0 0: 1
1 0 1: 1
1 1 0: 0
1 1 1: 1


- No separate parser required!

Thursday, March 04, 2010

NPR 2002 Puzzle

A puzzle I copied from here via reddit:
1 2 3 4 5 6 7 8 9 = 2002
Put any combination of plus, times, and spaces between the digits on the left to make the identify true.
The answer was easy and was quick to compute.

I realised that you could evaluate all expressions where the numbers 1 through nine are separated by the permutations of either a null string, '', '+' or '*'

The answer came from the following:

>>> from itertools import product
>>> eqn = '%s'.join(list('123456789')) + '==2002'
>>> eqn
'1%s2%s3%s4%s5%s6%s7%s8%s9==2002'
>>> for ops in product(*[['', '+', '*']]*8):
if eval(eqn % ops):
print(eqn % ops)


1*23+45*6*7+89==2002
1*2+34*56+7+89==2002
>>>

So, there are two solutions.

Thursday, December 03, 2009

What is a first-class function?

There is a debate about it on Wikipedia, and I am curious:
  1. Does a language have first-class functions if it can use its functions to solve particular problems in a certain manner, or
  2. Does a language have first-class functions if functions are just as flexible as other types in the language itself?
It probably means 1. above, as the second definition would omit most static languages, as they might be able, for example, to take textual input and change it into an internal representation of an integer, but not do the same for a function with similar effort. Someone who exclusively thinks in terms of static languages may not have realized a difference.

Saturday, November 28, 2009

Hamming Numbers (and Perl: Why I just ignored it)

I had read the following post: "Perl: Love it, or hate it, but don’t ignore
it."
, by Phillip Smith, in which the author wishes Perl
solutions were credited as much as solutions in other languages such as
Python and Ruby.



I agreed that Perl wasn't mentioned as much as it was, and a day or so
later I came across an example of why....



I monitor whats happening in Python, and came across a new recipe added
to Activestate code called: "Recipe 576961: Technique for cyclical
iteration"
, by Raymond Hettinger. On first, second, and third
readings, i just couldn't understand the need for the deferred_output
function; the whole recipe remained opaque:



Raymond's
code:



from itertools import tee, chain, islice
from heapq import merge

def hamming_numbers():
# Generate "5-smooth" numbers, also called "Hamming numbers"
# or "Regular numbers". See: http://en.wikipedia.org/wiki/Regular_number
# Finds solutions to 2**i * 3**j * 5**k for some integers i, j, and k.

def no_repeats(it):
last = None
for i in it:
if i is not last:
yield i
last = i

def deferred_output():
for i in output:
yield i

result, p2, p3, p5 = tee(deferred_output(), 4)
m2 = (2*x for x in p2) # multiples of 2
m3 = (3*x for x in p3) # multiples of 3
m5 = (5*x for x in p5) # multiples of 5
merged = merge(m2, m3, m5)
combined = chain([1], merged) # prepend a starting point
output = no_repeats(combined) # eliminate duplicates

return result

if __name__ == '__main__':
print(list(islice(hamming_numbers(), 1000))) # Show first 1000 hamming numbers




I was hooked by the series however and so went looking for more
information.



Perl


I came across a mention to the following page: "Infinite
lists in Perl"
by Mark Dominus which is copyright 1997 by the
Perl Journal. Normally I wouldn't mention the article, as it spends
more time creating an implementation of generators in Perl as it does
on addressing the computation of Hamming numbers. It also uses the term
"stream" instead of generators. but maybe this is down to the age of
the article, i.e. 1997?



Haskell


Looking a bit further, I found this Dr Dobbs Article, and this
commented Haskell
solution
:


hamming = 1 : map (2*) hamming `merge` map (3*) hamming `merge` map (5*) hamming
where merge (x:xs) (y:ys)
| x < y = x : xs `merge` (y:ys)
| x > y = y : (x:xs) `merge` ys
| otherwise = x : xs `merge` ys


The Haskell solution lead me to think that maybe Python has an issue
with defining recursive generators, and some experimentation lead me to
believe that yep, maybe that was why Raymond has the deferred_output
function.



My Python


Since I have trouble understanding Raymond s I went away and wrote my
own solution. It works. It can generate very large values of the
sequence quite quickly. But it does need memory to store
multiples:

from itertools import islice
from pprint import pprint as pp


def regularnumbergen(start = 1, multipliers=(2, 3, 5)):
'''\
Generates the Regular Number sequence defined here:
http://en.wikipedia.org/wiki/Regular_number
The sequence starts with 1 and consists of all multiples by 2, 3, or 5
of any number in the sequence in non-repeating, increasing order.
'''
multiples = [[] for i in multipliers]
this = start
while True:
yield this
# get multiples
for multiple, multiplier in zip(multiples, multipliers):
multiple.append(this * multiplier)
# get next
this = min(m[0] for m in multiples)
# remove this from multiples
for multiple in multiples:
if multiple[0] == this:
del multiple[0]

if __name__ == '__main__':
print(list(islice(regularnumbergen(), 10)))

Generating the following large values of the sequence took only seconds:


>>> print(list(islice(regularnumbergen(), 99999, 100001)))
[290142196707511001929482240000000000000, 290237644800000000000000000000000000000]
>>>




Wrapup


Well, I've learnt a bit more about generators, the limitatins in their
Python implementation w.r.t. Haskel, and I've put more meat on why I
tend to not quote much Perl In this case, i couldn't see the wood for
the trees! (But I do quote Perl).

About Me

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive