I presented on Thursday at EuroPython. from the feedback at the end it seemed to go well.
I've been told that the video will go up on the EuroPython web site by the end of next week which should be "interesting", as it would be the first time that I've watched myself present, (and I don't do many conference presentations).
What I found different was that I am used to presenting to others within the industry who know about RTL, VHDL, digital simulation, coverage, ... To address this Python-savvy audience I re-wrote a presentation on coverage ranking of Verilog/VHDL and cast it in terms of getting coverage of a Python program and ranking the coverage of three Python tests of the Python program. I then walked them through the ranking algorithm for this much simplified case, then tried to show some of the extra work that needs to be done for the real app.
I decided to not show a more polished version of the script. I showed a version where the main functionality could easily be split into more functions for readability; a psyco import does effectively nothing ... The reason was that what I showed was what I did to convince myself that the algorithm was correct and the speed was fast enough. I would have chucked it at this stage if it was slow. (The need was speed of execution as well as an acceptable algorithm for ranking).
The finish was a "winding down" example of just what you can do when you decide to add documentation for a file format in with the program.
I would like to thank my audience for making someone from an industry they may not have known, feel welcome; and the EuroPython organisers, especially John Pinner, for making it happen.
Go deh!
Mainly Tech projects on Python and Electronic Design Automation.
Saturday, July 24, 2010
Sunday, April 25, 2010
More on binary search
I wrote this post: Am I one of the 10% of programmers who can write a binary search? as a reply to: Are you one of the 10% of programmers who can write a binary search? by Mike Taylor.
Since then, Mike has followed up with this: Testing is not a substitute for thinking which starts as a summary of the responses to the earlier post then moves on to his own views on testing.
There is mention of some great work by Darius Bacon who tests around twenty solutions gathered, painstakingly, from the comments. Darius' program is great work, but I thought that my testing strategy might be better so modified his code to:
When I was creating my function, i new that there were likely to be issues with off-by-ones when modifying range indices, and with handling of an empty data list. I chose to leave a 'mental place-holder' on the off-by-one issues as I thought i could easily write tests to exercise those corners and fill them in. The alternative would be to mentally interpret those corner cases to write them down in 'development', then still have to thoroughly test for them anyway.
When writing my tests:
The only conclusion I can bring, is that you need to do some white box testing, i.e. tests based on knowledge of the code too. Mike Taylor needs to recognise that imposing rigid categories might well lead to extremes. I guess, their is no substitute for experience, but the unexperienced need to be given rules until they can learn enough to break them - Do we teach testing up front? Or not? (Remembering that given half a chance, testing by the novice will be inadequate).
- Paddy.
Since then, Mike has followed up with this: Testing is not a substitute for thinking which starts as a summary of the responses to the earlier post then moves on to his own views on testing.
There is mention of some great work by Darius Bacon who tests around twenty solutions gathered, painstakingly, from the comments. Darius' program is great work, but I thought that my testing strategy might be better so modified his code to:
- Add my attempt as function: paddy3118
- Graft on a version of my test routine to his list of function variants
I find that i cannot agree with a lot of Mikes writing on test driven development as he seems to take an extreme position by separating test from development too much.
def paddy3118(data, item):
if not data:
return -1
lo, hi = 0, len(data)-1
mid = (hi + lo) // 2
#print(lo,mid,hi)
while item != data[mid] and lo < hi:
lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
if item < data[mid] else
(mid+1, (hi + mid + 1) // 2, hi) )
#print(lo,mid,hi)
#0/0
return mid if item == data[mid] else -1
# PASSES test()
#test(paddy3118)
#print sorted(k for k,v in globals().items() if type(v)==type(dan) and 'test' not in k)
tests = (Max, aaron_maxwell, ashish_yadav, ben_gutierrez, clark,
dan, dave_r, ed_marshall, guilherme_melo, jasper, martin,
michael_fogleman, paddy3118, patrick_shields, paul,
rodrigo_b, scott, si, tomas_henriquez,
travis_wrapped, yc)
if __name__ == '__main__':
totpass = 0
for t in tests:
fails =0
for dlimit in range(5):
data = list(range(dlimit))
for item in data:
try:
assert t(data, item) != -1
except:
fails += 1
print(" ERROR: %s FAIL when looking for %i in %r" %
(t.func_name, item, data))
break
for item in list(range(dlimit+1)):
try:
assert t(data, item - 0.5) == -1
except:
fails += 1
print(" ERROR: %s FAIL as %3.1f is found in %r" %
(t.func_name, item - 0.5, data))
break
if fails:
break
if not fails:
totpass += 1
print('PASS: %s' % t.func_name)
print("\n%i/%i = %3.1f%% of functions pass" %
(totpass, len(tests), 100 * totpass / float(len(tests))))
'''
PASS: Max
ERROR: aaron_maxwell FAIL as -0.5 is found in []
PASS: ashish_yadav
PASS: ben_gutierrez
ERROR: clark FAIL when looking for 0 in [0]
ERROR: dan FAIL as -0.5 is found in []
ERROR: dave_r FAIL as -0.5 is found in []
ERROR: ed_marshall FAIL as -0.5 is found in []
ERROR: guilherme_melo FAIL as -0.5 is found in []
ERROR: jasper FAIL as -0.5 is found in []
ERROR: martin FAIL as -0.5 is found in []
PASS: michael_fogleman
PASS: paddy3118
ERROR: patrick_shields FAIL as -0.5 is found in []
PASS: paul
ERROR: rodrigo_b FAIL as -0.5 is found in []
ERROR: scott FAIL as 1.5 is found in [0, 1, 2]
PASS: si
ERROR: tomas_henriquez FAIL when looking for 0 in [0, 1]
ERROR: tomas_henriquez FAIL as 1.5 is found in [0, 1]
ERROR: travis_wrapped FAIL when looking for 0 in [0]
PASS: yc
8/21 = 38.1% of functions pass
'''
When I was creating my function, i new that there were likely to be issues with off-by-ones when modifying range indices, and with handling of an empty data list. I chose to leave a 'mental place-holder' on the off-by-one issues as I thought i could easily write tests to exercise those corners and fill them in. The alternative would be to mentally interpret those corner cases to write them down in 'development', then still have to thoroughly test for them anyway.
When writing my tests:
- I thought about the division of the range of the indices- and thought that I should use data of both odd and even lengths, and of sufficient length so that a range might be divided more than one.
- I thought about all the comparisons - and decided to test for every item that is in the data.
- I thought about the significant ways that an item might not be in the data - and decided to test for items between each and every data item, an items below the smallest and above the largest data items.
The only conclusion I can bring, is that you need to do some white box testing, i.e. tests based on knowledge of the code too. Mike Taylor needs to recognise that imposing rigid categories might well lead to extremes. I guess, their is no substitute for experience, but the unexperienced need to be given rules until they can learn enough to break them - Do we teach testing up front? Or not? (Remembering that given half a chance, testing by the novice will be inadequate).
- Paddy.
Tuesday, April 20, 2010
Am I one of the 10% of programmers who can write a binary search?
I just read this post where the blogger read a passage in a book that stated that only 10% of programmers could write a binary search from the description. I read up to the description of the binary search algorithm given in the book then thought I would try and implement it without finishing the reading of the rest of his blog entry.
I took around half an hour to come up with the following:
What happens with an empty list, and the little plus ones were added after I put in the assert statements, and I was satisfied enough to go on and complete the reading of the blog.
The blog goes on to state that the rules of test are that no testing is to be done. A compiler can be used to ensure that their are no mechanical errors however.
Whoa! I was shocked and then secretly pleased. I use Python, and automatically added the tests as part of my development process. In no way did I think that I could do the task without passing some tests. I am still not saying that my implementation is correct, but I have tested for a lot of corner-cases such as:
All I can conclude is that if given the test and access to a compiler/interpreter of your choice then one should argue convincingly for testing any submission as you would never deign to submit any code as complete unless tested. If the testers remain unconvinced then it may be time to stick to your guns as you don't gain by lowering your standards, especially in an interview situation when you know that testing is right!
P.S. I have since had a look at the code of the Python bisect module and the algorithm given on Wikipedia, and they both seem more elegant than mine.
I took around half an hour to come up with the following:
'''
from: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/
"Only 10% of programmers can write a binary search"
Binary search solves the problem [of searching within a pre-sorted
array] by keeping track of a range within the array in which T
[i.e. the sought value] must be if it is anywhere in the array.
Initially, the range is the entire array. The range is shrunk by
comparing its middle element to T and discarding half the range.
The process continues until T is discovered in the array, or until
the range in which it must lie is known to be empty. In an
N-element table, the search uses roughly log(2) N comparisons.
'''
def binsearch(data, item):
if not data:
return False
lo, hi = 0, len(data)-1
mid = (hi + lo) // 2
#print(lo,mid,hi)
while item != data[mid] and lo < hi:
lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
if item < data[mid] else
(mid+1, (hi + mid + 1) // 2, hi) )
#print(lo,mid,hi)
#0/0
return item == data[mid]
if __name__ == '__main__':
for dlimit in range(5):
data = list(range(dlimit))
print(data)
for item in data:
assert binsearch(data, item)
for item in list(range(dlimit+1)):
print(item)
assert not binsearch(data, item - 0.5)
What happens with an empty list, and the little plus ones were added after I put in the assert statements, and I was satisfied enough to go on and complete the reading of the blog.
Extra Restrictions
The blog goes on to state that the rules of test are that no testing is to be done. A compiler can be used to ensure that their are no mechanical errors however.
Whoa! I was shocked and then secretly pleased. I use Python, and automatically added the tests as part of my development process. In no way did I think that I could do the task without passing some tests. I am still not saying that my implementation is correct, but I have tested for a lot of corner-cases such as:
- Lists of length zero, one and more than one.
- Items in every position of the list.
- Items outside the list: less than, greater than and between all members of the list.
All I can conclude is that if given the test and access to a compiler/interpreter of your choice then one should argue convincingly for testing any submission as you would never deign to submit any code as complete unless tested. If the testers remain unconvinced then it may be time to stick to your guns as you don't gain by lowering your standards, especially in an interview situation when you know that testing is right!
P.S. I have since had a look at the code of the Python bisect module and the algorithm given on Wikipedia, and they both seem more elegant than mine.
Friday, April 16, 2010
Saturday, April 10, 2010
Tuesday, March 30, 2010
Why Python?
Choose for yourself. Have a look at the examples on the Rosetta Code site. You might like to follow the links to a few pages that I did the task descriptions for as they tend to be meatier tasks, but the examples in different languages should allow you to compare how different languages are used to solve the same problems.
Unfortunately it cannot eliminate the issue of the competence of the individual programmers, but it might provide some useful info.
Have fun :-)
Unfortunately it cannot eliminate the issue of the competence of the individual programmers, but it might provide some useful info.
Have fun :-)
Tuesday, March 16, 2010
Expressions to Truth-Tables
I had a need for the data in the truth tables for a standard cell library, rather like that shown here if you look at say page 35 you will see that for for every Cell it gives the cell name; the cell function as a boolean expression, and the truth-table for the cell.
My cell library was an Infineon one and I had the pdf version of the databook. A quick squirt through pdftotext scrambled the truth table, but did preserve the boolean expression of each cells' function, and the cell name.
I wrote a Python script that used a multi-line regexp to extract each cells info including the boolean equation.
The equation from the Infineon databook uses +,*, and ! for boolean or, and and not operators, but also used lots of parentheses so that operator precedence did not affect the value of the infix expression. I realised that I could substitute for the Python bitwise operators, and form a python expression that could then be evaluated with appropriate values for any input names to create a truth table.
I have left out the parsing and present a function to create a truth-table from a boolean expression:
Going through function truthtable:
- No separate parser required!
My cell library was an Infineon one and I had the pdf version of the databook. A quick squirt through pdftotext scrambled the truth table, but did preserve the boolean expression of each cells' function, and the cell name.
I wrote a Python script that used a multi-line regexp to extract each cells info including the boolean equation.
The equation from the Infineon databook uses +,*, and ! for boolean or, and and not operators, but also used lots of parentheses so that operator precedence did not affect the value of the infix expression. I realised that I could substitute for the Python bitwise operators, and form a python expression that could then be evaluated with appropriate values for any input names to create a truth table.
I have left out the parsing and present a function to create a truth-table from a boolean expression:
from itertools import product
def truthtable(expressionstring):
comp = compile(expressionstring.strip(), '-', 'eval')
inputs = sorted(comp.co_names)
table = {}
for values in product( *([(0,1)] * len(inputs)) ):
valuedict = dict(zip(inputs, values))
answer = eval(comp, valuedict)
table[values] = answer
return expressionstring.strip(), inputs, table
def pptable(ex, inputs, table):
print("\nTRUTH-TABLE FOR EXPRESSION: %r\n" % ex)
print(' ' + ' '.join('%2s' % inp for inp in inputs) + ': OUTPUT')
fmt = ' %i' * len(inputs) + ': %i'
for inp, ans in sorted(table.items()):
print(fmt % tuple(list(inp) + [ans]))
ex, inputs, table = truthtable('(A & (~C)) | D')
pptable(ex, inputs, table)
Going through function truthtable:
- compile() gives access to all the names in the compiled expression via attribute .co_names - this is all the names used in the expression.
- product() allows you to quickly generate the 1/0 value combinations for the input names.
- valuedict assigns a value to each name used in the expression.
- eval evaluates the expression with the given values on each name
TRUTH-TABLE FOR EXPRESSION: '(A & (~C)) | D'
A C D: OUTPUT
0 0 0: 0
0 0 1: 1
0 1 0: 0
0 1 1: 1
1 0 0: 1
1 0 1: 1
1 1 0: 0
1 1 1: 1
- No separate parser required!
Thursday, March 04, 2010
NPR 2002 Puzzle
A puzzle I copied from here via reddit:
1 2 3 4 5 6 7 8 9 = 2002
Put any combination of plus, times, and spaces between the digits on the left to make the identify true.
The answer was easy and was quick to compute.
I realised that you could evaluate all expressions where the numbers 1 through nine are separated by the permutations of either a null string, '', '+' or '*'
The answer came from the following:
So, there are two solutions.
1 2 3 4 5 6 7 8 9 = 2002
Put any combination of plus, times, and spaces between the digits on the left to make the identify true.
The answer was easy and was quick to compute.
I realised that you could evaluate all expressions where the numbers 1 through nine are separated by the permutations of either a null string, '', '+' or '*'
The answer came from the following:
>>> from itertools import product
>>> eqn = '%s'.join(list('123456789')) + '==2002'
>>> eqn
'1%s2%s3%s4%s5%s6%s7%s8%s9==2002'
>>> for ops in product(*[['', '+', '*']]*8):
if eval(eqn % ops):
print(eqn % ops)
1*23+45*6*7+89==2002
1*2+34*56+7+89==2002
>>>
So, there are two solutions.
Thursday, December 03, 2009
What is a first-class function?
There is a debate about it on Wikipedia, and I am curious:
- Does a language have first-class functions if it can use its functions to solve particular problems in a certain manner, or
- Does a language have first-class functions if functions are just as flexible as other types in the language itself?
Saturday, November 28, 2009
Hamming Numbers (and Perl: Why I just ignored it)
I had read the following post: "Perl: Love it, or hate it, but don’t ignore
it.", by Phillip Smith, in which the author wishes Perl
solutions were credited as much as solutions in other languages such as
Python and Ruby.
I agreed that Perl wasn't mentioned as much as it was, and a day or so
later I came across an example of why....
I monitor whats happening in Python, and came across a new recipe added
to Activestate code called: "Recipe 576961: Technique for cyclical
iteration", by Raymond Hettinger. On first, second, and third
readings, i just couldn't understand the need for the deferred_output
function; the whole recipe remained opaque:
Raymond's
code:
I was hooked by the series however and so went looking for more
information.
I came across a mention to the following page: "Infinite
lists in Perl" by Mark Dominus which is copyright 1997 by the
Perl Journal. Normally I wouldn't mention the article, as it spends
more time creating an implementation of generators in Perl as it does
on addressing the computation of Hamming numbers. It also uses the term
"stream" instead of generators. but maybe this is down to the age of
the article, i.e. 1997?
Looking a bit further, I found this Dr Dobbs Article, and this
commented Haskell
solution:
The Haskell solution lead me to think that maybe Python has an issue
with defining recursive generators, and some experimentation lead me to
believe that yep, maybe that was why Raymond has the deferred_output
function.
Since I have trouble understanding Raymond s I went away and wrote my
own solution. It works. It can generate very large values of the
sequence quite quickly. But it does need memory to store
multiples:
Generating the following large values of the sequence took only seconds:
Well, I've learnt a bit more about generators, the limitatins in their
Python implementation w.r.t. Haskel, and I've put more meat on why I
tend to not quote much Perl In this case, i couldn't see the wood for
the trees! (But I do quote Perl).
it.", by Phillip Smith, in which the author wishes Perl
solutions were credited as much as solutions in other languages such as
Python and Ruby.
I agreed that Perl wasn't mentioned as much as it was, and a day or so
later I came across an example of why....
I monitor whats happening in Python, and came across a new recipe added
to Activestate code called: "Recipe 576961: Technique for cyclical
iteration", by Raymond Hettinger. On first, second, and third
readings, i just couldn't understand the need for the deferred_output
function; the whole recipe remained opaque:
Raymond's
code:
from itertools import tee, chain, islice
from heapq import merge
def hamming_numbers():
# Generate "5-smooth" numbers, also called "Hamming numbers"
# or "Regular numbers". See: http://en.wikipedia.org/wiki/Regular_number
# Finds solutions to 2**i * 3**j * 5**k for some integers i, j, and k.
def no_repeats(it):
last = None
for i in it:
if i is not last:
yield i
last = i
def deferred_output():
for i in output:
yield i
result, p2, p3, p5 = tee(deferred_output(), 4)
m2 = (2*x for x in p2) # multiples of 2
m3 = (3*x for x in p3) # multiples of 3
m5 = (5*x for x in p5) # multiples of 5
merged = merge(m2, m3, m5)
combined = chain([1], merged) # prepend a starting point
output = no_repeats(combined) # eliminate duplicates
return result
if __name__ == '__main__':
print(list(islice(hamming_numbers(), 1000))) # Show first 1000 hamming numbers
I was hooked by the series however and so went looking for more
information.
Perl
I came across a mention to the following page: "Infinite
lists in Perl" by Mark Dominus which is copyright 1997 by the
Perl Journal. Normally I wouldn't mention the article, as it spends
more time creating an implementation of generators in Perl as it does
on addressing the computation of Hamming numbers. It also uses the term
"stream" instead of generators. but maybe this is down to the age of
the article, i.e. 1997?
Haskell
Looking a bit further, I found this Dr Dobbs Article, and this
commented Haskell
solution:
hamming = 1 : map (2*) hamming `merge` map (3*) hamming `merge` map (5*) hamming
where merge (x:xs) (y:ys)
| x < y = x : xs `merge` (y:ys)
| x > y = y : (x:xs) `merge` ys
| otherwise = x : xs `merge` ys
The Haskell solution lead me to think that maybe Python has an issue
with defining recursive generators, and some experimentation lead me to
believe that yep, maybe that was why Raymond has the deferred_output
function.
My Python
Since I have trouble understanding Raymond s I went away and wrote my
own solution. It works. It can generate very large values of the
sequence quite quickly. But it does need memory to store
multiples:
from itertools import islice
from pprint import pprint as pp
def regularnumbergen(start = 1, multipliers=(2, 3, 5)):
'''\
Generates the Regular Number sequence defined here:
http://en.wikipedia.org/wiki/Regular_number
The sequence starts with 1 and consists of all multiples by 2, 3, or 5
of any number in the sequence in non-repeating, increasing order.
'''
multiples = [[] for i in multipliers]
this = start
while True:
yield this
# get multiples
for multiple, multiplier in zip(multiples, multipliers):
multiple.append(this * multiplier)
# get next
this = min(m[0] for m in multiples)
# remove this from multiples
for multiple in multiples:
if multiple[0] == this:
del multiple[0]
if __name__ == '__main__':
print(list(islice(regularnumbergen(), 10)))
Generating the following large values of the sequence took only seconds:
>>> print(list(islice(regularnumbergen(), 99999, 100001)))
[290142196707511001929482240000000000000, 290237644800000000000000000000000000000]
>>>
Wrapup
Well, I've learnt a bit more about generators, the limitatins in their
Python implementation w.r.t. Haskel, and I've put more meat on why I
tend to not quote much Perl In this case, i couldn't see the wood for
the trees! (But I do quote Perl).
Subscribe to:
Posts (Atom)