Go deh!: 2010

Tuesday, December 07, 2010

Stupid macro languages

In the Python web world, the debate on whether templating languages should offer programming language type abilities has reared its head again. from "Stupid templating languages", to "Not So Stupid Template Languages", "In Response to "Stupid Template Languages"", and discussed more on reddit here.

In the world of Electronic Design Automation and integrating complex EDA software tools, the EDA companies,, on the whole, have embraced TCL as their scripting tool, but sometimes I have inherited a complex design flow, where, to add control, and later flexibility within that control, proprietary initialization file formats and their parsers have grown by adding their own macro language type features and grown to need the power of something like the bash shell. Rather than ditching their proprietary lashed-together, one-off languages for something that is tried and tested and used/debugged by many - such as bash/python/Tcl/Lua they persevere by adding more capability to an in-house, one project language.

In the case of web templating the main argument is that given an intelligent web templating language, people are apt to migrate what should be business logic into the templates thereby muddying the separation of the business logic from the presentation layer. In the EDA field the problem is often the maintenance of the proprietary macro language and the proliferation of languages with which one needs to be familiar with in order to master a design environment.

I won't comment more on the web templating debate, but when it comes to initialization file formats and or initialization scripts then you should stick to pre-existing standards such as .ini files or .csv files in the simpler cases, or use pre-existing scripting languages when you need more power such as bash scripts, Python, or Tcl.

Monday, October 04, 2010

Ipod Tablet Opportunity?

The ipad is close to £600 in the UK. The ipod touch is less than £200. Given that there are tablet computers on sale for less than £200, and inspired by news of a future ipod touch add-on that gives it phone capabilities; how about designing a ten-inch screen, extra battery, and phone dongle that would take a (partially disassembled), ipod touch inside it and turn it into a ten inch ipad-alike?

One problem I see is that you would probably have to provide a service to take the users ipod touch and disassemble/assemble it inside the ten-inch "dock"; but Apple seems to provide a lot of monetary difference to try and make a profit.

Just a thought!

Saturday, September 18, 2010

Regression runners: Whot! No Job Arrays?

I went from running regressions consisting of greater than 50K simulations nightly using Mentors QuestaSim, where I had written the regression framework and results preparation myself - to my first real Cadece vManager controlled regression project.

I should explain to my readers who usually see Python focused blog entries, that this is about the running of large regressions on compute farms for verifying ASICs (although my regression runner used Python extensively).

I did like the more minimal, HTML-based GUI on the tool - I've spent too much time creating work-arounds for flashy GUI's that don't work outside a narrow area, and so know they can be a double-edged sword. The tool needs a lot of configuration behind the scenes, and they have chosen to do this using their own vsif file format which seems to me to be a design fault. If they had gone for XML (gosh is that me advocating the use of XML), or YAML, then the format would be as readable and more easily used in other tools - no tool is an island in a design flow.

The main problem with vManager, is with its use of LSF. LSF is a separate tool used to harness a network of computers into a compute cluster. LSF runs jobs on the cluster, managing the cluster to give maximum job throughput. vManager can turn each test that needs to run on the design into a possible series of jobs for LSF, but it fails to use job arrays! Job arrays allow for many similar jobs to be submitted and controlled as one array of jobs where the indexing of the individual job in the array is used to introduce a variation in the test.

In my regression framework I created arrays of over 100K jobs, that took several weeks to run, and all the time that the job array at the heart of my regression was running, IT could change the parameters of my job array to control its use of resources according to changing priorities; scale back the maximum number of concurrent simulations during the 9-till-5; ramp up during weekends, (or stop then resume for a 'planned' license outage). IT had all the information they need to show the full extent of the regression and LSF will give the throughput of the array from which I can estimate completion times. Another engineer wishing to start a regression has immediate feedback of what else is running rather than a huge list of hundreds of jobs that it is difficult to make sense of. In short, ASIC verification regressions map naturally to Job Arrays.

The other problem I have with vManager is a missing feature: Randomization of the order that tests are run. If you have 100 tests to run on five areas of the design then it is natural to compose the vsif file that describes each test in some sort of orderly progression through the tests to run. When the regression is running and you are monitoring partial results then it is easier to get a sense of the overall result if the individual tests are run in a random order. In my regression framework I generated tests in an order, then randomized the set of tests based on Pythons excellent Mersenne Twister implementation. By monitoring results during previous runs I could judge how much of a regression had to run if we needed results in a specified time.

A bit of background (re?)search showed that Mentor have their own regression running framework, that also doesn't use job arrays; and that the Grid Engine cluster management tool supports job arrays.

Monday, August 23, 2010

McCarthy's amb operator in Python

From what I can gather after some reading [1], [2], [3]; McCarthy's amb operator, would allow you to give the possible values for variables; then give functions that constrain the values; and finally extract values for those variables that satisfy the constraints just by calling amb without any arguments.

I decided to work backwards from how I thought it should be used to an implementation.

Given the problem of finding Pythagorean triples for integers <=10, i.e define three variables x, y, and z that can have values 1..10 subject to the constraint that x*x +y*y == z*z. What are the possible values of x, y and z that satisfy the constraint? For an amb-type solution adapted a little to python, being amb-like is all in the way that the problem is expressed. I would like to write something like:


print("\nSmall Pythagorean triples problem")
# Set rage of global variables
x = amb(range(1,11))
y = amb(range(1,11))
z = amb(range(1,11))

# Extract solutions to global variables and print
for _dummy in amb( lambda x, y, z: x*x + y*y == z*z ):
print x, y, z

I took a little liberty in the way I had seen other implementations work, as I had seen examples where the call of amb with an argument of a function just set things up and 'registered' the function, then subsequent calls of amb() without any arguments, would set the global variables used in the lambda function to successive values that satisfy the constraint. I reasoned that calls to amb after registering the constraint function should follow the Python iterator protocol ( with an __iter__ and a next method), so all the values, if any, could be accessed in a loop. I liked the idea of automatically setting global variables so decided to give that a go (but I was prepared to jettison this setting global variables if it proved long-winded).

The solution follows. To modify global names, I noted that functions have f.func_globals which is the globals environment in which the function was created. I get this from the lambda expression. I also cheat a little in requiring the parameter names of the constraint function to be the global names that were previously assigned value ranges.


import itertools as _itertools

class Amb(object):
 def __init__(self):
     self._names2values   = {}       # set of values for each global name
     self._func           = None     # Boolean constraint function
     self._valueiterator  = None     # itertools.product of names values
     self._funcargnames   = None     # Constraint parameter names

 def __call__(self, arg=None):
     if hasattr(arg, 'func_globals'):
         ##
         ## Called with a constraint function. 
         ##
         globls = arg.func_globals
         # Names used in constraint
         argv = arg.__code__.co_varnames[:arg.__code__.co_argcount]
         for name in argv:
             if name not in self._names2values:
                 assert name in globls, \
                        "Global name %s not found in function globals" % name
                 self._names2values[name] = globls[name]
         # Gather the range of values of all names used in the constraint
         valuesets = [self._names2values[name] for name in argv]
         self._valueiterator = _itertools.product(*valuesets)
         self._func = arg
         self._funcargnames = argv
         return self
     elif arg is not None:
         ##
         ## Assume called with an iterable set of values
         ##
         arg = frozenset(arg)
         return arg
     else:
         ##
         ## blank call tries to return next solution
         ##
         return self._nextinsearch()

 def _nextinsearch(self):
     arg = self._func
     globls = arg.func_globals
     argv = self._funcargnames
     found = False
     for values in self._valueiterator:
         if arg(*values):
             # Set globals.
             found = True
             for n, v in zip(argv, values):
                 globls[n] = v
             break
     if not found: raise StopIteration
     return values

 def __iter__(self):
     return self

 def next(self):
     return self()

if __name__ == '__main__':
 if True:
     amb = Amb()

     print("\nSmall Pythagorean triples problem:")
     x = amb(range(1,11))
     y = amb(range(1,11))
     z = amb(range(1,11))

     for _dummy in amb( lambda x, y, z: x*x + y*y == z*z ):
         print x, y, z


 if True:
     amb = Amb()

     print("\nRosetta Code Amb problem:")
     w1 = amb(["the", "that", "a"])
     w2 = amb(["frog", "elephant", "thing"])
     w3 = amb(["walked", "treaded", "grows"])
     w4 = amb(["slowly", "quickly"])

     for _dummy in amb( lambda w1, w2, w3, w4: \
                          w1[-1] == w2[0] and \
                          w2[-1] == w3[0] and \
                          w3[-1] == w4[0] ):
         print w1, w2, w3, w4

 if True:
     amb = Amb()

     print("\nAmb problem from "
         "http://www.randomhacks.net/articles/2005/10/11/amb-operator:")
     x = amb([1, 2, 3])
     y = amb([4, 5, 6])

     for _dummy in amb( lambda x, y: x * y != 8 ):
         print x, y

Program output looks like this:


Small Pythagorean triples problem:
3 4 5
4 3 5
6 8 10
8 6 10

Rosetta Code Amb problem:
that thing grows slowly

Amb problem from http://www.randomhacks.net/articles/2005/10/11/amb-operator:
1 4
1 5
1 6
2 5
2 6
3 4
3 5
3 6

So, what you end up with is a declarative style of programming that is good as an exercise, but the solving algorithm is just a brute-force search over all permutations of possible inputs.

Saturday, July 24, 2010

Python and EDA" at EuroPython

I presented on Thursday at EuroPython. from the feedback at the end it seemed to go well.

I've been told that the video will go up on the EuroPython web site by the end of next week which should be "interesting", as it would be the first time that I've watched myself present, (and I don't do many conference presentations).

What I found different was that I am used to presenting to others within the industry who know about RTL, VHDL, digital simulation, coverage, ... To address this Python-savvy audience I re-wrote a presentation on coverage ranking of Verilog/VHDL and cast it in terms of getting coverage of a Python program and ranking the coverage of three Python tests of the Python program. I then walked them through the ranking algorithm for this much simplified case, then tried to show some of the extra work that needs to be done for the real app.

I decided to not show a more polished version of the script. I showed a version where the main functionality could easily be split into more functions for readability; a psyco import does effectively nothing ... The reason was that what I showed was what I did to convince myself that the algorithm was correct and the speed was fast enough. I would have chucked it at this stage if it was slow. (The need was speed of execution as well as an acceptable algorithm for ranking).

The finish was a "winding down" example of just what you can do when you decide to add documentation for a file format in with the program.

I would like to thank my audience for making someone from an industry they may not have known, feel welcome; and the EuroPython organisers, especially John Pinner, for making it happen.

Sunday, April 25, 2010

More on binary search

I wrote this post: Am I one of the 10% of programmers who can write a binary search? as a reply to: Are you one of the 10% of programmers who can write a binary search? by Mike Taylor.

Since then, Mike has followed up with this: Testing is not a substitute for thinking which starts as a summary of the responses to the earlier post then moves on to his own views on testing.

There is mention of some great work by Darius Bacon who tests around twenty solutions gathered, painstakingly, from the comments. Darius' program is great work, but I thought that my testing strategy might be better so modified his code to:

Add my attempt as function: paddy3118
Graft on a version of my test routine to his list of function variants

The delta code was added to the end and looks like the following:


def paddy3118(data, item):
   if not data:
       return -1
   lo, hi = 0, len(data)-1
   mid = (hi + lo) // 2
   #print(lo,mid,hi)
   while item != data[mid] and lo < hi:
       lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
                       if item < data[mid] else
                       (mid+1, (hi + mid + 1) // 2, hi) )
       #print(lo,mid,hi)
   #0/0
   return mid if item == data[mid] else -1
# PASSES test()
#test(paddy3118)

#print sorted(k for k,v in globals().items() if type(v)==type(dan) and 'test' not in k)
tests = (Max, aaron_maxwell, ashish_yadav, ben_gutierrez, clark,
        dan, dave_r, ed_marshall, guilherme_melo, jasper, martin,
        michael_fogleman, paddy3118, patrick_shields, paul,
        rodrigo_b, scott, si, tomas_henriquez,
        travis_wrapped, yc)

if __name__ == '__main__':
   totpass = 0
   for t in tests:
       fails =0
       for dlimit in range(5):
           data = list(range(dlimit))
           for item in data:
               try:
                   assert t(data, item) != -1
               except:
                   fails += 1
                   print("  ERROR: %s FAIL when looking for %i in %r" %
                         (t.func_name, item, data))
                   break
           for item in list(range(dlimit+1)):
               try:
                   assert t(data, item - 0.5) == -1
               except:
                   fails += 1
                   print("  ERROR: %s FAIL as %3.1f is found in %r" %
                         (t.func_name, item - 0.5, data))
                   break
           if fails:
               break
       if not fails:
           totpass += 1
           print('PASS: %s' % t.func_name)
   print("\n%i/%i = %3.1f%% of functions pass" %
         (totpass, len(tests), 100 * totpass / float(len(tests))))
   '''
    PASS: Max
      ERROR: aaron_maxwell FAIL as -0.5 is found in []
    PASS: ashish_yadav
    PASS: ben_gutierrez
      ERROR: clark FAIL when looking for 0 in [0]
      ERROR: dan FAIL as -0.5 is found in []
      ERROR: dave_r FAIL as -0.5 is found in []
      ERROR: ed_marshall FAIL as -0.5 is found in []
      ERROR: guilherme_melo FAIL as -0.5 is found in []
      ERROR: jasper FAIL as -0.5 is found in []
      ERROR: martin FAIL as -0.5 is found in []
    PASS: michael_fogleman
    PASS: paddy3118
      ERROR: patrick_shields FAIL as -0.5 is found in []
    PASS: paul
      ERROR: rodrigo_b FAIL as -0.5 is found in []
      ERROR: scott FAIL as 1.5 is found in [0, 1, 2]
    PASS: si
      ERROR: tomas_henriquez FAIL when looking for 0 in [0, 1]
      ERROR: tomas_henriquez FAIL as 1.5 is found in [0, 1]
      ERROR: travis_wrapped FAIL when looking for 0 in [0]
    PASS: yc

    8/21 = 38.1% of functions pass
    '''

I find that i cannot agree with a lot of Mikes writing on test driven development as he seems to take an extreme position by separating test from development too much.
When I was creating my function, i new that there were likely to be issues with off-by-ones when modifying range indices, and with handling of an empty data list. I chose to leave a 'mental place-holder' on the off-by-one issues as I thought i could easily write tests to exercise those corners and fill them in. The alternative would be to mentally interpret those corner cases to write them down in 'development', then still have to thoroughly test for them anyway.

When writing my tests:

I thought about the division of the range of the indices- and thought that I should use data of both odd and even lengths, and of sufficient length so that a range might be divided more than one.
I thought about all the comparisons - and decided to test for every item that is in the data.
I thought about the significant ways that an item might not be in the data - and decided to test for items between each and every data item, an items below the smallest and above the largest data items.

My paddy3118 function passed the Darius tests. My test routine failed the guilherme_melo test that was passed by Darius.

The only conclusion I can bring, is that you need to do some white box testing, i.e. tests based on knowledge of the code too. Mike Taylor needs to recognise that imposing rigid categories might well lead to extremes. I guess, their is no substitute for experience, but the unexperienced need to be given rules until they can learn enough to break them - Do we teach testing up front? Or not? (Remembering that given half a chance, testing by the novice will be inadequate).

- Paddy.

Tuesday, April 20, 2010

Am I one of the 10% of programmers who can write a binary search?

I just read this post where the blogger read a passage in a book that stated that only 10% of programmers could write a binary search from the description. I read up to the description of the binary search algorithm given in the book then thought I would try and implement it without finishing the reading of the rest of his blog entry.

I took around half an hour to come up with the following:

'''
 from: http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/
 "Only 10% of programmers can write a binary search"

    Binary search solves the problem [of searching within a pre-sorted
    array] by keeping track of a range within the array in which T
    [i.e. the sought value] must be if it is anywhere in the array.
    Initially, the range is the entire array.  The range is shrunk by
    comparing its middle element to T and discarding half the range.
    The process continues until T is discovered in the array, or until
    the range in which it must lie is known to be empty.  In an
    N-element table, the search uses roughly log(2) N comparisons.

'''

def binsearch(data, item):
 if not data:
     return False
 lo, hi = 0, len(data)-1
 mid = (hi + lo) // 2
 #print(lo,mid,hi)
 while item != data[mid] and lo < hi:
     lo, mid, hi = ( (lo, (lo + mid) // 2, mid)
                     if item < data[mid] else
                     (mid+1, (hi + mid + 1) // 2, hi) )
     #print(lo,mid,hi)
 #0/0
 return item == data[mid]

if __name__ == '__main__':
 for dlimit in range(5):
     data = list(range(dlimit))
     print(data)
     for item in data:
         assert binsearch(data, item)
     for item in list(range(dlimit+1)):
         print(item)
         assert not binsearch(data, item - 0.5)

What happens with an empty list, and the little plus ones were added after I put in the assert statements, and I was satisfied enough to go on and complete the reading of the blog.

Extra Restrictions

The blog goes on to state that the rules of test are that no testing is to be done. A compiler can be used to ensure that their are no mechanical errors however.

Whoa! I was shocked and then secretly pleased. I use Python, and automatically added the tests as part of my development process. In no way did I think that I could do the task without passing some tests. I am still not saying that my implementation is correct, but I have tested for a lot of corner-cases such as:

Lists of length zero, one and more than one.
Items in every position of the list.
Items outside the list: less than, greater than and between all members of the list.

I of course assume that items are comparable, and that data is sorted and subscript-able.

All I can conclude is that if given the test and access to a compiler/interpreter of your choice then one should argue convincingly for testing any submission as you would never deign to submit any code as complete unless tested. If the testers remain unconvinced then it may be time to stick to your guns as you don't gain by lowering your standards, especially in an interview situation when you know that testing is right!

P.S. I have since had a look at the code of the Python bisect module and the algorithm given on Wikipedia, and they both seem more elegant than mine.

Friday, April 16, 2010

The Securities and Exchange Commission is to mandate the use of Python?

Read on

Saturday, April 10, 2010

Smartphone Programming, Compare and Contrast

Apple: Steve Jobs Has Just Gone Mad
Android: Android Scripting Environment brings scripting languages to Android.
Nokia: Python for S60

Do you still want that Ipad?

Tuesday, March 30, 2010

Why Python?

Choose for yourself. Have a look at the examples on the Rosetta Code site. You might like to follow the links to a few pages that I did the task descriptions for as they tend to be meatier tasks, but the examples in different languages should allow you to compare how different languages are used to solve the same problems.

Unfortunately it cannot eliminate the issue of the competence of the individual programmers, but it might provide some useful info.

Have fun :-)

Tuesday, March 16, 2010

Expressions to Truth-Tables

I had a need for the data in the truth tables for a standard cell library, rather like that shown here if you look at say page 35 you will see that for for every Cell it gives the cell name; the cell function as a boolean expression, and the truth-table for the cell.

My cell library was an Infineon one and I had the pdf version of the databook. A quick squirt through pdftotext scrambled the truth table, but did preserve the boolean expression of each cells' function, and the cell name.

I wrote a Python script that used a multi-line regexp to extract each cells info including the boolean equation.

The equation from the Infineon databook uses +,*, and ! for boolean or, and and not operators, but also used lots of parentheses so that operator precedence did not affect the value of the infix expression. I realised that I could substitute for the Python bitwise operators, and form a python expression that could then be evaluated with appropriate values for any input names to create a truth table.

I have left out the parsing and present a function to create a truth-table from a boolean expression:


from itertools import product

def truthtable(expressionstring):
  comp = compile(expressionstring.strip(), '-', 'eval')
  inputs = sorted(comp.co_names)
  table = {}
  for values in product( *([(0,1)] * len(inputs)) ):
      valuedict = dict(zip(inputs, values))
      answer = eval(comp, valuedict)
      table[values] = answer
  return expressionstring.strip(), inputs, table

def pptable(ex, inputs, table):
  print("\nTRUTH-TABLE FOR EXPRESSION: %r\n" % ex)
  print(' ' + ' '.join('%2s' % inp for inp in inputs) + ': OUTPUT')
  fmt = '  %i' * len(inputs) + ':  %i'
  for inp, ans in sorted(table.items()):
      print(fmt % tuple(list(inp) + [ans]))

ex, inputs, table = truthtable('(A & (~C)) | D')
pptable(ex, inputs, table)

Going through function truthtable:

compile() gives access to all the names in the compiled expression via attribute .co_names - this is all the names used in the expression.
product() allows you to quickly generate the 1/0 value combinations for the input names.
valuedict assigns a value to each name used in the expression.
eval evaluates the expression with the given values on each name

The output of the program is:


TRUTH-TABLE FOR EXPRESSION: '(A & (~C)) | D'

  A  C  D: OUTPUT
  0  0  0:  0
  0  0  1:  1
  0  1  0:  0
  0  1  1:  1
  1  0  0:  1
  1  0  1:  1
  1  1  0:  0
  1  1  1:  1

- No separate parser required!

Thursday, March 04, 2010

NPR 2002 Puzzle

A puzzle I copied from here via reddit:
1 2 3 4 5 6 7 8 9 = 2002
Put any combination of plus, times, and spaces between the digits on the left to make the identify true.
The answer was easy and was quick to compute.

I realised that you could evaluate all expressions where the numbers 1 through nine are separated by the permutations of either a null string, '', '+' or '*'

The answer came from the following:


>>> from itertools import product
>>> eqn = '%s'.join(list('123456789')) + '==2002'
>>> eqn
'1%s2%s3%s4%s5%s6%s7%s8%s9==2002'
>>> for ops in product(*[['', '+', '*']]*8):
    if eval(eqn % ops):
        print(eqn % ops)

        
1*23+45*6*7+89==2002
1*2+34*56+7+89==2002
>>>

So, there are two solutions.

Go deh!