Go deh!: 2008

Monday, December 29, 2008

Experiences converting a C lex/yacc parser for Verilog to pure Python

href="http://en.wiktionary.org/wiki/back_in_the_day">Back in
the day, simulators only had point delays, so all gate
libraries had buffers on their I/O. Now, we tend to use path, or
pin-to-pin delays in gate level models things have been made more
difficult for me.

My task was to randomly inject
some stuck-at faults in a large gate level Verilog design for
simulation. Individual gates of the design are connected by nets and
although I can force a net I cannot force a node as all nodes
on a net are affected by a force on the net, and you cannot get at
individual nodes by looking to force a net inside a
particular gate model as their are no input buffers. The
other problem with forcing nets/nodes using interactive commands of the
simulator is that you have to compile the design sub-optimally as all
the nets have to be accessible which stops the compiler/elaborator from
performing certain speed optimizations on what will be an execution
speed critical set of simulations.

Brainstorming
with a colleague we came up with the concept of modifying the netlist
to insert logic to either pass-through or force its out put to either 1
or 0 from an external control.

So off I went to href="http://embedded.eecs.berkeley.edu/Alumni/pinhong/scriptEDA/">ScriptEDA,
(great page - awful backdrop), and installed his href="http://www-cad.eecs.berkeley.edu/%7Epinhong/scriptEDA/pythongnd-0.35.tar.gz">Python
Interface for Gate-level Netlist Engine., which I then tried
on my gate level netlist.. Unfortunately our gate-level netlist failed
to parse because it used concatenations of nets which the parser
couldn't handle. I would need to extend the parser to support
net concatenations.

Looking at the
source of the parser it was a C program using href="http://en.wikipedia.org/wiki/Flex_lexical_analyser">lex
and yacc
creating C data structures which were then interfaced to Python via a href="http://www.swig.org/">SWIG generated
wrapper. I don't like to maintain C programs , and I needed
to flex my parsing abilities so decided to spend a short time
researching a pure Python solution.

PLY
The
obvious choice of Python parsing tool seemed to be PLY as it
stands for Python Lex Yacc, so a download later and poking around in
the install, I came across href="http://code.google.com/p/ply/source/browse/trunk/example/yply/yply.py?r=43">yply.py:
style="margin-left: 40px;">

yply.py

This example implements a program yply.py that converts a UNIX-yacc
specification file into a PLY-compatible program.  To use, simply
run it like this:

   % python yply.py [-nocode] inputfile.y >myparser.py

The output of this program is Python code. In the output,
any C code in the original file is included, but is commented out.
If you use the -nocode option, then all of the C code in the
original file is just discarded.

To use the resulting grammer with PLY, you'll need to edit the
myparser.py file. Within this file, some stub code is included that
can be used to test the construction of the parsing tables. However,
you'll need to do more editing to make a workable parser.

Disclaimer:  This just an example I threw together in an afternoon.
It might have some bugs.  However, it worked when I tried it on
a yacc-specified C++ parser containing 442 rules and 855 parsing
states.

yply.py
worked on converting the ScriptEDA yacc source (after an edit to remove
some comments in the .y file that it could not handle). It saved me a
lot of work and saved me from transcription errors. I then
wrote the Python lexer based on the C, and created Python
datastructures to represent the parsed netlist. After
getting everything parsing I experimented until I got the net
concatenation for module ports and instance pins working. All the
classes used for the pares datastructure new how to print themselves in
valid Verilog so I could now read in a gate-level Verilog netlist
then write it out in my 'standard' form.

I
then documented the parsed datastructures and worked out how to
randomly insert my 'tie' gate in an instance port connection for any
input or output instance port by selective modifications of the parsed
netlist. By using the Python standard difflib module I
generated the standard form of the unmodified netlist and compared it
to the modified netlist as an aid when debugging .

After
a weeks work I now have a completely Python based tool to
programmatically modify our gatelevel netlists. Their is no C level
maintenance envisaged and the tool should be flexible enough
to meet future needs - in the past, I may have not contemplated netlist
modification solutions to problems because of a lack of a suitable
parser. PLY turned out to be straight-forward to use

Thursday, December 04, 2008

Debugging vs inserting print statements

After reading this blog entry from Corey Goldberg, you got some people saying use debuggers instead of print statements.

So why can't you put that same print statement in the debugger and have it show up as an inserted print statement in the source? Why can't you do that for any code to insert? Rather than a breakpoint appearing as a blob to the side, you should have the option of it appearing as inserted code, maybe made visually distinct via shading and containing the logic inherent in the breakpoint such as coded indication that it counts down and only fires on the 20'th pass and then executes this code without pausing the run (or not).

A debugger breakpoint that doesn't stop the simulation but prints some text and the value of a few variables could then look like the insertion of a print statement in your source.

If you could make the interface work the other-way round too, i.e. stick the IDE into Add Debug Text mode, then code added would automagically be turned into equivalent statements executed by the debugger, then you could really fight those pesky inserters of print statements for debugging purposes by giving them a debugger whose interface works that way anyway ;-)

</only_slightly_tongue_in_cheek>

Yet Another Blog Entry On Doctests.

Doctests are a current hot Python blog topic. I thought I would point out that many of the bloggers, especially those pointing out its problems, seem to be seasoned Pythoneers, aware of alternatives and so able to compare doctesting with other forms of testing..

I've found that there is a significant number of programmers out there who just don't write reusable and maintainable tests at all. They may test to ensure their program 'works' when they wrote it, but that is all - job done, finito!

Doctests perform the very important task of lowering the entry point to the subject of testing. Novice pythoneers ready for exposure to testing are likely to have used the python shell to "try things out"; it is a small cognitive step to point out that by cutting and pasting those explorations from the shell into a docstring and adding some "magic sauce" to invoke it, you get a test. A trainer can then move on to the "why" of tests with the student thinking "I do most of it anyway", rather than thinking that it is yet more to learn

I am not saying that doctest is without flaws. I am saying that it is still a good tool, without being perfection.

Tuesday, December 02, 2008

Knapsack Problem

The Rosetta Code Blog this week mentioned that entries starting with a K were missing so I reviewed my sources and added a knapsack type problem to RC together with the python solution.

It's not the first time that I've remembered that itertools.product can do arbitrary nested for loops. Before, I would have had to employ recursive function calls.

- Paddy.

Sunday, November 30, 2008

New year. New PC

I'm thinking of retiring my trusty 17" laptop as my main PC and going quad core.I want to do it on the cheap, so was thinking of building from parts for hopefully arounf £400 and adding this monitor, or this TV/monitor. The screens are made by the same company and look very much the same except the TV version has extra inputs and a TV tuner. I wonder if anyone has this TV connected to a PC and can comment?

I don't play graphic-intensive games so can get away with a basic graphics card.

I want the quad cores to try out more parallel programming. I have a VCD to toggle count program that reads a large VCD file . I would like to work out how to accurately do this on the four cores and see how close to linear speed-up I can get. Our regression simulations at work leave behind approximately 20K simulation logs and coverage databases - I'm starting to think of ways of data-mining them which would be aided by parallel processing (most of works compute farm PC's have between 8 and 16 cores).

Priority would go to getting a little X-at-a-time script moved from bash to Python. Given a list of jobs to do - one per line in a file; the original bash script will execute X of them in the background, starting more, when others finish.

So, if you have any ideas of cheap bundles for creating quad-core PC's (In the UK), fire away :-)

- Paddy,

Saturday, November 15, 2008

Smearing the Static vs Dynamic Typing Dichotomy

CPython
needs to ship with a compiler!
There, I've said it. But what
are my reasons?

I truly believe that your
programming language influences the way you go about solving a problem,
and their is a mindset to programming in a dynamic language like
Python/Perl/TCL/AWK/... that can lead to different types solutions (and
even perceived problems), than programmers in say
Java/C/C++/C#. (OK some of the differences are also down to other
factors such as IDE's and OS which also affect how you solve an issue).

Lately
I have seen href="http://www.25hoursaday.com/weblog/2008/01/02/DoesC30BeatDynamicLanguagesAtTheirOwnGame.aspx">blog
posts commenting on the 4.0 release of C# will be
adding dynamic type resolution capabilities via a keyword that will
allow C# to finally do Duck Typing. The href="http://boo.codehaus.org/">Boo language has
done this trick from its outset. Thinking about it though, I don't
think it will make C# programmers style change to mimic that of
dynamic language programmers as they would have to stick their keyword
on every definition and leave behind their perceived safety of run-time
type checking. They will, I think,rather use it as another resource
that they can dip into, forming rules/design patterns on where it fits
best and use it from overwhelmingly statically typed source code.

When
I look critically at Python, (yep - I do do that from time to time),
Python 3.0 has optional type decorations - but no compiler,
and Python distributions such as Enthoughts, include external modules
to help in creating compiled code to link to Python,, but we need to
have a complete solution out of the box i.e. CPython should be shipped
with a defined method of compiling 'Python-like' source code into
machine code as part of its standard library and enshrined in the
reference guide. That would allow Python programmers to do the opposite
of what the C# programmers can do: write a mainly dynamic program -
with dynamic language idioms, but be able to seamlessly have part of
their recognisable Python code compiled . Maybe all it would take is
the addition of a new static
keyword to apply to classes/functions/modules.

You
would need to think of static/dynamic as a continuum where instead of
having languages bunched at either end, you would have a smearing out
towards the middle.

Existing languages have too much
static vs dynamic baggage, but maybe a new language with type
inferencing and where you were forced to first define the default type
checking regime as either dynamic or static, then be given the option
of declaring particular 'things' as being at either run-time or
compile-time would lead to a language that might breed new
programmers with more interesting ways to solve problems.

-
Paddy.

Sunday, November 02, 2008

Comments on a "Python Tutorial"

Hi,
I thought I'd go through your tutorial on Python and make some notes if thats OK.

"No ++ / --":

I think this was the same kind of decision as Python not allowing assignments in if conditionals. There is a a high instance of problems caused by confusion over the misuse of increment/decrement operators in languages like C. They can make it harder to read a program.

"Slow in comparison to compiled languages" - (Here comes the dynamic language supporters usual reply): Yes, but often getting something right is much more important and often never requires the speep that Python gives you. People have prototyped in Python and shipped the prototype as it fits all quality metrics in a far shorter development time. I recently had a case of that myself here.

"Whitespace for indentation": Arguably a positive feature. It should read "whitespace for block delineation" or something. Run a pretty printer on say a C source file and it will try and indent it so that the visible level of indentation mirrors the logical structure of the code. Code is hard to read if the indentation does not follow the logic. Python recognises this and goes further in saying that the common use of block start and end delimeters such as {/} or begin/end are superfluous and only help in writing misleading code.

You seem to have missed the importance of docstrings in Python.

range "(good for indexing)": True, but in idiomatic Python you are much more likely to iterate over the members of a collection object rather than create an index using range:

for x in my_list:

rather than:

for i in range(len(my_list)):

....x = my_list[i]

"Functions work in python much like they do any other language": those with only Java and C experience might miss out on the extended argument passing of Python functions if not made aware.

"Classes are more basic than what you will be use to from java": In what way? There maybe less to learn to become equally proficient but I see this as a good thing. Note that Python is dynamic, meaning that, if you wanted to, you can modify classes and instances at run-time. Its not necessarily a X language does classes better than Y. Python does things differently and if you try and constrain Python to how you use OO in Java, you won't get the best from it.

"from os import *": It's a bad Python habit. You want to warn students off this, as maintenance can suffer.

"We can raise our own exceptions": It could be but shouldn't be a string. best to use an exception type.

"Numbers": There are decimals too, which are good for financial calculations.

"Tuples": If each position in a sequence has explicit meaning then you should use a tuple, e.g. a point on a surface might be the tuple (x,y). coordinates of a car would more likely be a list of points.

"Strings": Misses triple quoted strings.

Filter and map, although present, have declined in use due to the later addition of the powerful list comprehension and generator comprehension features.

One-liners in Python: Difficult to do. Usually people write multi-line scripts in an editor. In a shell such as bash which has multi-line string delimeters you might be able to use a multi-line single quote for bash and embed a multi-line python program using Pythons -c argument and restricting your Python to double quotes if needed.

Have fun with Python.

- Paddy.; November 2, 2008 5:26 AM
Paddy3118 said...: Ouch!

My notes are longer than your blog entry :-)

Thursday, October 30, 2008

Dear Americans

Please vote for Obama.

Whoever you vote for - our British Prime Minister has to ~~suck-up to~~ support, and it has been painful, as a Brit, watching what our elected government has had to do these past few years. Unlike the French or the Germans, we have had to go along with so many of your current administrations schemes, and so have been seen as no more than a lickspittle.

It is time to make America great again! (We Brits can't stand the reflected ridicule).

P.S. The obligatory Python angle: If you are going to vote electronically, then use Open Source tools such as pvote.

Tuesday, October 21, 2008

On Regression Test Optimization/ It's My Data

When regression testing a large ASIC using constrained random test techniques (or directed tests for that matter), we collect coverage metrics on each seeded test and routinely rank, (or optimize), the tests to find out what randomly generated tests are 'significant'.

The usual meaning of significant is "what is the smallest set of tests that would give the same overall coverage". The tools don't try every combination of tests, (I have used a simple greedy algorithm). This got me thinking of what other ways I might want to calculate the test rankings.

I came up with two other ways of ranking tests that might be useful:

File centred ranking: Using coverage results expressed on a per-source-file basis, extract the ranking of tests for full coverage of each source file individually; finally rank just the tests contributing to individual files on how they contribute to all files of the design.
Instance centred ranking: Using coverage results expressed on a per-instance basis, extract the ranking of tests for full coverage of each instance individually; finally rank just the tests contributing to individual instances on how they contribute to all instances in the design.

(Lets call what we normally do, "Design centred ranking", as the ranking is looking at all the tests coverage of the whole design).

File and instance centred ranking leave behind useful information for the designers, what tests target parts of the design they are working on. I guess (because this is another thought experiment), that FCR and ICR would never give smaller ranked test lists than DCR.

All three of the above methods of ranking tests could do with flagging which tests provide sole coverage for a cover point. Some tests may cover points that other tests cover, other tests may cover points that only that test covers it might be useful to be able to show this information.

Its my Data

If the tool vendors supported an interface to Python, then data mining within your coverage results would be a lot simpler, and get us part way to the kind of environment Scientists use in SciPy. Expressing test coverage as Python sets, you could roll your own coverage processing.

Friday, October 03, 2008

Max Licenses In Use

I just created the task, an example log file and Python solution for this new programming task on Rosetta Code.

The task is a simplified version of something I solved at work when I thought we were being overcharged.

Tuesday, September 30, 2008

"Frequently Forgotten Fundamental Facts about Software Engineering"

This has good rules of thumb about software production.

Thursday, September 25, 2008

How Nerdy are you

I am nerdier than 93% of all people. Are you a nerd? Click here to find out!

They obviously have it wrong. I have a family, and, and ...

Tuesday, September 23, 2008

Python Fractions issue.

There
seems to be a problem/difference in calculating with the new fractions
module when comparing Python 26rc2 and 30rc1

I was
reading the paper " href="http://conference.scipy.org/proceedings/SciPy2008/paper_3/full_text.pdf">Interval
Arithmetic: Python Implementation and Applications" and
thought to try the first example function f(x,y), which needs more
precision than Python floating point provides (on my Windows PC), to
calculate a sensible answer.

Ahah, I thought, lets
try this with Fractions - so I first installed Python 30rc1, typed in
the Function - saw the rubbish result, then used Fractions and got the
'right' result.
I then wondered if fractions had been
back-ported to Python 2.6rc2, found that it had, but got different
results.

As you can see from the table, Python
2.6rc2 has a problem calculating t3 = F(11,2) * y**8 + x/(2*y), where F
is Fraction.

I wonder where the problem
lies?

style="text-align: left; margin-left: auto; margin-right: auto; font-family: monospace;"
border="4" cellpadding="2" cellspacing="2"> style="white-space: nowrap; text-align: left; vertical-align: top;">Python
3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit (Intel)]
on
win32
Type "help", "copyright", "credits" or "license" for
more information.
>>> from fractions
import Fraction as F
>>>
def f(x,y):
...     return
(
...
(333.75 - x**2)* y**6 + x**2 *
...
(11* x**2 * y**2 - 121 * y**4 - 2)
...
+ 5.5 * y**8 + x/(2*y))
...
>>>
f(77617.0, 33096.0)
1.1726039400531787
>>>
def f2(x,y):
...
return (
...
(333 + F(3,4) - x**2)* y**6 + x**2 *
...
(11* x**2 * y**2 - 121 * y**4 - 2)
...
+ F(11,2) * y**8 + x/(2*y))
...
>>>
f2(77617.0, 33096.0)
1.1726039400531787
>>>
f2(77617, 33096)
-0.82739605994682131
>>>
x,y = 77617, 33096
>>> t1 = (333 + F(3,4)
- x**2)* y**6 + x**2
>>> t1
Fraction(-7917110903377385049079188231255750815,
1)
>>> t2 = (11* x**2 * y**2 - 121 * y**4
- 2)
>>> t2
-72586759091903445314
>>>
t3 = F(11,2) * y**8 + x/(2*y)
>>> t3
7.9171113406689609e+36
>>> style="white-space: nowrap; text-align: left; vertical-align: top; width: 50%;">Python
2.6rc2 (r26rc2:66507, Sep 18 2008, 14:27:33) [MSC v.1500 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license()"
for more information.
IDLE
2.6rc2
>>>
from fractions import Fraction as F
>>>
def f(x,y):
    return (

    (333.75 - x**2)* y**6 + x**2 *

    (11* x**2 * y**2 - 121 * y**4 - 2)

    + 5.5 * y**8 + x/(2*y))

>>>
f(77617.0, 33096.0)
1.1726039400531787
>>>
def f2(x,y):
    return (

    (333 + F(3,4) - x**2)* y**6 + x**2 *

    (11* x**2 * y**2 - 121 * y**4 - 2)

    + F(11,2) * y**8 + x/(2*y))

>>>
f2(77617.0, 33096.0)
1.1726039400531787
>>>
f2(77617, 33096)
style="background-color: rgb(255, 102, 102); font-weight: bold;">Fraction(-1,
1)
>>> x,y = 77617, 33096
>>>
t1 = (333 + F(3,4) - x**2)* y**6 + x**2
>>>
t1
Fraction(-7917110903377385049079188231255750815, 1)
>>>
t2 = (11* x**2 * y**2 - 121 * y**4 - 2)
>>>
t2
-72586759091903445314L
>>> t3
= F(11,2) * y**8 + x/(2*y)
>>> t3
style="background-color: rgb(255, 102, 102); font-weight: bold;">Fraction(7917111340668961361101134701524942849,
1)
>>>

Sunday, August 10, 2008

Monty Hall Problem Simulations

I
had started a href="http://www.rosettacode.org/wiki/Monty_Hall_simulation">new
topic on Rosetta Code on the Monty Hall problem, using an
example I had laying around that I had written in Python to simulate
the effects of different player strategies. The game goes like this:

The
contestant in in front of three doors that he cannot see
behind..
The three doors conceal one
prize and the rest being booby prizes, arranged randomly.
The
Host asks the contestant to choose a door.
The host
then goes behind the doors where only he can see what is concealed,
then always
opens one
door, out of the other s not chosen by the contestant, that
must reveal a booby prize to the contestant.
The
host then asks the contestant if he would like either to stick with his
previous choice, or switch and choose the other remaining closed door.

It
turns out that if the contestant follows a strategy of always switching
when asked, then he will maximise his chances of winning..

I
then re-wrote the simulator in AWK (gawk), so there are now
at least two implementations of the simulator on Rosetta Code.,

The
simulator results show that:

A strategy of
never switching wins 1/3rd of the time.
A strategy
of randomly switching wins 1/2 of the time.
A
strategy of always switching wins 2/3rds of the time.

style="margin-left: 40px;">

Tuesday, August 05, 2008

Spiral

This
exercise I thought up myself after doing href="http://paddy3118.blogspot.com/2008/08/zig-zag.html">ZigZag.
"
Produce a spiral array.. A spiral array is a square arrangement of the
first N² integers, where the numbers increase
sequentially as you go around the edges of the array
spiralling inwards.

For example, given 5, produce
this array:

style="font-family: monospace;"> 0
1 2 3 4 style="font-family: monospace;"> style="font-family: monospace;">15 16 17 18 5 style="font-family: monospace;"> style="font-family: monospace;">14 23 24 19 6 style="font-family: monospace;"> style="font-family: monospace;">13 22 21 20 7 style="font-family: monospace;"> style="font-family: monospace;">12 11 10
9 8

The
Algorithm

Again I choose to define the x axis as the column
numbers increasing linearly left to right; and the y axis being the row
numbers increasing linearly top to bottom; and with the top left corner
being (0,0).
If you think of the outer part of the spiral, it
can be defined as how x and y change, i.e. deltas. First along the top
row dx,dy =
(1,0) meaning that for successive places x changes by 1
and y does not.

Similarly
for the next straight run , down the RHS: style="font-family: monospace;"> dx,dy = (0,1).
Along
the bottom: dx,dy =
(-1,0).
And back to the top along the LHS: style="font-family: monospace;">dx,dy = (0,-1).

The
next time around the loop, dx,dy take on those same values.

When
to stop going in a straight line and change direction?
style="margin-left: 40px;">

When you
would go off the edge of the array
If you are about
to 'step' on a position that is already filled.

When
to stop?

When
you have placed the last integer.

The Program

I have two ways of generating successive dx,dy
values; a formula in lines 19 and 34, and a commented out use of
itertools.cycle in lines 14,17,18 and 33 which could be used instead.
They generate the same sequence.
When to change direction is
done by initializing myarray with None values in line 22 and the
conditional statement at line 29
When to stop is handled by
the outer while loop at line 24.

I added a flag at
line 15 that allows you to trace the calculation

 1 '''
 color="#804040"> 2 Author: Donald 'Paddy' McCarthy
 color="#804040"> 3 Email: paddy3118@gmail.com
 color="#804040"> 4 File: spiral.py
 color="#804040"> 5 
 color="#804040"> 6 Routine creates a square array filled with numbers that increase
 color="#804040"> 7 in a spiral pattern into the center.
 color="#804040"> 8 
 color="#804040"> 9 Note: for spiralling out just swap array[x,y] for n**2-1-array[x,y]
 color="#804040">10 Note: Vim colourizing plus 
 color="#804040">11        :%s/\(&nbsp;&nbsp;\)\?  / color="#6a5acd">\1\&nbsp;\&nbsp;/g
 color="#804040">12 '''
 color="#804040">13 
 color="#804040">14 #from itertools import cycle
 color="#804040">15 explain = True
 color="#804040">16 def  color="#008080">spiral(n):
 color="#804040">17      color="#0000ff">#deltas = cycle( ((1,0),(0,1),(-1,0),(0,-1)) )
 color="#804040">18      color="#0000ff">#dx,dy = deltas.next() # Starting increments
 color="#804040">19     dx,dy = 1,0             color="#0000ff"># Starting increments
 color="#804040">20     x,y = 0,0               color="#0000ff"># Starting location
 color="#804040">21     i = 0                   color="#0000ff"># starting value
 color="#804040">22     myarray = [[None]* n  color="#804040">for j  color="#804040">in range(n)]
 color="#804040">23      color="#804040">if explain:  color="#804040">print " color="#ff00ff">dx,dy ->", dx,dy
 color="#804040">24      color="#804040">while i < n**2:
 color="#804040">25         myarray[x][y] = i
 color="#804040">26          color="#804040">if explain:  color="#804040">print " color="#ff00ff">x,y->[x,y]", x,y, myarray[x][y]
 color="#804040">27         i += 1
 color="#804040">28         nx,ny = x+dx, y+dy
 color="#804040">29          color="#804040">if 0<=nx<n  color="#804040">and 0<=ny<n  color="#804040">and myarray[nx][ny] == None:
 color="#804040">30             x,y = nx,ny
 color="#804040">31              color="#804040">continue
 color="#804040">32          color="#804040">else:
 color="#804040">33              color="#0000ff">#dx,dy = deltas.next()
 color="#804040">34             dx,dy = (-dy,dx)  color="#804040">if dy  color="#804040">else (dy,dx)
 color="#804040">35              color="#804040">if explain:  color="#804040">print " color="#ff00ff">dx,dy ->", dx,dy
 color="#804040">36             x,y = x+dx, y+dy
 color="#804040">37      color="#804040">return myarray
 color="#804040">38 def  color="#008080">printspiral(myarray):
 color="#804040">39     n = range(len(myarray))
 color="#804040">40      color="#804040">for y  color="#804040">in n:
 color="#804040">41          color="#804040">for x  color="#804040">in n:
 color="#804040">42              color="#804040">print " color="#ff00ff">%2i" % myarray[x][y],
 color="#804040">43          color="#804040">print
 color="#804040">44 
 color="#804040">45 printspiral(spiral(6))

Sample
output:

style="font-family: monospace;">>>>
printspiral(spiral(4)) style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 1 0 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 0 0 0 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 1 0 1 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 2 0 2 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 3 0 3 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 0 1 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 3 1 4 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 3 2 5 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 3 3 6 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> -1 0 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 2 3 7 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 1 3 8 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 0 3 9 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 0 -1 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 0 2 10 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 0 1 11 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 1 0 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 1 1 12 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 2 1 13 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 0 1 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 2 2 14 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> -1 0 style="font-family: monospace;"> style="font-family: monospace;">x,y->[x,y] 1 2 15 style="font-family: monospace;"> style="font-family: monospace;">dx,dy -> 0 -1 style="font-family: monospace;"> style="font-family: monospace;"> 0
1 2 3 style="font-family: monospace;"> style="font-family: monospace;">11 12 13 4 style="font-family: monospace;"> style="font-family: monospace;">10 15 14 5 style="font-family: monospace;"> style="font-family: monospace;"> 9
8 7 6 style="font-family: monospace;"> style="font-family: monospace;">>>>

I have an interesting problem at work which has been taxing me before
the Easter break. One of the less traditional ways forward is to write
a toggle count utility - something to take a simulation of a
hardware design and count which nets transition both to a zero and a
one in the simulation. For various reasons I could not get the design
to simulate in a modern version of a simulator with in-built toggle
counting without having to rebuild a C-based testbench for a later
version of an OS and a simulator that a major part of the TB is not
certified on.

I decided to spend my Easter Monday writing a VCD
to toggle-count
utility.

I started by searching my C: drive for a non-trivial vcd file and found
a ~200k/300 net sample generated from the Icarus Verilog program. I
then googled for the file format and found the IEEE 1394-2001 document
and went for that. I googled for ready-made utilities too and found an
interesting document on mining VCD files and parsing RTL to generate
all sorts of coverage metrics, but the VCD readers I found were in C
and, e.g. GTKWave
, and didn't seem to clarify how to write a vcd reader. (OK,
specifically - the liberal sprinkling of goto's in GTKWave sources put
me off, there - I've said it :-).

I moved on to read the IEEE spec, peruse my example vcd files, reject
writing an evcd (extended VCD), parser as I could
later extend a VCD parser, and I think the works simulator has enough
controls to tailor the VCD generated to be what I want.

The VCD files I will eventually have to work on may be a gigabyte in
size so I want to step through the file one token at a time gathering
stats as I go, and my reading of the VCD file format seemed to show
that that was possible, so the fist line of Python I wrote was this
line:

175   tokeniser = (word for line in f for word in line.split() if word)

The above generator
comprehension, just gives successive words from the VCD file
without storing a huge chunk of the file in memory. tokeniser worked
originally with :

154 keyword2handler = {
155   # declaration_keyword ::=
156   "$comment":        drop_declaration,
157   "$date":           vcd_date,
158   "$enddefinitions": vcd_enddefinitions,
159   "$scope":          vcd_scope,
160   "$timescale":      vcd_timescale,
161   "$upscope":        vcd_upscope,
162   "$var":            vcd_var,
163   "$version":        vcd_version,
164   # simulation_keyword ::=
165   "$dumpall":        vcd_dumpall,
166   "$dumpoff":        vcd_dumpoff,
167   "$dumpon":         vcd_dumpon,
168   "$dumpvars":       vcd_dumpvars,
169   "$end":            vcd_end,
170   }

and the 'switch statement':

176   for count,token in enumerate(tokeniser):
181       keyword2handler[token](tokeniser, token)

... to form the guts of the program but when I had fleshed out all the
declaration keyword handlers beyond mere stubs, I decided to
add the second half of the parse routine as I realised that I would
skip all the simulation keywords and only needed to concentrate on the
first character of a token to determine what to do. This led to the
finished form of function vcd_toggle_count

I had thought that the rules for left-expanding a VCD value to a larger
word size might lead to a compact Python solution and i was pleased
with my result, which I tested in the Python interpreter as:

>>> for
number in "10 X10 ZX0 0X10".split():
...
extend = size-len(number)
...
print "%-4s -> %s" % (number,
('0' if number[0]=='1' else number[0])*extend + number)
...
10
-> 0010
X10
-> XX10
ZX0
-> ZZX0
0X10 -> 0X10

.This became:

 81       extend = stats.size - len(number)
 82       if extend:
 83         number = ('0' if number[0]=='1' else number[0])*extend + number

Well, enough prattling on about how I wrote the program, the full
source is below, and I'm off to work to to try it for real. It should
work, and since I have done no optimisations for speed as yet , I am
confident that I could get acceptable run-times out of its
variants to handle the gigabyte VCD file if I get one.

  1 #!python
  2 '''
  3  Extract toggle count from vcd file
  4 
  5   Refer to IEEE standard 1364 2001
  6   (http://inst.eecs.berkeley.edu/~cs150/ProtectedDocs/verilog-ieee.pdf)
  7 
  8   Author Donald 'Paddy' McCarthy (C) 24 March 2008
  9 '''
 10 
 11 from __future__ import with_statement
 12 from itertools import dropwhile, takewhile, izip
 13 from collections import defaultdict
 14 from pprint import pprint as pp
 15 
 16 vcdfile = r"C:\cygwin\home\HP DV8025EA\tmp\ivtest_v1.0\test_div16.vcd"
 17 
 18 class VCD(object):
 19   def __init__(self):
 20     self.scope = []
 21     self.idcode2references = defaultdict(list)
 22     self.reference2idcode = dict()
 23     self.enddefinitions = False
 24     self.id2stats = dict()  # Maps id to its accumulated statistics
 25   def textstats(self):
 26     total, updown, uponly, downonly = 0,0,0,0
 27     out = []
 28     for ref in sorted(self.reference2idcode.keys()):
 29       id = self.reference2idcode[ref]
 30       stats = self.id2stats[id]
 31       if stats.size == 1:
 32         total +=1
 33         if stats.zero2one and stats.one2zero:
 34           updown +=1
 35           covered = 'PASS'
 36         elif stats.zero2one:
 37           uponly +=1
 38           covered = 'FAIL0'
 39         elif stats.one2zero:
 40           downonly +=1
 41           covered = 'FAIL1'
 42         else:
 43           covered = 'FAIL10'
 44         out.append( "  %-50s %s" % ( '"'+".".join(x[1] for x in ref)+'":', (covered, stats.zero2one, stats.one2zero)) )
 45       else:
 46         total += stats.size
 47         for count, (one2zero, zero2one) in enumerate(izip(stats.one2zero, stats.zero2one)):
 48           if zero2one and one2zero:
 49             updown +=1
 50             covered = 'PASS'
 51           elif zero2one:
 52             uponly +=1
 53             covered = 'FAIL0'
 54           elif stats.one2zero:
 55             downonly +=1
 56             covered = 'FAIL1'
 57           else:
 58             covered = 'FAIL10'
 59           name = ".".join( x[1] for x in (ref+(('BIT:','<'+str(count)+'>'),)) )
 60           out.append( "  %-50s %s" % ( '"'+name+'":', (covered, zero2one, one2zero)) )
 61     header = "# TOGGLE REPORT: %g %%, %i / %i covered. %i up-only, %i down-only." % (
 62       updown/1.0/total*100, updown, total, uponly, downonly )
 63     body = "toggle={\n" + "\n".join(out) + '\n  }'
 64     return header, body
 65 
 66   def scaler_value_change(self, value, id):
 67     if value in '01' :
 68       stats = self.id2stats[id]
 69       if not stats.value:
 70         stats.value = value
 71       elif stats.value != value:
 72         stats.value = value
 73         if value == '0':
 74           stats.one2zero +=1
 75         else:
 76           stats.zero2one +=1
 77 
 78   def vector_value_change(self, format, number, id):
 79     if format == 'b':
 80       stats = self.id2stats[id]
 81       extend = stats.size - len(number)
 82       if extend:
 83         number = ('0' if number[0]=='1' else number[0])*extend + number
 84       newdigit, newone2zero, newzero2one = [],[],[]
 85       for digit, olddigit, one2zero, zero2one in izip(number, stats.value, stats.one2zero, stats.zero2one):
 86         if digit in '01' and olddigit and olddigit != digit:
 87           if digit == '0':
 88             one2zero +=1
 89           else:
 90             zero2one +=1
 91         elif digit not in '01':
 92           digit = olddigit
 93         newdigit.append(digit)
 94         newone2zero.append(one2zero)
 95         newzero2one.append(zero2one)
 96       stats.value, stats.one2zero, stats.zero2one = newdigit, newone2zero, newzero2one
 97 
 98 
 99 class IdStats(object):
100   def __init__(self, size):
101     size = int(size)
102     self.size = size
103     if size ==1:
104       self.value = ''
105       self.zero2one = 0
106       self.one2zero = 0
107     else:
108       # stats for each bit
109       self.value       = ['' for x in range(size)]
110       self.zero2one = [0 for x in range(size)]
111       self.one2zero = [0 for x in range(size)]
112   def __repr__(self):
113     return "<IdStats: " + repr((self.size, self.value, self.zero2one, self.one2zero)) + ">"
114 
115 
116 vcd = VCD()
117 
118 def parse_error(tokeniser, keyword):
119   raise "Don't understand keyword: " + keyword
120 
121 def drop_declaration(tokeniser, keyword):
122   dropwhile(lambda x: x != "$end", tokeniser).next()
123 
124 def save_declaration(tokeniser, keyword):
125   vcd.__setattr__(keyword.lstrip('$'),
126                   " ".join( takewhile(lambda x: x != "$end", tokeniser)) )
127 vcd_date      = save_declaration
128 vcd_timescale = save_declaration
129 vcd_version   = save_declaration
130 
131 def vcd_enddefinitions(tokeniser, keyword):
132   vcd.enddefinitions = True
133   drop_declaration(tokeniser, keyword)
134 def vcd_scope(tokeniser, keyword):
135   vcd.scope.append( tuple(takewhile(lambda x: x != "$end", tokeniser)))
136 def vcd_upscope(tokeniser, keyword):
137   vcd.scope.pop()
138   tokeniser.next()
139 def vcd_var(tokeniser, keyword):
140   var_type, size, identifier_code, reference = tuple(takewhile(lambda x: x != "$end", tokeniser))
141   reference = vcd.scope + [('var', reference)]
142   vcd.idcode2references[identifier_code].append( (var_type, size, reference))
143   vcd.reference2idcode[tuple(reference)] = identifier_code
144   vcd.id2stats[identifier_code] = IdStats(size)
145 def vcd_dumpall(tokeniser, keyword): pass
146 def vcd_dumpoff(tokeniser, keyword): pass
147 def vcd_dumpon(tokeniser, keyword): pass
148 def vcd_dumpvars(tokeniser, keyword): pass
149 def vcd_end(tokeniser, keyword):
150   if not vcd.enddefinitions:
151     parse_error(tokeniser, keyword)
152 
153 
154 keyword2handler = {
155   # declaration_keyword ::=
156   "$comment":        drop_declaration,
157   "$date":           vcd_date,
158   "$enddefinitions": vcd_enddefinitions,
159   "$scope":          vcd_scope,
160   "$timescale":      vcd_timescale,
161   "$upscope":        vcd_upscope,
162   "$var":            vcd_var,
163   "$version":        vcd_version,
164   # simulation_keyword ::=
165   "$dumpall":        vcd_dumpall,
166   "$dumpoff":        vcd_dumpoff,
167   "$dumpon":         vcd_dumpon,
168   "$dumpvars":       vcd_dumpvars,
169   "$end":            vcd_end,
170   }
171 keyword2handler = defaultdict(parse_error, keyword2handler)
172 
173 def vcd_toggle_count(vcdfile):
174   f = open(vcdfile)
175   tokeniser = (word for line in f for word in line.split() if word)
176   for count,token in enumerate(tokeniser):
177     if not vcd.enddefinitions:
178       # definition section
179       if token != '$var':
180         print token
181       keyword2handler[token](tokeniser, token)
182     else:
183       if count % 10000 == 0:
184         print count, "\r",
185       c, rest = token[0], token[1:]
186       if c == '$':
187         # skip $dump* tokens and $end tokens in sim section
188         continue
189       elif c == '#':
190         vcd.now = rest
191       elif c in '01xXzZ':
192         vcd.scaler_value_change(value=c, id=rest)
193       elif c in 'bBrR':
194         vcd.vector_value_change(format=c.lower(), number=rest, id=tokeniser.next())
195       else:
196         raise "Don't understand: %s After %i words" % (token, count)
197   print count
198   f.close()
199 
200 vcd_toggle_count(vcdfile)
201 header, body = vcd.textstats()
202 print '\n'+header+'\n\n'+body+'\n'

STOP PRESS!
I'm back from work and the program worked with minor changes:

I used the fileinput module to allow greater flexibility in specifying the input VCD file.
Works, simulator had a slightly different interpretation of the spec around the definition of $var. (The spec needs to explicitely mark where spaces can/must occur).
I missed adding commas to separate the output lines which should form a valid Python dict.

With an unchanged core algorithm the program churned through 200Mbytes of VCD file in 3 minutes. 2 gigs in 30 minutes is fine for me.

Monday, December 29, 2008

Thursday, December 04, 2008

Tuesday, December 02, 2008

Sunday, November 30, 2008

Saturday, November 15, 2008

Sunday, November 02, 2008

Thursday, October 30, 2008

Tuesday, October 21, 2008

Its my Data

Friday, October 03, 2008

Tuesday, September 30, 2008

Thursday, September 25, 2008

Tuesday, September 23, 2008

Sunday, August 10, 2008

Tuesday, August 05, 2008

TheAlgorithm

The Program

Sampleoutput:

Monday, August 04, 2008

The code:

Sunday, August 03, 2008

Saturday, July 26, 2008

The Task

The Problem

Initial Thoughts

Scratching the Itch

So Fast. Must Check.

Endpoint

Friday, July 04, 2008

Saturday, June 21, 2008

Friday, June 13, 2008

Saturday, May 24, 2008

His point 1:

His point 2:

His point 3:

His point 4:

Saturday, April 26, 2008

Friday, April 11, 2008

Tuesday, March 25, 2008

About Me

Followers

Subscribe Now: google

Go deh too!

whos.amung.us

Blog Archive

The
Algorithm

Sample
output: