I have an interesting problem at work which has been taxing me before
the Easter break. One of the less traditional ways forward is to write
a toggle count utility - something to take a simulation of a
hardware design and count which nets transition both to a zero and a
one in the simulation. For various reasons I could not get the design
to simulate in a modern version of a simulator with in-built toggle
counting without having to rebuild a C-based testbench for a later
version of an OS and a simulator that a major part of the TB is not
certified on.
I decided to spend my Easter Monday writing a
VCDto
toggle-countutility.
I started by searching my C: drive for a non-trivial vcd file and found
a ~200k/300 net sample generated from the Icarus Verilog program. I
then googled for the file format and found the IEEE 1394-2001 document
and went for that. I googled for ready-made utilities too and found an
interesting
document on mining VCD files and parsing RTL to generate
all sorts of coverage metrics, but the VCD readers I found were in C
and, e.g.
GTKWave, and didn't seem to clarify how to write a vcd reader. (OK,
specifically - the liberal sprinkling of goto's in GTKWave sources put
me off, there - I've said it :-).
I moved on to read the IEEE spec, peruse my example vcd files, reject
writing an evcd (extended VCD), parser as I could
later extend a VCD parser, and I think the works simulator has enough
controls to tailor the VCD generated to be what I want.
The VCD files I will eventually have to work on may be a gigabyte in
size so I want to step through the file one token at a time gathering
stats as I go, and my reading of the VCD file format seemed to show
that that was possible, so the fist line of Python I wrote was this
line:
175 tokeniser = (word for line in f for word in line.split() if word)
The above
generator
comprehension, just gives successive words from the VCD file
without storing a huge chunk of the file in memory. tokeniser worked
originally with :
154 keyword2handler = {
155 # declaration_keyword ::=
156 "$comment": drop_declaration,
157 "$date": vcd_date,
158 "$enddefinitions": vcd_enddefinitions,
159 "$scope": vcd_scope,
160 "$timescale": vcd_timescale,
161 "$upscope": vcd_upscope,
162 "$var": vcd_var,
163 "$version": vcd_version,
164 # simulation_keyword ::=
165 "$dumpall": vcd_dumpall,
166 "$dumpoff": vcd_dumpoff,
167 "$dumpon": vcd_dumpon,
168 "$dumpvars": vcd_dumpvars,
169 "$end": vcd_end,
170 }
and the 'switch statement':
176 for count,token in enumerate(tokeniser):
181 keyword2handler[token](tokeniser, token)
... to form the guts of the program but when I had fleshed out all the
declaration keyword handlers beyond mere stubs, I decided to
add the second half of the parse routine as I realised that I would
skip all the simulation keywords and only needed to concentrate on the
first character of a token to determine what to do. This led to the
finished form of function vcd_toggle_count
I had thought that the rules for left-expanding a VCD value to a larger
word size might lead to a compact Python solution and i was pleased
with my result, which I tested in the Python interpreter as:
>>> for
number in "10 X10 ZX0 0X10".split():
...
extend = size-len(number)
...
print "%-4s -> %s" % (number,
('0' if number[0]=='1' else number[0])*extend + number)
...
10
-> 0010
X10
-> XX10
ZX0
-> ZZX0
0X10 -> 0X10
.This became:
81 extend = stats.size - len(number)
82 if extend:
83 number = ('0' if number[0]=='1' else number[0])*extend + number
Well, enough prattling on about how I wrote the program, the full
source is below, and I'm off to work to to try it for real. It should
work, and since I have done no optimisations for speed as yet , I am
confident that I could get acceptable run-times out of its
variants to handle the gigabyte VCD file if I get one.
1 #!python
2 '''
3 Extract toggle count from vcd file
4
5 Refer to IEEE standard 1364 2001
6 (http://inst.eecs.berkeley.edu/~cs150/ProtectedDocs/verilog-ieee.pdf)
7
8 Author Donald 'Paddy' McCarthy (C) 24 March 2008
9 '''
10
11 from __future__ import with_statement
12 from itertools import dropwhile, takewhile, izip
13 from collections import defaultdict
14 from pprint import pprint as pp
15
16 vcdfile = r"C:\cygwin\home\HP DV8025EA\tmp\ivtest_v1.0\test_div16.vcd"
17
18 class VCD(object):
19 def __init__(self):
20 self.scope = []
21 self.idcode2references = defaultdict(list)
22 self.reference2idcode = dict()
23 self.enddefinitions = False
24 self.id2stats = dict() # Maps id to its accumulated statistics
25 def textstats(self):
26 total, updown, uponly, downonly = 0,0,0,0
27 out = []
28 for ref in sorted(self.reference2idcode.keys()):
29 id = self.reference2idcode[ref]
30 stats = self.id2stats[id]
31 if stats.size == 1:
32 total +=1
33 if stats.zero2one and stats.one2zero:
34 updown +=1
35 covered = 'PASS'
36 elif stats.zero2one:
37 uponly +=1
38 covered = 'FAIL0'
39 elif stats.one2zero:
40 downonly +=1
41 covered = 'FAIL1'
42 else:
43 covered = 'FAIL10'
44 out.append( " %-50s %s" % ( '"'+".".join(x[1] for x in ref)+'":', (covered, stats.zero2one, stats.one2zero)) )
45 else:
46 total += stats.size
47 for count, (one2zero, zero2one) in enumerate(izip(stats.one2zero, stats.zero2one)):
48 if zero2one and one2zero:
49 updown +=1
50 covered = 'PASS'
51 elif zero2one:
52 uponly +=1
53 covered = 'FAIL0'
54 elif stats.one2zero:
55 downonly +=1
56 covered = 'FAIL1'
57 else:
58 covered = 'FAIL10'
59 name = ".".join( x[1] for x in (ref+(('BIT:','<'+str(count)+'>'),)) )
60 out.append( " %-50s %s" % ( '"'+name+'":', (covered, zero2one, one2zero)) )
61 header = "# TOGGLE REPORT: %g %%, %i / %i covered. %i up-only, %i down-only." % (
62 updown/1.0/total*100, updown, total, uponly, downonly )
63 body = "toggle={\n" + "\n".join(out) + '\n }'
64 return header, body
65
66 def scaler_value_change(self, value, id):
67 if value in '01' :
68 stats = self.id2stats[id]
69 if not stats.value:
70 stats.value = value
71 elif stats.value != value:
72 stats.value = value
73 if value == '0':
74 stats.one2zero +=1
75 else:
76 stats.zero2one +=1
77
78 def vector_value_change(self, format, number, id):
79 if format == 'b':
80 stats = self.id2stats[id]
81 extend = stats.size - len(number)
82 if extend:
83 number = ('0' if number[0]=='1' else number[0])*extend + number
84 newdigit, newone2zero, newzero2one = [],[],[]
85 for digit, olddigit, one2zero, zero2one in izip(number, stats.value, stats.one2zero, stats.zero2one):
86 if digit in '01' and olddigit and olddigit != digit:
87 if digit == '0':
88 one2zero +=1
89 else:
90 zero2one +=1
91 elif digit not in '01':
92 digit = olddigit
93 newdigit.append(digit)
94 newone2zero.append(one2zero)
95 newzero2one.append(zero2one)
96 stats.value, stats.one2zero, stats.zero2one = newdigit, newone2zero, newzero2one
97
98
99 class IdStats(object):
100 def __init__(self, size):
101 size = int(size)
102 self.size = size
103 if size ==1:
104 self.value = ''
105 self.zero2one = 0
106 self.one2zero = 0
107 else:
108 # stats for each bit
109 self.value = ['' for x in range(size)]
110 self.zero2one = [0 for x in range(size)]
111 self.one2zero = [0 for x in range(size)]
112 def __repr__(self):
113 return "<IdStats: " + repr((self.size, self.value, self.zero2one, self.one2zero)) + ">"
114
115
116 vcd = VCD()
117
118 def parse_error(tokeniser, keyword):
119 raise "Don't understand keyword: " + keyword
120
121 def drop_declaration(tokeniser, keyword):
122 dropwhile(lambda x: x != "$end", tokeniser).next()
123
124 def save_declaration(tokeniser, keyword):
125 vcd.__setattr__(keyword.lstrip('$'),
126 " ".join( takewhile(lambda x: x != "$end", tokeniser)) )
127 vcd_date = save_declaration
128 vcd_timescale = save_declaration
129 vcd_version = save_declaration
130
131 def vcd_enddefinitions(tokeniser, keyword):
132 vcd.enddefinitions = True
133 drop_declaration(tokeniser, keyword)
134 def vcd_scope(tokeniser, keyword):
135 vcd.scope.append( tuple(takewhile(lambda x: x != "$end", tokeniser)))
136 def vcd_upscope(tokeniser, keyword):
137 vcd.scope.pop()
138 tokeniser.next()
139 def vcd_var(tokeniser, keyword):
140 var_type, size, identifier_code, reference = tuple(takewhile(lambda x: x != "$end", tokeniser))
141 reference = vcd.scope + [('var', reference)]
142 vcd.idcode2references[identifier_code].append( (var_type, size, reference))
143 vcd.reference2idcode[tuple(reference)] = identifier_code
144 vcd.id2stats[identifier_code] = IdStats(size)
145 def vcd_dumpall(tokeniser, keyword): pass
146 def vcd_dumpoff(tokeniser, keyword): pass
147 def vcd_dumpon(tokeniser, keyword): pass
148 def vcd_dumpvars(tokeniser, keyword): pass
149 def vcd_end(tokeniser, keyword):
150 if not vcd.enddefinitions:
151 parse_error(tokeniser, keyword)
152
153
154 keyword2handler = {
155 # declaration_keyword ::=
156 "$comment": drop_declaration,
157 "$date": vcd_date,
158 "$enddefinitions": vcd_enddefinitions,
159 "$scope": vcd_scope,
160 "$timescale": vcd_timescale,
161 "$upscope": vcd_upscope,
162 "$var": vcd_var,
163 "$version": vcd_version,
164 # simulation_keyword ::=
165 "$dumpall": vcd_dumpall,
166 "$dumpoff": vcd_dumpoff,
167 "$dumpon": vcd_dumpon,
168 "$dumpvars": vcd_dumpvars,
169 "$end": vcd_end,
170 }
171 keyword2handler = defaultdict(parse_error, keyword2handler)
172
173 def vcd_toggle_count(vcdfile):
174 f = open(vcdfile)
175 tokeniser = (word for line in f for word in line.split() if word)
176 for count,token in enumerate(tokeniser):
177 if not vcd.enddefinitions:
178 # definition section
179 if token != '$var':
180 print token
181 keyword2handler[token](tokeniser, token)
182 else:
183 if count % 10000 == 0:
184 print count, "\r",
185 c, rest = token[0], token[1:]
186 if c == '$':
187 # skip $dump* tokens and $end tokens in sim section
188 continue
189 elif c == '#':
190 vcd.now = rest
191 elif c in '01xXzZ':
192 vcd.scaler_value_change(value=c, id=rest)
193 elif c in 'bBrR':
194 vcd.vector_value_change(format=c.lower(), number=rest, id=tokeniser.next())
195 else:
196 raise "Don't understand: %s After %i words" % (token, count)
197 print count
198 f.close()
199
200 vcd_toggle_count(vcdfile)
201 header, body = vcd.textstats()
202 print '\n'+header+'\n\n'+body+'\n'
STOP PRESS!I'm back from work and the program worked with minor changes:
- I used the fileinput module to allow greater flexibility in specifying the input VCD file.
- Works, simulator had a slightly different interpretation of the spec around the definition of $var. (The spec needs to explicitely mark where spaces can/must occur).
- I missed adding commas to separate the output lines which should form a valid Python dict.
With an unchanged core algorithm the program churned through 200Mbytes of VCD file in 3 minutes. 2 gigs in 30 minutes is fine for me.
Hi,
I thought I'd go through your tutorial on Python and make some notes if thats OK.
"No ++ / --":
I think this was the same kind of decision as Python not allowing assignments in if conditionals. There is a a high instance of problems caused by confusion over the misuse of increment/decrement operators in languages like C. They can make it harder to read a program.
"Slow in comparison to compiled languages" - (Here comes the dynamic language supporters usual reply): Yes, but often getting something right is much more important and often never requires the speep that Python gives you. People have prototyped in Python and shipped the prototype as it fits all quality metrics in a far shorter development time. I recently had a case of that myself here.
"Whitespace for indentation": Arguably a positive feature. It should read "whitespace for block delineation" or something. Run a pretty printer on say a C source file and it will try and indent it so that the visible level of indentation mirrors the logical structure of the code. Code is hard to read if the indentation does not follow the logic. Python recognises this and goes further in saying that the common use of block start and end delimeters such as {/} or begin/end are superfluous and only help in writing misleading code.
You seem to have missed the importance of docstrings in Python.
range "(good for indexing)": True, but in idiomatic Python you are much more likely to iterate over the members of a collection object rather than create an index using range:
for x in my_list:
rather than:
for i in range(len(my_list)):
....x = my_list[i]
"Functions work in python much like they do any other language": those with only Java and C experience might miss out on the extended argument passing of Python functions if not made aware.
"Classes are more basic than what you will be use to from java": In what way? There maybe less to learn to become equally proficient but I see this as a good thing. Note that Python is dynamic, meaning that, if you wanted to, you can modify classes and instances at run-time. Its not necessarily a X language does classes better than Y. Python does things differently and if you try and constrain Python to how you use OO in Java, you won't get the best from it.
"from os import *": It's a bad Python habit. You want to warn students off this, as maintenance can suffer.
"We can raise our own exceptions": It could be but shouldn't be a string. best to use an exception type.
"Numbers": There are decimals too, which are good for financial calculations.
"Tuples": If each position in a sequence has explicit meaning then you should use a tuple, e.g. a point on a surface might be the tuple (x,y). coordinates of a car would more likely be a list of points.
"Strings": Misses triple quoted strings.
Filter and map, although present, have declined in use due to the later addition of the powerful list comprehension and generator comprehension features.
One-liners in Python: Difficult to do. Usually people write multi-line scripts in an editor. In a shell such as bash which has multi-line string delimeters you might be able to use a multi-line single quote for bash and embed a multi-line python program using Pythons -c argument and restricting your Python to double quotes if needed.
Have fun with Python.
- Paddy.
Ouch!
My notes are longer than your blog entry :-)