Mainly Tech projects on Python and Electronic Design Automation.

Tuesday, March 25, 2008

Writing a VCD to toggle-count generator in Python

I have an interesting problem at work which has been taxing me before
the Easter break. One of the less traditional ways forward is to write
a toggle count utility - something to take a simulation of a
hardware design and count which nets transition both to a zero and a
one in the simulation. For various reasons I could not get the design
to simulate in a modern version of a simulator with in-built toggle
counting without having to rebuild a C-based testbench for a later
version of an OS and a simulator that a major part of the TB is not
certified on.



I decided to spend my Easter Monday writing a VCD
to toggle-count
utility.



I started by searching my C: drive for a non-trivial vcd file and found
a ~200k/300 net sample generated from the Icarus Verilog program. I
then googled for the file format and found the IEEE 1394-2001 document
and went for that. I googled for ready-made utilities too and found an
interesting document on mining VCD files and parsing RTL to generate
all sorts of coverage metrics, but the VCD readers I found were in C
and, e.g. GTKWave
, and didn't seem to clarify how to write a vcd reader. (OK,
specifically - the liberal sprinkling of goto's in GTKWave sources put
me off, there - I've said it :-).



I moved on to read the IEEE spec, peruse my example vcd files, reject
writing an evcd (extended VCD), parser as I could
later extend a VCD parser, and I think the works simulator has enough
controls to tailor the VCD generated to be what I want.



The VCD files I will eventually have to work on may be a gigabyte in
size so I want to step through the file one token at a time gathering
stats as I go, and my reading of the VCD file format seemed to show
that that was possible, so the fist line of Python I wrote was this
line:



175   tokeniser = (word for line in f for word in line.split() if word)


The above generator
comprehension
, just gives successive words from the VCD file
without storing a huge chunk of the file in memory. tokeniser worked
originally with :

154 keyword2handler = {
155 # declaration_keyword ::=
156 "$comment": drop_declaration,
157 "$date": vcd_date,
158 "$enddefinitions": vcd_enddefinitions,
159 "$scope": vcd_scope,
160 "$timescale": vcd_timescale,
161 "$upscope": vcd_upscope,
162 "$var": vcd_var,
163 "$version": vcd_version,
164 # simulation_keyword ::=
165 "$dumpall": vcd_dumpall,
166 "$dumpoff": vcd_dumpoff,
167 "$dumpon": vcd_dumpon,
168 "$dumpvars": vcd_dumpvars,
169 "$end": vcd_end,
170 }

and the 'switch statement':

176   for count,token in enumerate(tokeniser):
181 keyword2handler[token](tokeniser, token)

... to form the guts of the program but when I had fleshed out all the
declaration keyword handlers beyond mere stubs, I decided to
add the second half of the parse routine as I realised that I would
skip all the simulation keywords and only needed to concentrate on the
first character of a token to determine what to do. This led to the
finished form of function vcd_toggle_count



I had thought that the rules for left-expanding a VCD value to a larger
word size might lead to a compact Python solution and i was pleased
with my result, which I tested in the Python interpreter as:

>>> for
number in "10 X10 ZX0 0X10".split():

...
extend = size-len(number)

...
print "%-4s -> %s" % (number,
('0' if number[0]=='1' else number[0])*extend + number)

...
10
-> 0010

X10
-> XX10

ZX0
-> ZZX0

0X10 -> 0X10


.This became:


 81       extend = stats.size - len(number)
82 if extend:
83 number = ('0' if number[0]=='1' else number[0])*extend + number




Well, enough prattling on about how I wrote the program, the full
source is below, and I'm off to work to to try it for real. It should
work, and since I have done no optimisations for speed as yet , I am
confident that I could get acceptable run-times out of its
variants to handle the gigabyte VCD file if I get one.



  1 #!python
2 '''
3 Extract toggle count from vcd file
4
5 Refer to IEEE standard 1364 2001
6 (http://inst.eecs.berkeley.edu/~cs150/ProtectedDocs/verilog-ieee.pdf)
7
8 Author Donald 'Paddy' McCarthy (C) 24 March 2008
9 '''
10
11 from __future__ import with_statement
12 from itertools import dropwhile, takewhile, izip
13 from collections import defaultdict
14 from pprint import pprint as pp
15
16 vcdfile = r"C:\cygwin\home\HP DV8025EA\tmp\ivtest_v1.0\test_div16.vcd"
17
18 class VCD(object):
19 def __init__(self):
20 self.scope = []
21 self.idcode2references = defaultdict(list)
22 self.reference2idcode = dict()
23 self.enddefinitions = False
24 self.id2stats = dict() # Maps id to its accumulated statistics
25 def textstats(self):
26 total, updown, uponly, downonly = 0,0,0,0
27 out = []
28 for ref in sorted(self.reference2idcode.keys()):
29 id = self.reference2idcode[ref]
30 stats = self.id2stats[id]
31 if stats.size == 1:
32 total +=1
33 if stats.zero2one and stats.one2zero:
34 updown +=1
35 covered = 'PASS'
36 elif stats.zero2one:
37 uponly +=1
38 covered = 'FAIL0'
39 elif stats.one2zero:
40 downonly +=1
41 covered = 'FAIL1'
42 else:
43 covered = 'FAIL10'
44 out.append( " %-50s %s" % ( '"'+".".join(x[1] for x in ref)+'":', (covered, stats.zero2one, stats.one2zero)) )
45 else:
46 total += stats.size
47 for count, (one2zero, zero2one) in enumerate(izip(stats.one2zero, stats.zero2one)):
48 if zero2one and one2zero:
49 updown +=1
50 covered = 'PASS'
51 elif zero2one:
52 uponly +=1
53 covered = 'FAIL0'
54 elif stats.one2zero:
55 downonly +=1
56 covered = 'FAIL1'
57 else:
58 covered = 'FAIL10'
59 name = ".".join( x[1] for x in (ref+(('BIT:','<'+str(count)+'>'),)) )
60 out.append( " %-50s %s" % ( '"'+name+'":', (covered, zero2one, one2zero)) )
61 header = "# TOGGLE REPORT: %g %%, %i / %i covered. %i up-only, %i down-only." % (
62 updown/1.0/total*100, updown, total, uponly, downonly )
63 body = "toggle={\n" + "\n".join(out) + '\n }'
64 return header, body
65
66 def scaler_value_change(self, value, id):
67 if value in '01' :
68 stats = self.id2stats[id]
69 if not stats.value:
70 stats.value = value
71 elif stats.value != value:
72 stats.value = value
73 if value == '0':
74 stats.one2zero +=1
75 else:
76 stats.zero2one +=1
77
78 def vector_value_change(self, format, number, id):
79 if format == 'b':
80 stats = self.id2stats[id]
81 extend = stats.size - len(number)
82 if extend:
83 number = ('0' if number[0]=='1' else number[0])*extend + number
84 newdigit, newone2zero, newzero2one = [],[],[]
85 for digit, olddigit, one2zero, zero2one in izip(number, stats.value, stats.one2zero, stats.zero2one):
86 if digit in '01' and olddigit and olddigit != digit:
87 if digit == '0':
88 one2zero +=1
89 else:
90 zero2one +=1
91 elif digit not in '01':
92 digit = olddigit
93 newdigit.append(digit)
94 newone2zero.append(one2zero)
95 newzero2one.append(zero2one)
96 stats.value, stats.one2zero, stats.zero2one = newdigit, newone2zero, newzero2one
97
98
99 class IdStats(object):
100 def __init__(self, size):
101 size = int(size)
102 self.size = size
103 if size ==1:
104 self.value = ''
105 self.zero2one = 0
106 self.one2zero = 0
107 else:
108 # stats for each bit
109 self.value = ['' for x in range(size)]
110 self.zero2one = [0 for x in range(size)]
111 self.one2zero = [0 for x in range(size)]
112 def __repr__(self):
113 return "<IdStats: " + repr((self.size, self.value, self.zero2one, self.one2zero)) + ">"
114
115
116 vcd = VCD()
117
118 def parse_error(tokeniser, keyword):
119 raise "Don't understand keyword: " + keyword
120
121 def drop_declaration(tokeniser, keyword):
122 dropwhile(lambda x: x != "$end", tokeniser).next()
123
124 def save_declaration(tokeniser, keyword):
125 vcd.__setattr__(keyword.lstrip('$'),
126 " ".join( takewhile(lambda x: x != "$end", tokeniser)) )
127 vcd_date = save_declaration
128 vcd_timescale = save_declaration
129 vcd_version = save_declaration
130
131 def vcd_enddefinitions(tokeniser, keyword):
132 vcd.enddefinitions = True
133 drop_declaration(tokeniser, keyword)
134 def vcd_scope(tokeniser, keyword):
135 vcd.scope.append( tuple(takewhile(lambda x: x != "$end", tokeniser)))
136 def vcd_upscope(tokeniser, keyword):
137 vcd.scope.pop()
138 tokeniser.next()
139 def vcd_var(tokeniser, keyword):
140 var_type, size, identifier_code, reference = tuple(takewhile(lambda x: x != "$end", tokeniser))
141 reference = vcd.scope + [('var', reference)]
142 vcd.idcode2references[identifier_code].append( (var_type, size, reference))
143 vcd.reference2idcode[tuple(reference)] = identifier_code
144 vcd.id2stats[identifier_code] = IdStats(size)
145 def vcd_dumpall(tokeniser, keyword): pass
146 def vcd_dumpoff(tokeniser, keyword): pass
147 def vcd_dumpon(tokeniser, keyword): pass
148 def vcd_dumpvars(tokeniser, keyword): pass
149 def vcd_end(tokeniser, keyword):
150 if not vcd.enddefinitions:
151 parse_error(tokeniser, keyword)
152
153
154 keyword2handler = {
155 # declaration_keyword ::=
156 "$comment": drop_declaration,
157 "$date": vcd_date,
158 "$enddefinitions": vcd_enddefinitions,
159 "$scope": vcd_scope,
160 "$timescale": vcd_timescale,
161 "$upscope": vcd_upscope,
162 "$var": vcd_var,
163 "$version": vcd_version,
164 # simulation_keyword ::=
165 "$dumpall": vcd_dumpall,
166 "$dumpoff": vcd_dumpoff,
167 "$dumpon": vcd_dumpon,
168 "$dumpvars": vcd_dumpvars,
169 "$end": vcd_end,
170 }
171 keyword2handler = defaultdict(parse_error, keyword2handler)
172
173 def vcd_toggle_count(vcdfile):
174 f = open(vcdfile)
175 tokeniser = (word for line in f for word in line.split() if word)
176 for count,token in enumerate(tokeniser):
177 if not vcd.enddefinitions:
178 # definition section
179 if token != '$var':
180 print token
181 keyword2handler[token](tokeniser, token)
182 else:
183 if count % 10000 == 0:
184 print count, "\r",
185 c, rest = token[0], token[1:]
186 if c == '$':
187 # skip $dump* tokens and $end tokens in sim section
188 continue
189 elif c == '#':
190 vcd.now = rest
191 elif c in '01xXzZ':
192 vcd.scaler_value_change(value=c, id=rest)
193 elif c in 'bBrR':
194 vcd.vector_value_change(format=c.lower(), number=rest, id=tokeniser.next())
195 else:
196 raise "Don't understand: %s After %i words" % (token, count)
197 print count
198 f.close()
199
200 vcd_toggle_count(vcdfile)
201 header, body = vcd.textstats()
202 print '\n'+header+'\n\n'+body+'\n'

STOP PRESS!
I'm back from work and the program worked with minor changes:
  1. I used the fileinput module to allow greater flexibility in specifying the input VCD file.
  2. Works, simulator had a slightly different interpretation of the spec around the definition of $var. (The spec needs to explicitely mark where spaces can/must occur).
  3. I missed adding commas to separate the output lines which should form a valid Python dict.
With an unchanged core algorithm the program churned through 200Mbytes of VCD file in 3 minutes. 2 gigs in 30 minutes is fine for me.

5 comments:

  1. For fun, I tried this vcd reader on a large vcd file ( 2GBytes, 650k $vars), and it choked.
    The code does go out of its way with iterators to avoid reading the whole original file into memory, but it then builds internal data structures (idcode2references, reference2idcode, id2stats) that take up more memory than the original file ( I gave up at 2.5 Gbytes ).
    So this needs some other approach for very large files. I dont actually need the toggle count for my app, I just want to read and apply the vcd values in a a testbench, so I will just turn off the toggle count part, ( so thanks for contributing the rest! ) but if I was going to adapt it, I would slurp the whole vcd header in to a single data structure, and just append the needed toggle data to that.

    ReplyDelete
  2. Thanks a lot for developing this utility.Can yoy give an example of how to run this file.For me it shows Syntax errors when I copy the script and run

    ReplyDelete
  3. I tried taking this a bit further and making it more generic/ reusable. Thanks for sharing your initial parser https://github.com/GordonMcGregor/vcd_parser

    ReplyDelete

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us