Go deh!: Writing a VCD to toggle-count generator in Python

I have an interesting problem at work which has been taxing me before
the Easter break. One of the less traditional ways forward is to write
a toggle count utility - something to take a simulation of a
hardware design and count which nets transition both to a zero and a
one in the simulation. For various reasons I could not get the design
to simulate in a modern version of a simulator with in-built toggle
counting without having to rebuild a C-based testbench for a later
version of an OS and a simulator that a major part of the TB is not
certified on.

I decided to spend my Easter Monday writing a VCD
to toggle-count
utility.

I started by searching my C: drive for a non-trivial vcd file and found
a ~200k/300 net sample generated from the Icarus Verilog program. I
then googled for the file format and found the IEEE 1394-2001 document
and went for that. I googled for ready-made utilities too and found an
interesting document on mining VCD files and parsing RTL to generate
all sorts of coverage metrics, but the VCD readers I found were in C
and, e.g. GTKWave
, and didn't seem to clarify how to write a vcd reader. (OK,
specifically - the liberal sprinkling of goto's in GTKWave sources put
me off, there - I've said it :-).

I moved on to read the IEEE spec, peruse my example vcd files, reject
writing an evcd (extended VCD), parser as I could
later extend a VCD parser, and I think the works simulator has enough
controls to tailor the VCD generated to be what I want.

The VCD files I will eventually have to work on may be a gigabyte in
size so I want to step through the file one token at a time gathering
stats as I go, and my reading of the VCD file format seemed to show
that that was possible, so the fist line of Python I wrote was this
line:

175   tokeniser = (word for line in f for word in line.split() if word)

The above generator
comprehension, just gives successive words from the VCD file
without storing a huge chunk of the file in memory. tokeniser worked
originally with :

154 keyword2handler = {
155   # declaration_keyword ::=
156   "$comment":        drop_declaration,
157   "$date":           vcd_date,
158   "$enddefinitions": vcd_enddefinitions,
159   "$scope":          vcd_scope,
160   "$timescale":      vcd_timescale,
161   "$upscope":        vcd_upscope,
162   "$var":            vcd_var,
163   "$version":        vcd_version,
164   # simulation_keyword ::=
165   "$dumpall":        vcd_dumpall,
166   "$dumpoff":        vcd_dumpoff,
167   "$dumpon":         vcd_dumpon,
168   "$dumpvars":       vcd_dumpvars,
169   "$end":            vcd_end,
170   }

and the 'switch statement':

176   for count,token in enumerate(tokeniser):
181       keyword2handler[token](tokeniser, token)

... to form the guts of the program but when I had fleshed out all the
declaration keyword handlers beyond mere stubs, I decided to
add the second half of the parse routine as I realised that I would
skip all the simulation keywords and only needed to concentrate on the
first character of a token to determine what to do. This led to the
finished form of function vcd_toggle_count

I had thought that the rules for left-expanding a VCD value to a larger
word size might lead to a compact Python solution and i was pleased
with my result, which I tested in the Python interpreter as:

>>> for
number in "10 X10 ZX0 0X10".split():
...
extend = size-len(number)
...
print "%-4s -> %s" % (number,
('0' if number[0]=='1' else number[0])*extend + number)
...
10
-> 0010
X10
-> XX10
ZX0
-> ZZX0
0X10 -> 0X10

.This became:

 81       extend = stats.size - len(number)
 82       if extend:
 83         number = ('0' if number[0]=='1' else number[0])*extend + number

Well, enough prattling on about how I wrote the program, the full
source is below, and I'm off to work to to try it for real. It should
work, and since I have done no optimisations for speed as yet , I am
confident that I could get acceptable run-times out of its
variants to handle the gigabyte VCD file if I get one.

  1 #!python
  2 '''
  3  Extract toggle count from vcd file
  4 
  5   Refer to IEEE standard 1364 2001
  6   (http://inst.eecs.berkeley.edu/~cs150/ProtectedDocs/verilog-ieee.pdf)
  7 
  8   Author Donald 'Paddy' McCarthy (C) 24 March 2008
  9 '''
 10 
 11 from __future__ import with_statement
 12 from itertools import dropwhile, takewhile, izip
 13 from collections import defaultdict
 14 from pprint import pprint as pp
 15 
 16 vcdfile = r"C:\cygwin\home\HP DV8025EA\tmp\ivtest_v1.0\test_div16.vcd"
 17 
 18 class VCD(object):
 19   def __init__(self):
 20     self.scope = []
 21     self.idcode2references = defaultdict(list)
 22     self.reference2idcode = dict()
 23     self.enddefinitions = False
 24     self.id2stats = dict()  # Maps id to its accumulated statistics
 25   def textstats(self):
 26     total, updown, uponly, downonly = 0,0,0,0
 27     out = []
 28     for ref in sorted(self.reference2idcode.keys()):
 29       id = self.reference2idcode[ref]
 30       stats = self.id2stats[id]
 31       if stats.size == 1:
 32         total +=1
 33         if stats.zero2one and stats.one2zero:
 34           updown +=1
 35           covered = 'PASS'
 36         elif stats.zero2one:
 37           uponly +=1
 38           covered = 'FAIL0'
 39         elif stats.one2zero:
 40           downonly +=1
 41           covered = 'FAIL1'
 42         else:
 43           covered = 'FAIL10'
 44         out.append( "  %-50s %s" % ( '"'+".".join(x[1] for x in ref)+'":', (covered, stats.zero2one, stats.one2zero)) )
 45       else:
 46         total += stats.size
 47         for count, (one2zero, zero2one) in enumerate(izip(stats.one2zero, stats.zero2one)):
 48           if zero2one and one2zero:
 49             updown +=1
 50             covered = 'PASS'
 51           elif zero2one:
 52             uponly +=1
 53             covered = 'FAIL0'
 54           elif stats.one2zero:
 55             downonly +=1
 56             covered = 'FAIL1'
 57           else:
 58             covered = 'FAIL10'
 59           name = ".".join( x[1] for x in (ref+(('BIT:','<'+str(count)+'>'),)) )
 60           out.append( "  %-50s %s" % ( '"'+name+'":', (covered, zero2one, one2zero)) )
 61     header = "# TOGGLE REPORT: %g %%, %i / %i covered. %i up-only, %i down-only." % (
 62       updown/1.0/total*100, updown, total, uponly, downonly )
 63     body = "toggle={\n" + "\n".join(out) + '\n  }'
 64     return header, body
 65 
 66   def scaler_value_change(self, value, id):
 67     if value in '01' :
 68       stats = self.id2stats[id]
 69       if not stats.value:
 70         stats.value = value
 71       elif stats.value != value:
 72         stats.value = value
 73         if value == '0':
 74           stats.one2zero +=1
 75         else:
 76           stats.zero2one +=1
 77 
 78   def vector_value_change(self, format, number, id):
 79     if format == 'b':
 80       stats = self.id2stats[id]
 81       extend = stats.size - len(number)
 82       if extend:
 83         number = ('0' if number[0]=='1' else number[0])*extend + number
 84       newdigit, newone2zero, newzero2one = [],[],[]
 85       for digit, olddigit, one2zero, zero2one in izip(number, stats.value, stats.one2zero, stats.zero2one):
 86         if digit in '01' and olddigit and olddigit != digit:
 87           if digit == '0':
 88             one2zero +=1
 89           else:
 90             zero2one +=1
 91         elif digit not in '01':
 92           digit = olddigit
 93         newdigit.append(digit)
 94         newone2zero.append(one2zero)
 95         newzero2one.append(zero2one)
 96       stats.value, stats.one2zero, stats.zero2one = newdigit, newone2zero, newzero2one
 97 
 98 
 99 class IdStats(object):
100   def __init__(self, size):
101     size = int(size)
102     self.size = size
103     if size ==1:
104       self.value = ''
105       self.zero2one = 0
106       self.one2zero = 0
107     else:
108       # stats for each bit
109       self.value       = ['' for x in range(size)]
110       self.zero2one = [0 for x in range(size)]
111       self.one2zero = [0 for x in range(size)]
112   def __repr__(self):
113     return "<IdStats: " + repr((self.size, self.value, self.zero2one, self.one2zero)) + ">"
114 
115 
116 vcd = VCD()
117 
118 def parse_error(tokeniser, keyword):
119   raise "Don't understand keyword: " + keyword
120 
121 def drop_declaration(tokeniser, keyword):
122   dropwhile(lambda x: x != "$end", tokeniser).next()
123 
124 def save_declaration(tokeniser, keyword):
125   vcd.__setattr__(keyword.lstrip('$'),
126                   " ".join( takewhile(lambda x: x != "$end", tokeniser)) )
127 vcd_date      = save_declaration
128 vcd_timescale = save_declaration
129 vcd_version   = save_declaration
130 
131 def vcd_enddefinitions(tokeniser, keyword):
132   vcd.enddefinitions = True
133   drop_declaration(tokeniser, keyword)
134 def vcd_scope(tokeniser, keyword):
135   vcd.scope.append( tuple(takewhile(lambda x: x != "$end", tokeniser)))
136 def vcd_upscope(tokeniser, keyword):
137   vcd.scope.pop()
138   tokeniser.next()
139 def vcd_var(tokeniser, keyword):
140   var_type, size, identifier_code, reference = tuple(takewhile(lambda x: x != "$end", tokeniser))
141   reference = vcd.scope + [('var', reference)]
142   vcd.idcode2references[identifier_code].append( (var_type, size, reference))
143   vcd.reference2idcode[tuple(reference)] = identifier_code
144   vcd.id2stats[identifier_code] = IdStats(size)
145 def vcd_dumpall(tokeniser, keyword): pass
146 def vcd_dumpoff(tokeniser, keyword): pass
147 def vcd_dumpon(tokeniser, keyword): pass
148 def vcd_dumpvars(tokeniser, keyword): pass
149 def vcd_end(tokeniser, keyword):
150   if not vcd.enddefinitions:
151     parse_error(tokeniser, keyword)
152 
153 
154 keyword2handler = {
155   # declaration_keyword ::=
156   "$comment":        drop_declaration,
157   "$date":           vcd_date,
158   "$enddefinitions": vcd_enddefinitions,
159   "$scope":          vcd_scope,
160   "$timescale":      vcd_timescale,
161   "$upscope":        vcd_upscope,
162   "$var":            vcd_var,
163   "$version":        vcd_version,
164   # simulation_keyword ::=
165   "$dumpall":        vcd_dumpall,
166   "$dumpoff":        vcd_dumpoff,
167   "$dumpon":         vcd_dumpon,
168   "$dumpvars":       vcd_dumpvars,
169   "$end":            vcd_end,
170   }
171 keyword2handler = defaultdict(parse_error, keyword2handler)
172 
173 def vcd_toggle_count(vcdfile):
174   f = open(vcdfile)
175   tokeniser = (word for line in f for word in line.split() if word)
176   for count,token in enumerate(tokeniser):
177     if not vcd.enddefinitions:
178       # definition section
179       if token != '$var':
180         print token
181       keyword2handler[token](tokeniser, token)
182     else:
183       if count % 10000 == 0:
184         print count, "\r",
185       c, rest = token[0], token[1:]
186       if c == '$':
187         # skip $dump* tokens and $end tokens in sim section
188         continue
189       elif c == '#':
190         vcd.now = rest
191       elif c in '01xXzZ':
192         vcd.scaler_value_change(value=c, id=rest)
193       elif c in 'bBrR':
194         vcd.vector_value_change(format=c.lower(), number=rest, id=tokeniser.next())
195       else:
196         raise "Don't understand: %s After %i words" % (token, count)
197   print count
198   f.close()
199 
200 vcd_toggle_count(vcdfile)
201 header, body = vcd.textstats()
202 print '\n'+header+'\n\n'+body+'\n'

STOP PRESS!
I'm back from work and the program worked with minor changes:

I used the fileinput module to allow greater flexibility in specifying the input VCD file.
Works, simulator had a slightly different interpretation of the spec around the definition of $var. (The spec needs to explicitely mark where spaces can/must occur).
I missed adding commas to separate the output lines which should form a valid Python dict.

With an unchanged core algorithm the program churned through 200Mbytes of VCD file in 3 minutes. 2 gigs in 30 minutes is fine for me.

6 comments:

Paddy3118Wed Mar 26, 04:42:00 am
Some more comments on comp.lang.verilog
tom loftusFri Sept 19, 10:21:00 pm
For fun, I tried this vcd reader on a large vcd file ( 2GBytes, 650k $vars), and it choked.
The code does go out of its way with iterators to avoid reading the whole original file into memory, but it then builds internal data structures (idcode2references, reference2idcode, id2stats) that take up more memory than the original file ( I gave up at 2.5 Gbytes ).
So this needs some other approach for very large files. I dont actually need the toggle count for my app, I just want to read and apply the vcd values in a a testbench, so I will just turn off the toggle count part, ( so thanks for contributing the rest! ) but if I was going to adapt it, I would slurp the whole vcd header in to a single data structure, and just append the needed toggle data to that.
MRThu Jun 04, 08:35:00 am
Thanks a lot for developing this utility.Can yoy give an example of how to run this file.For me it shows Syntax errors when I copy the script and run
document finderTue Apr 17, 07:20:00 am
Expert!It's useful to me
thanks
AnonymousWed Jul 17, 05:17:00 am
I tried taking this a bit further and making it more generic/ reusable. Thanks for sharing your initial parser https://github.com/GordonMcGregor/vcd_parser
UnknownFri Sept 21, 02:46:00 pm
Thanks for sharing your script and your experience. when I ran the python file, the following result is appeared for all the val which shows that the script is not work well for my vcd files because of each wire, there are lots of zero2one and one2zero, but the covered value is equal to FAIL10. Could you please tell me what is the exact problem?
"TOP.chip_top.Rocket.ioNetwork.ClientTileLinkNetworkPort.io_network_grant_bits_payload_manager_xact_id": ('FAIL10', 0, 0)

Go deh!

Tuesday, March 25, 2008

Writing a VCD to toggle-count generator in Python

6 comments:

About Me

Followers

Subscribe Now: google

Go deh too!

whos.amung.us

Blog Archive