Sunday, October 20, 2019

Indent datastructure for trees

I was browsing StackOverflow and came across a question that mentioned a new-to-me format for a datatructure for holding a tree of data.I am well used to the (name, list_of_children) set of interconnected node datastructures way of doing things, but this mentioned where you
  1. Create an empty list
  2.  then for each node starting at the root which has a depth of zero:
    1. add the (depth, name) tuple of the node to the list
    2. visit all this nodes child node.
It is a preorder traversal of the conceptual tree, aggregating (depth,  name) tuples into a list to form what I am calling the indent tree datastructure as it captures all the information of the tree but in a different datastructure than normal, and can be extended to allow data at each node and might be a useful alternative for DB storage of trees.

An example tree:





Its indent format:

[(0, 'R'),
 (1, 'a'),
 (2, 'c'),
 (3, 'd'),
 (3, 'h'),
 (1, 'b'),
 (2, 'e'),
 (3, 'f'),
 (4, 'g'),
 (2, 'i')]

Indented representation:

If you print out successive names from the indent format list above, one per line, with indent from the left of the indent value, then you get a nice textual regpresentation of the tree; expanded left-to-right rather than the top-down representation of the graphic:


R
  a
    c
      d
      h
  b
    e
      f
        g
    i


Code

I wrote some code to manipulate and traverse this kind of tree datastructure, as well as to use graphviz to draw graphical representations.


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
import graphviz as gv
from collections import defaultdict
import re
from pprint import pprint as pp


#%%
node_attr = {at: val for at, val in 
             (line.strip().split() for line in '''
shape      record
fontsize   12
height     0.1
width      0.1
rankdir    LR
ranksep    0.25
nodesep    0.25
            '''.strip().split('\n'))}
edge_attr = {at: val for at, val in 
             (line.strip().split() for line in '''
arrowhead  none
minlen     1
            '''.strip().split('\n'))}
root_attr = {at: val for at, val in 
             (line.strip().split() for line in '''
fontcolor  green
color      brown
fontname   bold
            '''.strip().split('\n'))}
#%%


def _pc2indent(pc, root=None, indent=None, children=None):
    "parent-child dependency dict to indent format conversion"
    if root is None:
        parents = set(pc)
        kids = set(sum(pc.values(), []))
        root = parents - kids
        assert len(root) == 1, f"Need exactly one root: {root}"
        root = root.pop()
    if indent is None:
        indent = 0
    if children is None:
        children = []
    children.append((indent, root))
    if root in pc:
        for child in pc[root]:
            pc2indent(pc, child, indent+1, children)
    return children

def dot2indent(tree):
    "simple dot format to indent format translator"
    depends = defaultdict(list)
    for matchobj in re.finditer(r'\s+(\S+)\s*->\s*(\S+)\s', tree.source):
        parent, child = matchobj.groups()
        depends[parent].append(child)
    parent2child = dict(depends)
    return _pc2indent(parent2child)

#%%
def _highlight(txt):
    return f'#{txt}#'

def pp_indent(data, hlight=None, indent='  '):
    "simple printout of indent format tree with optional node highlighting"
    if hlight is None:
        hlight = set()
    for level, name in data:
        print(indent * level 
              + (_highlight(name) if name in hlight else name))

#%%
def indent2dot(lst):
    tree = gv.Digraph(name='indent2dot', node_attr=node_attr)
    tree.edge_attr.update(**edge_attr)
    levelparent = {}
    for level, name in lst:
        levelparent[level] = name
        if level - 1 in levelparent:
            tree.edge(levelparent[level-1], name)
        else:
            tree.node(name, _attributes=root_attr)
    return tree

#%%
def printwithspace(i):
    print(i, end=' ')

def preorder(tree, visitor=printwithspace):
    for indent, name in tree:
        visitor(name)

def levelorder(tree, reverse_depth=False, reverse_in_level=False, 
               visitor=printwithspace):
    if not tree:
        return
    indent2name = defaultdict(list)
    mx = -1
    for indent, name in tree:
        if indent > mx:
            mx = indent
        indent2name[indent].append(name)
    if reverse_in_level:
        for names in indent2name.values():
            names.reverse()
    if not reverse_depth:
        for indent in range(0, mx + 1):
            for name in indent2name[indent]:
                visitor(name)
    else:
        for indent in range(mx, -1, -1):
            for name in indent2name[indent]:
                visitor(name)
    
    
#%%
# Example tree
ex1    = [(0, '1'),
          (1, '2'), 
          (2, '4'), 
          (3, '7'), 
          (2, '5'), 
          (1, '3'),
          (2, '6'),
          (3, '8'), 
          (3, '9'),
          (3, '10'),
          ]

#%%
if __name__ == '__main__':
    print('A tree in indent datastructure format:')
    pp(ex1)
    print('\nSame tree, printed as indented list:')
    pp_indent(ex1)
    print('\nSame tree, drawn by graphviz:')
    display(indent2dot(ex1))  # display works in spyder/Jupyter

    print('\nSame tree, preorder traversal:')
    preorder(ex1)
    print()
    print('Same tree, levelorder traversal:')
    levelorder(ex1)
    print()
    print('Same tree, reverse_depth levelorder traversal:')
    levelorder(ex1, True)
    print()
    print('Same tree, reverse_depth, reverse_in_level levelorder traversal:')
    levelorder(ex1, True, True)
    print()
    print('Same tree, depth_first, reverse_in_level levelorder traversal:')
    levelorder(ex1, False, True)
    print()

Output:



A tree in indent datastructure format:
[(0, '1'),
(1, '2'),
(2, '4'),
(3, '7'),
(2, '5'),
(1, '3'),
(2, '6'),
(3, '8'),
(3, '9'),
(3, '10')]

Same tree, printed as indented list:

1
    2
        4
            7
        5
    3
        6
            8
            9
            10

Same tree, drawn by graphviz:


Same tree, preorder traversal:
1 2 4 7 5 3 6 8 9 10
Same tree, levelorder traversal:
1 2 3 4 5 6 7 8 9 10
Same tree, reverse_depth levelorder traversal:
7 8 9 10 4 5 6 2 3 1
Same tree, reverse_depth, reverse_in_level levelorder traversal:
10 9 8 7 6 5 4 3 2 1
Same tree, depth_first, reverse_in_level levelorder traversal:
1 3 2 6 5 4 10 9 8 7

In [47]:



Sunday, October 13, 2019

ISO26262:: Some thoughts on Specs.

It seems as if there are many engineers who can do, but who struggle with doing the documentation.
In safety flows, everything comes from the specification - the designer designs to the spec., the verification team verify that the spec. is correctly implemented.
  • The spec. is central.
  • The spec. is a document written by engineers.
  • In modern system-on-chip designs, the spec. is complex.
There's an aphorism, (yanked from the zen of Python),  that states "complex is better than complicated" which I take to mean that you need to look deeper for the structure in multi-facetted, intricate systems. How you find and present/document that structure can make a wealth of difference to how well, and how quickly the system is understood.

Structured data

Let's call those that develop  specs Concept Engineers. Concept engineers will have their modelling tools, scripts, and spreadsheets, etc that they use to converge on the correct design that they
proceed to write up as the spec. Those tools often create a wealth of structured data such as:
  • Register maps
  • Register field descriptions
  • Memory maps
  • Top level bus maps
  • ...
The Concept engineers will/can produce structured data of many formats, such as XML, JSON, SQL DB's, CSV files, REST format web DB's.

When writing the spec: Parts can be automated by using these concept pre-spec data sources to generate aspects.

When implementing and verifying the design: Parts can be automated by using these data sources.

What is often left out is: ISO26262 mandates that all data come from the spec.

What is the spec?

I assume that the specification must be print friendly and understandable. You need to convince auditors that what is delivered is an expression of the spec. I am assuming that expressing all of the concept-stage structured data in textual form and appending reams of xml as an appendix is unacceptable. You need register tables, state machine diagrams, truth tables, ...
We have standard ways of expressing our technical concepts that are expected to be used.  In the world of safety, they need to be built upon.

Round tripping

If an item is auto generated from structured date for use by the design verification team, then:
  1. The spec. should contain that data.
  2. Create a script that can scrape the end spec. format (usually a PDF), and regenrate the structured data - identical enough so that a simple textual diff will show they are the same.

Scraping aids:

(I.e. aids to scrape data from a specification. Like web-scraping does for HTML)
  • I usually take the spec in PDF as a format for reading. Luckily, if you check with your PDF generation tools, there are PDF readers that can convert PDF's to spreadsheets, In my current case, PDF text lines appear as spreadsheet rows with the text line all in the leftmost column. PDF table columns appear in multiple cells of rows, and each PDF page is a separate sheet of the workbook.
    Python, and other languages have libraries that can read spreadsheets.
  • The "original" structured data generated from concept engineering tools  should be "pretty printed" and ordered  before use in downstraem flows to allow easier textual comparison by diff'ing or other simple means.
  • When generating sections of text for a spec from structured data pre-tag that section.
    Pre tagging means adding a recognised word or sequence of words immediately before the generated spec data that denotes the format of that data chunk. For example, it could be a new line starting REGISTER:: that must allways start a register definition with its fields arranged, in-order, inside a 31-to-16, then 15-to-0 annottated horizontal table; then a table with headers of maybe "field, bitrange, type, comment"; defined text specifying register features; ...
    That pre-tag format should be used throughout the document and should not detract from how it reads. I use an example of a word followed by double colons above; a hash, '#' followed by a word, (a hashtag), would work too, but choose a format and stick to it.
    Making the tag immediately precede what is tagged and having data fields with the same tag be expressed in the same format aids scraping enormousely. (And reading too).
  • Show the structure in the items: If a pre-tagged item, such as a register name has a range of values then show a parameterised name and use a named index then show how the index links to properties of the register (in this case), that are designed to vary with the index, e.g. register offset, any of the registers bitfields, reset values, ...
    Don't expand the index in the spec. as valuable information may be lost. It may seem easier if the index has only two values, to add two sepearate "expanded" entries, but then their inter-relationships and the very fact that they are related, must then be inferred rather than being given.
    Different , separate, parameterised items may then share the same named index and index range to show extra information

Scraping benefits

  • Data duplication in specs: Explaining a topic might need the mention of a tagged item before items of that type are all shown, for example specific registers before the section where all registers are shown as part of the register map. When the spec is scraped, the scraper can make sure multiple definition are equivalent.
  • Scraped data can be used to regenerate the concept data used in design and verification flows before the spec was finalised to ensure the spec is correct. (Or those tools rerun on the specs scraped data).
  • By thinking of scraping needs, you are forced to think about finding the patterns in the data, and ensuring th spec is complete.

Diagrams

They are too much bother to try and scrape I find. I'd cut down on expressing Concept data used to generate verification and design parts as only diagrams in the spec. Most diagrams do have textul countrparts such as netlists. Creative layout of textual , parsable "code", may suffice or be used in addition to the diagram that ahows the same - scraping might then just point out an area for manual checking.

When you just can't get away from that long list

Lets say you have have 1024 identical register in all but name and "power domain". You could use a parameterised name for the register type, an alias for each indexed instance of that register type. The data format for registers needs to expand to have a field pointing to a named, pre-tagged  data table that would map each index to its alias and power domain. (Of course, if the table is never very wide, then it could be doubled-up on the page to save space and make the spec. more presentable - someone will read it.

END.

(There are other aspects of spec writing I need to write about in further posts.).