# Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

## Saturday, March 28, 2009

### Huffman Encoding in Python

Huffman encoding came up on Rosetta Code.
Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bits used to encode a typical string of of those symbols.
For example, if you use letters as symbols and have details of the frequency of occurence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings.
Any string of letters will be encoded as a string of bits that are no-longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'.
If you were to assign a code 01 for 'e' and code 011 for 'x', then if the bits to decode started as 011... then you would not know iif you should decode an 'e' or an 'x'.
The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have less bits in their encoding. (See the WP article for more information).
A Huffman encoding can be computed by first creating a tree of nodes:
 Create a leaf node for each symbol and add it to the priority queue. While there is more than one node in the queue: Remove the node of highest priority (lowest probability) twice to get two nodes. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. Add the new node to the queue. The remaining node is the root node and the tree is complete. Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeroes and ones at each leaf constitute a Huffman encoding for those symbols and weights.

## In Python

I orginally gave an an example that matched a definition that was later found to be insufficient, so substituted my own definition above.. My first Python solution  on RC to the wrong definition, did have the advantage, (as I saw it), of not having to traverse a tree.
My 'true' Huffman code creator assembles each symbol and its weight into the following structure initially (the leaf structure):
[ weight, [ symbol, []]]
The weight applies to every (in this case only one), of the [symbol, []] pairs after it in the same list.
The empty list is used to accumulate the Huffman code for the symbol as we manipulate the heap, without having to walk a constructed tree structure.

There are two types of input to the program that I am running examples with:

1. A string of space separated symbol, weight pairs, as used in small examples.
2. A sample of text for which letters and letter frequencies are extracted.
The if statement at line 23  allows me to switch between the two types of input whilst exploring the algorithm.
The tutor argument to the encode function shows what is happening in the loop around the heap pops
``` 1
2 from heapq import heappush, heappop, heapify
3
4 def codecreate(symbol2weights, tutor= False):
5     ''' Huffman encode the given dict mapping symbols to weights '''
6     heap = [ [float(wt), [sym, []]] for sym, wt in symbol2weights.iteritems() ]
7     heapify(heap)
8     if tutor: print "ENCODING:", sorted(symbol2weights.iteritems())
9     while len(heap) >1:
10         lo = heappop(heap)
11         hi = heappop(heap)
12         if tutor: print "  COMBINING:", lo, '\n        AND:', hi
13         for i in lo[1:]: i[1].insert(0, '0')
14         for i in hi[1:]: i[1].insert(0, '1')
15         lohi = [ lo[0] + hi[0] ] + lo[1:] + hi[1:]
16         if tutor: print "  PRODUCING:", lohi, '\n'
17         heappush(heap, lohi)
18     codes = heappop(heap)[1:]
19     for i in codes: i[1] = ''.join(i[1])
20     return sorted(codes, key=lambda x: (len(x[-1]), x))
21
22 # Input types
23 if 1:
24     readin = "B 25   C 2.5 D  12.5 A 5 \n"
25     #readin = "a .1 b .15 c .3 d .16 e .29" # Wikipedia sample
26     #readin = "a1 .4 a2 .35 a3 .2 a4 .05" # Wikipedia sample
27     #readin = "A 50 B 25 C 12.5 D 12.5" # RC example
28
30     symbol2weights = dict((symbol, wt)
31                          for symbol, wt in zip(cleaned[0::2], cleaned[1::2]) )
32 else:
33     astring = "this is an example for huffman encoding"
34     symbol2weights = dict((ch, astring.count(ch)) for ch in set(astring)) # for astring
35
36 huff = codecreate(symbol2weights, True)
37 print "\nSYMBOL\tWEIGHT\tHUFFMAN CODE"
38 for h in huff:
39     print "%s\t%s\t%s" % (h[0], symbol2weights[h[0]], h[1])```

A run, with the tutor enabled gives the following output:
ENCODING: [('A', '5'), ('B', '25'), ('C', '2.5'), ('D', '12.5')]
COMBINING: [2.5, ['C', []]]
AND: [5.0, ['A', []]]
PRODUCING: [7.5, ['C', ['0']], ['A', ['1']]]

COMBINING: [7.5, ['C', ['0']], ['A', ['1']]]
AND: [12.5, ['D', []]]
PRODUCING: [20.0, ['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]]

COMBINING: [20.0, ['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]]
AND: [25.0, ['B', []]]
PRODUCING: [45.0, ['C', ['0', '0', '0']], ['A', ['0', '0', '1']], ['D', ['0', '1']], ['B', ['1']]]

SYMBOL    WEIGHT    HUFFMAN CODE
B         25        1
D         12.5      01
A         5         001
C         2.5       000

## Encode/Decode Round-tripping

I realised that I could use a method similar to how I accumulate the codes in the heap loop, to generate a single function that can recognise a single symbol from the beginning of the encoded symbols. By using the function in a loop, I could regenerate the symbol list.

In the codecreate function, the leaf structure is modified:
[weight, [ [symbol, []] ], repr(sym)]
Their is an extra level of list around the [symbol, code accumulation list pair]  as well as a new item: 'repr(sym)' it is the third item in the outer list and will always be the  function to generate the item as accumulated so far.. This function starts off by returning just the symbol and an outer if/then/else expression is added as we go around the heap loop (see line 43)

After the outer expression is accumulated, it is turned into a lambda expression, the string eval'd, and assigned to a global variable (see line 47)

I use function probchoice (from earlier RC work), to create an arbitrary sequence of symbols to in the given weighting then encode and decode it as well as giving some stats on space saving.

```  1
2 from heapq import heappush, heappop, heapify
3 import random, bisect
4
5
6 # Helper routine for generating test sequences
7 def probchoice(items, probs):
8   '''\
9   Splits the interval 0.0-1.0 in proportion to probs
10   then finds where each random.random() choice lies
11   (This routine, probchoice, was released under the
12   GNU Free Documentation License 1.2)
13   '''
14
15   prob_accumulator = 0
16   accumulator = []
17   for p in probs:
18     prob_accumulator += p
19     accumulator.append(prob_accumulator)
20
21   while True:
22     r = random.random()
23     yield items[bisect.bisect(accumulator, r)]
24
25
26 # placeholder
27 decode = lambda : None
28
29 def codecreate(symbol2weights, tutor= False):
30     ''' Huffman encode the given dict mapping symbols to weights '''
31     global decode
32
33     heap = [ [float(wt), [[sym, []]], repr(sym)] for sym, wt in symbol2weights.iteritems() ]
34     heapify(heap)
35     if tutor: print "ENCODING:", sorted(symbol2weights.iteritems())
36     while len(heap) >1:
37         lo = heappop(heap)
38         hi = heappop(heap)
39         if tutor: print "  COMBINING:", lo, '\n        AND:', hi
40         for i in lo[1]: i[1].insert(0, '0')
41         for i in hi[1]: i[1].insert(0, '1')
42         lohi = [ lo[0] + hi[0] ] + [lo[1] + hi[1]]
43         lohi.append('(%s if nextbit() else %s)' % (hi[2], lo[2]))
44         if tutor: print "  PRODUCING:", lohi, '\n'
45         heappush(heap, lohi)
46     wt, codes, decoder = heappop(heap)
47     decode = eval('lambda :' + decoder, globals())
48     decode.__doc__ = decoder
49     for i in codes: i[1] = ''.join(i[1])
50     #for i in codes: i[::] = i[:2]
51     return sorted(codes, key=lambda x: (len(x[-1]), x))
52
53 # Input types
54 if 1:
55     tutor = True
56     sequencecount = 50
57     readin = "B 25   C 2.5 D  12.5 A 5 \n"
58     #readin = "a .1 b .15 c .3 d .16 e .29" # Wikipedia sample
59     #readin = "a1 .4 a2 .35 a3 .2 a4 .05" # Wikipedia sample
60     #readin = "A 50 B 25 C 12.5 D 12.5" # RC example
61
63     symbol2weights = dict((symbol, wt)
64                          for symbol, wt in zip(cleaned[0::2], cleaned[1::2]) )
65 else:
66     tutor = False
67     sequencecount = 500
68     astring = "this is an example for huffman encoding"
69     symbol2weights = dict((ch, astring.count(ch)) for ch in set(astring)) # for astring
70
71 huff = codecreate(symbol2weights, tutor= tutor)
72 print "\nSYMBOL\tWEIGHT\tHUFFMAN CODE"
73 for h in huff:
74     print "%s\t%s\t%s" % (h[0], symbol2weights[h[0]], h[1])
75
76 ##
77 ## encode-decode check
78 ##
79 symbol2code = dict(huff)
80 symbols, weights = zip(*symbol2weights.iteritems())
81 # normalize weights
82 weights = [float(wt) for wt in weights]
83 tot = sum(weights)
84 weights = [wt/tot for wt in weights]
85 # Generate a sequence
86 nxt = probchoice(symbols, weights).next
87 symbolsequence = [nxt() for i in range(sequencecount)]
88 # encode it
89 bitsequence = ''.join(symbol2code[sym] for sym in symbolsequence)
90
91 sslen, slen, blen = len(symbolsequence), len(symbols), len(bitsequence)
92 countlen = len(bin(slen-1)[2:])
93 print '''
94
95
96 ROUND-TRIPPING
97 ==============
98 I have generated a random sequence of %i symbols to the given weights.
99 If I use a binary count to encode each of the %i symbols I would need
100 %i * %i = %i bits to encode the sequence.
101 Using the Huffman code, I need only %i bits.
102 ''' % (sslen, slen, sslen, countlen, sslen * countlen, blen )
103
104 ## decoding
105 nextbit = (bit=='1' for bit in bitsequence).next
106
107 decoded = []
108 try:
109     while 1:
110         decoded.append(decode())
111 except StopIteration:
112     pass
113
114 print "Comparing the decoded sequence with the original I get:", decoded == symbolsequence```

This short run is in tutor mode, so you can track the accumulation of the decode function:
ENCODING: [('A', '5'), ('B', '25'), ('C', '2.5'), ('D', '12.5')]
COMBINING: [2.5, [['C', []]], "'C'"]
AND: [5.0, [['A', []]], "'A'"]
PRODUCING: [7.5, [['C', ['0']], ['A', ['1']]], "('A' if nextbit() else 'C')"]

COMBINING: [7.5, [['C', ['0']], ['A', ['1']]], "('A' if nextbit() else 'C')"]
AND: [12.5, [['D', []]], "'D'"]
PRODUCING: [20.0, [['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]], "('D' if nextbit() else ('A' if nextbit() else 'C'))"]

COMBINING: [20.0, [['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]], "('D' if nextbit() else ('A' if nextbit() else 'C'))"]
AND: [25.0, [['B', []]], "'B'"]
PRODUCING: [45.0, [['C', ['0', '0', '0']], ['A', ['0', '0', '1']], ['D', ['0', '1']], ['B', ['1']]], "('B' if nextbit() else ('D' if nextbit() else ('A' if nextbit() else 'C')))"]

SYMBOL    WEIGHT    HUFFMAN CODE
B         25        1
D         12.5      01
A         5         001
C         2.5       000

ROUND-TRIPPING
==============
I have generated a random sequence of 50 symbols to the given weights.
If I use a binary count to encode each of the 4 symbols I would need
50 * 2 = 100 bits to encode the sequence.
Using the Huffman code, I need only 90 bits.

Comparing the decoded sequence with the original I get: True

And if I change line 54 to be False, I get the following:
SYMBOL    WEIGHT    HUFFMAN CODE
6    101
n    4    010
a    3    1001
e    3    1100
f    3    1101
h    2    0001
i    3    1110
m    2    0010
o    2    0011
s    2    0111
g    1    00000
l    1    00001
p    1    01100
r    1    01101
t    1    10000
u    1    10001
x    1    11110
c    1    111110
d    1    111111

ROUND-TRIPPING
==============
I have generated a random sequence of 500 symbols to the given weights.
If I use a binary count to encode each of the 19 symbols I would need
500 * 5 = 2500 bits to encode the sequence.
Using the Huffman code, I need only 2012 bits.

Comparing the decoded sequence with the original I get: True

Note: The purpose of the program is to teach me more about Huffman coding and is not an exercise in speed!

I am the author of all the Python on this page. The diagram is from Wikipedia. Please refrain from passing-off my code as your own (that one is mainly for students).

## Saturday, March 21, 2009

### Batch Process Runner in bash shell - Forgotten Enhancements

post caused me to dig out this enhancement to my original
batch process runner.

What you do is create a file of one-liner commands that you would enter
at a command prompt, for example, this is file style="font-weight: bold;">process_list.txt :

`1  color="#0000ff"># (Comments have '#' at the left margin) color="#804040">2 sleep 4; echo slept for 4 at line 2; csdf_sdf_sd; exit 19 color="#804040">3 sleep 1; cause-an-error; echo slept for 1 at line 3 color="#804040">4 sleep 3; echo  color="#ff00ff">'slept for 3 at line 4' color="#804040">5 sleep 7; echo  color="#ff00ff">"slept for 7 at line 5" color="#804040">6 # Whee! color="#804040">7 sleep 9; another-erro; echo slept for 9 at line 7 color="#804040">8 sleep 5; echo  color="#ff00ff">'slept "for 5" at line 8'`

A bit of warning, bash returns the exit code of this laast command it
executes, earlier exit codes won't be seen.

Give my script the number of processes to run in parallel, N, followed
by the above file, and it will create N '.job'  files out of
the non-comment lines of process_list.txt and execute them as
background jobs, with their output sent to '.n' files.:

`bash\$ ./process_list_runner.sh 2 process_list.txt## STARTING 6 Processes from file: process_list.txt, 2 at a time with id plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP# 2 Jobs in background. 6/6 started. 2009-03-21-05:36:36## FINISHED, (stats in plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.csv)bash\$  style="font-weight: bold;">ls -1 plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.*plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.1plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.2plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.3plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.4plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.5plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.6plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.csvplr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.job_0plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.job_1bash\$ `

The script does what it can to create a csv file of run stats, which,
with a bit of tidying up in calc produces the following:

frame="void" rules="none">
width="340"> width="110"> width="115"> width="73">

 width="1409">##STARTING 6 Processes from file: process_list.txt size="4">2 at a time with idplr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP align="center" bgcolor="#ccffff" height="123" valign="middle">JOB align="right" bgcolor="#ccffff" valign="middle">LINE align="center" bgcolor="#ccffff" valign="middle">Commandbeing timed align="center" bgcolor="#ccffff" valign="middle">Usertime (seconds) align="center" bgcolor="#ccffff" valign="middle">Systemtime (seconds) align="center" bgcolor="#ccffff" valign="middle">Percentof CPU this job got align="center" bgcolor="#ccffff" valign="middle">Elapsed(wall clock) time (h:mm:ss or m:ss) align="center" bgcolor="#ccffff" valign="middle">Maximumresident set size (kbytes) align="center" bgcolor="#ccffff" valign="middle">Major(requiring I/O) page faults align="center" bgcolor="#ccffff" valign="middle">Signalsdelivered align="center" bgcolor="#ccffff" valign="middle">Pagesize (bytes) align="center" bgcolor="#ccffff" valign="middle">Exitstatus sdnum="2057;" align="center" bgcolor="#ffffcc" height="18">1 sdnum="2057;" align="center" bgcolor="#ffffcc">2 bgcolor="#ffffcc">sleep 4; echo slept for 4 at line 2;csdf_sdf_sd; exit 19 sdnum="2057;" align="center" bgcolor="#ffffcc">0.13 sdnum="2057;" align="center" bgcolor="#ffffcc">0.15 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">6.00% sdval="0.0000503472222222222" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ffffcc">00:04.35 sdval="760832" sdnum="2057;" align="center" bgcolor="#ffffcc">760832 sdnum="2057;" align="center" bgcolor="#ffffcc">3277 sdnum="2057;" align="center" bgcolor="#ffffcc">6 sdval="65536" sdnum="2057;" align="center" bgcolor="#ffffcc">65536 sdnum="2057;" align="center" bgcolor="#ffffcc">19 sdnum="2057;" align="center" bgcolor="#ccccff" height="17">2 sdnum="2057;" align="center" bgcolor="#ccccff">3 bgcolor="#ccccff">sleep 1; cause-an-error; echo slept for 1at line 3 sdnum="2057;" align="center" bgcolor="#ccccff">0.12 sdnum="2057;" align="center" bgcolor="#ccccff">0.09 sdnum="2057;0;0.00%" align="center" bgcolor="#ccccff">16.00% sdval="0.0000152777777777778" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ccccff">00:01.32 sdval="761088" sdnum="2057;" align="center" bgcolor="#ccccff">761088 sdnum="2057;" align="center" bgcolor="#ccccff">3273 sdnum="2057;" align="center" bgcolor="#ccccff">6 sdval="65536" sdnum="2057;" align="center" bgcolor="#ccccff">65536 sdnum="2057;" align="center" bgcolor="#ccccff">0 sdnum="2057;" align="center" bgcolor="#ffffcc" height="17">3 sdnum="2057;" align="center" bgcolor="#ffffcc">4 bgcolor="#ffffcc">sleep 3; echo 'slept for 3 at line 4' sdnum="2057;" align="center" bgcolor="#ffffcc">0.07 sdnum="2057;" align="center" bgcolor="#ffffcc">0.06 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">4.00% sdval="0.0000377314814814815" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ffffcc">00:03.26 sdval="617216" sdnum="2057;" align="center" bgcolor="#ffffcc">617216 sdnum="2057;" align="center" bgcolor="#ffffcc">2639 sdnum="2057;" align="center" bgcolor="#ffffcc">3 sdval="65536" sdnum="2057;" align="center" bgcolor="#ffffcc">65536 sdnum="2057;" align="center" bgcolor="#ffffcc">0 sdnum="2057;" align="center" bgcolor="#ccccff" height="17">4 sdnum="2057;" align="center" bgcolor="#ccccff">5 bgcolor="#ccccff">sleep 7; echo "slept for 7 at line 5" sdnum="2057;" align="center" bgcolor="#ccccff">0.07 sdnum="2057;" align="center" bgcolor="#ccccff">0.07 sdnum="2057;0;0.00%" align="center" bgcolor="#ccccff">2.00% sdval="0.000084837962962963" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ccccff">00:07.33 sdval="617216" sdnum="2057;" align="center" bgcolor="#ccccff">617216 sdnum="2057;" align="center" bgcolor="#ccccff">2644 sdnum="2057;" align="center" bgcolor="#ccccff">3 sdval="65536" sdnum="2057;" align="center" bgcolor="#ccccff">65536 sdnum="2057;" align="center" bgcolor="#ccccff">0 sdnum="2057;" align="center" bgcolor="#ffffcc" height="18">5 sdnum="2057;" align="center" bgcolor="#ffffcc">7 bgcolor="#ffffcc">sleep 9; another-erro; echo slept for 9 atline 7 sdnum="2057;" align="center" bgcolor="#ffffcc">0.09 sdnum="2057;" align="center" bgcolor="#ffffcc">0.13 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">2.00% sdval="0.000106597222222222" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ffffcc">00:09.21 sdval="760064" sdnum="2057;" align="center" bgcolor="#ffffcc">760064 sdnum="2057;" align="center" bgcolor="#ffffcc">3274 sdnum="2057;" align="center" bgcolor="#ffffcc">6 sdval="65536" sdnum="2057;" align="center" bgcolor="#ffffcc">65536 sdnum="2057;" align="center" bgcolor="#ffffcc">0 sdnum="2057;" align="center" bgcolor="#ccccff" height="17">6 sdnum="2057;" align="center" bgcolor="#ccccff">8 bgcolor="#ccccff">sleep 5; echo 'slept "for 5" at line 8' sdnum="2057;" align="center" bgcolor="#ccccff">0.1 sdnum="2057;" align="center" bgcolor="#ccccff">0.06 sdnum="2057;" align="center" bgcolor="#ccccff">0.03 sdval="0.0000622685185185185" sdnum="2057;0;MM:SS.00" align="center" bgcolor="#ccccff">00:05.38 sdval="617216" sdnum="2057;" align="center" bgcolor="#ccccff">617216 sdnum="2057;" align="center" bgcolor="#ccccff">2639 sdnum="2057;" align="center" bgcolor="#ccccff">3 sdval="65536" sdnum="2057;" align="center" bgcolor="#ccccff">65536 sdnum="2057;" align="center" bgcolor="#ccccff">0

The script itself is:

` 1  color="#0000ff">#!/bin/bash  color="#804040"> 2 #!/opt/TWWfsw/bin/bash - color="#804040"> 3  color="#804040"> 4 set  color="#008080">-u color="#804040"> 5  color="#804040"> 6 ## color="#804040"> 7 ## process_runner.sh <concurrent> <total procs> color="#804040"> 8 ## color="#804040"> 9 ## Example script given the maximum number of processes to color="#804040"> 10 ## run concurrently, c, and the total number of processes, n color="#804040"> 11 ## runs at most c, processes in the background until all n color="#804040"> 12 ## Have been run. color="#804040"> 13 ## color="#804040"> 14 ## Author Donald 'Paddy' McCarthy Dec. 17 2007 color="#804040"> 15 ## color="#804040"> 16  color="#804040"> 17 # how many processes to run in parallel color="#804040"> 18 concurrent= color="#a020f0">\$1 color="#804040"> 19 # File of commands color="#804040"> 20 proclist= color="#804040">" color="#a020f0">\$2" color="#804040"> 21  color="#804040"> 22 ## color="#804040"> 23 ## color="#804040"> 24  color="#804040"> 25 # main loop wait time between checking background procs. color="#804040"> 26 tick= color="#ff00ff">1 color="#804040"> 27 # Unique_id for process files of this run color="#804040"> 28 unique_id=plr_ color="#6a5acd">`date +\${ color="#a020f0">USER//  color="#804040">/ color="#a020f0">}_%F-%T_ color="#a020f0">\${HOSTNAME color="#804040">//  color="#804040">/ color="#a020f0">}` color="#804040"> 29 unique_id= color="#a020f0">\${unique_id color="#804040">//: color="#804040">/_ color="#a020f0">} color="#804040"> 30  color="#804040"> 31 function read_proclistfile  color="#6a5acd">{ color="#804040"> 32  # read processes from file ignoring comment lines color="#804040"> 33  maxprocs= color="#ff00ff">0 color="#804040"> 34  local -i  color="#008080">linenumber= color="#ff00ff">0 color="#804040"> 35  while color="#804040">  color="#804040">read color="#804040">; color="#804040">  color="#804040">do color="#804040"> 36  (( color="#008080">linenumber+= color="#ff00ff">1)) color="#804040"> 37  # echo [ "\${REPLY:0:1}" == "#" ] \$maxprocs \$REPLY color="#804040"> 38  if  color="#804040">[  color="#804040">" color="#a020f0">\${REPLY color="#804040">: color="#ff00ff">0: color="#ff00ff">1} color="#804040">"  color="#804040">!=  color="#804040">" color="#ff00ff">#"  color="#804040">]  color="#804040">;  color="#804040">then color="#804040"> 39  ((maxprocs+ color="#804040">= color="#ff00ff">1)) color="#804040"> 40  allprocs color="#804040">[ color="#a020f0">\$maxprocs color="#804040">] color="#804040">= color="#804040">" color="#a020f0">\$REPLY" color="#804040"> 41  allprocline color="#804040">[ color="#a020f0">\$maxprocs color="#804040">] color="#804040">= color="#a020f0">\$linenumber color="#804040"> 42  # echo \$linenumber \$maxprocs \$REPLY color="#804040"> 43  fi color="#804040"> 44  done  color="#804040"><  color="#a020f0">\$proclist color="#804040"> 45 } color="#804040"> 46  color="#804040"> 47 read_proclistfile color="#804040"> 48  color="#804040"> 49 printf  color="#804040">" color="#6a5acd">\n## STARTING %i Processes from file: %s, %i at a time with id %s color="#6a5acd">\n\n" \ color="#804040"> 50  \$maxprocs  color="#a020f0">\$proclist  color="#a020f0">\$concurrent  color="#a020f0">\$unique_id color="#804040"> 51  color="#804040"> 52  color="#804040"> 53  color="#804040"> 54 function assemble_job  color="#6a5acd">{ color="#804040"> 55  # color="#804040"> 56  local  color="#008080">plr_unique_id= color="#804040">" color="#a020f0">\$1"; color="#804040">local  color="#008080">plr_ran=" color="#a020f0">\$2"; color="#804040">local  color="#008080">plr_proc= color="#804040">" color="#a020f0">\$3"; color="#804040">local  color="#008080">plr_procline= color="#804040">" color="#a020f0">\$4"; color="#804040">local  color="#008080">plr_proclist= color="#804040">" color="#a020f0">\$5" color="#804040"> 57  cat  color="#804040"><<! color="#804040"> 58  color="#804040"> 59 #  color="#a020f0">\$plr_unique_id color="#ff00ff"> \$plr_ran color="#ff00ff">  color="#804040"> 60 ` color="#804040">local color="#6a5acd">` color="#804040"> 61  color="#804040"> 62 trap 'error= color="#6a5acd">\\$?;printf " color="#6a5acd">\n\n##STATISTICS For Job %i color="#6a5acd">\n"  color="#a020f0">\$plr_ran; printf "# Line %i of file %s color="#6a5acd">\n\n"  color="#a020f0">\$plr_procline color="#ff00ff"> \$plr_proclist color="#ff00ff">; exit \\$ color="#ff00ff">error' EXIT color="#804040"> 63  color="#804040"> 64  color="#804040"> 65 \$plr_proc color="#804040"> 66  color="#804040"> 67 ! color="#804040"> 68 } color="#804040"> 69  color="#804040"> 70 function print_runstats  color="#6a5acd">{ color="#804040"> 71  printf  color="#804040">' color="#ff00ff"># %i Jobs in background. %i/%i started. %s\r color="#804040">' \ color="#804040"> 72  ` color="#804040">jobs color="#6a5acd"> -r color="#804040">| color="#6a5acd">wc -l` \$ran  color="#a020f0">\$maxprocs  color="#804040">" color="#6a5acd">`date +%F-%T` color="#804040">" color="#804040"> 73 } color="#804040"> 74  color="#804040"> 75 # Bash array running keeps track of the background process numbers color="#804040"> 76 # Start with nothing running (sentinel value will not be a process number color="#804040"> 77 for  color="#6a5acd">((i= color="#ff00ff">0; i color="#804040">< color="#a020f0">\$concurrent color="#804040">; i+ color="#804040">= color="#ff00ff">1 )) color="#804040">;  color="#804040">do running color="#804040">[ color="#a020f0">\$i] color="#804040">= color="#ff00ff">123456789 color="#804040">;  color="#804040">done color="#804040"> 78  color="#804040"> 79 ran= color="#ff00ff">0 color="#804040"> 80 until color="#804040"> 81  while  color="#804040">[  color="#a020f0">\$ran -lt  color="#a020f0">\$maxprocs  color="#804040">] color="#804040">;  color="#804040">do color="#804040"> 82  for  color="#6a5acd">((p= color="#ff00ff">0; p color="#804040">< color="#a020f0">\$concurrent color="#804040">; p+ color="#804040">= color="#ff00ff">1 )) color="#804040">;  color="#804040">do color="#804040"> 83  proc= color="#a020f0">\${running color="#a020f0">[\$p color="#a020f0">]} color="#804040"> 84  # Over all running processes... color="#804040"> 85  # \$proc still running? color="#804040"> 86  ps -p  color="#a020f0">\$proc |  color="#804040">fgrep  color="#a020f0">\$proc >/dev/null color="#804040"> 87  if  color="#804040">[  color="#a020f0">\$? -ne  color="#804040">' color="#ff00ff">0'  color="#804040">]  color="#804040">;  color="#804040">then color="#804040"> 88  # Not found i.e. its finished color="#804040"> 89  # So start the next color="#804040"> 90  ((ran+ color="#804040">= color="#ff00ff">1)) color="#804040"> 91  #(assemble_job ) >\$unique_id.job_\$p color="#804040"> 92  ( assemble_job  color="#804040">" color="#a020f0">\$unique_id color="#804040">"  color="#804040">" color="#a020f0">\$ran"  color="#804040">" color="#a020f0">\${allprocs color="#a020f0">[\$ran color="#a020f0">]} color="#804040">" \ color="#804040"> 93  \${ color="#a020f0">allprocline color="#a020f0">[\$ran color="#a020f0">]}  color="#a020f0">\$proclist  color="#804040">)  color="#804040">>  color="#a020f0">\$unique_id.job_ color="#a020f0">\$p color="#804040"> 94  chmod +x  color="#a020f0">\$unique_id.job_ color="#a020f0">\$p color="#804040"> 95  # run one job in background collecting run stats color="#804040"> 96  (/bin/ color="#804040">time -v -a -o  color="#a020f0">\$unique_id. color="#a020f0">\$ran < /dev/null  color="#a020f0">\$unique_id.job_ color="#a020f0">\$p 2>&1  color="#804040">)  color="#804040">>  color="#a020f0">\$unique_id. color="#a020f0">\$ran & color="#804040"> 97  running color="#804040">[ color="#a020f0">\$p] color="#804040">= color="#a020f0">\$! color="#804040"> 98  runningprocnum color="#804040">[ color="#a020f0">\$p] color="#804040">= color="#a020f0">\$ran color="#804040"> 99  if  color="#804040">[  color="#a020f0">\$ran -ge  color="#a020f0">\$maxprocs  color="#804040">] color="#804040">;  color="#804040">then  color="#804040">break  color="#ff00ff">1;  color="#804040">fi color="#804040">100  #exit color="#804040">101  fi color="#804040">102  done color="#804040">103  sleep  color="#a020f0">\$tick color="#804040">104  # Status color="#804040">105  print_runstats color="#804040">106  done color="#804040">107  color="#804040">108  ## break 1 gets us here! color="#804040">109  color="#804040">110  # Keep on printing status while there are background processes  color="#804040">111  # even though there are no more to start. color="#804040">112  sleep  color="#a020f0">\$tick color="#804040">113  # Status color="#804040">114  print_runstats color="#804040">115 do  color="#804040">[  color="#6a5acd">`jobs color="#6a5acd"> -r color="#804040">| color="#6a5acd">wc -l` -eq  color="#ff00ff">0 ] color="#804040">116 done color="#804040">117 wait color="#804040">118  color="#804040">119 function stats2csv  color="#6a5acd">{ color="#804040">120  # Gather overall stats in csv file color="#804040">121  color="#804040">122  # CSV Heading color="#804040">123  printf  color="#804040">" color="#6a5acd">\n## STARTING %i Processes from file: %s color="#6a5acd">\n%i at a time with id %s color="#6a5acd">\n\n" \ color="#804040">124  \$maxprocs  color="#a020f0">\$proclist  color="#a020f0">\$concurrent  color="#a020f0">\$unique_id  color="#804040">>  color="#a020f0">\$unique_id.csv color="#804040">125  # CSV Column labels  color="#804040">126  gawk  color="#804040">' color="#ff00ff">/^##STATISTICS For Job/{s++} color="#804040">127  s&&/^# Line .*of file/{printf "JOB,LINE";next} color="#804040">128  s&&/: /{l=\$0;sub(/: .*/,"");\$1=\$1;printf",%s",\$0} color="#804040">129  s&&l~/Exit status:/{s=0;print"";l="";exit} color="#804040">130   color="#804040">'  color="#a020f0">\$unique_id. color="#804040">[ color="#ff00ff">0-9]*  color="#804040">>>  color="#a020f0">\$unique_id.csv color="#804040">131  color="#804040">132  # Stick allprocs into env for later gawk substitution of command color="#804040">133  for i  color="#804040">in  color="#a020f0">\${!allprocs color="#a020f0">[@] color="#a020f0">};  color="#804040">do color="#804040">134  declare  color="#008080">-x _ap color="#a020f0">\$i= color="#804040">" color="#a020f0">\${allprocs color="#a020f0">[\$i color="#a020f0">]} color="#804040">" color="#804040">135  done color="#804040">136  #env|grep '_ap.=' ; debug color="#804040">137  # CSV Rows color="#804040">138  gawk  color="#804040">' color="#ff00ff">/^##STATISTICS For Job/{s++;j=\$NF} color="#804040">139  s&&/^# Line .*of file/{printf "%s,%s",j,\$3;next} color="#804040">140  s&&/Command [^:]*: /{printf",%s",ENVIRON["_ap" j]; next} color="#804040">141  s&&/: /{l=\$0;sub(/.*: /,"");printf",%s",\$0} color="#804040">142  s&&l~/Exit status:/{s=0;print"";l="";nextfile} color="#804040">143   color="#804040">'  color="#a020f0">\$unique_id. color="#804040">[ color="#ff00ff">0-9]*  color="#804040">>>  color="#a020f0">\$unique_id.csv color="#804040">144 } color="#804040">145  color="#804040">146 stats2csv color="#804040">147  color="#804040">148 printf  color="#804040">" color="#6a5acd">\n## FINISHED, (stats in %s) color="#6a5acd">\n\n"  color="#a020f0">\$unique_id.csv color="#804040">149  color="#804040">150 exit  color="#ff00ff">0 color="#804040">151  color="#804040">152  color="#804040">153  color="#804040">154 `

I remember writing this with the help of the advanced bash tutorial
which is great, and looking back at this program, I would need the
tutorials help if I needed to extend it!

## Why not in Python?

Of course I thought of doing it in Python, but bash's job
control features were used to flesh out the initial idea and it grew to
become the finished article. Looking back at this program as it's over
a year old, I find that I like bash's job control, and gawk one-liners

src="http://kompozer.sourceforge.net/images/kompozer_80x15.png"
border="0">

## Wednesday, March 04, 2009

### Psion Netbook

I was just reading about Intel's claim that the term Netbook was generic. It seems to be bullying to me.

So, in support of 'our Psion' I got out my old Psion Series 3 for nostalgias sake, and again marvelled at its design.

P.S. Offers of a complementary Netbook would be gratefully received :-)