Mainly Tech projects on Python and Electronic Design Automation.

Saturday, March 28, 2009

Huffman Encoding in Python



Huffman encoding came up on href="http://www.rosettacode.org/wiki/Huffman_codes">Rosetta
Code.

Huffman encoding is a way to assign binary codes to symbols
that
reduces the overall number of bits used to encode a typical string of
of those symbols.


For example, if you use letters as symbols and have details of
the frequency of occurence of those letters in typical strings, then
you could just encode each letter with a fixed number of bits, such as
in ASCII codes. You can do better than this by encoding more frequently
occurring letters such as e and a, with smaller bit strings; and less
frequently occurring letters such as q and x with longer bit strings.


Any string of letters will be encoded as a string of bits that
are no-longer of the same length per letter. To successfully decode
such as string, the smaller codes assigned to letters such as 'e'
cannot occur as a prefix in the larger codes such as that for 'x'.



If you were to assign a code 01 for 'e' and code 011 for
'x',
then if the bits to decode started as 011... then you would not know
iif you should decode an 'e' or an 'x'.


The Huffman coding scheme takes each symbol and its weight (or
frequency of occurrence), and generates proper encodings for each
symbol taking account of the weights of each symbol, so that higher
weighted symbols have less bits in their encoding. (See the href="http://en.wikipedia.org/wiki/Huffman_coding" class="extiw"
title="wp:Huffman_coding">WP article for more
information).


A Huffman encoding can be computed by first creating a tree of
nodes:


cellpadding="1" cellspacing="1">









  1. Create a leaf node for each symbol and add it to the
    priority queue.

  2. While there is more than one node in the queue:

    1. Remove the node of highest priority (lowest
      probability) twice to get two nodes.

    2. Create a new internal node with these two nodes
      as children
      and with probability equal to the sum of the two nodes' probabilities.

    3. Add the new node to the queue.



  3. The remaining node is the root node and the tree is
    complete.


Traverse the constructed binary tree from root to leaves
assigning
and accumulating a '0' for one branch and a '1' for the other at each
node. The accumulated zeroes and ones at each leaf constitute a Huffman
encoding for those symbols and weights.


href="http://www.rosettacode.org/wiki/Image:Huffman_coding_example.jpg"
class="image" title="Huffman coding example.jpg"> alt=""
src="http://www.rosettacode.org/w/images/thumb/8/8b/Huffman_coding_example.jpg/250px-Huffman_coding_example.jpg"
border="0" height="120" width="250">


In Python


I orginally gave an an example that matched a definition that
was later found to be insufficient, so substituted my own definition
above.. My href="http://www.rosettacode.org/w/index.php?title=Huffman_codes&oldid=26720#Python">first
Python solution  on RC to the wrong definition, did
have the advantage, (as I saw it), of not having to traverse a tree.


My 'true' Huffman code creator assembles each symbol and its
weight into the following structure initially (the style="font-weight: bold;">leaf structure):

style="font-family: monospace;">[ weight, [ symbol, []]]

The weight applies to every (in this case only one), of the style="font-family: monospace;"> [symbol, []]
pairs after it in the same list.

The empty list is used to accumulate the Huffman code for the symbol as
we manipulate the heap, without having to walk a constructed tree
structure.



There are two types of input to the program that I am running examples
with:



  1. A string of space separated symbol, weight pairs, as used
    in small examples.

  2. A sample of text for which letters and letter frequencies
    are extracted.


The if statement at line 23
 allows me to switch between the two types of input whilst
exploring the algorithm.

The tutor argument to the encode function shows what is happening in
the loop around the heap pops


 1 
color="#804040"> 2 from heapq color="#a020f0">import heappush, heappop, heapify
color="#804040"> 3
color="#804040"> 4 def color="#008080">codecreate(symbol2weights, tutor= False):
color="#804040"> 5 ''' color="#ff00ff"> Huffman encode the given dict mapping symbols to weights '''
color="#804040"> 6 heap = [ [float(wt), [sym, []]] color="#804040">for sym, wt color="#804040">in symbol2weights.iteritems() ]
color="#804040"> 7 heapify(heap)
color="#804040"> 8 if tutor: color="#804040">print " color="#ff00ff">ENCODING:", sorted(symbol2weights.iteritems())
color="#804040"> 9 while len(heap) >1:
color="#804040">10 lo = heappop(heap)
color="#804040">11 hi = heappop(heap)
color="#804040">12 color="#804040">if tutor: color="#804040">print " color="#ff00ff"> COMBINING:", lo, ' color="#6a5acd">\n AND:', hi
color="#804040">13 color="#804040">for i color="#804040">in lo[1:]: i[1].insert(0, ' color="#ff00ff">0')
color="#804040">14 color="#804040">for i color="#804040">in hi[1:]: i[1].insert(0, ' color="#ff00ff">1')
color="#804040">15 lohi = [ lo[0] + hi[0] ] + lo[1:] + hi[1:]
color="#804040">16 color="#804040">if tutor: color="#804040">print " color="#ff00ff"> PRODUCING:", lohi, ' color="#6a5acd">\n'
color="#804040">17 heappush(heap, lohi)
color="#804040">18 codes = heappop(heap)[1:]
color="#804040">19 for i color="#804040">in codes: i[1] = ''.join(i[1])
color="#804040">20 return sorted(codes, key= color="#804040">lambda x: (len(x[-1]), x))
color="#804040">21
color="#804040">22 # Input types
color="#804040"> style="font-weight: bold; background-color: rgb(255, 255, 102);">23 color="#804040">if 1:
color="#804040">24 readin = " color="#ff00ff">B 25 C 2.5 D 12.5 A 5 color="#6a5acd">\n"
color="#804040">25 #readin = "a .1 b .15 c .3 d .16 e .29" # Wikipedia sample
color="#804040">26 #readin = "a1 .4 a2 .35 a3 .2 a4 .05" # Wikipedia sample
color="#804040">27 #readin = "A 50 B 25 C 12.5 D 12.5" # RC example
color="#804040">28
color="#804040">29 cleaned = readin.strip().split()
color="#804040">30 symbol2weights = dict((symbol, wt)
color="#804040">31 color="#804040">for symbol, wt color="#804040">in zip(cleaned[0::2], cleaned[1::2]) )
color="#804040">32 else:
color="#804040">33 astring = " color="#ff00ff">this is an example for huffman encoding"
color="#804040">34 symbol2weights = dict((ch, astring.count(ch)) color="#804040">for ch color="#804040">in set(astring)) color="#0000ff"># for astring
color="#804040">35
color="#804040">36 huff = codecreate(symbol2weights, True)
color="#804040">37 print " color="#6a5acd">\nSYMBOL color="#6a5acd">\tWEIGHT color="#6a5acd">\tHUFFMAN CODE"
color="#804040">38 for h color="#804040">in huff:
color="#804040">39 print " color="#ff00ff">%s\t color="#ff00ff">%s\t color="#ff00ff">%s" % (h[0], symbol2weights[h[0]], h[1])




A run, with the tutor enabled gives the following output:

style="font-family: monospace;">ENCODING: [('A', '5'), ('B',
'25'), ('C', '2.5'), ('D', '12.5')] style="font-family: monospace;">
 
COMBINING: [2.5, ['C', []]]
style="font-family: monospace;">
       
AND: [5.0, ['A', []]]


 
PRODUCING: [7.5, ['C', ['0']], ['A', ['1']]]
style="font-family: monospace;">


 
COMBINING: [7.5, ['C', ['0']], ['A', ['1']]]
style="font-family: monospace;">
       
AND: [12.5, ['D', []]]
style="font-family: monospace;">
 
PRODUCING: [20.0, ['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]]
style="font-family: monospace;">


 
COMBINING: [20.0, ['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]]
style="font-family: monospace;">
       
AND: [25.0, ['B', []]]
style="font-family: monospace;">
 
PRODUCING: [45.0, ['C', ['0', '0', '0']], ['A', ['0', '0', '1']], ['D',
['0', '1']], ['B', ['1']]]
style="font-family: monospace;">




SYMBOL   
WEIGHT    HUFFMAN CODE
style="font-family: monospace;">
B   
     25   
    1
style="font-family: monospace;">
D   
     12.5      01
style="font-family: monospace;">
A   
     5   
     001
style="font-family: monospace;">
C   
     2.5   
   000





Encode/Decode Round-tripping


I realised that I could use a method similar to how I accumulate the
codes in the heap loop, to generate a single function that can
recognise a single symbol from the beginning of the encoded symbols. By
using the function in a loop, I could regenerate the symbol list.



In the codecreate function, the leaf
structure
is modified:

style="font-family: monospace;">[weight, [ [symbol, []] ],
repr(sym)]

Their is an extra level of list around the [symbol, code accumulation
list pair]  as well as a new item: 'repr(sym)' it is the third
item in the outer list and will always be the  function to
generate the item as accumulated so far.. This function starts off by
returning just the symbol and an outer if/then/else expression is added
as we go around the heap loop (see line style="font-weight: bold;">43)



After the outer expression is accumulated, it is turned into a lambda
expression, the string eval'd, and assigned to a global variable (see
line 47)



I use function probchoice (from earlier RC work), to create an
arbitrary sequence of symbols to in the given weighting then encode and
decode it as well as giving some stats on space saving.




  1 
color="#804040"> 2 from heapq color="#a020f0">import heappush, heappop, heapify
color="#804040"> 3 import random, bisect
color="#804040"> 4
color="#804040"> 5
color="#804040"> 6 # Helper routine for generating test sequences
color="#804040"> style="background-color: rgb(255, 255, 102);"> 7 color="#804040">def color="#008080">probchoice(items, probs):
color="#804040"> 8 ''' color="#6a5acd">\
color="#804040"> 9 Splits the interval 0.0-1.0 in proportion to probs
color="#804040"> 10 then finds where each random.random() choice lies
color="#804040"> 11 (This routine, probchoice, was released under the
color="#804040"> 12 GNU Free Documentation License 1.2)
color="#804040"> 13 '''
color="#804040"> 14
color="#804040"> 15 prob_accumulator = 0
color="#804040"> 16 accumulator = []
color="#804040"> 17 for p color="#804040">in probs:
color="#804040"> 18 prob_accumulator += p
color="#804040"> 19 accumulator.append(prob_accumulator)
color="#804040"> 20
color="#804040"> 21 while True:
color="#804040"> 22 r = random.random()
color="#804040"> 23 yield items[bisect.bisect(accumulator, r)]
color="#804040"> 24
color="#804040"> 25
color="#804040"> 26 # placeholder
color="#804040"> 27 decode = color="#804040">lambda : None
color="#804040"> 28
color="#804040"> 29 def color="#008080">codecreate(symbol2weights, tutor= False):
color="#804040"> 30 ''' color="#ff00ff"> Huffman encode the given dict mapping symbols to weights '''
color="#804040"> 31 global decode
color="#804040"> 32
color="#804040"> 33 heap = [ [float(wt), [[sym, []]], repr(sym)] color="#804040">for sym, wt color="#804040">in symbol2weights.iteritems() ]
color="#804040"> 34 heapify(heap)
color="#804040"> 35 if tutor: color="#804040">print " color="#ff00ff">ENCODING:", sorted(symbol2weights.iteritems())
color="#804040"> 36 while len(heap) >1:
color="#804040"> 37 lo = heappop(heap)
color="#804040"> 38 hi = heappop(heap)
color="#804040"> 39 color="#804040">if tutor: color="#804040">print " color="#ff00ff"> COMBINING:", lo, ' color="#6a5acd">\n AND:', hi
color="#804040"> 40 color="#804040">for i color="#804040">in lo[1]: i[1].insert(0, ' color="#ff00ff">0')
color="#804040"> 41 color="#804040">for i color="#804040">in hi[1]: i[1].insert(0, ' color="#ff00ff">1')
color="#804040"> 42 lohi = [ lo[0] + hi[0] ] + [lo[1] + hi[1]]
color="#804040"> style="background-color: rgb(255, 255, 102);"> 43 lohi.append(' color="#ff00ff">(%s if nextbit() else %s)' % (hi[2], lo[2]))
color="#804040"> 44 color="#804040">if tutor: color="#804040">print " color="#ff00ff"> PRODUCING:", lohi, ' color="#6a5acd">\n'
color="#804040"> 45 heappush(heap, lohi)
color="#804040"> 46 wt, codes, decoder = heappop(heap)
color="#804040"> style="background-color: rgb(255, 255, 102);"> 47 decode = eval(' color="#ff00ff">lambda :' + decoder, globals())
color="#804040"> 48 decode.__doc__ = decoder
color="#804040"> 49 for i color="#804040">in codes: i[1] = ''.join(i[1])
color="#804040"> 50 #for i in codes: i[::] = i[:2]
color="#804040"> 51 return sorted(codes, key= color="#804040">lambda x: (len(x[-1]), x))
color="#804040"> 52
color="#804040"> 53 # Input types
color="#804040"> style="background-color: rgb(255, 255, 102);"> 54 color="#804040">if 1:
color="#804040"> 55 tutor = True
color="#804040"> 56 sequencecount = 50
color="#804040"> 57 readin = " color="#ff00ff">B 25 C 2.5 D 12.5 A 5 color="#6a5acd">\n"
color="#804040"> 58 #readin = "a .1 b .15 c .3 d .16 e .29" # Wikipedia sample
color="#804040"> 59 #readin = "a1 .4 a2 .35 a3 .2 a4 .05" # Wikipedia sample
color="#804040"> 60 #readin = "A 50 B 25 C 12.5 D 12.5" # RC example
color="#804040"> 61
color="#804040"> 62 cleaned = readin.strip().split()
color="#804040"> 63 symbol2weights = dict((symbol, wt)
color="#804040"> 64 color="#804040">for symbol, wt color="#804040">in zip(cleaned[0::2], cleaned[1::2]) )
color="#804040"> 65 else:
color="#804040"> 66 tutor = False
color="#804040"> 67 sequencecount = 500
color="#804040"> 68 astring = " color="#ff00ff">this is an example for huffman encoding"
color="#804040"> 69 symbol2weights = dict((ch, astring.count(ch)) color="#804040">for ch color="#804040">in set(astring)) color="#0000ff"># for astring
color="#804040"> 70
color="#804040"> 71 huff = codecreate(symbol2weights, tutor= tutor)
color="#804040"> 72 print " color="#6a5acd">\nSYMBOL color="#6a5acd">\tWEIGHT color="#6a5acd">\tHUFFMAN CODE"
color="#804040"> 73 for h color="#804040">in huff:
color="#804040"> 74 print " color="#ff00ff">%s\t color="#ff00ff">%s\t color="#ff00ff">%s" % (h[0], symbol2weights[h[0]], h[1])
color="#804040"> 75
color="#804040"> 76 ##
color="#804040"> 77 ## encode-decode check
color="#804040"> 78 ##
color="#804040"> 79 symbol2code = dict(huff)
color="#804040"> 80 symbols, weights = zip(*symbol2weights.iteritems())
color="#804040"> 81 # normalize weights
color="#804040"> 82 weights = [float(wt) color="#804040">for wt color="#804040">in weights]
color="#804040"> 83 tot = sum(weights)
color="#804040"> 84 weights = [wt/tot color="#804040">for wt color="#804040">in weights]
color="#804040"> 85 # Generate a sequence
color="#804040"> 86 nxt = probchoice(symbols, weights).next
color="#804040"> 87 symbolsequence = [nxt() color="#804040">for i color="#804040">in range(sequencecount)]
color="#804040"> 88 # encode it
color="#804040"> 89 bitsequence = ''.join(symbol2code[sym] color="#804040">for sym color="#804040">in symbolsequence)
color="#804040"> 90
color="#804040"> 91 sslen, slen, blen = len(symbolsequence), len(symbols), len(bitsequence)
color="#804040"> 92 countlen = len(bin(slen-1)[2:])
color="#804040"> 93 print '''
color="#804040"> 94
color="#804040"> 95
color="#804040"> 96 ROUND-TRIPPING
color="#804040"> 97 ==============
color="#804040"> 98 I have generated a random sequence of %i symbols to the given weights.
color="#804040"> 99 If I use a binary count to encode each of the %i symbols I would need
color="#804040">100 %i * %i = %i bits to encode the sequence.
color="#804040">101 Using the Huffman code, I need only %i bits.
color="#804040">102 ''' % (sslen, slen, sslen, countlen, sslen * countlen, blen )
color="#804040">103
color="#804040">104 ## decoding
color="#804040">105 nextbit = (bit==' color="#ff00ff">1' for bit color="#804040">in bitsequence).next
color="#804040">106
color="#804040">107 decoded = []
color="#804040">108 try:
color="#804040">109 while 1:
color="#804040">110 decoded.append(decode())
color="#804040">111 except StopIteration:
color="#804040">112 pass
color="#804040">113
color="#804040">114 print " color="#ff00ff">Comparing the decoded sequence with the original I get:", decoded == symbolsequence




This short run is in tutor mode, so you can track the accumulation of
the decode function:

style="font-family: monospace;">ENCODING: [('A', '5'), ('B',
'25'), ('C', '2.5'), ('D', '12.5')] style="font-family: monospace;">
 
COMBINING: [2.5, [['C', []]], "'C'"]


       
AND: [5.0, [['A', []]], "'A'"]
style="font-family: monospace;">
 
PRODUCING: [7.5, [['C', ['0']], ['A', ['1']]], style="font-weight: bold;">"('A' if nextbit() else 'C')"
]




 
COMBINING: [7.5, [['C', ['0']], ['A', ['1']]], "('A' if nextbit() else
'C')"]


       
AND: [12.5, [['D', []]], "'D'"]
style="font-family: monospace;">
 
PRODUCING: [20.0, [['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]],
"('D' if nextbit() else
('A' if nextbit() else 'C'))"
]
style="font-family: monospace;">


 
COMBINING: [20.0, [['C', ['0', '0']], ['A', ['0', '1']], ['D', ['1']]],
"('D' if nextbit() else ('A' if nextbit() else 'C'))"]
style="font-family: monospace;">
       
AND: [25.0, [['B', []]], "'B'"]
style="font-family: monospace;">
 
PRODUCING: [45.0, [['C', ['0', '0', '0']], ['A', ['0', '0', '1']],
['D', ['0', '1']], ['B', ['1']]], "('B'
if nextbit() else ('D' if nextbit() else ('A' if nextbit() else 'C')))"
]






SYMBOL   
WEIGHT    HUFFMAN CODE
style="font-family: monospace;">
B   
     25   
style="font-family: monospace;"> 
   style="font-family: monospace;">1 style="font-family: monospace;">
D   
 
   
style="font-family: monospace;">12.5  
   01 style="font-family: monospace;">
A   
 
   
style="font-family: monospace;">5   
     001 style="font-family: monospace;">
C   
 
   
style="font-family: monospace;">2.5   
   000 style="font-family: monospace;">






ROUND-TRIPPING style="font-family: monospace;">
============== style="font-family: monospace;">
I have generated a
random sequence of 50 symbols to the given weights.
style="font-family: monospace;">
If I use a binary
count to encode each of the 4 symbols I would need
style="font-family: monospace;">
50 * 2 = style="font-weight: bold;">100 bits to encode
the sequence.

Using the Huffman
code, I need only 90
bits.




Comparing the
decoded sequence with the original I get: style="font-weight: bold;">True



And if I change line 54
to be False, I get the following:

style="font-family: monospace;">SYMBOL   
WEIGHT    HUFFMAN CODE style="font-family: monospace;">
    
6    101
style="font-family: monospace;">
n   
4    010
style="font-family: monospace;">
a   
3    1001
style="font-family: monospace;">
e   
3    1100
style="font-family: monospace;">
f   
3    1101
style="font-family: monospace;">
h   
2    0001
style="font-family: monospace;">
i   
3    1110
style="font-family: monospace;">
m   
2    0010
style="font-family: monospace;">
o   
2    0011
style="font-family: monospace;">
s   
2    0111
style="font-family: monospace;">
g   
1    00000
style="font-family: monospace;">
l   
1    00001
style="font-family: monospace;">
p   
1    01100
style="font-family: monospace;">
r   
1    01101
style="font-family: monospace;">
t   
1    10000
style="font-family: monospace;">
u   
1    10001
style="font-family: monospace;">
x   
1    11110
style="font-family: monospace;">
c   
1    111110
style="font-family: monospace;">
d   
1    111111
style="font-family: monospace;">






ROUND-TRIPPING style="font-family: monospace;">
============== style="font-family: monospace;">
I have generated a
random sequence of 500 symbols to the given weights.
style="font-family: monospace;">
If I use a binary
count to encode each of the 19 symbols I would need
style="font-family: monospace;">
500 * 5 = 2500 bits
to encode the sequence.
style="font-family: monospace;">
Using the Huffman
code, I need only 2012 bits.
style="font-family: monospace;">


Comparing the
decoded sequence with the original I get: True
style="font-family: monospace;">





Note: The purpose of the program is to teach me more about Huffman
coding and is not an exercise in speed!



I am the author of all the Python on this page. The diagram is from
Wikipedia. Please refrain from passing-off my code as your own (that
one is mainly for students).

Saturday, March 21, 2009

Batch Process Runner in bash shell - Forgotten Enhancements

A comment on an href="http://paddy3118.blogspot.com/2007/12/batch-process-runner-in-bash-shell.html">earlier
post caused me to dig out this enhancement to my original
batch process runner.



What you do is create a file of one-liner commands that you would enter
at a command prompt, for example, this is file style="font-weight: bold;">process_list.txt :


1  color="#0000ff"># (Comments have '#' at the left margin)
color="#804040">2 sleep 4; echo slept for 4 at line 2; csdf_sdf_sd; exit 19
color="#804040">3 sleep 1; cause-an-error; echo slept for 1 at line 3
color="#804040">4 sleep 3; echo color="#ff00ff">'slept for 3 at line 4'
color="#804040">5 sleep 7; echo color="#ff00ff">"slept for 7 at line 5"
color="#804040">6 # Whee!
color="#804040">7 sleep 9; another-erro; echo slept for 9 at line 7
color="#804040">8 sleep 5; echo color="#ff00ff">'slept "for 5" at line 8'


A bit of warning, bash returns the exit code of this laast command it
executes, earlier exit codes won't be seen.



Give my script the number of processes to run in parallel, N, followed
by the above file, and it will create N '.job'  files out of
the non-comment lines of process_list.txt and execute them as
background jobs, with their output sent to '.n' files.:


bash$ ./process_list_runner.sh 2 process_list.txt

## STARTING 6 Processes from file: process_list.txt, 2 at a time with id plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP

# 2 Jobs in background. 6/6 started. 2009-03-21-05:36:36

## FINISHED, (stats in plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.csv)

bash$ style="font-weight: bold;">ls -1 plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.*
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.1
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.2
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.3
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.4
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.5
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.6
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.csv
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.job_0
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP.job_1
bash$




The script does what it can to create a csv file of run stats, which,
with a bit of tidying up in calc produces the following:




frame="void" rules="none">
width="340"> width="110"> width="115"> width="73">























































































































width="1409">##
STARTING 6 Processes from file: process_list.txt
size="4">2 at a time with id
plr_HPDV8025EA_2009-03-21-05_36_19_PADDYS-HPLAPTOP
























align="center" bgcolor="#ccffff" height="123"
valign="middle">JOB
align="right" bgcolor="#ccffff" valign="middle">LINE align="center" bgcolor="#ccffff" valign="middle">Command
being timed
align="center" bgcolor="#ccffff" valign="middle">User
time (seconds)
align="center" bgcolor="#ccffff" valign="middle">System
time (seconds)
align="center" bgcolor="#ccffff" valign="middle">Percent
of CPU this job got
align="center" bgcolor="#ccffff" valign="middle">Elapsed
(wall clock) time (h:mm:ss or m:ss)
align="center" bgcolor="#ccffff" valign="middle">Maximum
resident set size (kbytes)
align="center" bgcolor="#ccffff" valign="middle">Major
(requiring I/O) page faults
align="center" bgcolor="#ccffff" valign="middle">Signals
delivered
align="center" bgcolor="#ccffff" valign="middle">Page
size (bytes)
align="center" bgcolor="#ccffff" valign="middle">Exit
status
sdnum="2057;" align="center" bgcolor="#ffffcc"
height="18">1
sdnum="2057;" align="center" bgcolor="#ffffcc">2 bgcolor="#ffffcc">sleep 4; echo slept for 4 at line 2;
csdf_sdf_sd; exit 19
sdnum="2057;" align="center" bgcolor="#ffffcc">0.13 sdnum="2057;" align="center" bgcolor="#ffffcc">0.15 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">6.00% sdval="0.0000503472222222222" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ffffcc">00:04.35
sdval="760832" sdnum="2057;" align="center"
bgcolor="#ffffcc">760832
sdnum="2057;" align="center" bgcolor="#ffffcc">3277 sdnum="2057;" align="center" bgcolor="#ffffcc">6 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ffffcc">65536
sdnum="2057;" align="center" bgcolor="#ffffcc">19
sdnum="2057;" align="center" bgcolor="#ccccff"
height="17">2
sdnum="2057;" align="center" bgcolor="#ccccff">3 bgcolor="#ccccff">sleep 1; cause-an-error; echo slept for 1
at line 3
sdnum="2057;" align="center" bgcolor="#ccccff">0.12 sdnum="2057;" align="center" bgcolor="#ccccff">0.09 sdnum="2057;0;0.00%" align="center" bgcolor="#ccccff">16.00% sdval="0.0000152777777777778" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ccccff">00:01.32
sdval="761088" sdnum="2057;" align="center"
bgcolor="#ccccff">761088
sdnum="2057;" align="center" bgcolor="#ccccff">3273 sdnum="2057;" align="center" bgcolor="#ccccff">6 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ccccff">65536
sdnum="2057;" align="center" bgcolor="#ccccff">0
sdnum="2057;" align="center" bgcolor="#ffffcc"
height="17">3
sdnum="2057;" align="center" bgcolor="#ffffcc">4 bgcolor="#ffffcc">sleep 3; echo 'slept for 3 at line 4' sdnum="2057;" align="center" bgcolor="#ffffcc">0.07 sdnum="2057;" align="center" bgcolor="#ffffcc">0.06 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">4.00% sdval="0.0000377314814814815" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ffffcc">00:03.26
sdval="617216" sdnum="2057;" align="center"
bgcolor="#ffffcc">617216
sdnum="2057;" align="center" bgcolor="#ffffcc">2639 sdnum="2057;" align="center" bgcolor="#ffffcc">3 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ffffcc">65536
sdnum="2057;" align="center" bgcolor="#ffffcc">0
sdnum="2057;" align="center" bgcolor="#ccccff"
height="17">4
sdnum="2057;" align="center" bgcolor="#ccccff">5 bgcolor="#ccccff">sleep 7; echo "slept for 7 at line 5" sdnum="2057;" align="center" bgcolor="#ccccff">0.07 sdnum="2057;" align="center" bgcolor="#ccccff">0.07 sdnum="2057;0;0.00%" align="center" bgcolor="#ccccff">2.00% sdval="0.000084837962962963" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ccccff">00:07.33
sdval="617216" sdnum="2057;" align="center"
bgcolor="#ccccff">617216
sdnum="2057;" align="center" bgcolor="#ccccff">2644 sdnum="2057;" align="center" bgcolor="#ccccff">3 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ccccff">65536
sdnum="2057;" align="center" bgcolor="#ccccff">0
sdnum="2057;" align="center" bgcolor="#ffffcc"
height="18">5
sdnum="2057;" align="center" bgcolor="#ffffcc">7 bgcolor="#ffffcc">sleep 9; another-erro; echo slept for 9 at
line 7
sdnum="2057;" align="center" bgcolor="#ffffcc">0.09 sdnum="2057;" align="center" bgcolor="#ffffcc">0.13 sdnum="2057;0;0.00%" align="center" bgcolor="#ffffcc">2.00% sdval="0.000106597222222222" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ffffcc">00:09.21
sdval="760064" sdnum="2057;" align="center"
bgcolor="#ffffcc">760064
sdnum="2057;" align="center" bgcolor="#ffffcc">3274 sdnum="2057;" align="center" bgcolor="#ffffcc">6 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ffffcc">65536
sdnum="2057;" align="center" bgcolor="#ffffcc">0
sdnum="2057;" align="center" bgcolor="#ccccff"
height="17">6
sdnum="2057;" align="center" bgcolor="#ccccff">8 bgcolor="#ccccff">sleep 5; echo 'slept "for 5" at line 8' sdnum="2057;" align="center" bgcolor="#ccccff">0.1 sdnum="2057;" align="center" bgcolor="#ccccff">0.06 sdnum="2057;" align="center" bgcolor="#ccccff">0.03 sdval="0.0000622685185185185" sdnum="2057;0;MM:SS.00"
align="center" bgcolor="#ccccff">00:05.38
sdval="617216" sdnum="2057;" align="center"
bgcolor="#ccccff">617216
sdnum="2057;" align="center" bgcolor="#ccccff">2639 sdnum="2057;" align="center" bgcolor="#ccccff">3 sdval="65536" sdnum="2057;" align="center"
bgcolor="#ccccff">65536
sdnum="2057;" align="center" bgcolor="#ccccff">0




The script itself is:


 1  color="#0000ff">#!/bin/bash 
color="#804040"> 2 #!/opt/TWWfsw/bin/bash -
color="#804040"> 3
color="#804040"> 4 set color="#008080">-u
color="#804040"> 5
color="#804040"> 6 ##
color="#804040"> 7 ## process_runner.sh <concurrent> <total procs>
color="#804040"> 8 ##
color="#804040"> 9 ## Example script given the maximum number of processes to
color="#804040"> 10 ## run concurrently, c, and the total number of processes, n
color="#804040"> 11 ## runs at most c, processes in the background until all n
color="#804040"> 12 ## Have been run.
color="#804040"> 13 ##
color="#804040"> 14 ## Author Donald 'Paddy' McCarthy Dec. 17 2007
color="#804040"> 15 ##
color="#804040"> 16
color="#804040"> 17 # how many processes to run in parallel
color="#804040"> 18 concurrent= color="#a020f0">$1
color="#804040"> 19 # File of commands
color="#804040"> 20 proclist= color="#804040">" color="#a020f0">$2"
color="#804040"> 21
color="#804040"> 22 ##
color="#804040"> 23 ##
color="#804040"> 24
color="#804040"> 25 # main loop wait time between checking background procs.
color="#804040"> 26 tick= color="#ff00ff">1
color="#804040"> 27 # Unique_id for process files of this run
color="#804040"> 28 unique_id=plr_ color="#6a5acd">`date +${ color="#a020f0">USER// color="#804040">/ color="#a020f0">}_%F-%T_ color="#a020f0">${HOSTNAME color="#804040">// color="#804040">/ color="#a020f0">}`
color="#804040"> 29 unique_id= color="#a020f0">${unique_id color="#804040">//: color="#804040">/_ color="#a020f0">}
color="#804040"> 30
color="#804040"> 31 function read_proclistfile color="#6a5acd">{
color="#804040"> 32 # read processes from file ignoring comment lines
color="#804040"> 33 maxprocs= color="#ff00ff">0
color="#804040"> 34 local -i color="#008080">linenumber= color="#ff00ff">0
color="#804040"> 35 while color="#804040"> color="#804040">read color="#804040">; color="#804040"> color="#804040">do
color="#804040"> 36 (( color="#008080">linenumber+= color="#ff00ff">1))
color="#804040"> 37 # echo [ "${REPLY:0:1}" == "#" ] $maxprocs $REPLY
color="#804040"> 38 if color="#804040">[ color="#804040">" color="#a020f0">${REPLY color="#804040">: color="#ff00ff">0: color="#ff00ff">1} color="#804040">" color="#804040">!= color="#804040">" color="#ff00ff">#" color="#804040">] color="#804040">; color="#804040">then
color="#804040"> 39 ((maxprocs+ color="#804040">= color="#ff00ff">1))
color="#804040"> 40 allprocs color="#804040">[ color="#a020f0">$maxprocs color="#804040">] color="#804040">= color="#804040">" color="#a020f0">$REPLY"
color="#804040"> 41 allprocline color="#804040">[ color="#a020f0">$maxprocs color="#804040">] color="#804040">= color="#a020f0">$linenumber
color="#804040"> 42 # echo $linenumber $maxprocs $REPLY
color="#804040"> 43 fi
color="#804040"> 44 done color="#804040">< color="#a020f0">$proclist
color="#804040"> 45 }
color="#804040"> 46
color="#804040"> 47 read_proclistfile
color="#804040"> 48
color="#804040"> 49 printf color="#804040">" color="#6a5acd">\n## STARTING %i Processes from file: %s, %i at a time with id %s color="#6a5acd">\n\n" \
color="#804040"> 50 $maxprocs color="#a020f0">$proclist color="#a020f0">$concurrent color="#a020f0">$unique_id
color="#804040"> 51
color="#804040"> 52
color="#804040"> 53
color="#804040"> 54 function assemble_job color="#6a5acd">{
color="#804040"> 55 #
color="#804040"> 56 local color="#008080">plr_unique_id= color="#804040">" color="#a020f0">$1"; color="#804040">local color="#008080">plr_ran=" color="#a020f0">$2"; color="#804040">local color="#008080">plr_proc= color="#804040">" color="#a020f0">$3"; color="#804040">local color="#008080">plr_procline= color="#804040">" color="#a020f0">$4"; color="#804040">local color="#008080">plr_proclist= color="#804040">" color="#a020f0">$5"
color="#804040"> 57 cat color="#804040"><<!
color="#804040"> 58
color="#804040"> 59 # color="#a020f0">$plr_unique_id color="#ff00ff"> $plr_ran color="#ff00ff">
color="#804040"> 60 ` color="#804040">local color="#6a5acd">`
color="#804040"> 61
color="#804040"> 62 trap 'error= color="#6a5acd">\$?;printf " color="#6a5acd">\n\n##STATISTICS For Job %i color="#6a5acd">\n" color="#a020f0">$plr_ran; printf "# Line %i of file %s color="#6a5acd">\n\n" color="#a020f0">$plr_procline color="#ff00ff"> $plr_proclist color="#ff00ff">; exit \$ color="#ff00ff">error' EXIT
color="#804040"> 63
color="#804040"> 64
color="#804040"> 65 $plr_proc
color="#804040"> 66
color="#804040"> 67 !
color="#804040"> 68 }
color="#804040"> 69
color="#804040"> 70 function print_runstats color="#6a5acd">{
color="#804040"> 71 printf color="#804040">' color="#ff00ff"># %i Jobs in background. %i/%i started. %s\r color="#804040">' \
color="#804040"> 72 ` color="#804040">jobs color="#6a5acd"> -r color="#804040">| color="#6a5acd">wc -l` $ran color="#a020f0">$maxprocs color="#804040">" color="#6a5acd">`date +%F-%T` color="#804040">"
color="#804040"> 73 }
color="#804040"> 74
color="#804040"> 75 # Bash array running keeps track of the background process numbers
color="#804040"> 76 # Start with nothing running (sentinel value will not be a process number
color="#804040"> 77 for color="#6a5acd">((i= color="#ff00ff">0; i color="#804040">< color="#a020f0">$concurrent color="#804040">; i+ color="#804040">= color="#ff00ff">1 )) color="#804040">; color="#804040">do running color="#804040">[ color="#a020f0">$i] color="#804040">= color="#ff00ff">123456789 color="#804040">; color="#804040">done
color="#804040"> 78
color="#804040"> 79 ran= color="#ff00ff">0
color="#804040"> 80 until
color="#804040"> 81 while color="#804040">[ color="#a020f0">$ran -lt color="#a020f0">$maxprocs color="#804040">] color="#804040">; color="#804040">do
color="#804040"> 82 for color="#6a5acd">((p= color="#ff00ff">0; p color="#804040">< color="#a020f0">$concurrent color="#804040">; p+ color="#804040">= color="#ff00ff">1 )) color="#804040">; color="#804040">do
color="#804040"> 83 proc= color="#a020f0">${running color="#a020f0">[$p color="#a020f0">]}
color="#804040"> 84 # Over all running processes...
color="#804040"> 85 # $proc still running?
color="#804040"> 86 ps -p color="#a020f0">$proc | color="#804040">fgrep color="#a020f0">$proc >/dev/null
color="#804040"> 87 if color="#804040">[ color="#a020f0">$? -ne color="#804040">' color="#ff00ff">0' color="#804040">] color="#804040">; color="#804040">then
color="#804040"> 88 # Not found i.e. its finished
color="#804040"> 89 # So start the next
color="#804040"> 90 ((ran+ color="#804040">= color="#ff00ff">1))
color="#804040"> 91 #(assemble_job ) >$unique_id.job_$p
color="#804040"> 92 ( assemble_job color="#804040">" color="#a020f0">$unique_id color="#804040">" color="#804040">" color="#a020f0">$ran" color="#804040">" color="#a020f0">${allprocs color="#a020f0">[$ran color="#a020f0">]} color="#804040">" \
color="#804040"> 93 ${ color="#a020f0">allprocline color="#a020f0">[$ran color="#a020f0">]} color="#a020f0">$proclist color="#804040">) color="#804040">> color="#a020f0">$unique_id.job_ color="#a020f0">$p
color="#804040"> 94 chmod +x color="#a020f0">$unique_id.job_ color="#a020f0">$p
color="#804040"> 95 # run one job in background collecting run stats
color="#804040"> 96 (/bin/ color="#804040">time -v -a -o color="#a020f0">$unique_id. color="#a020f0">$ran < /dev/null color="#a020f0">$unique_id.job_ color="#a020f0">$p 2>&1 color="#804040">) color="#804040">> color="#a020f0">$unique_id. color="#a020f0">$ran &
color="#804040"> 97 running color="#804040">[ color="#a020f0">$p] color="#804040">= color="#a020f0">$!
color="#804040"> 98 runningprocnum color="#804040">[ color="#a020f0">$p] color="#804040">= color="#a020f0">$ran
color="#804040"> 99 if color="#804040">[ color="#a020f0">$ran -ge color="#a020f0">$maxprocs color="#804040">] color="#804040">; color="#804040">then color="#804040">break color="#ff00ff">1; color="#804040">fi
color="#804040">100 #exit
color="#804040">101 fi
color="#804040">102 done
color="#804040">103 sleep color="#a020f0">$tick
color="#804040">104 # Status
color="#804040">105 print_runstats
color="#804040">106 done
color="#804040">107
color="#804040">108 ## break 1 gets us here!
color="#804040">109
color="#804040">110 # Keep on printing status while there are background processes
color="#804040">111 # even though there are no more to start.
color="#804040">112 sleep color="#a020f0">$tick
color="#804040">113 # Status
color="#804040">114 print_runstats
color="#804040">115 do color="#804040">[ color="#6a5acd">`jobs color="#6a5acd"> -r color="#804040">| color="#6a5acd">wc -l` -eq color="#ff00ff">0 ]
color="#804040">116 done
color="#804040">117 wait
color="#804040">118
color="#804040">119 function stats2csv color="#6a5acd">{
color="#804040">120 # Gather overall stats in csv file
color="#804040">121
color="#804040">122 # CSV Heading
color="#804040">123 printf color="#804040">" color="#6a5acd">\n## STARTING %i Processes from file: %s color="#6a5acd">\n%i at a time with id %s color="#6a5acd">\n\n" \
color="#804040">124 $maxprocs color="#a020f0">$proclist color="#a020f0">$concurrent color="#a020f0">$unique_id color="#804040">> color="#a020f0">$unique_id.csv
color="#804040">125 # CSV Column labels
color="#804040">126 gawk color="#804040">' color="#ff00ff">/^##STATISTICS For Job/{s++}
color="#804040">127 s&&/^# Line .*of file/{printf "JOB,LINE";next}
color="#804040">128 s&&/: /{l=$0;sub(/: .*/,"");$1=$1;printf",%s",$0}
color="#804040">129 s&&l~/Exit status:/{s=0;print"";l="";exit}
color="#804040">130 color="#804040">' color="#a020f0">$unique_id. color="#804040">[ color="#ff00ff">0-9]* color="#804040">>> color="#a020f0">$unique_id.csv
color="#804040">131
color="#804040">132 # Stick allprocs into env for later gawk substitution of command
color="#804040">133 for i color="#804040">in color="#a020f0">${!allprocs color="#a020f0">[@] color="#a020f0">}; color="#804040">do
color="#804040">134 declare color="#008080">-x _ap color="#a020f0">$i= color="#804040">" color="#a020f0">${allprocs color="#a020f0">[$i color="#a020f0">]} color="#804040">"
color="#804040">135 done
color="#804040">136 #env|grep '_ap.=' ; debug
color="#804040">137 # CSV Rows
color="#804040">138 gawk color="#804040">' color="#ff00ff">/^##STATISTICS For Job/{s++;j=$NF}
color="#804040">139 s&&/^# Line .*of file/{printf "%s,%s",j,$3;next}
color="#804040">140 s&&/Command [^:]*: /{printf",%s",ENVIRON["_ap" j]; next}
color="#804040">141 s&&/: /{l=$0;sub(/.*: /,"");printf",%s",$0}
color="#804040">142 s&&l~/Exit status:/{s=0;print"";l="";nextfile}
color="#804040">143 color="#804040">' color="#a020f0">$unique_id. color="#804040">[ color="#ff00ff">0-9]* color="#804040">>> color="#a020f0">$unique_id.csv
color="#804040">144 }
color="#804040">145
color="#804040">146 stats2csv
color="#804040">147
color="#804040">148 printf color="#804040">" color="#6a5acd">\n## FINISHED, (stats in %s) color="#6a5acd">\n\n" color="#a020f0">$unique_id.csv
color="#804040">149
color="#804040">150 exit color="#ff00ff">0
color="#804040">151
color="#804040">152
color="#804040">153
color="#804040">154


I remember writing this with the help of the advanced bash tutorial
which is great, and looking back at this program, I would need the
tutorials help if I needed to extend it!



Why not in Python?


 Of course I thought of doing it in Python, but bash's job
control features were used to flesh out the initial idea and it grew to
become the finished article. Looking back at this program as it's over
a year old, I find that I like bash's job control, and gawk one-liners
 (but not the readability) :-)



- Paddy.



href="http://kompozer.net/"> alt="Document made with KompoZer"
src="http://kompozer.sourceforge.net/images/kompozer_80x15.png"
border="0">

Wednesday, March 04, 2009

Psion Netbook

I was just reading about Intel's claim that the term Netbook was generic. It seems to be bullying to me.



So, in support of 'our Psion' I got out my old Psion Series 3 for nostalgias sake, and again marvelled at its design.



P.S. Offers of a complementary Netbook would be gratefully received :-)



- Paddy.

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive