Go deh!

Sunday, May 31, 2009

Zed and community

Who is Zed?

Zed it seems knows about his blogging style: asking questions
as statements;and ranting.but
it seems that he has written good code for the Rails community in the
past and has earned some peoples respect. Now he is a new member of the
Python community I would hope that current members realize his style
and try not to emulate it.

Zed may be here for a long stay, or gone tomorrow, but I wouldn't want
the lasting effect to be a rise in the abrasiveness of the Python
community, or an unthinking
abandonment of community practices. Even if Zed does produce
Pythons Holy Grail, we have to remember that that is software.
Community matters, and we have to work with the
Zeds, the Alex's, the Xah's, the Tim's without being seen to sink to
the lowest common denominator.

Saturday, May 23, 2009

Pipe Fitting with Python Generators

I was investigating the Hofstadter-Conway href="http://mathworld.wolfram.com/Hofstadter-Conway10000-DollarSequence.html"
target="_blank">$10,000 sequence which, for
mathematics, has a very href="http://www.nytimes.com/1988/08/30/science/intellectual-duel-brash-challenge-swift-response.html"
target="_blank"> interesting story. To
'follow the money', you have to generate the sequence of values and
calculate maxima and minima of the sequence values so I decided to
create a generator of the sequence values and pass it through filters
that would emit local maxima/minima. I ended up with something like:

high_generator =
maxima(sequence_generator())

highs = [high_generator.next() for i in range(1000)]

Now I just hated the nested call of generators. If it were a Unix
shell, I would have hoped to have written:

sequence_generator |
maxima |head 1000 > highs

Notice how the data flows from source at the left to sink at the right.
I like that.

I spent some time trying to use classes and redefine bitwise or for
class instances, but it was looking messy and my heart wasn't really in
a class based solution.

I left it for a week. And in that week I remembered seeing a function
called pipe being used to pipe things so decided to write my own pipe
function for combining generators in a more natural way.

Pipe

What I wanted was a function called pipe, that when called with several
generators, returned a nested call of all its arguments, with the
leftmost argument being in the center of the nest. For example, a call
of:

pipe(a,b,c,d)

Would return the equivalent of:

d(c(b(a)))

The eventual definition turned out to be quite straight-forward:

 color="#804040">def  color="#008080">pipe(*cmds):
    ''' color="#ff00ff"> pipe(a,b,c,d, ...) -> yield from ...d(c(b(a())))
 color="#ff00ff">    '''
    gen = cmds[0]
     color="#804040">for cmd  color="#804040">in cmds[1:]:
        gen = cmd(gen)
     color="#804040">for x  color="#804040">in gen:
         color="#804040">yield x

Pipe in use

Lets create some simple generators; style="font-family: monospace;">count is just
itertools.count; sqr
yields the square of items on its input iterator

from itertools  color="#a020f0">import count

 color="#804040">def  color="#008080">sqr(x):
 color="#804040">    for val  color="#804040">in x:
 color="#804040">        yield val*val

I can now generate the squares of integers:

>>> p = pipe(count(), sqr)
>>> [p.next()  color="#804040">for i  color="#804040">in range(5)]
[0, 1, 4, 9, 16]

But I need a more complex definition for head as it needs to have a
parameter of the maximum number of items to pass through.

def  color="#008080">head(n):
    ''' color="#ff00ff"> return a generator that passes through at most n items
 color="#ff00ff">    '''
     color="#804040">def  color="#008080">head(x):
        i = n
         color="#804040">for val  color="#804040">in x:
             color="#804040">if i>0:
                i -= 1
                 color="#804040">yield val
             color="#804040">else:
                 color="#804040">raise StopIteration
     color="#804040">return head

And now with head I can more naturally express nested generator
functions as I can stop an infinite generator stream easily for example:

>>> list(pipe(count(), sqr, head(4)))
[0, 1, 4, 9]
>>>  color="#804040">for i  color="#804040">in pipe(count(), sqr, head(10)):
     color="#804040">print i,

0 1 4 9 16 25 36 49 64 81

Generalized function generator and filter

I thought it might be handy to have a function that skipped an input if
the prefilter(input) was false; applied a given function, fn(input) to
the input stream and passed on fn(input) to the output if
postfilter(fn(input)) was true. Think of a stream of data. You can
filter it, apply a function to it, or filter after applying the
function - all via this pipe filter. I called it fnf for function
filter:

def  color="#008080">fnf(fn, prefilter=None, postfilter=None):
    ''' color="#ff00ff"> functionfilter:
 color="#ff00ff">            for s in x:
 color="#ff00ff">                yield fn(s) subject to prefilter(s), postfilter(fn(s)) returning True if present,
 color="#ff00ff">                or skip this s
 color="#ff00ff">    '''
     color="#804040">def  color="#008080">_fnf(x):
         color="#804040">for s  color="#804040">in x:
             color="#804040">if prefilter  color="#804040">is None  color="#804040">or prefilter(s):
                fns = fn(s)
                 color="#804040">if postfilter  color="#804040">is None  color="#804040">or postfilter(fns):
                     color="#804040">yield fns
    _fnf.__doc__ = ' color="#ff00ff">_fnf = fnf(%s, %s, %s)' %(fn, prefilter, postfilter)
     color="#804040">return _fnf

I'll show fnf in action.

>>> # turns the input stream of numbers, x into a stream of tuples x,x/2.0
>>>  color="#804040">def  color="#008080">n_nby2(x):  color="#804040">return x, x/2.0

>>> list(pipe( count(), sqr, head(5),  style="font-weight: bold; text-decoration: underline;">fnf(n_nby2) ))
[(0, 0.0), (1, 0.5), (4, 2.0), (9, 4.5), (16, 8.0)]
>>>  color="#0000ff"># Lets create a prefilter
>>>  color="#804040">def  color="#008080">even(x):  color="#804040">return x%2 == 0

>>> list(pipe( count(), sqr, head(5), fnf(n_nby2, style="font-weight: bold; text-decoration: underline;">even) ))
[(0, 0.0), (4, 2.0), (16, 8.0)]
>>>  color="#0000ff"># And now a postfilter which must work on the result of applying fn
>>>  color="#804040">def  color="#008080">frac(x):  color="#804040">return int(x[1]) != x[1]

>>> list(pipe( count(), sqr, head(5), fnf(n_nby2, style="font-style: italic; font-weight: bold;">None, style="font-weight: bold; text-decoration: underline;">frac) ))
[(1, 0.5), (9, 4.5)]
>>>  color="#0000ff"># Note the presence of  style="font-weight: bold; font-style: italic;">None in the above call as we don't use a prefilter

Thats it for now with the pipe fitting. I do have a tee function for
printing intermediate results, but it's style="text-decoration: line-through;">late
early and I'm tired....

- Paddy.

Wednesday, May 13, 2009

Extended unpacking in Python 3.0

After reading about href="http://www.rosettacode.org/wiki/Pattern_Matching"
target="_blank">pattern matching on Rosetta Code,
I wondered, (and still am wondering), if it has anything to do with
list unpacking in Python, which lead me to do some simple explorations
of the target="_blank">extended iterable unpacking that
is new in Python 3.0.

Introducing a star

The feature builds on the pre-existing capability of being able to
unpack a list or tuple into multiple names in Python 2.x, so in Python
2.x you
could do:

>>> import platform
>>> platform.python_version()
' color="#ff00ff">2.6.1'
>>> lst = [1,2,3]
>>> a,b,c = lst
>>> print a,b,c
1 2 3
>>> tpl = (1,2,3)
>>> a,b,c = tpl
>>> print a,b,c
1 2 3
>>> iter = (x for x in range(4))
>>> iter
<generator object <genexpr> at 0x01300468>
>>> a,b,c,d = iter
>>> print a,b,c,d
0 1 2 3
>>>  color="#0000ff"># You could also do the (fancy), but non-* stuff too
>>> a,b,(c,d) = (1,2,(3,4))
>>>  color="#804040">print a,b,c,d
1 2 3 4
>>>  color="#0000ff"># But not
>>> a,*b = (1,2,3,4)
SyntaxError: invalid syntax
>>>

In Python 3.0 you get all of the above plus the ability to 'soak up'
the rest of the values using style="font-weight: bold;">* (which I pronounce
as 'star' in this
context).

It is similar to how you can use *args in a function definition to
assign extra positional arguments to a name. In an assignment however
it means 'take
whatever's left of the what is being assigned and assign it to me as a
list'.

Here are some examples:

>>> import platform
>>> platform.python_version()
' color="#ff00ff">3.0.1'
>>> a,*b = [1,2,3,4]
>>>  color="#804040">print(a,b, sep=' color="#ff00ff">    ')
1    [2, 3, 4]
>>> a = [1,2,3,4]
>>>  color="#0000ff"># Note the comma after the *b to make it a tuple
>>> *b, = [1,2,3,4]
>>> a == b
True
>>>  color="#0000ff"># Only one * per iterable (or sub-iterable) being assigned
>>> *a,*b = [1,2,3,4]
 style="color: red;">SyntaxError: two starred expressions  style="color: red;" color="#804040">in style="color: red;"> assignment (<pyshell style="color: red;" color="#0000ff">#87>, line 1)
>>>
>>>  color="#0000ff"># but note:
>>> a,*b,(c,*d) = [1,2,3,[4,5,6]]
>>>  color="#804040">print(a,b,c,d, sep=' color="#ff00ff">    ')
1    [2, 3]    4    [5, 6]
>>>  color="#0000ff"># Notice that  style="font-weight: bold;">*  style="text-decoration: underline;">always creates a list
>>> *a, = (1,2,3)
>>> a
 style="font-weight: bold; color: rgb(0, 153, 0);">[1, 2, 3 style="font-weight: bold; color: rgb(0, 153, 0);">]
>>> a,b,(*c,) = (x  color="#804040">if x<2  color="#804040">else range(5)  color="#804040">for x  color="#804040">in range(3))
>>>  color="#804040">print(a,b,c, sep=' color="#ff00ff">    ')
0    1    [0, 1, 2, 3, 4]
>>>
>>>  color="#0000ff"># Although you can have only one  style="font-weight: bold;">* per (sub)iterable, its position is flexible
>>> a,*b,c = [1,2,3,4]
>>> b
[2, 3]
>>> *b,a,c = [1,2,3,4]
>>> b
[1, 2]
>>> a,c,*b = [1,2,3,4]
>>>
>>>  color="#0000ff"># And look!
>>> a,c,*b,d,e = [1,2,3,4]
>>> b
[]
>>> a,b,c,d,e
(1, [], 2, 3, 4)
>>>  color="#0000ff"># So * can accumulate  style="text-decoration: underline;">no values in some cases

I'm looking forward to using this feature in 3.1.

Tuesday, May 12, 2009

Critique of pseudocode explanations of the Closest Pair Algorithm

I watched this
task on Rosetta Code from its inception on the 6^th of
May, but had problems understanding the pseudo-code given, and some
of the stuff I googled too. after a couple of days though I was more
familiar with attempts at O(nlog(n)) divide-n-conquer
algorithms having seen a few and happened to find this
explanation (.ppt file) where it seemed to click for me.

After problems with the other references I thought I might go
through and identify what I found problematic with them.

The Rosetta Code Pseudo-code:

closestPair of P(1), P(2), ... P(N)
if N ≤ 3 then
  return closest points of P using brute-force algorithm
else
  xP ← P ordered by the x coordinate, in ascending order
  P_L ← points of xP from 1 to ⌈N/2⌉
  P_R ← points of xP from ⌈N/2⌉+1 to N
  (d_L, pair_L) ← closestPair of P_L

  (d_R, pair_R) ← closestPair of P_R
  (d_min, pair_min) ← (d_R, pair_R)
  if d_L < d_R then

    (d_min, pair_min) ← (d_L, pair_L)
  endif
  x_M ← xP(⌈N/2⌉)_x

  S ← { p ∈ xP : |x_M - p_x| < d_min }
  yP ← S ordered by the y coordinate, in ascending order
  nP ← number of points in yP
  (closest, closestPair) ← (d_min, pair_min)
  for i from 1 to nP - 1
    k ← i + 1
    while k ≤ nP and yP(k)_y - yP(k)_y < d_min

      if |yP(k) - yP(i)| < closest then
        (closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)})
      endif
      k ← k + 1
    endwhile
  return closest, closestPair

endif

The author had noted that the above could have problems so I was
warned, but still started by trying to follow the above. I was
blindly following the pseudo-code until I got the the statement: x_M
← xP(⌈N/2⌉)_x which I couldn't understand.

This prompted my wider search for a better explanation. After
finding it (linked in the first paragraph), I modified my Python
program to follow the new explanation.

The RC references

There were several references quoted.

The Wikipedia article
Did not give pseudo-code
Closest
Pair (McGill)
Section 3.4 on improving their previous
O(nlog²n) into an O(nlogn)
algorithm I found to be very vague. Just how do you “returning
the points in each set in sorted order by y-coordinate “
without doing an nlogn sort leading again to an O(nlog²n)
solution?
Closest
Pair (UCBS)
This seems to concur with my preferred reference
but is given at a higher level than pseudo-code, and so would be
harder to implement from the description given.
Closest
pair (WUStL) by Dr. Jeremy Buhler
Their seems to be a
problem with having N, the number of points as an argument of the
function. What if N is odd? The pseudo-code seems to rely on other
material probably presented in a class and is at a higher level than
my preferred solution as it seems to be set as an exercise for
students.

My Python Solution

The only tests done is to generate random points and compare the
brute force algorithms solution with that of the divide and conquer.

'''

  Compute nearest pair of points using two algorithms
  
  First algorithm is 'brute force' comparison of every possible pair.
  Second, 'divide and conquer', is based on:
    www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt 
'''

from random import randint

infinity = float('inf')


# Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)

def bruteForceClosestPair(point):
    numPoints = len(point)
    if numPoints < 2:
        return infinity, (None, None)
    return min( ((abs(point[i] - point[j]), (point[i], point[j]))
                 for i in range(numPoints-1)
                 for j in range(i+1,numPoints)),
                key=lambda x: x[0])


def closestPair(point):
    xP = sorted(point, key= lambda p: p.real)
    yP = sorted(point, key= lambda p: p.imag)
    return _closestPair(xP, yP)

def _closestPair(xP, yP):
    numPoints = len(xP)
    if numPoints <= 3:
        return bruteForceClosestPair(xP)
    Pl = xP[:numPoints/2]
    Pr = xP[numPoints/2:]
    Yl, Yr = [], []
    xDivider = Pl[-1].real
    for p in yP:
        if p.real <= xDivider:
            Yl.append(p)
        else:
            Yr.append(p)
    dl, pairl = _closestPair(Pl, Yl)
    dr, pairr = _closestPair(Pr, Yr)
    dm, pairm = (dl, pairl) if dl < dr else (dr, pairr)
    # Points within dm of xDivider sorted by Y coord

    closeY = [p for p in yP  if abs(p.real - xDivider) < dm]
    numCloseY = len(closeY)
    if numCloseY > 1:
        # There is a proof that you only need compare a max of 7 other points

        closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j]))
                         for i in range(numCloseY-1)
                         for j in range(i+1,min(i+8, numCloseY))),
                        key=lambda x: x[0])
        return (dm, pairm) if dm <= closestY[0] else closestY
    else:
        return dm, pairm


def times():
    ''' Time the different functions
    '''
    import timeit

    functions = [bruteForceClosestPair, closestPair]
    for f in functions:
        print 'Time for', f.__name__, timeit.Timer(
            '%s(pointList)' % f.__name__,
            'from closestpair import %s, pointList' % f.__name__).timeit(number=1)



pointList = [randint(0,1000)+1j*randint(0,1000) for i in range(2000)]


if __name__ == '__main__':
    pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
    print pointList
    print '  bruteForceClosestPair:', bruteForceClosestPair(pointList)
    print '            closestPair:', closestPair(pointList)
    for i in range(10):
        pointList = [randint(0,10)+1j*randint(0,10) for i in range(10)]
        print '\n', pointList
        print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)
        print '           closestPair:', closestPair(pointList)
    print '\n'
    times()
    times()
    times()

Sample output:

[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
  bruteForceClosestPair: (1.0, ((8+4j), (7+4j)))
            closestPair: (1.0, ((8+4j), (7+4j)))

[(6+8j), (4+1j), 7j, (4+10j), (10+6j), (1+9j), (10+10j), (5+0j), (3+1j), (6+4j)]
 bruteForceClosestPair: (1.0, ((4+1j), (3+1j)))
           closestPair: (1.0, ((3+1j), (4+1j)))

[(5+5j), (10+9j), 3j, (6+1j), (9+5j), 1j, (8+0j), 8j, (3+2j), (10+4j)]
 bruteForceClosestPair: (1.4142135623730951, ((9+5j), (10+4j)))
           closestPair: (1.4142135623730951, ((9+5j), (10+4j)))

[(9+2j), (1+2j), 5j, (10+0j), (6+6j), (6+4j), (5+1j), (2+5j), (8+6j), (3+2j)]
 bruteForceClosestPair: (2.0, ((1+2j), (3+2j)))
           closestPair: (2.0, ((6+6j), (6+4j)))

[(2+3j), (9+0j), (9+7j), 8j, (4+5j), (6+7j), (6+2j), (3+3j), (3+10j), (7+5j)]
 bruteForceClosestPair: (1.0, ((2+3j), (3+3j)))
           closestPair: (1.0, ((2+3j), (3+3j)))

[(7+2j), (2+10j), (3+7j), (8+6j), (2+1j), 3j, (8+5j), (6+9j), (1+0j), 0j]
 bruteForceClosestPair: (1.0, ((8+6j), (8+5j)))
           closestPair: (1.0, ((8+6j), (8+5j)))

[(10+7j), (6+3j), (2+7j), (3+6j), (9+3j), (8+5j), (5+7j), (3+10j), 5j, (3+7j)]
 bruteForceClosestPair: (1.0, ((2+7j), (3+7j)))
           closestPair: (1.0, ((3+6j), (3+7j)))

[(4+6j), (5+4j), 2j, (7+0j), 4j, (6+5j), (7+5j), (8+9j), (10+5j), (8+2j)]
 bruteForceClosestPair: (1.0, ((6+5j), (7+5j)))
           closestPair: (1.0, ((6+5j), (7+5j)))

[(4+0j), (7+9j), 7j, 8j, (7+1j), (2+5j), (7+3j), (1+3j), (3+9j), (10+7j)]
 bruteForceClosestPair: (1.0, (7j, 8j))
           closestPair: (1.0, (7j, 8j))

[(9+1j), (5+0j), (8+3j), (6+9j), (3+7j), (4+6j), (7+6j), (8+6j), (10+8j), (2+5j)]
 bruteForceClosestPair: (1.0, ((7+6j), (8+6j)))
           closestPair: (1.0, ((7+6j), (8+6j)))

[(2+2j), (10+8j), (3+0j), (8+0j), (1+1j), (6+5j), 10j, (6+8j), (10+4j), (4+10j)]
 bruteForceClosestPair: (1.4142135623730951, ((2+2j), (1+1j)))
           closestPair: (1.4142135623730951, ((1+1j), (2+2j)))


Time for bruteForceClosestPair 4.59288093878
Time for closestPair 0.120782948671
Time for bruteForceClosestPair 5.00096353028
Time for closestPair 0.120047939054
Time for bruteForceClosestPair 4.86709875138
Time for closestPair 0.123363723602

Note how much slower the brute force algorithm is for only 2000
points.

(Oh, I guess I should add that the above is all my own work –
students might want to say the same).

Sunday, May 10, 2009

On "RailsConf: What killed Smalltalk could kill Ruby, too"

Browsing Reddit introduced me to RailsConf: What killed Smalltalk could kill Ruby, too The keynote speech by Robert Martin. I found it to be quite interesting as I had been monitoring odd posts on the Rails community and its general attitude.

Some of the points made by Robert on why Smalltalk contracted in popularity included:

It was too easy to make a mess.
And the mess was found late in the project timeline, leading to failed projects.
Smalltakers didn't want to deal with the boring 'corporate' tasks.
Leading to the rise of programmers using other languages to earn a corporate living.
When Smalltalk was 'riding high', a tendency to look down on other languages.
This would alienate the general population of programmers.

Robert put in a big, recurring, plug for both TDD and refactoring IDEs as ways to keep Ruby alive.

One comment he made was that he thought that in the dynamic vs static typing, language wars: Dynamic won and cited as evidence that most mainstream 'static' languages such as Java, C++ and C# have ways to do dynamic type checking – although badly. I wouldn't go that far, as I suspect that little of the mindset of programming in a dynamic language such as Python would be adopted by a C# or Java programmer. I suspect that they will use the dynamism available to them as a last resort as part of a design pattern to solve a particular sub-problem.

Sunday, May 03, 2009

Tech Issues For The Euro Elections

Elections for members of the European Parliament are href="http://news.bbc.co.uk/1/hi/world/europe/7819889.stm"
target="_blank">coming up soon. At the moment, it
seems certain companies are lobbying to have href="http://www.theregister.co.uk/2009/05/02/microsoft_open_source_eu_patents/">software
patents extended into Europe which is something I
don't agree with, as on algorithms - how can you distinguish between
the maths and the code? On something like look-and-feel of a GUI - how
can you distinguish between the idea and its implementation? If you see
that person A used a pinching motion of the fingers to
zoom-out centered on the pinch then I don't think that should stop
person B from coding his own implementation using different code. I do
think that the screen technology that allows a particular
implementation of multi-touch should be patentable however.

There is another tech issue that I need to talk to prospective
candidates about: their stance on href="http://www.theregister.co.uk/2008/04/04/ooxml_ec_investigation_iso/"
target="_blank"> ODF vs OOX ML, and more widely:
their stance on the promotion of the use of open style="text-decoration: underline;">standards
within government.

Can't think of any more tech issues at the moment but if you can think
of any...

P.S. Happy birthday to me. Happy birthday to me,...

Wednesday, April 29, 2009

Fight for the community you want

Someone took offence at a puerile and sexist presentation given at what is supposed to be a technical event in the Rails community. Good on them. The community is what you make of it.

"Using sex to sell spanners" doesn't make me think they are good spanners.

- Paddy.

Tuesday, April 21, 2009

Monitoring a linux process as it reads files

I am processing 20 thousand files with some proprietary software and
needed to monitor how far it got in reading the files. In my own Python
version of the utility the reading of data was ten times faster than
the subsequent processing and i wanted to find out if this proprietary
solution, which was havinfg performance problems , was equally spending
most of its time reading data.

The proprietary program took a list of 20000 files to process as its
first argument and I remembered that on Linux, the /proc
directory had info on running processes and sure enough , the
/proc/<process id>/fd directory had info on all the file
descriptors currently open by the process as links. So by opening the
list of files to in my editor and searching within it for the file name
shown on one of the file descriptors, I could gauge how many
files been read so far.

I decided to automate the checking and wrote a shell script using
cat/fgrep/gawk/... that then told me what line in the list of files to
process the program was currently at.

Now I've had time to refine things to use mainly python but to
demonstrate its use I also have to generate a test environment

First create some test files to process

style="font-family: monospace;">bash$ style="font-weight: bold;">mkdir -p /tmp/test style="font-family: monospace;">
bash$ style="font-weight: bold;">for ((i=0; i < 100; i++))
do touch /tmp/test/file$i ;done style="font-family: monospace;">
bash$ style="font-weight: bold;">/bin/ls /tmp/test/file* >
/tmp/test/all_files.lst style="font-family: monospace;">
bash$ style="font-weight: bold;">head /tmp/test/all_files.lst style="font-family: monospace;">
/tmp/test/file0 style="font-family: monospace;">
/tmp/test/file1 style="font-family: monospace;">
/tmp/test/file10 style="font-family: monospace;">
/tmp/test/file11 style="font-family: monospace;">
/tmp/test/file12 style="font-family: monospace;">
/tmp/test/file13 style="font-family: monospace;">
/tmp/test/file14 style="font-family: monospace;">
/tmp/test/file15 style="font-family: monospace;">
/tmp/test/file16 style="font-family: monospace;">
/tmp/test/file17 style="font-family: monospace;">
bash$

Now lets create a test executable to monitor

This script just holds each file open for reading for twenty seconds
before closing the file.

style="font-family: monospace;">bash$ style="font-weight: bold;">python -c ' style="color: rgb(0, 0, 153);">import sys,time style="font-family: monospace; color: rgb(0, 0, 153);">
for
name in file(sys.argv[1]): style="font-family: monospace; color: rgb(0, 0, 153);">

f = file(name.strip()) style="font-family: monospace; color: rgb(0, 0, 153);">

time.sleep(45) style="font-family: monospace; color: rgb(0, 0, 153);">

f.close()

' style="font-weight: bold;">/tmp/test/all_files.lst
& style="font-family: monospace;">
[2] style="font-weight: bold;"> style="font-family: monospace; color: rgb(0, 0, 153);"> style="font-weight: bold; color: red;">7984 style="font-family: monospace;">
bash$

here is what the fd directory looks like

style="font-family: monospace;">bash$ ls -l /proc/ style="font-family: monospace; color: rgb(0, 0, 153);"> style="font-weight: bold; color: red;">7984 style="font-family: monospace;">/fd style="font-family: monospace;">
total 0 style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 0 -> /dev/tty1 style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 1 -> /dev/tty1 style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 2 -> /dev/tty1 style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 3 -> /tmp/test/all_files.lst style="font-family: monospace;">
lrwxrwxrwx 1 HP
DV8025EA None 0 Apr 21 22:17 4 -> /tmp/test/file0 style="font-family: monospace;">
bash$

And here is a python script to monitor that fd directories
link number 4 periodically

style="font-family: monospace;">bash$ style="font-weight: bold;">python -c ' style="color: rgb(0, 0, 153);">import sys,time,os,datetime style="color: rgb(0, 0, 153);">
name2index =
dict((name.strip(), index) for index,name in
enumerate(file(sys.argv[1]))) style="color: rgb(0, 0, 153);">
all = len(name2index) style="color: rgb(0, 0, 153);">
while True: style="color: rgb(0, 0, 153);">
path =
os.path.realpath("/proc/7984/fd/ style="font-weight: bold; color: red;">4").strip() style="color: rgb(0, 0, 153);">
print
name2index[path],"/",all, path, datetime.datetime.now().isoformat() style="color: rgb(0, 0, 153);">

time.sleep(30)

' /tmp/test/all_files.lst

22 / 100 /tmp/test/file29 2009-04-21T22:34:07.817750

23 / 100 /tmp/test/file3 2009-04-21T22:34:37.820750

24 / 100 /tmp/test/file30 2009-04-21T22:35:07.825750

24 / 100 /tmp/test/file30 2009-04-21T22:35:37.834750

I watched the monitor output over the next couple of hours and found
out when file reading ended and processing of read data started.

END.

Thursday, April 16, 2009

Bimap. Bi-directional mapping/dictionary for Python?

I had a chance encounter with the boost C++ library documentation and
came across a description of their href="http://cablemodem.fibertel.com.ar/mcape/boost/libs/bimap/index.html">bimap
(that is bimap without a 't') .

It seems to be a bi-directional mapping.

In a Python dict, keys map to values. In a bimap, values would also map
to keys. It seems they put keys/values on an equal footing by renaming
them as left and right members, and having methods that work on either
the right to left or the left to right mapping.

I had just re-visited a program of mine that dealt with what I would
consider large sets of values. I was mapping test names to test
coverage where their were tens of trhousands of test names, each of
~100 characters long and for each file, I had many thousands of code
coverage points, which themselves were strings of ~150 characters each.
I new I had to work with a dict mapping test_names to sets of covered
points for each test and do a lot of set arithmatic to rank the tests
in order of contribution to the overall coverage figure, so I decided
to use indices; replacing files by negative numbers and coverage points
by positive numbers. My program works, and is very fast but I note that
I create both a testname2index mapping dict and the inverse
index2testname mapping, (and coverpoint2index and index2coverpoint
mapping dicts).

What I had been constructing, unawares, was a bimap by using two dicts.

I then spent time googling for Python bimaps to no avail, (although I
think there are haskel and Java implementations out there - and Google
'helps' by sugesting you meant bitmap with a 't').

So my questions are:

Is their a Python bimap implementation out there?

Would anyone want one? (I already code around its
absense).

What would its methods be?

Methods

On my third question, I guess you could have all the normal methods of
a dict but preceeded by either
left_ or right_
with calling of a method without those prefixes raising an exception,
as it seems important to not favour one mapping direction over another.

This would give:

style="font-family: monospace; font-weight: bold; text-decoration: underline;">
DICT      BIMOD
(LEFT) BIMOD (RIGHT) style="font-family: monospace; font-weight: bold;">
style="font-family: monospace;">
clear
left_clear right_clear


copy
left_copy right_copy


fromkeys
left_fromkeys right_fromkeys style="font-family: monospace;">

get
left_get
right_get      style="font-family: monospace;">

has_key
left_has_key right_has_key style="font-family: monospace;">

items
left_items right_items

iteritems    left_iteritems
right_iteritems


iterkeys
left_iterkeys right_iterkeys style="font-family: monospace;">
itervalues
left_itervalues right_itervalues style="font-family: monospace;">

keys
left_keys right_keys


pop
left_pop
right_pop      style="font-family: monospace;">

popitem
left_popitem right_popitem style="font-family: monospace;">
setdefault
left_setdefault right_setdefault style="font-family: monospace;">

update
left_update right_update   style="font-family: monospace;">

values
left_values right_values

The above couldn't be extended for indexing however, unless you did
something like:

style="font-family: monospace;">bimod_instance.left[leftindex]
and bimod_instance..right[rightindex]

but then wouldn't it be better to use the dotted form for all the
methods and have:

style="font-family: monospace;">
bimod_instance.left.haskey # (with a dot after left), ...

I note that their are redundancies above, for example, left_clear ==
right_clear ...

Strewth, I've only scratched the surface :-)

- Paddy.

Friday, April 10, 2009

Knapsack solution by OO Calc

I had previously solved the href="http://www.rosettacode.org/wiki/Knapsack_Problem"
target="_blank">knapsack problem in Python. I came
across the Solver function of OpenOffice Calc and so tried to use it to
solve knapsack.

A quick recap of the problem

A traveller gets diverted and has to make an unscheduled stop
in
what turns out to be Shangri La. Opting to leave, he is allowed to take
as much as he likes of the following items, so long as it will fit in
his knapsack, and he can carry it.
He knows that he can carry no more than 25 'weights' in total; and that
the capacity of his knapsack is 0.25 'cubic lengths'.

Looking just above the bar codes on the items he finds their
weights and volumes. He digs out his recent copy of a financial paper
and gets the value of each item.

cellpadding="2" cellspacing="2">

nowrap="nowrap" valign="middle">Item	nowrap="nowrap" valign="middle">Explanation	nowrap="nowrap" valign="middle">Value (each)	nowrap="nowrap" valign="middle">weight	nowrap="nowrap" valign="middle">Volume (each)
panacea (vials of)	Incredible healing properties	3000	0.3	0.025
ichor (ampules of)	Vampires blood	1800	0.2	0.015
gold (bars)	Shiney shiney	2500	2.0	0.002
align="left" nowrap="nowrap" valign="middle">Knapsack	align="left" nowrap="nowrap" valign="middle">For the carrying of	align="left" nowrap="nowrap" valign="middle">-	align="left" nowrap="nowrap" valign="middle"><=25	align="left" nowrap="nowrap" valign="middle"><=0.25

He can only take whole units of any item,, but there is much
more of any item than he could ever carry

How many of each item does he take to maximise the
value of items he is carrying away with him?

Note:

There are four solutions that maximise the value taken.
Only one need be given.

OO Calc setup

I am using version 3.0.1. I opened a new spreadsheat and literally
cut-n-pasted the above table into the sheet. The solver needs to be
able to change some cells (the variables) that affect one cell
containing the value to optimise.. I added the table in blue, at the
right of the figure below to calculate the value, weight and
volume of what is in the knapsack based on the amount of each item
chosen, so you can change what is in the style="font-weight: bold;">Number column and
get new totals calculated in the Totals
row

style="width: 914px; height: 444px;"
alt="Knapsack problem formulae" title="(Showing cell formulae)"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSTTwTk_nEMgY1qLVv_QXSamy1aaWjm-H73uDgKHNjbvVB0QeTQR-EVT5tCoZtaWPjmLw3RwEbFD5kBXUJ31NSYXgM-JeYFub6uyU1ZIjFKXj1e6ltYRnG3beA0o7jgjED6D3i/">

I show the formulae in the cells in the image above.

The Solver

Menu Tools-> Solver shows a form for filling in. from top to
bottom:

The target cell is the cell I want to maximise, which is
the total value of all items in cell H8

I want to Maximise.

I want to change the Number
of each item i.e. vary cells G4:G6

My limiting conditions are:

The weight is <= 25

The volume is <= 0.25

The Number of each item cannot be negative.

style="width: 455px; height: 373px;" alt="Solver Menu"
title="Solver menu"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_hFwnFTntRnmBiKs5Be81nQN6-LVoVgsi94Yl6FTgi1eM3a2IijSDPcmUOdjbh2ME_jNSpWqkScBW-9KHrl952p7MXhYT8lV_csPjRzeciF1i3Qij3K3m5JAgXQQF4by_bGoH/">

You also need to click the Solvers Options
button and ensure the options are set like this:

style="width: 431px; height: 286px;" alt="Solver Options"
title="Solver Options"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYg4BmKl0Ye99EgQ7bGvj4D_xEkf-oUNIug2hvpp9PAmjsa048nuFPSSdtMsYysNvh98NAMMpphy7jWQHRZhXWHg8fgvcUY_iH2WnsDAc5cUITezPm93HXGvNmQpx1Fj99Gxk-/">

Ok the Options, then hit Solve on the Solver window and eventually you
will get the following result:

style="width: 914px; height: 444px;"
alt="Result of Knapsack constraints problem"
title="Result of Knapsack constraints problem"
src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgoq_CC0HfR9EzsFU6E9dXcxmLk1zXhqZxkuu2tCe1DWK-_REuVKnTHBGO9KhzvQ-cIJfskAb24DzRT8bMv2Wb5lshQw7ec5YGT6IN4VEgvR8MlCZNBM0wRJDOAgXqzn-tSZu2J/">

The solver has correctly determined that carrying no panacea, fifteen
ampules of ichor, and eleven gold bars will maximise the value, subject
to the given constraints..

Comment

Their are actually other values that give the same total value under
the same constraints but I can't find an easy way for the solver to
give them all.

END.