# Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

## Tuesday, May 12, 2009

### Critique of pseudocode explanations of the Closest Pair Algorithm

I watched this
on Rosetta Code from its inception on the 6th of
May, but had problems understanding the pseudo-code given, and some
of the stuff I googled too. after a couple of days though I was more
familiar with attempts at O(nlog(n)) divide-n-conquer
algorithms having seen a few and happened to find this
explanation (.ppt file) where it seemed to click for me.

After problems with the other references I thought I might go
through and identify what I found problematic with them.

### The Rosetta Code Pseudo-code:

`closestPair of P(1), P(2), ... P(N)if N ≤ 3 then  return closest points of P using brute-force algorithmelse  xP ← P ordered by the x coordinate, in ascending order  PL ← points of xP from 1 to ⌈N/2⌉  PR ← points of xP from ⌈N/2⌉+1 to N  (dL, pairL) ← closestPair of PL  (dR, pairR) ← closestPair of PR  (dmin, pairmin) ← (dR, pairR)  if dL < dR then    (dmin, pairmin) ← (dL, pairL)  endif  xM ← xP(⌈N/2⌉)x  S ← { p ∈ xP : |xM - px| < dmin }  yP ← S ordered by the y coordinate, in ascending order  nP ← number of points in yP  (closest, closestPair) ← (dmin, pairmin)  for i from 1 to nP - 1    k ← i + 1    while k ≤ nP and yP(k)y - yP(k)y < dmin      if |yP(k) - yP(i)| < closest then        (closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)})      endif      k ← k + 1    endwhile  return closest, closestPairendif`

The author had noted that the above could have problems so I was
warned, but still started by trying to follow the above. I was
blindly following the pseudo-code until I got the the statement: xM
← xP(⌈N/2⌉)x which I couldn't understand.

This prompted my wider search for a better explanation. After
finding it (linked in the first paragraph), I modified my Python
program to follow the new explanation.

### The RC references

There were several references quoted.

• The Wikipedia article
Did not give pseudo-code

• Closest
Pair (McGill)

Section 3.4 on improving their previous
O(nlog2n) into an O(nlogn)
algorithm I found to be very vague. Just how do you “returning
the points in each set in sorted order by y-coordinate “
without doing an nlogn sort leading again to an O(nlog2n)
solution?

• Closest
Pair (UCBS)

This seems to concur with my preferred reference
but is given at a higher level than pseudo-code, and so would be
harder to implement from the description given.

• Closest
pair (WUStL)
by Dr. Jeremy Buhler
Their seems to be a
problem with having N, the number of points as an argument of the
function. What if N is odd? The pseudo-code seems to rely on other
material probably presented in a class and is at a higher level than
my preferred solution as it seems to be set as an exercise for
students.

### My Python Solution

The only tests done is to generate random points and compare the
brute force algorithms solution with that of the divide and conquer.

`'''  Compute nearest pair of points using two algorithms    First algorithm is 'brute force' comparison of every possible pair.  Second, 'divide and conquer', is based on:    www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt '''from random import randintinfinity = float('inf')# Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)def bruteForceClosestPair(point):    numPoints = len(point)    if numPoints < 2:        return infinity, (None, None)    return min( ((abs(point[i] - point[j]), (point[i], point[j]))                 for i in range(numPoints-1)                 for j in range(i+1,numPoints)),                key=lambda x: x[0])def closestPair(point):    xP = sorted(point, key= lambda p: p.real)    yP = sorted(point, key= lambda p: p.imag)    return _closestPair(xP, yP)def _closestPair(xP, yP):    numPoints = len(xP)    if numPoints <= 3:        return bruteForceClosestPair(xP)    Pl = xP[:numPoints/2]    Pr = xP[numPoints/2:]    Yl, Yr = [], []    xDivider = Pl[-1].real    for p in yP:        if p.real <= xDivider:            Yl.append(p)        else:            Yr.append(p)    dl, pairl = _closestPair(Pl, Yl)    dr, pairr = _closestPair(Pr, Yr)    dm, pairm = (dl, pairl) if dl < dr else (dr, pairr)    # Points within dm of xDivider sorted by Y coord    closeY = [p for p in yP  if abs(p.real - xDivider) < dm]    numCloseY = len(closeY)    if numCloseY > 1:        # There is a proof that you only need compare a max of 7 other points        closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j]))                         for i in range(numCloseY-1)                         for j in range(i+1,min(i+8, numCloseY))),                        key=lambda x: x[0])        return (dm, pairm) if dm <= closestY[0] else closestY    else:        return dm, pairmdef times():    ''' Time the different functions    '''    import timeit    functions = [bruteForceClosestPair, closestPair]    for f in functions:        print 'Time for', f.__name__, timeit.Timer(            '%s(pointList)' % f.__name__,            'from closestpair import %s, pointList' % f.__name__).timeit(number=1)pointList = [randint(0,1000)+1j*randint(0,1000) for i in range(2000)]if __name__ == '__main__':    pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]    print pointList    print '  bruteForceClosestPair:', bruteForceClosestPair(pointList)    print '            closestPair:', closestPair(pointList)    for i in range(10):        pointList = [randint(0,10)+1j*randint(0,10) for i in range(10)]        print '\n', pointList        print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)        print '           closestPair:', closestPair(pointList)    print '\n'    times()    times()    times()`

#### Sample output:

`[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]  bruteForceClosestPair: (1.0, ((8+4j), (7+4j)))            closestPair: (1.0, ((8+4j), (7+4j)))[(6+8j), (4+1j), 7j, (4+10j), (10+6j), (1+9j), (10+10j), (5+0j), (3+1j), (6+4j)] bruteForceClosestPair: (1.0, ((4+1j), (3+1j)))           closestPair: (1.0, ((3+1j), (4+1j)))[(5+5j), (10+9j), 3j, (6+1j), (9+5j), 1j, (8+0j), 8j, (3+2j), (10+4j)] bruteForceClosestPair: (1.4142135623730951, ((9+5j), (10+4j)))           closestPair: (1.4142135623730951, ((9+5j), (10+4j)))[(9+2j), (1+2j), 5j, (10+0j), (6+6j), (6+4j), (5+1j), (2+5j), (8+6j), (3+2j)] bruteForceClosestPair: (2.0, ((1+2j), (3+2j)))           closestPair: (2.0, ((6+6j), (6+4j)))[(2+3j), (9+0j), (9+7j), 8j, (4+5j), (6+7j), (6+2j), (3+3j), (3+10j), (7+5j)] bruteForceClosestPair: (1.0, ((2+3j), (3+3j)))           closestPair: (1.0, ((2+3j), (3+3j)))[(7+2j), (2+10j), (3+7j), (8+6j), (2+1j), 3j, (8+5j), (6+9j), (1+0j), 0j] bruteForceClosestPair: (1.0, ((8+6j), (8+5j)))           closestPair: (1.0, ((8+6j), (8+5j)))[(10+7j), (6+3j), (2+7j), (3+6j), (9+3j), (8+5j), (5+7j), (3+10j), 5j, (3+7j)] bruteForceClosestPair: (1.0, ((2+7j), (3+7j)))           closestPair: (1.0, ((3+6j), (3+7j)))[(4+6j), (5+4j), 2j, (7+0j), 4j, (6+5j), (7+5j), (8+9j), (10+5j), (8+2j)] bruteForceClosestPair: (1.0, ((6+5j), (7+5j)))           closestPair: (1.0, ((6+5j), (7+5j)))[(4+0j), (7+9j), 7j, 8j, (7+1j), (2+5j), (7+3j), (1+3j), (3+9j), (10+7j)] bruteForceClosestPair: (1.0, (7j, 8j))           closestPair: (1.0, (7j, 8j))[(9+1j), (5+0j), (8+3j), (6+9j), (3+7j), (4+6j), (7+6j), (8+6j), (10+8j), (2+5j)] bruteForceClosestPair: (1.0, ((7+6j), (8+6j)))           closestPair: (1.0, ((7+6j), (8+6j)))[(2+2j), (10+8j), (3+0j), (8+0j), (1+1j), (6+5j), 10j, (6+8j), (10+4j), (4+10j)] bruteForceClosestPair: (1.4142135623730951, ((2+2j), (1+1j)))           closestPair: (1.4142135623730951, ((1+1j), (2+2j)))Time for bruteForceClosestPair 4.59288093878Time for closestPair 0.120782948671Time for bruteForceClosestPair 5.00096353028Time for closestPair 0.120047939054Time for bruteForceClosestPair 4.86709875138Time for closestPair 0.123363723602`

Note how much slower the brute force algorithm is for only 2000
points.

(Oh, I guess I should add that the above is all my own work –
students might want to say the same).

1. about «There is a proof that you only need compare a max of 7 other points» (I've found 6 indeed, but it is not essential). The "a max" (at most?) is very important, and not exploited in your code: you test all the 7 points you need to test (as I can understand the python code, maybe I'm wrong); indeed, most of the time, you *don't* need to test them all: when you reach the first beyond "dmin", you can stop! and this is the "why" of the second condition in the while loop in the pseudocode of WUStL; that code, most of the time (depending on the distribution of the points), tests less than 7 points; exactly it tests only the points that need to be tested: no one more. So it is slightly more efficient. (then there's a proof that the number of points that can live "inside" the dmin limit is 6or7, no more... but the important fact for the implementation is to test when the limit is reached)

2. "xM ← xP(⌈N/2⌉)x"

I would translate this into the following python code:
xM = xP[math.ceil(N/2.0)].x