task on Rosetta Code from its inception on the 6

^{th}of

May, but had problems understanding the pseudo-code given, and some

of the stuff I googled too. after a couple of days though I was more

familiar with attempts at

*O(nlog(n))*divide-n-conquer

algorithms having seen a few and happened to find this

explanation (.ppt file) where it seemed to click for me.

After problems with the other references I thought I might go

through and identify what I found problematic with them.

### The Rosetta Code Pseudo-code:

closestPairof P(1), P(2), ... P(N)ifN ≤ 3then

returnclosest points of P using brute-force algorithmelse

xP ← P ordered by the x coordinate, in ascending order

P_{L}← points of xP from 1 to ⌈N/2⌉

P_{R}← points of xP from ⌈N/2⌉+1 to N

(d_{L}, pair_{L}) ←closestPairof P_{L}

(d_{R}, pair_{R}) ←closestPairof P_{R}

(d_{min}, pair_{min}) ← (d_{R}, pair_{R})

ifd_{L}< d_{R}then

(d_{min}, pair_{min}) ← (d_{L}, pair_{L})

endif

x_{M}← xP(⌈N/2⌉)_{x}

S ← { p ∈ xP : |x_{M}- p_{x}| < d_{min}}

yP ← S ordered by the y coordinate, in ascending order

nP ← number of points in yP

(closest, closestPair) ← (d_{min}, pair_{min})

forifrom1tonP - 1

k ← i + 1

whilek ≤ nPandyP(k)_{y}- yP(k)_{y}< d_{min}

if|yP(k) - yP(i)| < closestthen

(closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)})

endif

k ← k + 1

endwhile

returnclosest, closestPairendif

The author had noted that the above could have problems so I was

warned, but still started by trying to follow the above. I was

blindly following the pseudo-code until I got the the statement: x_{M}

← xP(⌈N/2⌉)_{x} which I couldn't understand.

This prompted my wider search for a better explanation. After

finding it (linked in the first paragraph), I modified my Python

program to follow the new explanation.

### The RC references

There were several references quoted.

**The Wikipedia article**

Did not give pseudo-code

Closest

Pair (McGill)

Section 3.4 on improving their previous

O(*n*log^{2}*n*) into an O(*n*log*n*)

algorithm I found to be very vague. Just how do you “returning

the points in each set in sorted order by*y*-coordinate “

without doing an nlogn sort leading again to an O(*n*log^{2}*n*)

solution?

Closest

Pair (UCBS)

This seems to concur with my preferred reference

but is given at a higher level than pseudo-code, and so would be

harder to implement from the description given.

Closest

pair (WUStL) by Dr. Jeremy Buhler

Their seems to be a

problem with having N, the number of points as an argument of the

function. What if N is odd? The pseudo-code seems to rely on other

material probably presented in a class and is at a higher level than

my preferred solution as it seems to be set as an exercise for

students.

### My Python Solution

The only tests done is to generate random points and compare the

brute force algorithms solution with that of the divide and conquer.

'''

Compute nearest pair of points using two algorithms

First algorithm is 'brute force' comparison of every possible pair.

Second, 'divide and conquer', is based on:

www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt

'''

from random import randint

infinity = float('inf')

# Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)defbruteForceClosestPair(point):

numPoints = len(point)

ifnumPoints < 2:

returninfinity, (None, None)

returnmin( ((abs(point[i] - point[j]), (point[i], point[j]))

foriinrange(numPoints-1)

forjinrange(i+1,numPoints)),

key=lambdax: x[0])defclosestPair(point):

xP = sorted(point, key=lambdap: p.real)

yP = sorted(point, key=lambdap: p.imag)

return_closestPair(xP, yP)def_closestPair(xP, yP):

numPoints = len(xP)

ifnumPoints <= 3:

returnbruteForceClosestPair(xP)

Pl = xP[:numPoints/2]

Pr = xP[numPoints/2:]

Yl, Yr = [], []

xDivider = Pl[-1].real

forpinyP:

ifp.real <= xDivider:

Yl.append(p)

else:

Yr.append(p)

dl, pairl = _closestPair(Pl, Yl)

dr, pairr = _closestPair(Pr, Yr)

dm, pairm = (dl, pairl)ifdl < drelse(dr, pairr)

# Points within dm of xDivider sorted by Y coord

closeY = [pforpinyPifabs(p.real - xDivider) < dm]

numCloseY = len(closeY)

ifnumCloseY > 1:

# There is a proof that you only need compare a max of 7 other points

closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j]))

foriinrange(numCloseY-1)

forjinrange(i+1,min(i+8, numCloseY))),

key=lambdax: x[0])

return(dm, pairm)ifdm <= closestY[0]elseclosestY

else:

returndm, pairmdeftimes():

''' Time the different functions

'''

import timeit

functions = [bruteForceClosestPair, closestPair]

forfinfunctions:

'%s(pointList)' % f.__name__,

'from closestpair import %s, pointList' % f.__name__).timeit(number=1)

pointList = [randint(0,1000)+1j*randint(0,1000)foriinrange(2000)]if__name__ == '__main__':

pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]

foriinrange(10):

pointList = [randint(0,10)+1j*randint(0,10)foriinrange(10)]

times()

times()

times()

Sample output:

[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]

bruteForceClosestPair: (1.0, ((8+4j), (7+4j)))

closestPair: (1.0, ((8+4j), (7+4j)))

[(6+8j), (4+1j), 7j, (4+10j), (10+6j), (1+9j), (10+10j), (5+0j), (3+1j), (6+4j)]

bruteForceClosestPair: (1.0, ((4+1j), (3+1j)))

closestPair: (1.0, ((3+1j), (4+1j)))

[(5+5j), (10+9j), 3j, (6+1j), (9+5j), 1j, (8+0j), 8j, (3+2j), (10+4j)]

bruteForceClosestPair: (1.4142135623730951, ((9+5j), (10+4j)))

closestPair: (1.4142135623730951, ((9+5j), (10+4j)))

[(9+2j), (1+2j), 5j, (10+0j), (6+6j), (6+4j), (5+1j), (2+5j), (8+6j), (3+2j)]

bruteForceClosestPair: (2.0, ((1+2j), (3+2j)))

closestPair: (2.0, ((6+6j), (6+4j)))

[(2+3j), (9+0j), (9+7j), 8j, (4+5j), (6+7j), (6+2j), (3+3j), (3+10j), (7+5j)]

bruteForceClosestPair: (1.0, ((2+3j), (3+3j)))

closestPair: (1.0, ((2+3j), (3+3j)))

[(7+2j), (2+10j), (3+7j), (8+6j), (2+1j), 3j, (8+5j), (6+9j), (1+0j), 0j]

bruteForceClosestPair: (1.0, ((8+6j), (8+5j)))

closestPair: (1.0, ((8+6j), (8+5j)))

[(10+7j), (6+3j), (2+7j), (3+6j), (9+3j), (8+5j), (5+7j), (3+10j), 5j, (3+7j)]

bruteForceClosestPair: (1.0, ((2+7j), (3+7j)))

closestPair: (1.0, ((3+6j), (3+7j)))

[(4+6j), (5+4j), 2j, (7+0j), 4j, (6+5j), (7+5j), (8+9j), (10+5j), (8+2j)]

bruteForceClosestPair: (1.0, ((6+5j), (7+5j)))

closestPair: (1.0, ((6+5j), (7+5j)))

[(4+0j), (7+9j), 7j, 8j, (7+1j), (2+5j), (7+3j), (1+3j), (3+9j), (10+7j)]

bruteForceClosestPair: (1.0, (7j, 8j))

closestPair: (1.0, (7j, 8j))

[(9+1j), (5+0j), (8+3j), (6+9j), (3+7j), (4+6j), (7+6j), (8+6j), (10+8j), (2+5j)]

bruteForceClosestPair: (1.0, ((7+6j), (8+6j)))

closestPair: (1.0, ((7+6j), (8+6j)))

[(2+2j), (10+8j), (3+0j), (8+0j), (1+1j), (6+5j), 10j, (6+8j), (10+4j), (4+10j)]

bruteForceClosestPair: (1.4142135623730951, ((2+2j), (1+1j)))

closestPair: (1.4142135623730951, ((1+1j), (2+2j)))

Time for bruteForceClosestPair 4.59288093878

Time for closestPair 0.120782948671

Time for bruteForceClosestPair 5.00096353028

Time for closestPair 0.120047939054

Time for bruteForceClosestPair 4.86709875138

Time for closestPair 0.123363723602

Note how much slower the brute force algorithm is for only 2000

points.

(Oh, I guess I should add that the above is all my own work –

students might want to say the same).

about «There is a proof that you only need compare a max of 7 other points» (I've found 6 indeed, but it is not essential). The "a max" (at most?) is very important, and not exploited in your code: you test all the 7 points you need to test (as I can understand the python code, maybe I'm wrong); indeed, most of the time, you *don't* need to test them all: when you reach the first beyond "dmin", you can stop! and this is the "why" of the second condition in the while loop in the pseudocode of WUStL; that code, most of the time (depending on the distribution of the points), tests less than 7 points; exactly it tests only the points that need to be tested: no one more. So it is slightly more efficient. (then there's a proof that the number of points that can live "inside" the dmin limit is 6or7, no more... but the important fact for the implementation is to test when the limit is reached)

ReplyDelete"xM ← xP(⌈N/2⌉)x"

ReplyDeleteI would translate this into the following python code:

xM = xP[math.ceil(N/2.0)].x