I watched
this
task on Rosetta Code from its inception on the 6
th of
May, but had problems understanding the pseudo-code given, and some
of the stuff I googled too. after a couple of days though I was more
familiar with attempts at
O(nlog(n)) divide-n-conquer
algorithms having seen a few and happened to find
this
explanation (.ppt file) where it seemed to click for me.
After problems with the other references I thought I might go
through and identify what I found problematic with them.
The Rosetta Code Pseudo-code:
closestPair of P(1), P(2), ... P(N)
if N ≤ 3 then
return closest points of P using brute-force algorithm
else
xP ← P ordered by the x coordinate, in ascending order
PL ← points of xP from 1 to ⌈N/2⌉
PR ← points of xP from ⌈N/2⌉+1 to N
(dL, pairL) ← closestPair of PL
(dR, pairR) ← closestPair of PR
(dmin, pairmin) ← (dR, pairR)
if dL < dR then
(dmin, pairmin) ← (dL, pairL)
endif
xM ← xP(⌈N/2⌉)x
S ← { p ∈ xP : |xM - px| < dmin }
yP ← S ordered by the y coordinate, in ascending order
nP ← number of points in yP
(closest, closestPair) ← (dmin, pairmin)
for i from 1 to nP - 1
k ← i + 1
while k ≤ nP and yP(k)y - yP(k)y < dmin
if |yP(k) - yP(i)| < closest then
(closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)})
endif
k ← k + 1
endwhile
return closest, closestPair
endif
The author had noted that the above could have problems so I was
warned, but still started by trying to follow the above. I was
blindly following the pseudo-code until I got the the statement: xM
← xP(⌈N/2⌉)x which I couldn't understand.
This prompted my wider search for a better explanation. After
finding it (linked in the first paragraph), I modified my Python
program to follow the new explanation.
The RC references
There were several references quoted.
The Wikipedia article
Did not give pseudo-code
Closest
Pair (McGill)
Section 3.4 on improving their previous
O(nlog2n) into an O(nlogn)
algorithm I found to be very vague. Just how do you “returning
the points in each set in sorted order by y-coordinate “
without doing an nlogn sort leading again to an O(nlog2n)
solution?
Closest
Pair (UCBS)
This seems to concur with my preferred reference
but is given at a higher level than pseudo-code, and so would be
harder to implement from the description given.
Closest
pair (WUStL) by Dr. Jeremy Buhler
Their seems to be a
problem with having N, the number of points as an argument of the
function. What if N is odd? The pseudo-code seems to rely on other
material probably presented in a class and is at a higher level than
my preferred solution as it seems to be set as an exercise for
students.
My Python Solution
The only tests done is to generate random points and compare the
brute force algorithms solution with that of the divide and conquer.
'''
Compute nearest pair of points using two algorithms
First algorithm is 'brute force' comparison of every possible pair.
Second, 'divide and conquer', is based on:
www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt
'''
from random import randint
infinity = float('inf')
# Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)
def bruteForceClosestPair(point):
numPoints = len(point)
if numPoints < 2:
return infinity, (None, None)
return min( ((abs(point[i] - point[j]), (point[i], point[j]))
for i in range(numPoints-1)
for j in range(i+1,numPoints)),
key=lambda x: x[0])
def closestPair(point):
xP = sorted(point, key= lambda p: p.real)
yP = sorted(point, key= lambda p: p.imag)
return _closestPair(xP, yP)
def _closestPair(xP, yP):
numPoints = len(xP)
if numPoints <= 3:
return bruteForceClosestPair(xP)
Pl = xP[:numPoints/2]
Pr = xP[numPoints/2:]
Yl, Yr = [], []
xDivider = Pl[-1].real
for p in yP:
if p.real <= xDivider:
Yl.append(p)
else:
Yr.append(p)
dl, pairl = _closestPair(Pl, Yl)
dr, pairr = _closestPair(Pr, Yr)
dm, pairm = (dl, pairl) if dl < dr else (dr, pairr)
# Points within dm of xDivider sorted by Y coord
closeY = [p for p in yP if abs(p.real - xDivider) < dm]
numCloseY = len(closeY)
if numCloseY > 1:
# There is a proof that you only need compare a max of 7 other points
closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j]))
for i in range(numCloseY-1)
for j in range(i+1,min(i+8, numCloseY))),
key=lambda x: x[0])
return (dm, pairm) if dm <= closestY[0] else closestY
else:
return dm, pairm
def times():
''' Time the different functions
'''
import timeit
functions = [bruteForceClosestPair, closestPair]
for f in functions:
print 'Time for', f.__name__, timeit.Timer(
'%s(pointList)' % f.__name__,
'from closestpair import %s, pointList' % f.__name__).timeit(number=1)
pointList = [randint(0,1000)+1j*randint(0,1000) for i in range(2000)]
if __name__ == '__main__':
pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
print pointList
print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)
print ' closestPair:', closestPair(pointList)
for i in range(10):
pointList = [randint(0,10)+1j*randint(0,10) for i in range(10)]
print '\n', pointList
print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)
print ' closestPair:', closestPair(pointList)
print '\n'
times()
times()
times()
Sample output:
[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
bruteForceClosestPair: (1.0, ((8+4j), (7+4j)))
closestPair: (1.0, ((8+4j), (7+4j)))
[(6+8j), (4+1j), 7j, (4+10j), (10+6j), (1+9j), (10+10j), (5+0j), (3+1j), (6+4j)]
bruteForceClosestPair: (1.0, ((4+1j), (3+1j)))
closestPair: (1.0, ((3+1j), (4+1j)))
[(5+5j), (10+9j), 3j, (6+1j), (9+5j), 1j, (8+0j), 8j, (3+2j), (10+4j)]
bruteForceClosestPair: (1.4142135623730951, ((9+5j), (10+4j)))
closestPair: (1.4142135623730951, ((9+5j), (10+4j)))
[(9+2j), (1+2j), 5j, (10+0j), (6+6j), (6+4j), (5+1j), (2+5j), (8+6j), (3+2j)]
bruteForceClosestPair: (2.0, ((1+2j), (3+2j)))
closestPair: (2.0, ((6+6j), (6+4j)))
[(2+3j), (9+0j), (9+7j), 8j, (4+5j), (6+7j), (6+2j), (3+3j), (3+10j), (7+5j)]
bruteForceClosestPair: (1.0, ((2+3j), (3+3j)))
closestPair: (1.0, ((2+3j), (3+3j)))
[(7+2j), (2+10j), (3+7j), (8+6j), (2+1j), 3j, (8+5j), (6+9j), (1+0j), 0j]
bruteForceClosestPair: (1.0, ((8+6j), (8+5j)))
closestPair: (1.0, ((8+6j), (8+5j)))
[(10+7j), (6+3j), (2+7j), (3+6j), (9+3j), (8+5j), (5+7j), (3+10j), 5j, (3+7j)]
bruteForceClosestPair: (1.0, ((2+7j), (3+7j)))
closestPair: (1.0, ((3+6j), (3+7j)))
[(4+6j), (5+4j), 2j, (7+0j), 4j, (6+5j), (7+5j), (8+9j), (10+5j), (8+2j)]
bruteForceClosestPair: (1.0, ((6+5j), (7+5j)))
closestPair: (1.0, ((6+5j), (7+5j)))
[(4+0j), (7+9j), 7j, 8j, (7+1j), (2+5j), (7+3j), (1+3j), (3+9j), (10+7j)]
bruteForceClosestPair: (1.0, (7j, 8j))
closestPair: (1.0, (7j, 8j))
[(9+1j), (5+0j), (8+3j), (6+9j), (3+7j), (4+6j), (7+6j), (8+6j), (10+8j), (2+5j)]
bruteForceClosestPair: (1.0, ((7+6j), (8+6j)))
closestPair: (1.0, ((7+6j), (8+6j)))
[(2+2j), (10+8j), (3+0j), (8+0j), (1+1j), (6+5j), 10j, (6+8j), (10+4j), (4+10j)]
bruteForceClosestPair: (1.4142135623730951, ((2+2j), (1+1j)))
closestPair: (1.4142135623730951, ((1+1j), (2+2j)))
Time for bruteForceClosestPair 4.59288093878
Time for closestPair 0.120782948671
Time for bruteForceClosestPair 5.00096353028
Time for closestPair 0.120047939054
Time for bruteForceClosestPair 4.86709875138
Time for closestPair 0.123363723602
Note how much slower the brute force algorithm is for only 2000
points.
(Oh, I guess I should add that the above is all my own work –
students might want to say the same).