task on Rosetta Code from its inception on the 6th of
May, but had problems understanding the pseudo-code given, and some
of the stuff I googled too. after a couple of days though I was more
familiar with attempts at O(nlog(n)) divide-n-conquer
algorithms having seen a few and happened to find this
explanation (.ppt file) where it seemed to click for me.
After problems with the other references I thought I might go
through and identify what I found problematic with them.
The Rosetta Code Pseudo-code:
closestPair of P(1), P(2), ... P(N)
if N ≤ 3 then
return closest points of P using brute-force algorithm
else
xP ← P ordered by the x coordinate, in ascending order
PL ← points of xP from 1 to ⌈N/2⌉
PR ← points of xP from ⌈N/2⌉+1 to N
(dL, pairL) ← closestPair of PL
(dR, pairR) ← closestPair of PR
(dmin, pairmin) ← (dR, pairR)
if dL < dR then
(dmin, pairmin) ← (dL, pairL)
endif
xM ← xP(⌈N/2⌉)x
S ← { p ∈ xP : |xM - px| < dmin }
yP ← S ordered by the y coordinate, in ascending order
nP ← number of points in yP
(closest, closestPair) ← (dmin, pairmin)
for i from 1 to nP - 1
k ← i + 1
while k ≤ nP and yP(k)y - yP(k)y < dmin
if |yP(k) - yP(i)| < closest then
(closest, closestPair) ← (|yP(k) - yP(i)|, {yP(k), yP(i)})
endif
k ← k + 1
endwhile
return closest, closestPair
endif
The author had noted that the above could have problems so I was
warned, but still started by trying to follow the above. I was
blindly following the pseudo-code until I got the the statement: xM
← xP(⌈N/2⌉)x which I couldn't understand.
This prompted my wider search for a better explanation. After
finding it (linked in the first paragraph), I modified my Python
program to follow the new explanation.
The RC references
There were several references quoted.
The Wikipedia article
Did not give pseudo-code
Closest
Pair (McGill)
Section 3.4 on improving their previous
O(nlog2n) into an O(nlogn)
algorithm I found to be very vague. Just how do you “returning
the points in each set in sorted order by y-coordinate “
without doing an nlogn sort leading again to an O(nlog2n)
solution?
Closest
Pair (UCBS)
This seems to concur with my preferred reference
but is given at a higher level than pseudo-code, and so would be
harder to implement from the description given.
Closest
pair (WUStL) by Dr. Jeremy Buhler
Their seems to be a
problem with having N, the number of points as an argument of the
function. What if N is odd? The pseudo-code seems to rely on other
material probably presented in a class and is at a higher level than
my preferred solution as it seems to be set as an exercise for
students.
My Python Solution
The only tests done is to generate random points and compare the
brute force algorithms solution with that of the divide and conquer.
'''
Compute nearest pair of points using two algorithms
First algorithm is 'brute force' comparison of every possible pair.
Second, 'divide and conquer', is based on:
www.cs.iupui.edu/~xkzou/teaching/CS580/Divide-and-conquer-closestPair.ppt
'''
from random import randint
infinity = float('inf')
# Note the use of complex numbers to represent 2D points making distance == abs(P1-P2)
def bruteForceClosestPair(point):
numPoints = len(point)
if numPoints < 2:
return infinity, (None, None)
return min( ((abs(point[i] - point[j]), (point[i], point[j]))
for i in range(numPoints-1)
for j in range(i+1,numPoints)),
key=lambda x: x[0])
def closestPair(point):
xP = sorted(point, key= lambda p: p.real)
yP = sorted(point, key= lambda p: p.imag)
return _closestPair(xP, yP)
def _closestPair(xP, yP):
numPoints = len(xP)
if numPoints <= 3:
return bruteForceClosestPair(xP)
Pl = xP[:numPoints/2]
Pr = xP[numPoints/2:]
Yl, Yr = [], []
xDivider = Pl[-1].real
for p in yP:
if p.real <= xDivider:
Yl.append(p)
else:
Yr.append(p)
dl, pairl = _closestPair(Pl, Yl)
dr, pairr = _closestPair(Pr, Yr)
dm, pairm = (dl, pairl) if dl < dr else (dr, pairr)
# Points within dm of xDivider sorted by Y coord
closeY = [p for p in yP if abs(p.real - xDivider) < dm]
numCloseY = len(closeY)
if numCloseY > 1:
# There is a proof that you only need compare a max of 7 other points
closestY = min( ((abs(closeY[i] - closeY[j]), (closeY[i], closeY[j]))
for i in range(numCloseY-1)
for j in range(i+1,min(i+8, numCloseY))),
key=lambda x: x[0])
return (dm, pairm) if dm <= closestY[0] else closestY
else:
return dm, pairm
def times():
''' Time the different functions
'''
import timeit
functions = [bruteForceClosestPair, closestPair]
for f in functions:
print 'Time for', f.__name__, timeit.Timer(
'%s(pointList)' % f.__name__,
'from closestpair import %s, pointList' % f.__name__).timeit(number=1)
pointList = [randint(0,1000)+1j*randint(0,1000) for i in range(2000)]
if __name__ == '__main__':
pointList = [(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
print pointList
print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)
print ' closestPair:', closestPair(pointList)
for i in range(10):
pointList = [randint(0,10)+1j*randint(0,10) for i in range(10)]
print '\n', pointList
print ' bruteForceClosestPair:', bruteForceClosestPair(pointList)
print ' closestPair:', closestPair(pointList)
print '\n'
times()
times()
times()
Sample output:
[(5+9j), (9+3j), (2+0j), (8+4j), (7+4j), (9+10j), (1+9j), (8+2j), 10j, (9+6j)]
bruteForceClosestPair: (1.0, ((8+4j), (7+4j)))
closestPair: (1.0, ((8+4j), (7+4j)))
[(6+8j), (4+1j), 7j, (4+10j), (10+6j), (1+9j), (10+10j), (5+0j), (3+1j), (6+4j)]
bruteForceClosestPair: (1.0, ((4+1j), (3+1j)))
closestPair: (1.0, ((3+1j), (4+1j)))
[(5+5j), (10+9j), 3j, (6+1j), (9+5j), 1j, (8+0j), 8j, (3+2j), (10+4j)]
bruteForceClosestPair: (1.4142135623730951, ((9+5j), (10+4j)))
closestPair: (1.4142135623730951, ((9+5j), (10+4j)))
[(9+2j), (1+2j), 5j, (10+0j), (6+6j), (6+4j), (5+1j), (2+5j), (8+6j), (3+2j)]
bruteForceClosestPair: (2.0, ((1+2j), (3+2j)))
closestPair: (2.0, ((6+6j), (6+4j)))
[(2+3j), (9+0j), (9+7j), 8j, (4+5j), (6+7j), (6+2j), (3+3j), (3+10j), (7+5j)]
bruteForceClosestPair: (1.0, ((2+3j), (3+3j)))
closestPair: (1.0, ((2+3j), (3+3j)))
[(7+2j), (2+10j), (3+7j), (8+6j), (2+1j), 3j, (8+5j), (6+9j), (1+0j), 0j]
bruteForceClosestPair: (1.0, ((8+6j), (8+5j)))
closestPair: (1.0, ((8+6j), (8+5j)))
[(10+7j), (6+3j), (2+7j), (3+6j), (9+3j), (8+5j), (5+7j), (3+10j), 5j, (3+7j)]
bruteForceClosestPair: (1.0, ((2+7j), (3+7j)))
closestPair: (1.0, ((3+6j), (3+7j)))
[(4+6j), (5+4j), 2j, (7+0j), 4j, (6+5j), (7+5j), (8+9j), (10+5j), (8+2j)]
bruteForceClosestPair: (1.0, ((6+5j), (7+5j)))
closestPair: (1.0, ((6+5j), (7+5j)))
[(4+0j), (7+9j), 7j, 8j, (7+1j), (2+5j), (7+3j), (1+3j), (3+9j), (10+7j)]
bruteForceClosestPair: (1.0, (7j, 8j))
closestPair: (1.0, (7j, 8j))
[(9+1j), (5+0j), (8+3j), (6+9j), (3+7j), (4+6j), (7+6j), (8+6j), (10+8j), (2+5j)]
bruteForceClosestPair: (1.0, ((7+6j), (8+6j)))
closestPair: (1.0, ((7+6j), (8+6j)))
[(2+2j), (10+8j), (3+0j), (8+0j), (1+1j), (6+5j), 10j, (6+8j), (10+4j), (4+10j)]
bruteForceClosestPair: (1.4142135623730951, ((2+2j), (1+1j)))
closestPair: (1.4142135623730951, ((1+1j), (2+2j)))
Time for bruteForceClosestPair 4.59288093878
Time for closestPair 0.120782948671
Time for bruteForceClosestPair 5.00096353028
Time for closestPair 0.120047939054
Time for bruteForceClosestPair 4.86709875138
Time for closestPair 0.123363723602
Note how much slower the brute force algorithm is for only 2000
points.
(Oh, I guess I should add that the above is all my own work –
students might want to say the same).
about «There is a proof that you only need compare a max of 7 other points» (I've found 6 indeed, but it is not essential). The "a max" (at most?) is very important, and not exploited in your code: you test all the 7 points you need to test (as I can understand the python code, maybe I'm wrong); indeed, most of the time, you *don't* need to test them all: when you reach the first beyond "dmin", you can stop! and this is the "why" of the second condition in the while loop in the pseudocode of WUStL; that code, most of the time (depending on the distribution of the points), tests less than 7 points; exactly it tests only the points that need to be tested: no one more. So it is slightly more efficient. (then there's a proof that the number of points that can live "inside" the dmin limit is 6or7, no more... but the important fact for the implementation is to test when the limit is reached)
ReplyDelete"xM ← xP(⌈N/2⌉)x"
ReplyDeleteI would translate this into the following python code:
xM = xP[math.ceil(N/2.0)].x