(Best viewed on more than a 7" screen)
I helped someone on stackoverflow and quite enjoyed the task so thought I'd blog about it.
Task description
I'll take liberties with the original description but hopefully without changing the challenge.
Imagine you have a groove in a slight incline with a stop at one end. A number of coloured balls are arranged along the groove, and the stop stops them falling off the left-side lowest end. The balls can be any of ten different colours
Given an initial arrangement of balls, scanning from the left, if three or more balls of the
same colour are adjacent to each other then
all those balls (three or more of them), are discarded. Discarding balls
means that theballs still in the groove move down to fill the gap, and
the groove can be searched again for three or more adjacent balls of the
same colour.
This "filling the gap" might itself bring together balls of the same
colour that used to be separated forming a larger number of adjacent
balls of the same colour that should also be considered for discarding
under the same rule mentioned above.
Colours of balls are to be represented by the integers 0..9.
The initial arrangement of balls is represented as a list of colour-numbers
The challenge is to create a function that given a list of an initial arrangement of balls, returns the total number of balls that can be discarded.
Helper graphics
I have written a module that can display the coloured balls on the groove as a picture so lets import it
from ball_graphic import ball_graphic as bg
Lets show all the different ball colours and there mapping to numbers
bg(range(10), title="ALL BALL COLOURS", show_num=True, show_colour_name=True, show_list=True)
The above is the final configuration. A total of seven balls were discarded so any function should return 7
Example1
Given the initial ball configuration of [0, 0, 3, 3, 3, 0, 9]
The above is the final configuration. A total of six balls were discarded so any function should return 6
Example2
Given the initial ball configuration of [5, 5, 5, 5, 5, 5, 5]
The above is the final configuration of no balls. A total of seven balls were discarded so any function should return 7
Note that all the seven adjacent green balls are discarded at once, to leave none!
Example3
Given the initial ball configuration of [2, 4, 5, 7, 8, 8]
The above is the final configuration. A total of zero balls were discarded so any function should return 0
Note that there are no runs of three or more adjacent balls of the same colour/colour-number.
Do it like it says
This first solution tries to be as straight-forward as possible.
The function uses two pointers lo and hi scanning runs of adjacent
balls until the hi pointer hits a different ball colour number. The
length of the last run of adjacent balls of the same colour number is
then checked.
If greater than or equal to three, that run of adjacent balls is deleted
from the list and the scanning re-started from the beginning.
def discard_simple(a):
"Simple interpretation of the spec."
a = a.copy() # Lets not alter the argument
d = lo = hi = 0 # delete count, lo & hi pointers
lo_c = hi_c = a[0] # Colours at pointer positions
while hi +1 < len(a):
hi += 1; hi_c = a[hi] # Advance the hi pointer
if lo_c != hi_c:
if hi - lo > 2:
d += hi - lo # more discarded
del a[lo: hi]
lo, hi, lo_c, hi_c = 0, 0, a[0], a[0] # Re-scan from start
else:
lo, lo_c = hi, hi_c
return d
ex0 = [0, 0, 3, 3, 3, 0, 0]
discard_simple(ex0)
3
Whoa this should give 7 !
A bit of debugging shows that the runs of adjacent balls are only found when the next ball to the right is different to the ball before it, causing the run of previous adjacent balls to be calculated.
At the end of the list of balls the above function has no trigger to
calculate and possibly discard the last run of adjacent balls.
Although a check could be added after the while loop, a better way is
to add a sentinel "ball number" at the end of the list that is not of a
real ball colour-number so will always trigger the checking of that
last run of true balls in the list.
(It is better because the the if hi - lo > 2: ...
block logic does not have to be repeated after the while block.
def discard_simple(a):
"Simple interpretation of the spec."
a = a.copy() + [-1] # ADD TERMINATING SENTINEL
d = lo = hi = 0 # delete count, lo & hi pointers
lo_c = hi_c = a[0] # Colours at pointer positions
while hi +1 < len(a):
hi += 1; hi_c = a[hi]
if lo_c != hi_c:
if hi - lo > 2:
d += hi - lo
del a[lo: hi]
lo, hi, lo_c, hi_c = 0, 0, a[0], a[0]
else:
lo, lo_c = hi, hi_c
return d
discard_simple(ex0)
7
Looks good. let's quickly check the other examples
ex1 = [0, 0, 3, 3, 3, 0, 9]
discard_simple(ex1)
6
ex2 = [5, 5, 5, 5, 5, 5, 5]
discard_simple(ex2)
7
ex3 = [2, 4, 5, 7, 8, 8]
discard_simple(ex3)
0
All test exercises pass!
A Simple functionality test
Lets crate a function that given a function, exercises and answers; will test the function against all the exercises
import sys
exercises = [
# (Balls, answer),
([0, 0, 3, 3, 3, 0, 0], 7), # ex0
([0, 0, 3, 3, 3, 0, 9], 6), # ex1
([5, 5, 5, 5, 5, 5, 5], 7), # ex2
([2, 4, 5, 7, 8, 8], 0), # ex3
]
def test(f, in_outs=exercises):
print(f"START Testing {f.__name__}: '{f.__doc__}'")
for arg, ans in in_outs:
arg_txt = arg if len(arg) <= 8 else f"[<{len(arg)} terms>]"
try:
out = f(arg.copy())
except:
out = f'<Exception thrown! {sys.exc_info()[0]}>'
if out != ans:
print(f" {f.__name__}({arg_txt}) != {ans} # instead gives: {out}")
else:
print(f" {f.__name__}({arg_txt}) == {out}")
print(f"STOP Testing {f.__name__}")
test(discard_simple, exercises)
START Testing discard_simple: 'Simple interpretation of the spec.' discard_simple([0, 0, 3, 3, 3, 0, 0]) == 7 discard_simple([0, 0, 3, 3, 3, 0, 9]) == 6 discard_simple([5, 5, 5, 5, 5, 5, 5]) == 7 discard_simple([2, 4, 5, 7, 8, 8]) == 0 STOP Testing discard_simple
A better function: RLE
There are two glaring problems that are addressed by this next solution.
- I can remove the need of stepping through each ball by first run-length-encoding the list of balls to create a list of
[[ball, adjacency-count], ...]
then traversing that. - In RLE format, you only need to examine the previous
[ball, adjacency-count]
pair to see if a discard leads to a merging of now adjacent balls.
This solution should be faster.
from itertools import groupby
def discard_RLE(a):
"using run-length encoding"
a = a.copy() + [-1] # Add terminator
a = [[key, len(list(group))] for key, group in groupby(a)] # RLE
d = pt = 0 # delete count, pointer
while pt +1 < len(a):
i0, n0 = a[pt] # item, count at pt
if n0 > 2: # trigger initial discard
d += n0
del a[pt]
if pt > 0: # Check if previous run of is of same colour
if a[pt - 1][0] == a[pt][0]: # consolidate
a[pt - 1][1] += a[pt][1]
del a[pt]
pt -= 1 # Go back one run of balls.
continue
else:
pt += 1
return d
test(discard_RLE, exercises)
START Testing discard_RLE: 'using run-length encoding' discard_RLE([0, 0, 3, 3, 3, 0, 0]) == 7 discard_RLE([0, 0, 3, 3, 3, 0, 9]) == 6 discard_RLE([5, 5, 5, 5, 5, 5, 5]) == 7 discard_RLE([2, 4, 5, 7, 8, 8]) == 0 STOP Testing discard_RLE
It works!
My best function: No array deletions
Looking at the RLE example above, del
is used to delete multiple parts of the a
to discard runs of balls. This will be slow on large inputs.
The discard_LL
function below uses a doubly linked list datastructure instead of a list (but created within a list), to make deletions as simple as de-linking.
This solution should, (hopefully), be the fastest.
def discard_LL(a):
"using a linked-list and run-length encoding"
a = a.copy() + [-1] # Add terminator
# a is list of [[item, reps, prev_pointer, next_pointer], ...]
a = [[key, len(list(group)), i - 1, i + 1]
for i, (key, group) in enumerate(groupby(a))] # linked-list RLE
a[0][-2] = None # No previous item at start
a[-1][-1] = None # No next item at end
d = pt = 0 # delete count, pointer
while pt is not None:
i0, n0, pre_pt, nxt_pt = a[pt] # item, count, next at pt
if n0 > 2:
d += n0
deleted_pt = pt
if pre_pt is not None:
a[pre_pt][-1] = pt = nxt_pt # del a[pt] & pt to next
if a[pre_pt][0] == a[pt][0]: # consolidate same items in pre
a[pre_pt][1] += a[pt][1]
a[pre_pt][-1] = a[pt][-1] # del a[pt]
pt = pre_pt # ... & pt to previous
else:
pt = nxt_pt
else:
pt = nxt_pt
return d
test(discard_LL, exercises)
START Testing discard_LL: 'using a linked-list and run-length encoding' discard_LL([0, 0, 3, 3, 3, 0, 0]) == 7 discard_LL([0, 0, 3, 3, 3, 0, 9]) == 6 discard_LL([5, 5, 5, 5, 5, 5, 5]) == 7 discard_LL([2, 4, 5, 7, 8, 8]) == 0 STOP Testing discard_LL
BIG DATA AND TIMINGS
I need to generate rows of thousands of balls with differing discard characteristics
Here are some generators:
def gen_all_same(n, ball=0):
"n of the same ball"
return [ball] * n
def gen_x_of_each(n, balls=[2, 3], x=3):
"x of each ball ,in turn, until n balls returned"
each_by_x = sum(([b] * x for b in balls), start=[])
return (each_by_x * (n // len(each_by_x) + 1))[:n]
gen_all_same(15, 7)
[7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7]
out = gen_x_of_each(23, [2, 5], 3)
(len(out), out)
(23, [2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 5, 5, 2, 2, 2, 5, 5])
out = gen_x_of_each(23, [2, 3], 3)
(len(out), out)
(23, [2, 2, 2, 3, 3, 3, 2, 2, 2, 3, 3, 3, 2, 2, 2, 3, 3, 3, 2, 2, 2, 3, 3])
def gen_all_downto_zero(n):
"0,1,... generator that then reduces to zero balls for n a mult of 3"
n //= 3
if n == 0:
return []
x = ([0, 1] * ((n+1)//2))[:n] # 0,1 ... of length n
e = x[-1] # end item
ne = 0 if e == 1 else 1 # not end item
r = ([e, e, ne, ne] * n)[:2*n]
return x + r
out = gen_all_downto_zero(24)
(len(out), out)
(24, [0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
generators = [gen_all_same, gen_x_of_each, gen_all_downto_zero]
solvers = [discard_simple, discard_RLE, discard_LL]
import time
import pandas as pd
import gc
def time_a_generator(n_list, gen=gen_all_same, solvers=solvers, plot=True):
t = {key: [] for key in (f.__name__ for f in solvers)}
axis = dict(title=f"GENERATOR {gen.__name__}(N) TIMINGS", x="N", y="seconds")
index = []
for n in n_list:
arg = gen(n)
index.append(n)
arg_txt = arg if len(arg) <= 8 else f"[<{len(arg)} terms>]"
for f in solvers:
argc = arg.copy()
try:
gc.disable()
t0 = time.time()
out = f(arg.copy())
t1 = time.time()
gc.enable()
t[f.__name__].append(t1 - t0)
except:
out = f'<Exception thrown! {sys.exc_info()[0]}>'
t[f.__name__].append(None)
gc.enable()
df = pd.DataFrame(t, index=index)
if plot:
plot = df.plot.line(title=axis['title'])
plot.set_xlabel(axis["x"])
plot.set_ylabel(axis["y"])
return df #, axis
Timings for: All balls the same
The generator gen_all_same
is equivalent to thousands of the same balls in a row. They can all be
discarded in one fell swoop so timings should show the difference
between stepping through each individual ball in an explicit Python loop
in discard_simple versus the groupby
used in both discard_RLE and discard_LL. (there is trivial list/linked list traversal in discard_RLE/LL).
N, the number of balls goes up to approximately a million balls. (If N is a multiple of three then all the ball generator algorithms generate exactly N balls)
t_gen = time_a_generator(range(99_999, 999_999, 99_999), gen=gen_all_same);
discard_simple is seen to be much slower than discard_RLE/LL
Timings for: Serial collapsing of three adjacent balls
The generator gen_x_of_each
alternates between three of one ball and three of another... Each group of three can be discarded as they are found.
- discard_simple will be hampered by the slower for-loop traversal of each ball again; and needing to del the balls found from the array of balls.
- discard_RLE has the faster traversal of the RLE encoding of balls; but still needs to del the balls found from the array of balls.
- discard_LL has the faster traversal of the RLE encoding of balls; but this time unlinks the discarded balls found from the doubly-linked list of balls. (Is it fastest)?
t_gen = time_a_generator(range(3_333, 33_333, 3_333), gen=gen_x_of_each);
For this type of input discard_simple is always slower than discard_LL and discard_RLE. < br> discard_LL is close, to then becomes faster than discard_RLE.
Timings for: Delayed collapsing of alternating balls
The generator gen_all_downto_zero
alternates between one of one ball and one of another... then appends
two of the last ball to collapse it then two of the alternate ball to
collapse the next to last of the alternating, then two... all the way to
appending the last two balls of the fist colour, finally collapsing the
first ball to leave nothing.
Here is a small sample of the kind of rows of balls it generates:
bg(gen_all_downto_zero(12), title="gen_all_downto_zero(12)", show_num=True, show_list=True)
Notice that the first third, (4), of the balls alternate and then there is appended alternating two balls of each colour, leading to one section of three yellow balls that when discarded put three of the alternate balls together which are discarded... and so on down to discarding all the balls.
Discarding starts from a third of the way in and progresses to the left.
- discard_simple will be hampered by going back to the searching from the left after each discarding.
- discard_RLE is coded to only go back to the previous block of adjacent RLE encoded balls; but still needs to del the balls found from the array of balls.
- discard_LL is also coded to only go back to the previous block of adjacent RLE encoded balls, (why its a doubly linked list with a back pointer); but this time unlinks the discarded balls found from the doubly-linked list of balls.
t_gen = time_a_generator(range(999, 9_999, 999), gen=gen_all_downto_zero);
For this type of input discard_simple is way, waay slower than discard_LL and discard_RLE for this range of N. Let's remove discard_simple and run discard_RLE & LL for larger N.
t_gen = time_a_generator(range(33_333, 333_333, 33_333), gen=gen_all_downto_zero,
solvers=[discard_RLE, discard_LL]);
For this type of input discard_RLE is itself many times slower than discard_LL for this range of N. The doubly linked list deletion stamps its superiority.
Conclusion
The "Balls in a row" challenge has interesting nuances, both in its method of solution and in the creation of sequences of balls to exercise those solutions. My discard_LL solution that uses both a doubly linked list as well as run-length encoding is generally the fastest.
Do you know of a fundamental change to my best algorithm that improves on my timings?
> The challenge is to create a function that given a list of an initial arrangement of balls, returns the total number of balls that can be discarded.
ReplyDeleteIf this is interpreted to mean "the most number of balls that can be discarded", then all of the solutions presented have a bug, because they're all greedy algorithms.
E.g. consider the sequence
[0, 1, 1, 1, 0, 0, 0, 1, 1]
If it starts from the left, it can discard 3x1 and then 4x0 for a total of seven. But if it starts from the right it can discard 3x0 and then 5x1 for a total of eight. An algorithm to find the *maximum* number of balls to discard must consider different discard orders. (I'm not sure how difficult that is, but it smells NP-complete.)
Updated the description to explicitly state "from the left" - thanks.
ReplyDelete