Sunday, November 17, 2024

There's the easy way...

 

Best seen on a larger than landscape phone

Someone blogged about a particular problem:

From: https://theweeklychallenge.org/blog/perl-weekly-challenge-294/#TASK1

Given an unsorted array of integers, `ints`
Write a script to return the length of the longest consecutive elements sequence.
Return -1 if none found. *The algorithm must run in O(n) time.*

The solution they blogged used a sort which meant it could not be O(n) in time, but the problem looked good so I gave it some thought.

Sets! sets are O(1) in Python and are good for looking things up.

What if when looking at the inputted numbers, one at a time, you also looked for other ints in the input that would extend the int you have to form a longer  range?  Keep tab of the longest range so far and if you remove ints from the pool as they form ranges, when the pool is empty, you should know the longest range.

I added the printout of the longest range too.

My code

def consec_seq(ints) -> tuple[int, int, int]:
    "Extract longest_seq_length, its_min, its_max"
    pool = set(ints)
    longest, longest_mn, longest_mx = 0, 1, 0
    while pool:
        this = start = pool.pop()
        ln = 1
        # check down
        while (this:=(this - 1)) in pool:
            ln += 1
            pool.remove(this)
        mn = this + 1
        # check up
        this = start
        while (this:=(this + 1)) in pool:
            ln += 1
            pool.remove(this)
        mx = this - 1
        # check longest
        if ln > longest:
            longest, longest_mn, longest_mx = ln, mn, mx

    return longest,longest_mn,longest_mx

def _test():
    for ints in[(),
            (69,),
            (-20, 78, 79, 1, 100),
            (10, 4, 20, 1, 3, 2),
            (0, 6, 1, 8, 5, 2, 4, 3, 0, 7),
            (10, 30, 20),
            (2,4,3,1,0, 10,12,11,8,9),  # two runs of five
            (10,12,11,8,9, 2,4,3,1,0),  # two runs of five - reversed
            (2,4,3,1,0,-1, 10,12,11,8,9),  # runs of 6 and 5
            (2,4,3,1,0, 10,12,11,8,9,7),   # runs of 5 and 6
            ]:
        print(f"Input {ints = }")
        longest, longest_mn, longest_mx = consec_seq(ints)

        if longest <2:
            print("  -1")
        else:
            print(f"  The/A longest sequence has {longest} elements {longest_mn}..{longest_mx}")


# %%
if __name__ == '__main__':
    _test()

Sample output

Input ints = ()
  -1
Input ints = (69,)
  -1
Input ints = (-20, 78, 79, 1, 100)
  The/A longest sequence has 2 elements 78..79
Input ints = (10, 4, 20, 1, 3, 2)
  The/A longest sequence has 4 elements 1..4
Input ints = (0, 6, 1, 8, 5, 2, 4, 3, 0, 7)
  The/A longest sequence has 9 elements 0..8
Input ints = (10, 30, 20)
  -1
Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9)
  The/A longest sequence has 5 elements 0..4
Input ints = (10, 12, 11, 8, 9, 2, 4, 3, 1, 0)
  The/A longest sequence has 5 elements 0..4
Input ints = (2, 4, 3, 1, 0, -1, 10, 12, 11, 8, 9)
  The/A longest sequence has 6 elements -1..4
Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9, 7)
  The/A longest sequence has 6 elements 7..12

Another Algorithm

What if, you kept and extended ranges untill you amassed all ranges then chose the longest? I need to keep the hash lookup. dict key lookup should also be O(1).  What to look up? Look up ints that would extend a range!

If you have an existing (integer) range, say 1..3 inclusive of end points then finding 0 would extend the range to 0..3 or finding one more than the range maximum, 4 would extend the original range to 1..4 

So if you have ranges then they could be extended by finding rangemin - 1 or rangemax +1. I call then extends

If you do find that the next int from the input ints is also an extends value then you need to find the range that it extends, (by lookup), so you can modify that range. - use a dict to map extends to their range and checking if an int is in the extends dict keys should also take O(1) time.

I took that sketch of an algorithm and started to code. It took two evenings to finally get something that worked and I had to work out several details that were trying. The main problem was what about coalescing ranges? if you have ranges 1..2 and 4..5 what happens when you see a 3? the resultant is the single range 1..5. It took particular test cases and extensive debugging to work out that the extends2range mapping should map to potentially more than one range and that you need to combine ranges if two of them are present for any extend value being hit.


So for 1..2 the extends being looked for are 0 and 3. For 4..5 the extends being looked for are 3, again, and 6. The extends2ranges data structure for just this should look like:

{0: [[1, 2]],
 3: [[1, 2], [4, 5]],
 6: [[4, 5]]}

 

The Code #2

from collections import defaultdict


def combine_ranges(min1, max1, min2, max2):
    "Combine two overlapping ranges return the new range as [min, max], and a set of limits unused in the result"
    assert (min1 <= max1 and min2 <= max2          # Well formed
            and ( min1 <= max2 and min2 <= max1 )) # and ranges touch or overlap
    range_limits = set([min1, max1, min2, max2])
    new_mnmx = [min(range_limits), max(range_limits)]
    unused_limits = range_limits - set(new_mnmx)

    return new_mnmx, unused_limits

def consec_seq2(ints) -> tuple[int, int, int]:
    "Extract longest_seq_length, its_min, its_max"
    if not ints:
        return -1, 1, -1
    seen = set()  # numbers seen so far
    extends2ranges = defaultdict(list)  # map extends to its ranges
    for this in ints:
        if this in seen:
            continue
        else:
            seen.add(this)

        if this not in extends2ranges:
            # Start new range
            mnmx = [this, this]    # Range of one int
            # add in the extend points
            extends2ranges[this + 1].append(mnmx)
            extends2ranges[this - 1].append(mnmx)
        else:
            # Extend an existing range
            ranges = extends2ranges[this]  # The range(s) that could be extended by this
            if len(ranges) == 2:
                # this joins the two ranges
                extend_and_join_ranges(extends2ranges, this, ranges)
            else:
                # extend one range, copied
                extend_and_join_ranges(extends2ranges, this, [ranges[0], ranges[0].copy()])

    all_ranges = sum(extends2ranges.values(), start=[])
    longest_mn, longest_mx = max(all_ranges, key=lambda mnmx: mnmx[1] - mnmx[0])

    return (longest_mx - longest_mn + 1), longest_mn, longest_mx

def extend_and_join_ranges(extends2ranges, this, ranges):
    mnmx, mnmx2 = ranges
    mnmx_orig, mnmx2_orig = mnmx.copy(), mnmx2.copy() # keep copy of originals
    mn, mx = mnmx
    mn2, mx2 = mnmx2
    if this == mn - 1:
        mnmx[0] = mn = this  # Extend lower limit of the range
    if this == mn2 - 1:
        mnmx2[0] = mn2 = this  # Extend lower limit of the range
    if this == mx + 1:
        mnmx[1] = mx = this  # Extend upper limit of the range
    if this == mx2 + 1:
        mnmx2[1] = mx2 = this  # Extend lower limit of the range
    new_mnmx, _unused_limits = combine_ranges(mn, mx, mn2, mx2)

    remove_merged_from_extends(extends2ranges, this, mnmx, mnmx2)
    add_combined_range_to_extends(extends2ranges, new_mnmx)


def add_combined_range_to_extends(extends2ranges, new_mnmx):
    "Add in the combined of two ranges's extends"
    new_mn, new_mx = new_mnmx
    for extend in (new_mn - 1, new_mx + 1):
        r = extends2ranges[extend]  # ranges at new limit extension
        if new_mnmx not in r:
            r.append(new_mnmx)

def remove_merged_from_extends(extends2ranges, this, mnmx, mnmx2):
    "Remove original ranges that were merged from extends"
    for lohi in (mnmx, mnmx2):
        lo, hi = lohi
        for extend in (lo - 1, hi + 1):
            if extend in extends2ranges:
                r = extends2ranges[extend]
                for r_old in (mnmx, mnmx2):
                    if r_old in r:
                        r.remove(r_old)
                if not r:
                    del extends2ranges[extend]
    # remove joining extend, this
    del extends2ranges[this]


def _test():
    for ints in[
            (),
            (69,),
            (-20, 78, 79, 1, 100),
            (4, 1, 3, 2),
            (10, 4, 20, 1, 3, 2),
            (0, 6, 1, 8, 5, 2, 4, 3, 0, 7),
            (10, 30, 20),
            (2,4,3,1,0, 10,12,11,8,9),  # two runs of five
            (10,12,11,8,9, 2,4,3,1,0),  # two runs of five - reversed
            (2,4,3,1,0,-1, 10,12,11,8,9),  # runs of 6 and 5
            (2,4,3,1,0, 10,12,11,8,9,7),   # runs of 5 and 6
            ]:
        print(f"Input {ints = }")
        longest, longest_mn, longest_mx = consec_seq2(ints)

        if longest <2:
            print("  -1")
        else:
            print(f"  The/A longest sequence has {longest} elements {longest_mn}..{longest_mx}")


# %%
if __name__ == '__main__':
    _test()


Its Output

 Input ints = ()
  -1
Input ints = (69,)
  -1
Input ints = (-20, 78, 79, 1, 100)
  The/A longest sequence has 2 elements 78..79
Input ints = (4, 1, 3, 2)
  The/A longest sequence has 4 elements 1..4
Input ints = (10, 4, 20, 1, 3, 2)
  The/A longest sequence has 4 elements 1..4
Input ints = (0, 6, 1, 8, 5, 2, 4, 3, 0, 7)
  The/A longest sequence has 9 elements 0..8
Input ints = (10, 30, 20)
  -1
Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9)
  The/A longest sequence has 5 elements 0..4
Input ints = (10, 12, 11, 8, 9, 2, 4, 3, 1, 0)
  The/A longest sequence has 5 elements 8..12
Input ints = (2, 4, 3, 1, 0, -1, 10, 12, 11, 8, 9)
  The/A longest sequence has 6 elements -1..4
Input ints = (2, 4, 3, 1, 0, 10, 12, 11, 8, 9, 7)
  The/A longest sequence has 6 elements 7..12


This second algorithm gives correct results but is harder to develop and explain. It's a testament to my stubbornness as I thought there was a solution there, and debugging was me flexing my skills to keep them honed.


END.