Sunday, March 31, 2024

Finding a sub-list within a list, in Python

    a left-handed Jamaican woman finding a sublist in a list at a beach-front home-office

 

Existing?

 As part of a larger project, I thought I might need to search for a sub-list within a given list, and because I am lazy i did a quick google and did not like the answers I found.I started with the thought that the best algorithm for me would be to start searching from the index of the first item in the sublist and so on, but none of the googled answers used list.index.

I decided then to create my own 

My version

Well I want to use list.index. If the item is not in the list then it raises an error, so I'll need a try-except block too.

I look for successive first item from the sub-list in the list and if found, accumulate the index in the answer and move on to search for the next match.

It seemed easy to add flags to:

  1. Stop after finding a first index of the sub-list in the list.
  2. Allow overlapping matches  or not. [1,0,1] is found twice in [1,0,1,0,1] at indices 0 and 2, but only once if overlapping is not allowed
#!/bin/env python3
#%%
from typing import Any


"""
Find instance of a sub-list in a list
"""

def index_sublist(lst: list[Any],
                  sublst: list[Any],
                  only_first: bool=False,
                  non_overlapping=False,
                  ) -> list[int]:
    "Find instance of a (non-empty), sub-list in a list"
    if not sublst:
        raise ValueError("Empty sub-list")
    if not lst:
        return []
   
    first, ln = sublst[0], len(sublst)
   
    if len(lst) < ln:  # Reddit: check sizes
        return []
   
    ans, i = [], 0
    while True:
        try:
            i = lst.index(first, i)
        except ValueError:
            break
        if lst[i: i+ln] == sublst:
            ans.append(i)
            if only_first:  #Reddit: Only_first calc now fixed
                break
        i += ln if non_overlapping else 1
   
    return ans

#%%
def test():
    assert index_sublist([], [1], only_first=False) == []
    assert index_sublist([1], [1], only_first=False) == [0]
    assert index_sublist([1,0,1], [1], only_first=False) == [0, 2]
    assert index_sublist([2,1,0,1], [1], only_first=True) == [1]
    assert index_sublist([2,1,0,1], [1, 3], only_first=False) == []
   
    assert index_sublist([1,0,1,0,1], [1,0,1],
                         only_first=False,
                         non_overlapping=False) == [0, 2]
    assert index_sublist([1,0,1,0,1], [1,0,1],
                         only_first=False,
                         non_overlapping=True) == [0]

    # Reddit: Extra checks
    assert index_sublist([2,1,0,1,2,1,2], [1, 2],
                         only_first=True) == [3]
    assert index_sublist([2,1,0,1,2,1,2], [1, 2],
                         only_first=False) == [3,5]


#%%
if __name__ == '__main__':
    test()

# %%

STOP PRESS!

A Reddit commenter spotted a bug that I fixed and added a couple of extra tests for.

End.