Following this question on reddit, I decided to create a script to index a sublist within a list similar to the way index works for Python strings.
Python Strings
Strings are a concatenation of characters, (although there is no separate character type in Python - a one element string suffices). You can find a substring of characters within a string using the str.index method and the substring can be more than one character. If the substring is not present then ValueError is raised.
In [21]:
help(str.index)
In [22]:
s = 'abcabcd'
s.index('b')
Out[22]:
In [23]:
s.index('ca')
Out[23]:
In [24]:
s.index('x')
Python Lists
Single, individual values may be searched for within a list using the list.index method, but a sequence, or sublist of items cannot be searched for:
In [25]:
help(list.index)
In [26]:
x = [1, 2, 3, 4, 5]
x.index(3)
Out[26]:
Try looking for value 3 followed by value 4
In [27]:
x.index([3, 4])
But note:
In [28]:
y = [1, 2, [3, 4], 5]
y.index([3, 4])
Out[28]:
Note: the sublist [3,4] exists as a top-level element of y and so in index-able, but the value 3 is not
In [29]:
y.index(3)
The index-ing function
In [30]:
def indexsublist(sub, lst):
"""
Find a sublist in a list
"""
if type(sub) == list:
sublen = len(sub)
first = sub[0] if sub else []
else:
raise ValueError
indx = -1
while True:
try:
indx = lst.index(first, indx + 1)
except ValueError:
break
if sub == lst[indx: indx + sublen]:
return indx
raise ValueError
Give the function a sub-list together with a list to find it in and for every occurrence of the first item of the sub-list in the list it checks to see if the next items of the list are also the same from that position of the first values match. The while loop tries all matches of the first value of the sub-list until they are all tried or a match is found.
Tests
In [31]:
for x in [[1, 2, 3, 4, [], [4], 3, 4, 5, [1, 2], 3, 5, [1, 2]], ]:
for sublist in [[5, [1, 2]], [4, 5, [1, 2]], [3, 5, [1, 2]],
[[1, 2]], [[1, 2], 3], [[1, 2], 4], [], 4, [4], [[4]]]:
try:
ans = indexsublist(sublist, x)
except ValueError:
ans = '{ValueError}'
print('%12s == indexsublist(%14s, %s)' % (ans, sublist, x))
A Find all indices function
There isn't a method on str to find all indices where a substring occurs in a string, but is is a part of the re.module and I decided to augment the above function to have that functionality for lists.
In [32]:
def findall(sub, lst, overlap=True):
"""
Find all positions of sublist in a list
"""
if type(sub) == list:
sublen = len(sub)
first = sub[0] if sub else []
else:
raise ValueError
indices, indx = [], -1
while True:
try:
indx = lst.index(first, indx + 1)
except ValueError:
break
if sub == lst[indx: indx + sublen]:
indices.append(indx)
if not overlap:
indx += sublen - 1
return indices
List
indices
in function findall
just accumulates all the indices as it tries along the lst. Argument overlap
chooses between overlapping and nonoverlapping searches. The difference is seen in the last two test outputs for searches for [4, 4]
.Tests
In [33]:
for x in [[3, 4, [], [4], 3, 4, 5, [1, 2], 3, 5, [1, 2], 4, 4, 4, 4], ]:
for sublist in [[5, [1, 2]], [4, 5, [1, 2]], [3, 5, [1, 2]],
[[1, 2]], [[1, 2], 3], [[1, 2], 4], [], 4, [[4]], [4, 4]]:
try:
ans = findall(sublist, x)
except ValueError:
ans = ''
print('%12s == findall(%14s, %s)' % (ans, sublist, x))
for x in [[3, 4, [], [4], 3, 4, 5, [1, 2], 3, 5, [1, 2], 4, 4, 4, 4], ]:
for sublist in [[4, 4], ]:
try:
ans = findall(sublist, x, overlap=False)
except ValueError:
ans = '{ValueError}'
print('%12s == findall(%14s, %s, overlap=False)' % (ans, sublist, x))
Thanks for the thorough answer!
ReplyDelete