Mainly Tech projects on Python and Electronic Design Automation.

Friday, June 03, 2022

Python: Yield the first item of consecutive series of equal items, from an iterable.

 I was reading the source to the more_itertools.unique_unseen function. The function takes an iterator that may have regions where successive items are the same, and yields the item if next item is not the same as it.

itertools.groupby

Since the problem involves grouping, my first thought is to think of ways to apply itertools.groupby; but that is used in the more_itertools function reproduced below:

def unique_justseen(iterable, key=None):
    """Yields elements in order, ignoring serial duplicates

    >>> list(unique_justseen('AAAABBBCCDAABBB'))
    ['A', 'B', 'C', 'D', 'A', 'B']
    >>> list(unique_justseen('ABBCcAD', str.lower))
    ['A', 'B', 'C', 'A', 'D']

    """
    return map(next, map(operator.itemgetter(1), groupby(iterable, key)))

It groups, then extracts the first of each group, and looks to be a good, tight implementation. 

I decided to find another way.

Successive items method

I thought a bit then decided:

  • The first item of the iterable is always returned.
  • Other items are returned if they are not equal to the item before it.

You need the item and the item next to it to determine if that item is yielded. I quiclky toyed with thoughts of teeing and dropping one item from a tee'd branch then comparing items from the zip of the tees; but discarded that. I thought that I could use the walrus operator in a comprehension to keep track of the previous item, (called this below), and doodled this code:

it = iter('AAAABBBCCDAABBB')
print(this := next(it), end=' ')
print([(this := item) for item in it if item != this])
# Outputs: A ['B', 'C', 'D', 'A', 'B']


It seemed to work, and I set about creating my function.

My Function first_of_series

Looking at the more_itertools.unique_justseen function above, you should note that it has a key argument. The key argument is a function used to transmute the items of the iterable prior to comparison, much like its use in the sorted function. I decided to add that functionality too.


My final function is as follows:

def first_of_series(iterable, key=None):
    """
    Yield the first item of consecutive series of equal items from an iterable.
    Items have the key function applied for the comparison, if given.

    >>> list(first_of_series('AAAABBBCCDAABBB'))
    ['A', 'B', 'C', 'D', 'A', 'B']
    >>> list(first_of_series('ABBCcAD'))
    ['A', 'B', 'C', 'c', 'A', 'D']
    >>> list(first_of_series('ABBCcAD', key=str.lower))
    ['A', 'B', 'C', 'A', 'D']
    >>> list(first_of_series('ABBcCAD', key=str.lower))
    ['A', 'B', 'c', 'A', 'D']

    """
    it = iter(iterable) # Change iterable into an iterator!
    yield (this := next(it))
    if key is None:
        yield from (item for item in it if this != (this := item))
    else:
        this = key(this)
        yield from (item for item in it if this != (this := key(item)))

It works like this:

  1. Form an iterator from any input iterable so you can call next() on it.
  2. Set this to the first item from the iterable and also yield it.
  3. If there is no key:
    1. yield from a comprehension yielding items from the iterable but only if the last i.e this is not equal to the current, item from the list. 
    2. The walrus assignment updates this.
  4. Else if there is a key function:
    1. this is now always the value of the key function applied to items from the iterable.
    2. The comparison in the comprehension is always of values of the key function applied to items from the iterable; although the plain items are yielded.

I have to rely on the left-to-right order of evaluation for the if condition in the comprehensions to get the updating of variable this correct; and it's more verbose than the groupby version.


I did have fun creating it though, (and it reinforces my rule of using groupby when grouping).

No comments:

Post a Comment

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive