Thursday, June 02, 2022

Python: My nwise compared to more_itertools.sliding_window

In my last post I evolved two n-wise functions from the itertools.pairwise code to create functions that will generate a sliding window of arbitrary width from an input iterator.

Somone commented on reddit about my blog stating:

"Is there a reason why not just do it as they do it in more-itertools?"

Then gave the source from the more itertools site that I extract below:

from collections import deque, Counter
from itertools import islice
from timeit import Timer
import pandas as pd

def sliding_window(iterable, n):
    """Return a sliding window of width *n* over *iterable*.

        >>> list(sliding_window(range(6), 4))
        [(0, 1, 2, 3), (1, 2, 3, 4), (2, 3, 4, 5)]

    If *iterable* has fewer than *n* items, then nothing is yielded:

        >>> list(sliding_window(range(3), 4))
        []

    For a variant with more features, see :func:`windowed`.
    """
    it = iter(iterable)
    window = deque(islice(it, n), maxlen=n)
    if len(window) == n:
        yield tuple(window)
    for x in it:
        window.append(x)
        yield tuple(window)

Visual Comparison

I'll show my nwise function for comparison:

def nwise(iterable, width):
    tees = tee(iterable, width)
    skewed = [([next(t) for _ in range(position)], t)[1]
              for position, t in enumerate(tees)]
    return zip(*skewed)

 

sliding_window is nicely commented as it is in a public library; nwise is written to be read as part of its original blog post.

It's up to the reader to judge which code is more understandable. more_itertools does not have a blog post to help understand the code. nwise isn't packaged as a module - its a coding example.

Timing Comparison

 Yes, well, the timing methodology is set to compare run times over a range of inputs iterable sizes and output window widths. A particular application might benefit from more specific timings.

I added the following timing code:

data = dict(items=[], window=[], func=[], time=[])
fastest = []
for n in range(1,7,2):
    items = range(10**n)
    print()
    reps = None
    for w in range(9):
        wdth  = 2**w
        if wdth <= 10**n:
            print()
            func_speeds = []
            for func in (nwise, nwise_squashed, sliding_window):
                data['items'].append(len(items))
                data['window'].append(wdth)
                data['func'].append(func.__name__)
                cmd = f"list({func.__name__:<15}({str(items):<12}, {wdth:10_}))"
                print(cmd, end=' # ')
                timer = Timer(cmd, globals=globals())
                if reps is None:
                    reps, secs = timer.autorange()
                    if reps <3:
                        reps = 3
                secs = min(timer.repeat(3, reps)) # min of 3 runs of timeit(number=reps)
                data['time'].append(secs)
                print("%1.2fs, %i reps" % (secs, reps))
                func_speeds.append((secs, func.__name__))

            fastest.append(min(func_speeds)[1])

print("\nFunction most fast for a particular set of arguments:")
print("   ", Counter(fastest).most_common())



Generated Timings:

list(nwise          (range(0, 10),          1)) # 0.20s, 100000 reps
list(nwise_squashed (range(0, 10),          1)) # 0.22s, 100000 reps
list(sliding_window (range(0, 10),          1)) # 0.30s, 100000 reps

list(nwise          (range(0, 10),          2)) # 0.27s, 100000 reps
list(nwise_squashed (range(0, 10),          2)) # 0.29s, 100000 reps
list(sliding_window (range(0, 10),          2)) # 0.29s, 100000 reps

list(nwise          (range(0, 10),          4)) # 0.40s, 100000 reps
list(nwise_squashed (range(0, 10),          4)) # 0.43s, 100000 reps
list(sliding_window (range(0, 10),          4)) # 0.25s, 100000 reps

list(nwise          (range(0, 10),          8)) # 0.76s, 100000 reps
list(nwise_squashed (range(0, 10),          8)) # 0.78s, 100000 reps
list(sliding_window (range(0, 10),          8)) # 0.20s, 100000 reps


list(nwise          (range(0, 1000),          1)) # 0.21s, 5000 reps
list(nwise_squashed (range(0, 1000),          1)) # 0.21s, 5000 reps
list(sliding_window (range(0, 1000),          1)) # 0.90s, 5000 reps

list(nwise          (range(0, 1000),          2)) # 0.23s, 5000 reps
list(nwise_squashed (range(0, 1000),          2)) # 0.23s, 5000 reps
list(sliding_window (range(0, 1000),          2)) # 0.94s, 5000 reps

list(nwise          (range(0, 1000),          4)) # 0.26s, 5000 reps
list(nwise_squashed (range(0, 1000),          4)) # 0.28s, 5000 reps
list(sliding_window (range(0, 1000),          4)) # 0.96s, 5000 reps

list(nwise          (range(0, 1000),          8)) # 0.35s, 5000 reps
list(nwise_squashed (range(0, 1000),          8)) # 0.36s, 5000 reps
list(sliding_window (range(0, 1000),          8)) # 1.03s, 5000 reps

list(nwise          (range(0, 1000),         16)) # 0.56s, 5000 reps
list(nwise_squashed (range(0, 1000),         16)) # 0.56s, 5000 reps
list(sliding_window (range(0, 1000),         16)) # 1.23s, 5000 reps

list(nwise          (range(0, 1000),         32)) # 1.19s, 5000 reps
list(nwise_squashed (range(0, 1000),         32)) # 1.20s, 5000 reps
list(sliding_window (range(0, 1000),         32)) # 1.57s, 5000 reps

list(nwise          (range(0, 1000),         64)) # 2.69s, 5000 reps
list(nwise_squashed (range(0, 1000),         64)) # 2.85s, 5000 reps
list(sliding_window (range(0, 1000),         64)) # 2.41s, 5000 reps

list(nwise          (range(0, 1000),        128)) # 5.74s, 5000 reps
list(nwise_squashed (range(0, 1000),        128)) # 5.80s, 5000 reps
list(sliding_window (range(0, 1000),        128)) # 3.35s, 5000 reps

list(nwise          (range(0, 1000),        256)) # 15.83s, 5000 reps
list(nwise_squashed (range(0, 1000),        256)) # 15.87s, 5000 reps
list(sliding_window (range(0, 1000),        256)) # 4.58s, 5000 reps


list(nwise          (range(0, 100000),          1)) # 0.30s, 50 reps
list(nwise_squashed (range(0, 100000),          1)) # 0.30s, 50 reps
list(sliding_window (range(0, 100000),          1)) # 1.04s, 50 reps

list(nwise          (range(0, 100000),          2)) # 0.34s, 50 reps
list(nwise_squashed (range(0, 100000),          2)) # 0.33s, 50 reps
list(sliding_window (range(0, 100000),          2)) # 1.05s, 50 reps

list(nwise          (range(0, 100000),          4)) # 0.36s, 50 reps
list(nwise_squashed (range(0, 100000),          4)) # 0.37s, 50 reps
list(sliding_window (range(0, 100000),          4)) # 1.09s, 50 reps

list(nwise          (range(0, 100000),          8)) # 0.44s, 50 reps
list(nwise_squashed (range(0, 100000),          8)) # 0.46s, 50 reps
list(sliding_window (range(0, 100000),          8)) # 1.22s, 50 reps

list(nwise          (range(0, 100000),         16)) # 0.64s, 50 reps
list(nwise_squashed (range(0, 100000),         16)) # 0.64s, 50 reps
list(sliding_window (range(0, 100000),         16)) # 1.39s, 50 reps

list(nwise          (range(0, 100000),         32)) # 1.13s, 50 reps
list(nwise_squashed (range(0, 100000),         32)) # 1.13s, 50 reps
list(sliding_window (range(0, 100000),         32)) # 1.83s, 50 reps

list(nwise          (range(0, 100000),         64)) # 2.67s, 50 reps
list(nwise_squashed (range(0, 100000),         64)) # 2.64s, 50 reps
list(sliding_window (range(0, 100000),         64)) # 3.26s, 50 reps

list(nwise          (range(0, 100000),        128)) # 4.47s, 50 reps
list(nwise_squashed (range(0, 100000),        128)) # 4.38s, 50 reps
list(sliding_window (range(0, 100000),        128)) # 4.88s, 50 reps

list(nwise          (range(0, 100000),        256)) # 8.70s, 50 reps
list(nwise_squashed (range(0, 100000),        256)) # 8.57s, 50 reps
list(sliding_window (range(0, 100000),        256)) # 8.55s, 50 reps
 
Function most fast for a particular set of arguments:
    [('nwise', 11), ('sliding_window', 6), ('nwise_squashed', 5)]

  • sliding_window is only fastest 6 out of the 22 runs of the groups of three functions running with the same arguments. 
  • Either the nwise, or nwise_squashed algorithms are fastest most often, and with similar times between these two.


END.






No comments:

Post a Comment