Go deh!: That grouping by zip/iter trick explained

Wednesday, December 05, 2012

That grouping by zip/iter trick explained

In Python 2.7 you can get the following neat chunking of items in a list through application of zip and iter:

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> items, chunk = [1,2,3,4,5,6,7,8,9], 3
>>> zip(*[iter(items)]*chunk)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
>>>

It is not as neat in Python 3.X as you have to wrap the expression in list(..) to see the final list.

I thought I had to think hard about what was going on here so here is my annottated Python 3.x session that explains what is going on:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> from pprint import pprint as pp
>>> items, chunk = [1,2,3,4,5,6,7,8,9], 3
>>> items
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> chunk
3
>>> # A quick recap of what an iterator does 
>>> i = iter(items)
>>> i.__next__()
1
>>> i.__next__()
2
>>> i.__next__()
3
>>> i.__next__()
4
>>> i.__next__()
5
>>> i.__next__()
6
>>> i.__next__()
7
>>> i.__next__()
8
>>> i.__next__()
9
>>> i.__next__()
Traceback (most recent call last):
  File "", line 1, in 
    i.__next__()
StopIteration
>>>
>>> # Three copies of the same iterator
>>> threei = [iter(items)] * chunk
>>> pp(threei)
[object at 0x0000000003259D30>,
 object at 0x0000000003259D30>,
 object at 0x0000000003259D30>]
>>> # (notice that the addresses above are the same)
>>>
>>> # Lets break-out the three copies
>>> i, j, k = threei
>>> i
object at 0x0000000003259D30>
>>> j
object at 0x0000000003259D30>
>>> k
object at 0x0000000003259D30>
>>> # i,j and k are the same iterator
>>> i.__next__()
1
>>> j.__next__()
2
>>> k.__next__()
3
>>> # ...
>>>
>>> # Lets try that again with zip:
>>> i, j, k = threei = [iter(items)] * chunk
>>> list(zip(i, j, k))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
>>> # zip pulls from i, j, k once to form its first tuple of the list
>>> # then pulls from i, j, k again to form its second tuple of the list
>>> # and so on ...
>>> # Because i,j, and k are names for the same iterator we get the
>>> # given grouping action as output.
>>>
>>> # We could use *threei instead of i,j,k as the arguments to zip:
>>> i, j, k = threei = [iter(items)] * chunk
>>> list(zip(*threei))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
>>>
>>> # Or we could use the expression generating threei:
>>> i, j, k = threei = [iter(items)] * chunk
>>> list(zip(*[iter(items)] * chunk))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
>>> # But note that we are relying on the unary '*' operator
>>> # binding less strongly than binary '*'
>>>

For me it remains a curiosity as I doubt I will have a use for its functionality. (List-size must be a multiple of chunk-size; its functionality is obscure).

12 comments:

UnknownWed Dec 05, 11:15:00 am
You could do the same thing with the numpy.reshape function:

items = [1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk = 3
numpy.reshape(items, (len(items)/chunk, chunk))

ReplyDelete
Replies
UnknownWed Dec 05, 12:11:00 pm
As leonardo implies by his chunk size of 3 there are many datasets where this is quite useful. Take 3D coordinates for example:

coords = [1,2,3,4,5,6,7,8,9]
for x,y,z in zip(*[iter(coords)]*3):
print x,y,z

1 2 3
4 5 6
7 8 9
ReplyDelete
Replies
AnonymousWed Dec 05, 06:34:00 pm
You can also make make this a generator (consuming an iterator instead of a list) by using the itertools module:

from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

This also handles the case where the iterable isn't a multiple of n. There are a number of recipes in the itertools module that will really stretch your mind. :)
ReplyDelete
Replies
Paddy3118Wed Dec 05, 07:41:00 pm
Hi Brian, Naveen, Leonardo; I still think that this is no more than a trick, or at best it may become a design pattern but if it is of use then it should be replaced by an itertools function that has the functionality without having to worry about the awful code within it.It is mech worse than, say, when we had to use dict keys as sets; or the explicit decorate-sort-undecorate pattern.
ReplyDelete
Replies
AnonymousThu Dec 06, 05:55:00 pm
I enjoyed reading about this way of using iterators with zip. Thanks!

I thought it would be worth mentioning that for practical use, there are other worthy solutions to the chunking problem. There are two downsides to the iter/zip approach. First, it's hard to understand, and second, if the number of items is not a multiple of the chunk size, not all items appear in the output. (The latter will be fine in some situations, and undesirable in others.)

This solution is significantly easier to understand, and outputs every input item. When the items are not multiple of the chunk size, the output groups are as similar in size as possible:

>>> seq = [1,2,3,4,5,6,7,8,9,10]
>>> [seq[i::num] for i in range(num)]
[[1, 5, 9], [2, 6, 10], [3, 7], [4, 8]]

There's a major discussion of this on my blog at http://www.garyrobinson.net/2008/04/splitting-a-pyt.html. I'm going to update that post to also mention the zip/iter solution.

ReplyDelete
Replies
UnknownThu Dec 06, 10:14:00 pm
Lol that's an absolutely retarded way of doing it.

>>> items, chunk = [1,2,3,4,5,6,7,8,9], 3
>>> items, chunk = [1,2,3,4,5,6,7,8,9,10], 3
>>> zip(*[iter(items)]*chunk)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]

You should be using this instead;

>>> split_list = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
>>> split_list(items, chunk)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
>>>
ReplyDelete
Replies

Add comment

Go deh!

Wednesday, December 05, 2012

That grouping by zip/iter trick explained

12 comments:

About Me

Followers

Subscribe Now: google

Go deh too!

whos.amung.us

Blog Archive