unzip is straight-forward to calculate because:

>>> t1 = (0,1,2,3)

>>> t2 = (7,6,5,4)

>>> [t1,t2] == zip(*zip(t1,t2))

True

### Explanation

In answer to a commentator, I have written a (large), program to explain the above.

unzip_explained.py:

'''

Explanation of unzip expression zip(*zip(A,B))

References:

1: Unpacking argumment lists

http://www.network-theory.co.uk/docs/pytut/UnpackingArgumentLists.html

2: Zip

>>> help(zip)

Help on built-in function zip in module __builtin__:

zip(...)

zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

Return a list of tuples, where each tuple contains the i-th element

from each of the argument sequences. The returned list is truncated

in length to the length of the shortest argument sequence.

'''defshow_args(*positional, **kwargs):

"Straight-forward function to show its arguments"

n = 0

forpinpositional:

n += 1

fork,vinsorted(kwargs.items()):

A = tuple( "A%i" % nforninrange(3) )

B = tuple( "B%i" % nforninrange(3) )

And here is the program output:

Tuple A is:

('A0', 'A1', 'A2')

Tuple B is:

('B0', 'B1', 'B2')

Lets go slowly through the expression: [A,B] == zip(*zip(A,B))

List [A,B] is:

[('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]

zip(A,B) has arguments:

positional argument 0 is ('A0', 'A1', 'A2')

positional argument 1 is ('B0', 'B1', 'B2')

zip(A,B) returns:

[('A0', 'B0'), ('A1', 'B1'), ('A2', 'B2')]

The leftmost zip in zip(*zip(A,B)), due

to the 'list unpacking' of the previous

value has arguments of:

positional argument 0 is ('A0', 'B0')

positional argument 1 is ('A1', 'B1')

positional argument 2 is ('A2', 'B2')

The outer zip therefore returns:

[('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]

Which is the same as [A,B]

What happens in the third line, can you explain that in more details?

ReplyDeleteCool!

ReplyDelete@repei

ReplyDelete>>> zip(*zip(A,B)) == [A, B]

True

>>> zip(*zip(*zip(A,B))) == zip(A, B)

True

zip is unzip is zip. This is a bit weird to think about.

Lisp hacks seem to consider this obvious, but us mere mortals sometimes need a little help. Thanks for a fantastic post!

I considered having this as one of the problems for the Python Lab at PyCon, but decided it was too much of a 'trick' problem.

This property of the zip function in Python is only true if you zip tuples and not lists.

ReplyDelete>>> A = [1,2,3]

>>> B = [4,5,6]

>>> zip(*zip(A,B)) == [A,B]

False

When I give zip a tuple of lists, I want a tuple of lists back from unzip:

>>> def unzip(a):

.......return tuple(map(list,zip(*a)))

>>> unzip(zip(A,B)) == (A,B)

True

>>> zip(*unzip(zip(A,B))) == zip(A,B)

True

Now that you mention it, it's obvious! Even more so if you think of zip as (a truncating) matrix transpose.

ReplyDeleteI just didn't know about unpacking function arguments by '*'. It all became clear for me now. Thank you.

ReplyDeleteThat is really funny... how could anyone use Python's zip() function and NOT realize that it is its own inverse???

ReplyDeleteGood to see another Python/EDA hacker out there. I'm in the same boat... Pythonista and a bit of a hardware hacker. I've got some random utilities up at http://tonquil.homeip.net/~dlenski

LOL. That's awesome. I've never thought about it. <3 self-composable functions in python library.

ReplyDeleteUnfortunately there is quite a low limit on how many values you can unpack, so this method is only applicable to small lists.

ReplyDeleteUncloak and state this limit you found.

ReplyDeletenice

ReplyDeleteI wrote a simple unzip in Python using list comprehensions.

ReplyDeletedef unzip(lst):

if lst == []:

return ()

else:

return ([x[0] for x in lst], [x[1] for x in lst])

Slightly inefficient because of the two loops over the same list, but I like the elegancy.

Hi anonymous, That may be good for two lists, but what about three or more?

ReplyDelete>>> t1 = (0,1,2,3)

>>> t2 = (7,6,5,4)

>>> t3 = (2,4,6,8)

>>> t4 = (7,5,3,1)

>>> z1,z2,z3,z4 = zip(*zip(t1,t2,t3,t4))

>>> (t1,t2,t3,t4) == (z1,z2,z3,z4)

True

- Paddy.

Very enlightening. Thanks!

ReplyDeleteMy initial thought was that this wouldn't work for large lists as well, but size doesn't seem to be a problem.

def check(x):

a = range(0,x,1)

b = range(0,0-x,-1)

t1 = time.time()

c = zip(a,b)

t2 = time.time()

a2,b2 = zip(*c)

t3 = time.time()

assert(a == list(a2))

assert(b == list(b2))

print "zip:%0.2fs unzip:%0.2fs"%(t2-t1, t3-t2)

check( 100000) = zip:0.02s unzip:0.02s

check( 1000000) = zip:0.17s unzip:0.83s

check( 10000000) = zip:2.11s unzip:13.63s

check(100000000) = MemoryError

Doing zip(*somelistoftuples) is not an inverse of zip. For example, if it were then the following would be true:

ReplyDeletesomelist = zip(range(5), "word1 word2 word3 word4 word5".split())

somelist == zip(*zip(somelist))

And it most definitely is not.

Oh good. 'Cos that's not what I state in the original t1 and t2 example of the article.

Delete:-)