Friday, February 09, 2007

unzip un-needed in Python

Someone blogged about Python not having an unzip function to go with zip().
unzip is straight-forward to calculate because:

>>> t1 = (0,1,2,3)
>>> t2 = (7,6,5,4)
>>> [t1,t2] == zip(*zip(t1,t2))
True


Explanation


In answer to a commentator, I have written a (large), program to explain the above.
unzip_explained.py:

'''
Explanation of unzip expression zip(*zip(A,B))

References:
1: Unpacking argumment lists
http://www.network-theory.co.uk/docs/pytut/UnpackingArgumentLists.html
2: Zip
>>> help(zip)
Help on built-in function zip in module __builtin__:

zip(...)
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences. The returned list is truncated
in length to the length of the shortest argument sequence.


'''

def show_args(*positional, **kwargs):
"Straight-forward function to show its arguments"
n = 0
for p in positional:
print " positional argument", n, "is", p
n += 1
for k,v in sorted(kwargs.items()):
print " keyword argument", k, "is", v

A = tuple( "A%i" % n for n in range(3) )
print "\n\nTuple A is:"; print " ", A
B = tuple( "B%i" % n for n in range(3) )
print "Tuple B is:"; print " ", B

print "\nLets go slowly through the expression: [A,B] == zip(*zip(A,B))\n"

print "List [A,B] is:"
print " ", [A,B]
print "zip(A,B) has arguments:"; show_args(A,B)
print "zip(A,B) returns:"
print " ", zip(A,B)
print "The leftmost zip in zip(*zip(A,B)), due"
print " to the 'list unpacking' of the previous"
print " value has arguments of:"; show_args(*zip(A,B))
print "The outer zip therefore returns:"
print " ", zip(*zip(A,B))
print "Which is the same as [A,B]\n"

And here is the program output:

Tuple A is:
('A0', 'A1', 'A2')
Tuple B is:
('B0', 'B1', 'B2')

Lets go slowly through the expression: [A,B] == zip(*zip(A,B))

List [A,B] is:
[('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]
zip(A,B) has arguments:
positional argument 0 is ('A0', 'A1', 'A2')
positional argument 1 is ('B0', 'B1', 'B2')
zip(A,B) returns:
[('A0', 'B0'), ('A1', 'B1'), ('A2', 'B2')]
The leftmost zip in zip(*zip(A,B)), due
to the 'list unpacking' of the previous
value has arguments of:
positional argument 0 is ('A0', 'B0')
positional argument 1 is ('A1', 'B1')
positional argument 2 is ('A2', 'B2')
The outer zip therefore returns:
[('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]
Which is the same as [A,B]

16 comments:

  1. What happens in the third line, can you explain that in more details?

    ReplyDelete
  2. @repei

    >>> zip(*zip(A,B)) == [A, B]
    True
    >>> zip(*zip(*zip(A,B))) == zip(A, B)
    True

    zip is unzip is zip. This is a bit weird to think about.

    Lisp hacks seem to consider this obvious, but us mere mortals sometimes need a little help. Thanks for a fantastic post!

    I considered having this as one of the problems for the Python Lab at PyCon, but decided it was too much of a 'trick' problem.

    ReplyDelete
  3. This property of the zip function in Python is only true if you zip tuples and not lists.

    >>> A = [1,2,3]
    >>> B = [4,5,6]
    >>> zip(*zip(A,B)) == [A,B]
    False

    When I give zip a tuple of lists, I want a tuple of lists back from unzip:

    >>> def unzip(a):
    .......return tuple(map(list,zip(*a)))
    >>> unzip(zip(A,B)) == (A,B)
    True
    >>> zip(*unzip(zip(A,B))) == zip(A,B)
    True

    ReplyDelete
  4. Now that you mention it, it's obvious! Even more so if you think of zip as (a truncating) matrix transpose.

    ReplyDelete
  5. I just didn't know about unpacking function arguments by '*'. It all became clear for me now. Thank you.

    ReplyDelete
  6. That is really funny... how could anyone use Python's zip() function and NOT realize that it is its own inverse???

    Good to see another Python/EDA hacker out there. I'm in the same boat... Pythonista and a bit of a hardware hacker. I've got some random utilities up at http://tonquil.homeip.net/~dlenski

    ReplyDelete
  7. LOL. That's awesome. I've never thought about it. <3 self-composable functions in python library.

    ReplyDelete
  8. Unfortunately there is quite a low limit on how many values you can unpack, so this method is only applicable to small lists.

    ReplyDelete
  9. Uncloak and state this limit you found.

    ReplyDelete
  10. I wrote a simple unzip in Python using list comprehensions.

    def unzip(lst):
    if lst == []:
    return ()
    else:
    return ([x[0] for x in lst], [x[1] for x in lst])

    Slightly inefficient because of the two loops over the same list, but I like the elegancy.

    ReplyDelete
  11. Hi anonymous, That may be good for two lists, but what about three or more?

    >>> t1 = (0,1,2,3)
    >>> t2 = (7,6,5,4)
    >>> t3 = (2,4,6,8)
    >>> t4 = (7,5,3,1)
    >>> z1,z2,z3,z4 = zip(*zip(t1,t2,t3,t4))
    >>> (t1,t2,t3,t4) == (z1,z2,z3,z4)
    True

    - Paddy.

    ReplyDelete
  12. Very enlightening. Thanks!

    My initial thought was that this wouldn't work for large lists as well, but size doesn't seem to be a problem.

    def check(x):
         a = range(0,x,1)
         b = range(0,0-x,-1)
         t1 = time.time()
         c = zip(a,b)
         t2 = time.time()
         a2,b2 = zip(*c)
         t3 = time.time()
         assert(a == list(a2))
         assert(b == list(b2))
         print "zip:%0.2fs unzip:%0.2fs"%(t2-t1, t3-t2)

    check( 100000) = zip:0.02s unzip:0.02s
    check( 1000000) = zip:0.17s unzip:0.83s
    check( 10000000) = zip:2.11s unzip:13.63s
    check(100000000) = MemoryError

    ReplyDelete
  13. Doing zip(*somelistoftuples) is not an inverse of zip. For example, if it were then the following would be true:

    somelist = zip(range(5), "word1 word2 word3 word4 word5".split())
    somelist == zip(*zip(somelist))

    And it most definitely is not.

    ReplyDelete
    Replies
    1. Oh good. 'Cos that's not what I state in the original t1 and t2 example of the article.
      :-)

      Delete