Wednesday, June 09, 2021

Python hack: Creating local variables in a comprehension

 Lets say you have data and want to create a list of the running sum of the data. For example if the data is [1,2,3] the running sum is [1, 3, 6]

Doing this outside a comprehension:

data = [1, 2, 3]

s = 0  # accumulator
sums = []

for x in data:
    s += x
    sums.append(s)
print(sums)  # [1, 3, 6]

Now this needs the accumulator s, initialised to zero, but comprehensions create their own local variables and the syntax does not allow you to write a simple assignment within it, and each item from the comprehension is stated first in its syntax .


The Hack

I'll just show the hack then work through it afterwards.

In [1]: data = [1, 2, 3]

In [2]: [s for s in [0] for x in data for s in [s + x]]
Out[2]: [1, 3, 6]


When converting a comprehension into  similar for statements then the output expression at the beginning of the comprehension is thought of as moving to inside the rightmost if or for section of the comprehension, so we get:

# Comprehension over many lines
[s                          # output expression
 for s in [0]               # For clauses (nested)
     for x in data
         for s in [s + x]]

# Is similar too...
for s in [0]:                   # For clauses (nested)
    for x in data:
        for s in [s + x]:
            print(s, end=' ')   # output expresion: 1 3 6


Explanation

In the comprehension, the initial

[s for s in [0] ...

says:

  • Individual items of the comprehension will be the expression s.
    (Remember the output expression is stated first , but from the environment at the right of the comprehension).
  • In the comprehensions local scope we use the one-entry outer for loop to set local s to zero.

The middle for loop of the comprehension just iterates over the data

The final for loop of the comprehension is special:

... for s in [s + x]]

s is set to itself plus the next item of data, x, using iteration over a one element list [s + x]:

  • For the first x, s was initialised to zero in the local scope via the outermost for.
  • s becomes 0 + data[0] in the inner loop and becomes the first output expression value, 1.
  • For the second iteration of the middle loop, x = data[1], so s then becomes 0 + data[0] + data[1]. The second evaluation of the the output expression for the comprehension, 3.
  • And so on...

Multiple local variables

 We can generalise this. Here we generate running sums, and running sums of the squares which needs two local variables s and s2:

In [3]: data = [1, 2, 3]

In [4]: [(s, s2) for s, s2 in [(0, 0)] for x in data for s, s2 in [(s + x, s2 + x**2)]]
Out[4]: [(1, 1), (3, 5), (6, 14)]

Summary

  1. You can satisfy the need for local variables in comprehensions.
  2. Its a hard to understand hack!
 

UPDATE: Added Walrus:

The walrus operator can now be used to give a more readable equivalent. 
This introduces the external initialised variable into the comprehension as well as  keeping the running sums.

In [5]: # We had:

In [6]: data = [1, 2, 3]

In [7]: [s for s in [0] for x in data for s in [s + x]]
Out[7]: [1, 3, 6]

In [9]: # With :=

In [10]: s = 0

In [11]: [s := (s + x) for x in data]
Out[11]: [1, 3, 6]

In [12]: 

In [13]: # We then had:

In [14]: del s

In [15]: [(s, s2) for s, s2 in [(0, 0)] for x in data for s, s2 in [(s + x, s2 + x**2)]]
Out[15]: [(1, 1), (3, 5), (6, 14)]

In [16]: # Which becomes:

In [17]: s = s2 = 0

In [18]: [(s := s + x, s2 := s2 + x**2) for x in data]
Out[18]: [(1, 1), (3, 5), (6, 14)]
(Those external variables s and s2 are updated to the last running sum).

Source

 This all came about because I re-read the "Whats new in Python 3.9" doc after upgrading Anaconda and came across code I couldn't initially fathom.

End.