Mainly Tech projects on Python and Electronic Design Automation.

Tuesday, April 14, 2015

Pythonic matrix manipulation?

I was scanning Rosetta Code and came across a Python-2 example of Cholesky decomposition, a 2-D matrix manipulation that I thought could do with converting to Python 3.

On further investigation I noted that it was using "for i in range(len(some_list))" which is a code smell for me which prompts me to see if I can iterate over the elements rather than using an index.

Here is the original Python-2 code with minimal changes just to make it run under Python-3 but without changing any of the indexing and looping:

from pprint import pprint
from math import sqrt
def cholesky(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i in range(len(A)):
        for j in range(i+1):
            s = sum(L[i][k] * L[j][k] for k in range(j))
            L[i][j] = sqrt(A[i][i] - s) if (i == j) else \
                      (1.0 / L[j][j] * (A[i][j] - s))
    return L

The i and j loop indices are used in non-simple ways in the inner loops but I noticed that I might be able to factor out accesses to A[i], L[i], and L[j] by creating Ai, Li and Lj respectively, but would still need indices.

I came up with the following where I use enumerate to provide the indices, but I also iterate over the matrices A and L directly:

def cholesky2(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i, (Ai, Li) in enumerate(zip(A, L)):
        for j, Lj in enumerate(L[:i+1]):
            s = sum(Li[k] * Lj[k] for k in range(j))
            Li[j] = sqrt(Ai[i] - s) if (i == j) else \
                      (1.0 / Lj[j] * (Ai[j] - s))
    return L

Now cholesky2 is slightly faster and gives the same results, but I don't have any larger examples to test it on than are given on the RC page. If we say their execution speeds are comparable I would then ask myself which is more pythonic and which is easier to read?

Now for this matrix manipulation task I think that although the second uses more Python idioms I would probably think of the first as being more pythonic as it is likely to more closely follow any pseudo-code you might find for the algorithm i.e. having C/Fortran style indexing, and so be more maintainable.

How say you?

P.S. Although there is probably a numpy or whatever function for this but I want to consider Python code solutions only.

Addition at 14:04:2015 21:00: Further changes

A conversation with commenter Florian Philipp on Google plus lead to suggestions for additional changes  including the removal of more uses of range and removal of the conditional assignment by recognising that its if cause can be done after the inner loop and the inner loop changed to only need the else clause. The resultant is the following code:
from math import fsum
def cholesky3(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i, (Ai, Li) in enumerate(zip(A, L)):
        for j, Lj in enumerate(L[:i]):
            s = fsum(Lik * Ljk for Lik, Ljk in zip(Li, Lj[:j]))
            Li[j] = (1.0 / Lj[j] * (Ai[j] - s))
        s = fsum(Lik * Lik for Lik in Li[:i])
        Li[i] = sqrt(Ai[i] - s)
    return L

This third version might be the fastest, but by only a very small margin when calculating m3 so, again, you might want to assume that run time differences between the three are negligible and think of which you, as a seasoned Python programmer, think is more Pythonic.

No comments:

Post a Comment


Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

Blog Archive