Tuesday, April 14, 2015

Pythonic matrix manipulation?

I was scanning Rosetta Code and came across a Python-2 example of Cholesky decomposition, a 2-D matrix manipulation that I thought could do with converting to Python 3.

On further investigation I noted that it was using "for i in range(len(some_list))" which is a code smell for me which prompts me to see if I can iterate over the elements rather than using an index.

Here is the original Python-2 code with minimal changes just to make it run under Python-3 but without changing any of the indexing and looping:

from pprint import pprint
from math import sqrt
 
def cholesky(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i in range(len(A)):
        for j in range(i+1):
            s = sum(L[i][k] * L[j][k] for k in range(j))
            L[i][j] = sqrt(A[i][i] - s) if (i == j) else \
                      (1.0 / L[j][j] * (A[i][j] - s))
    return L

The i and j loop indices are used in non-simple ways in the inner loops but I noticed that I might be able to factor out accesses to A[i], L[i], and L[j] by creating Ai, Li and Lj respectively, but would still need indices.

I came up with the following where I use enumerate to provide the indices, but I also iterate over the matrices A and L directly:

def cholesky2(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i, (Ai, Li) in enumerate(zip(A, L)):
        for j, Lj in enumerate(L[:i+1]):
            s = sum(Li[k] * Lj[k] for k in range(j))
            Li[j] = sqrt(Ai[i] - s) if (i == j) else \
                      (1.0 / Lj[j] * (Ai[j] - s))
    return L


Now cholesky2 is slightly faster and gives the same results, but I don't have any larger examples to test it on than are given on the RC page. If we say their execution speeds are comparable I would then ask myself which is more pythonic and which is easier to read?

Now for this matrix manipulation task I think that although the second uses more Python idioms I would probably think of the first as being more pythonic as it is likely to more closely follow any pseudo-code you might find for the algorithm i.e. having C/Fortran style indexing, and so be more maintainable.

How say you?


P.S. Although there is probably a numpy or whatever function for this but I want to consider Python code solutions only.

Addition at 14:04:2015 21:00: Further changes

A conversation with commenter Florian Philipp on Google plus lead to suggestions for additional changes  including the removal of more uses of range and removal of the conditional assignment by recognising that its if cause can be done after the inner loop and the inner loop changed to only need the else clause. The resultant is the following code:
from math import fsum
 
def cholesky3(A):
    L = [[0.0] * len(A) for _ in range(len(A))]
    for i, (Ai, Li) in enumerate(zip(A, L)):
        for j, Lj in enumerate(L[:i]):
            s = fsum(Lik * Ljk for Lik, Ljk in zip(Li, Lj[:j]))
            Li[j] = (1.0 / Lj[j] * (Ai[j] - s))
        s = fsum(Lik * Lik for Lik in Li[:i])
        Li[i] = sqrt(Ai[i] - s)
    return L

This third version might be the fastest, but by only a very small margin when calculating m3 so, again, you might want to assume that run time differences between the three are negligible and think of which you, as a seasoned Python programmer, think is more Pythonic.

Tuesday, April 07, 2015

Wholistic options processing.

I have installed, configured and used many large software packages over time and have come to appreciate ideas on options handling and configuration from a mix of vendors that I think should become more widespread and supported by language libraries.

The best packages make their options settable in multiple ways and have a defined hierarchy and visibility of options - by visibility I mean whether an option is editable in some way.


  1. When installing the package some checks are done/questions answered that become options and values in an options file in the install area. Ideally this should also have package default values.
    I have only ever seen these types of files best done in windows .ini format, but I would think that YAML could also be used as it too is human readable/editable and supports comments (unlike JSON).
    Comments should precede every entry and section with enough detail to be understood by someone who has been on the packages introductory course and read the documentation or marked for advance use only.

    (Options should be of  a single case although their expression as upper/lower case may depend on context)
  2. The first time a program from the package is run it creates a $HOME/.package file of all user specific and user changeable options with their site specific defaults (copied over at that time and with comments). This will be a subset of the site options.
  3. When an interactive program from the package is run from within a run directory a subset of options will be editable , set and reset within the program. On good program exit a run directory .package file should be created with both the normal program options used but augmented with aspects of the programs state so that on re-running the program from the same directory things like GUI window positions or database connections that were filled in on the previous invocation might start from those previous values.
    (Warn when a program is run from the $HOME directory and ask before overwriting the $HOME/.package file).
  4. Some options will be able to be read from environment variables. Environment variable names should consist of a one-word prefix of the packages nick-name or name followed by the option name separated by a comma.
    When a package contains a large sub-package then that sub-package nick-name may form the prefix instead.
    Comments in the $HOME/.package file should state the environment variable name to use if an option is settable in the environment
  5. When invoking a program from the package the programs help should give help on all the options that are settable on the command line. In addition, those options settable on the command line should also appear in the $HOME/.package file and run directory .package file with the posix long format argument mentioned in the help text.
    One should allow for  a program argument short-form (single letter argument), that needs to appear in the programs help, but not necessarily in any .package file comments allowing for other package programs to reuse single character short options in what for them may be more idiomatic fashion.
    Default values need to reflect values already set in the options processing mentioned in 6.1 to 6.3.
  6. There should be a clear hierarchy of options processing:
    1. Process the site .package file
    2. Process the run directory .package file if it exists, else if it does not exist then process the home directory .package file.
    3. Process environment variable options.
    4. Process command line options.
    5. Process options changes within a program.
  7. Options processing should also check to ensure that given options are allowed to be changed/viewed where stated.
  8. Options processing should impose a hierarchy whereby an option set later overrides values set earlier in the 6.1 to 6.5 stages mentioned.
I don't know of open source options handling libraries that are this complete as most seem to handle command line processing and a few might integrate handling a single .package file.