Go deh!

Sunday, February 25, 2007

Pythons Function Nested Parameter Definitions

(And I don't mean nested function definitions).

Whilst reading PEP3107 on what's coming this year in Python 3000 I came across this section where it mentions nested parameters in function definitions.
I could not emember this feature so googled a bit, and still could not find it.
It was time to suck-it-and-see, so using Python2.5 I did:


>>> def x ((p0, p1), p2):
...     return p0,p1,p2
...
>>> x(('Does', 'this'), 'work')
('Does', 'this', 'work')
>>>

My current skunk-works project extracts data from VHDL source files as a list of nested tuples. I use each outer tuple as a record but unpack the record manually at each function call. If I used the above then the function parameter definition would reflect the record structure.

On closer inspection of the Python Function definition grammar, I find that nested parameters are represented by the sublist token.

Neat!

- Paddy.

Friday, February 09, 2007

unzip un-needed in Python

Someone blogged about Python not having an unzip function to go with zip().
unzip is straight-forward to calculate because:


>>> t1 = (0,1,2,3)
>>> t2 = (7,6,5,4)
>>> [t1,t2] == zip(*zip(t1,t2))
True

Explanation

In answer to a commentator, I have written a (large), program to explain the above.
unzip_explained.py:


'''
Explanation of unzip expression zip(*zip(A,B))

References:
 1: Unpacking argumment lists
    http://www.network-theory.co.uk/docs/pytut/UnpackingArgumentLists.html
 2: Zip
    >>> help(zip)
    Help on built-in function zip in module __builtin__:

    zip(...)
        zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
        
        Return a list of tuples, where each tuple contains the i-th element
        from each of the argument sequences.  The returned list is truncated
        in length to the length of the shortest argument sequence.


'''

def show_args(*positional, **kwargs):
  "Straight-forward function to show its arguments"
  n = 0
  for p in positional:
    print "    positional argument", n, "is", p
    n += 1
  for k,v in sorted(kwargs.items()):
    print "    keyword argument", k, "is", v

A = tuple( "A%i" % n for n in range(3) )
print "\n\nTuple A is:"; print "   ", A
B = tuple( "B%i" % n for n in range(3) )
print "Tuple B is:"; print "   ", B

print "\nLets go slowly through the expression: [A,B] == zip(*zip(A,B))\n"

print "List [A,B] is:"
print "   ", [A,B]
print "zip(A,B) has arguments:"; show_args(A,B)
print "zip(A,B) returns:"
print "   ", zip(A,B)
print "The leftmost zip in zip(*zip(A,B)), due"
print " to the 'list unpacking' of the previous"
print " value has arguments of:"; show_args(*zip(A,B))
print "The outer zip therefore returns:"
print "   ", zip(*zip(A,B))
print "Which is the same as [A,B]\n"

And here is the program output:


Tuple A is:
    ('A0', 'A1', 'A2')
Tuple B is:
    ('B0', 'B1', 'B2')

Lets go slowly through the expression: [A,B] == zip(*zip(A,B))

List [A,B] is:
    [('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]
zip(A,B) has arguments:
    positional argument 0 is ('A0', 'A1', 'A2')
    positional argument 1 is ('B0', 'B1', 'B2')
zip(A,B) returns:
    [('A0', 'B0'), ('A1', 'B1'), ('A2', 'B2')]
The leftmost zip in zip(*zip(A,B)), due
 to the 'list unpacking' of the previous
 value has arguments of:
    positional argument 0 is ('A0', 'B0')
    positional argument 1 is ('A1', 'B1')
    positional argument 2 is ('A2', 'B2')
The outer zip therefore returns:
    [('A0', 'A1', 'A2'), ('B0', 'B1', 'B2')]
Which is the same as [A,B]

Saturday, January 20, 2007

Python example: from 1997 to now.

I stumbled on an old site: My Programming Language Crisis of ~1997.
There is a data-mining example, complete with sample input data and 'golden output' called invert:

Stdin consists of lines of two tab-separated fields. Call the first field A and the second field B.
The program must gather like values of B together, collecting the A's that go with them.
The output of the program must be lines consisting of an arbitrary number of tab-separated fields; the first field of each line is a unique B; all subsequent fields of that line are A's that were associated with that B in the input. There will be as many output lines as there were unique B's in the input. The output lines must be sorted on the first field (B's) and all subsequent fields of each line (each B's As) must be sorted.

...For example, suppose you wanted to gather together all the unique URLs in a set of web pages in order to validate them efficiently. Your report needs to be able to associate dead URLs with the files they were originally found in. Thus the A's in the input are filenames and the B's are URLs.

The Python program of its day brought back memories and I of course had to re-write it using some of the later additions to the language.

Things I used includes:

The use of the fileinput module to do looping over lines of input.
split as a string method.
The default split action of splitting on white space.
(A little dicey but input fields don't include spaces).
The setdefault dict method for providing an empty list when necessary, so the try...except block can go.
The new sorted function that returns a sorted value.
list unpacking of kv into k,v in the output for loop.
The join method to insert tab separators into the ordered list of output fields.

Overall I'd say that Python has become more clear over the years.

The prog:

# invert benchmark in Python
# see http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
# This version by Donald 'Paddy' McCarthy

import fileinput

B = {}
for line in fileinput.input():
   fields = line.split()
   B.setdefault(fields[1], []).append(fields[0])

# In-memory data sort
# in-place sorting of values in key-value pairs
kv = sorted(B.items())
for k,v in kv:
   v.sort()
   print "\t".join([k]+v)


'''

## Here follows the original (1997?), program

#! /depot/python/arch/bin/python
# invert benchmark in Python
# see <url:http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
# Original by Keith Waclena 
# Optimized by Tom Carroll 

from sys import stdin
from string import split, join

B = {}

while 1:
   line = stdin.readline()
   if not line: break
   fields = split(line[0:-1], "\t")
   try:  #assume this key is already present
       B[fields[1]].append(fields[0])
   except:  #it's not!? well then, we best put it in...
       B[fields[1]] = [fields[0],]

keys = B.keys()
keys.sort()

   values = B[key]
   values.sort()
   print key + "\t" + join(values, "\t")
'''

Tuesday, January 02, 2007

Data Mining: in three languages

I answered this
post with a reply written in AWK then wrote versions in Perl and
Python. You sshould be aware that I have writen more awk, perl and
python (in that order); but I think I know more awk, python then perl
(in that order).

The awk example:

# Author Donald 'Paddy' McCarthy Jan 01 2007

BEGIN{
  nodata = 0;             # Curret run of consecutive flags<0 in lines of file
  nodata_max=-1;          # Max consecutive flags<0 in lines of file
  nodata_maxline="!";     # ... and line number(s) where it occurs
}
FNR==1 {
  # Accumulate input file names
  if(infiles){
    infiles = infiles "," infiles
  } else {
    infiles = FILENAME
  }
}
{
  tot_line=0;             # sum of line data
  num_line=0;             # number of line data items with flag>0

  # extract field info, skipping initial date field
  for(field=2; field<=NF; field+=2){
    datum=$field;
    flag=$(field+1);
    if(flag<1){
      nodata++
    }else{
      # check run of data-absent fields
      if(nodata_max==nodata && (nodata>0)){
        nodata_maxline=nodata_maxline ", " $1
      }
      if(nodata_max<nodata && (nodata>0)){
        nodata_max=nodata
        nodata_maxline=$1
      }
      # re-initialise run of nodata counter
      nodata=0;
      # gather values for averaging
      tot_line+=datum
      num_line++;
    }
  }

  # totals for the file so far
  tot_file += tot_line
  num_file += num_line

  printf "Line: %11s  Reject: %2i  Accept: %2i  Line_tot: %10.3f  Line_avg: %10.3f\n", \
         $1, ((NF -1)/2) -num_line, num_line, tot_line, (num_line>0)? tot_line/num_line: 0

  # debug prints of original data plus some of the computed values
  #printf "%s  %15.3g  %4i\n", $0, tot_line, num_line
  #printf "%s\n  %15.3f  %4i  %4i  %4i  %s\n", $0, tot_line, num_line,  nodata, nodata_max, nodata_maxline


}

END{
  printf "\n"
  printf "File(s)  = %s\n", infiles
  printf "Total    = %10.3f\n", tot_file
  printf "Readings = %6i\n", num_file
  printf "Average  = %10.3f\n", tot_file / num_file

  printf "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s\n", nodata_max, nodata_maxline
}

The same functionality in perl is very similar to the awk program:

# Author Donald 'Paddy' McCarthy Jan 01 2007

BEGIN {
  $nodata = 0;             # Curret run of consecutive flags<0 in lines of file
  $nodata_max=-1;          # Max consecutive flags<0 in lines of file
  $nodata_maxline="!";     # ... and line number(s) where it occurs
}
foreach (@ARGV) {
  # Accumulate input file names
  if($infiles ne ""){
    $infiles = "$infiles, $_";
  } else {
    $infiles = $_;
  }
}

while (<>){
  $tot_line=0;             # sum of line data
  $num_line=0;             # number of line data items with flag>0

  # extract field info, skipping initial date field
  chomp;
  @fields = split(/\s+/);
  $nf = @fields;
  $date = $fields[0];
  for($field=1; $field<$nf; $field+=2){
    $datum = $fields[$field] +0.0;
    $flag  = $fields[$field+1] +0;
    if(($flag+1<2)){
      $nodata++;
    }else{
      # check run of data-absent fields
      if($nodata_max==$nodata and ($nodata>0)){
        $nodata_maxline = "$nodata_maxline, $fields[0]";
      }
      if($nodata_max<$nodata and ($nodata>0)){
        $nodata_max = $nodata;
        $nodata_maxline=$fields[0];
      }
      # re-initialise run of nodata counter
      $nodata = 0;
      # gather values for averaging
      $tot_line += $datum;
      $num_line++;
    }
  }

  # totals for the file so far
  $tot_file += $tot_line;
  $num_file += $num_line;

  printf "Line: %11s  Reject: %2i  Accept: %2i  Line_tot: %10.3f  Line_avg: %10.3f\n",
         $date, (($nf -1)/2) -$num_line, $num_line, $tot_line, ($num_line>0)? $tot_line/$num_line: 0;

}

printf "\n";
printf "File(s)  = %s\n", $infiles;
printf "Total    = %10.3f\n", $tot_file;
printf "Readings = %6i\n", $num_file;
printf "Average  = %10.3f\n", $tot_file / $num_file;

printf "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s\n",
       $nodata_max, $nodata_maxline;

The python program however splits the fields in the line slightly
differently (although it could use the method used in the perl and
awk programs too):

# Author Donald 'Paddy' McCarthy Jan 01 2007

import fileinput
import sys

nodata = 0;             # Curret run of consecutive flags<0 in lines of file
nodata_max=-1;          # Max consecutive flags<0 in lines of file
nodata_maxline=[];      # ... and line number(s) where it occurs

tot_file = 0            # Sum of file data
num_file = 0            # Number of file data items with flag>0

infiles = sys.argv[1:]

for line in fileinput.input():
  tot_line=0;             # sum of line data
  num_line=0;             # number of line data items with flag>0

  # extract field info
  field = line.split()
  date  = field[0]
  data  = [float(f) for f in field[1::2]]
  flags = [int(f)   for f in field[2::2]]

  for datum, flag in zip(data, flags):
    if flag<1:
      nodata += 1
    else:
      # check run of data-absent fields
      if nodata_max==nodata and nodata>0:
        nodata_maxline.append(date)
      if nodata_max<nodata and nodata>0:
        nodata_max=nodata
        nodata_maxline=[date]
      # re-initialise run of nodata counter
      nodata=0;
      # gather values for averaging
      tot_line += datum
      num_line += 1

  # totals for the file so far
  tot_file += tot_line
  num_file += num_line

  print "Line: %11s  Reject: %2i  Accept: %2i  Line_tot: %10.3f  Line_avg: %10.3f" % (
        date,
        len(data) -num_line,
        num_line, tot_line,
        tot_line/num_line if (num_line>0) else 0)

print ""
print "File(s)  = %s" % (", ".join(infiles),)
print "Total    = %10.3f" % (tot_file,)
print "Readings = %6i" % (num_file,)
print "Average  = %10.3f" % (tot_file / num_file,)

print "\nMaximum run(s) of %i consecutive false readings ends at line starting with date(s): %s" % (
    nodata_max, ", ".join(nodata_maxline))

Timings:

$ time gawk -f readings.awk readingsx.txt readings.txt|tail
Line:  2004-12-29  Reject:  1  Accept: 23  Line_tot:     56.300  Line_avg:      2.448
Line:  2004-12-30  Reject:  1  Accept: 23  Line_tot:     65.300  Line_avg:      2.839
Line:  2004-12-31  Reject:  1  Accept: 23  Line_tot:     47.300  Line_avg:      2.057

File(s)  = readingsx.txt,readingsx.txt
Total    = 1361259.300
Readings = 129579
Average  =     10.505

Maximum run(s) of 589 consecutive false readings ends at line starting with date(s): 1993-03-05

real    0m1.069s
user    0m0.904s
sys     0m0.061s

$ time perl readings.pl readingsx.txt readings.txt|tail
Line:  2004-12-29  Reject:  1  Accept: 23  Line_tot:     56.300  Line_avg:      2.448
Line:  2004-12-30  Reject:  1  Accept: 23  Line_tot:     65.300  Line_avg:      2.839
Line:  2004-12-31  Reject:  1  Accept: 23  Line_tot:     47.300  Line_avg:      2.057

File(s)  = readingsx.txt, readings.txt
Total    = 1361259.300
Readings = 129579
Average  =     10.505

Maximum run(s) of 589 consecutive false readings ends at line starting with date(s): 1993-03-05

real    0m2.450s
user    0m1.639s
sys     0m0.015s

$ time /cygdrive/c/Python25/python readings.py readingsx.txt readings.txt|tail
Line:  2004-12-29  Reject:  1  Accept: 23  Line_tot:     56.300  Line_avg:      2.448
Line:  2004-12-30  Reject:  1  Accept: 23  Line_tot:     65.300  Line_avg:      2.839
Line:  2004-12-31  Reject:  1  Accept: 23  Line_tot:     47.300  Line_avg:      2.057

File(s)  = readingsx.txt, readings.txt
Total    = 1361259.300
Readings = 129579
Average  =     10.505

Maximum run(s) of 589 consecutive false readings ends at line starting with date(s): 1993-03-05

real    0m1.138s
user    0m0.061s
sys     0m0.030s

$

The differences in the Python prog. are not done as an
optimisation. The nifty list indexing of [1::2] and the zip just flow
naturally, (to me), from the data format.

The data format consists of single
line records of this format:

<string:date> [
<float:data-n> <int:flag-n> ]*24

e.g.

1991-03-31      10.000  1       10.000  1       ... 20.000      1       35.000  1

Wednesday, December 13, 2006

merits of Lisp vs Python

Excerpts from my posts to the comp.lang.python thread about Lisp vs Python:

Date: Fri, Dec 8 2006 10:05 pm

Mark Tarver wrote:

> How do you compare Python to Lisp? What specific advantages do you

> think that one has over the other?

> Note I'm not a Python person and I have no axes to grind here. This is

> just a question for my general education.

> Mark

I've never programmed in Lisp but I have programmed in Cadence Skill a

Lisp inspired language with infix notation as an option. I found Skill

to be a very powerful language. At the time I new only AWK, C, Pascal,

Forth, Postcript, Assembler and Basic. Skill was superior and I came

to love it.

But that was a decade ago. Now, I'd rather a company integrate Python

into their product as I find Python to be less 'arcane' than Skill;

with more accessible power, and a great community.

.

Analogy time!

You need Pure Maths, but more mathematicians will be working applying

maths to real-world problems. You need research physicists, but more

physicists will be applying physics in the real world. It seems to me

that Lisp and its basis in maths makes people research and develop a

lot of new techniques in Lisp, but when it comes to applying those

techniques in the real world - switch to Python!

Lisp has a role to play, but maybe a language tuned to research and

with its user base would naturally find it hard to compete in the roles

in which dynamic languages such as Python are strongest.

- Paddy.

Date: Fri, Dec 8 2006 10:22 pm

What is it about Lisp that despite doing everything first, way before

any other language, people don't stop using anything else and

automatically turn to Lisp? Maybe there is more to this everything than

the Lisp community comprehends.

Maybe Lisp is to science, as Python is to engineering - with a slight

blurring round the edges?

- Paddy.

Date: Sun, Dec 10 2006 1:53 am

NOBODY expects the Lispers Inquisition!

Our chief weapon is age... age and macros...

...macros and age.

Our two weapons are age and macros....

And mathematical rigour...

Our THREE weapons are age, macros, and mathematical rigour...

...And an almost fanatical belief in Lisps superiority.

Our *four* ...no.

AMONGST our weapons...

Amongst our weaponry...

...Are such elements as fear, surprise.... I'll come in again.

Python is fun to use.

Easy to read.

Simple and powerful.

Easy to test.

Easy to maintain.

Fast. Very fast!

- Paddy.

Date: Sun, Dec 10 2006 1:20 am

Paul Rubin wrote:

> "mystilleef" writes:

> > Slow for users who aren't familiar with Psyco, Pyrex and C extensions,

> > sure.

> Anyway it's pretty lousy advocacy for a language to say "well if the

> language is too slow, don't use it, use another langauge like C instead".

Python can be used as a glue language. It is not solely a glue

language.

A lot of people find using Python to script libraries written in other

languages

a way to get things done. Ask the scipy guys or the biopython guys.

The Python community actively encourages groups writing useful

libraries

to maintain a Python port, or Python users might wrap libraries

themselves.

You don't always wrap a module in Python for reasons of speed of

execution.

Software testing may well be easier to do in Python than in the

native language of the wrapped library. The library itself may be

better

used in the dynamic environment of Pythons command line; or used

together

with other libraries already wrapped for/accessible from Python.

- Paddy.

Date: Mon, Dec 11 2006 7:22 am

Unlike Lisp, Python does not have a ubiquitous compiler. It is therefore made to interface nicely with compiled languages. Other compiled language users see the need for dynamic interpreted languages like Python and maintain links Python such as the Boost Python C++ wrapper. IronPython for .NET, Jython for Java.

Lisp is its own interpreter and compiler, which should be a great advantage, but only if you don't make the mistake of ignoring the wealth of code out there that is written in other languages.

> > Talk to these guys:

> > http://en.wikipedia.org/wiki/PyPy they have an interesting take on

> No, actually maybe you should talk to them since you seem to think that

> making Python run fast is dangerous, or at least unnecessary.

> > Python has this unsung module called doctest that neatly shows some of

> > the strengths of python: http://en.wikipedia.org/wiki/Doctest

> Now I'm *certain* that you're just pulling my leg: You guys document

> all your random ten-line hacks in Wikipedia?!?! What a brilliant idea!

Python is newbie-friendly. Part of that is being accessible.

Doctest is about a novel way of using a feature shared by Lisp, that is docstrings. Testing is important, usually not done enough, and doctests are a way to get people to write more tests by making it easier. Does Lisp have similar?

> Hey, you even have dead vaporware projects like uuu documented in

> Wikipedia! Cool! (Actually, I don't know that doctest is ten lines in

> Python, but it'd be about ten lines of Lisp, if that, so I'm just

> guessing here.)

Does Lisp have a doctest-like module as part of its standard

distribution?

Or are you saying that If you ever needed it, then it would be trivial to implement in Lisp, and you would 'roll your own'? There are advantages

to doctest being one of Pythons standard modules.

- Paddy.

Date: Wed, Dec 13 2006 2:03 am

Oh, you don't like Wikipedia.

There are a lot of people that use Wikipedia. I think some of them

might want to learn to program. I make it easier for them to find

Python by helping to maintain Python within Wikipedia.

If I am researching anything then I like to cross check with

information from multiple sites. that's just good practice.

Some people dislike Wikipedia which is fine. Some people dislike

Wikipedia and deliberately sabotage it, which is vandalism.

-Paddy.

Date: Wed, Dec 13 2006 1:15 am

Jesús Carrete Montaña wrote:

> > Fast. Very fast!

> > - Paddy.

> Well, Python certainly is faster than most people doing floating-point

> arithmetic by hand, but I don't think this is the correct argument to use

> against Lisp :-P.

Why not!

Lispers can indeed roll-their-own anything, many do it seems do just

that. But others look at the *time saving* libraries available to users

of Python and think hmm...

-Paddy.

Wednesday, July 12, 2006

Python Functions: Assignments And Scope

Explaining why this works:

n = [0]
def list_access():
   n[0] = n[0] + 1
   return n

try:
   print "\nlist_access:", list_access()
except UnboundLocalError, inst:
   print " ERROR:\n", inst

And this throws the exception:

m = 0
def int_access():
   m = m + 1
   return m

try:
   print "\nint_access:", int_access()
except UnboundLocalError, inst:
   print " ERROR:\n", inst

To execute a source program, the Python compiler compiles your
original source into 'byte codes' – a form of your program that
is easier for the Python interpreter to later run. In generating this
byte code, the byte code compiler will determine which variable names
in a function are local to that function, (so alowing it to optimise
accesses to such local names).

The rule for determining if a variable is local to a function is:

If there is a global statement for the name in the function
then the name is accessed from the global scope.
If there is no global statement for the name, and if there
are assignments to the 'bare' name within the function then the name
is of local scope.
( A bare name assignment means assignment to a
name, where the name occurs without attribute references,
subscripts, or slicing s, just the bare name).
Otherwise the name will be looked up in reverse order of all
enclosing scopes, until it is found.

In the second example, function int_access; name m is flagged as
local by the byte code compiler as the bare name is being assigned
to. The interpreter therefore looks for a value of m to increment
only in the local scope, cannot find a value, then raises the
UnboundLocalError exception.
In function list_access, the bare
name n is not assigned to, so n is found when looking back
through enclosing scopes.

References

END.

Saturday, June 17, 2006

Psycho re-run

I don't normally use Psycho, and did not have it installed with my python2.4 installation. Whilst searching for something else however, I came across this simple language speed comparison: Ruby, Io, PHP, Python, Lua, Java, Haskell, and Plain C Fractal Benchmark by Erik Wrenholt and wondered...

So, twenty minutes later I had installed Psycho and with the addition of an inserted line three to Eriks Mandlebrot.py example of:

import psyco; psyco.full()

The run time went down from 4.6 to 1 second.

The exercise was really to remind me that Psycho is available.

Sunday, May 21, 2006

Function Attributes assigned by decorator

It appears to be straight forward to initialize function attributes by a decorator (the setup function).
The new function factory looks a lot like the class based version of my previous post, and performs similarly.


def make_standard_deviator2():
    '''Generate functions that return running standard deviations
    Uses function attributes applied by decorator
    '''

    def setup(func):
      func.N, func.X, func.X2 = [0.0]*3
      return func

    @setup
    def sd(x):
        '''returns standard deviation.
        Uses function attributes holding running sums
        '''
        from math import sqrt

        sd.N += 1       # Num values
        sd.X += x       # sum values
        sd.X2 += x**2   # sum squares

        if sd.N<2: return 0.0
        return sqrt((sd.N*sd.X2 - sd.X**2)
                    /(sd.N*(sd.N-1)))

    return sd

Python function attributes

Prompted by a presentation were someone said Python doesn't have full closures (noodle.odp), I looked at their example given and thought first that I could do that with funtion attributes, and then it hit me: I know of Pythons function attributes but had never used them.

I wrote a function generator that when called, returns a function that when it is called with successive numbers calculates and returns their standard deviation so far. The returned function stores accumulated data between calls as attributes.

The more usual Python way of doing this is to use a Clss instance. I created a Class based version too for comparison.

In speed terms, the Class based solution is only, (but consistently), just less than two percent faster than the function generator based solution. In terms of maintainability though both are readable, I expect most people to be trained in the Class based solution and so be more comfortable with it.
The function generator solution has the initializer section after the inner function definition which might be a minus, I wonder if a decorator could put the initializer 'up-front'.

#=== file: fn_attributes.py ===

def make_standard_deviator():
   '''Generate functions that return running standard deviations
    Uses function attributes
    '''
   def sd(x):
       '''returns standard deviation.
        Uses function attributes holding running sums
        '''
       from math import sqrt

       sd.N += 1       # Num values
       sd.X += x       # sum values
       sd.X2 += x**2   # sum squares

       if sd.N<2: return 0.0
       return sqrt((sd.N*sd.X2 - sd.X**2)
                   /(sd.N*(sd.N-1)))

   # Initialize attributes
   sd.N, sd.X, sd.X2 = [0.0]*3

   return sd

class Make_standard_deviator(object):
   '''Return running standard deviations
    when instance called as a function
    '''
   def __init__(self):
     self.N, self.X, self.X2 = [0.0]*3

   def __call__(self, x):
       '''Returns standard deviation.
        Uses instance attributes holding running sums
        '''
       from math import sqrt

       self.N += 1       # Num values
       self.X += x       # sum values
       self.X2 += x**2   # sum squares

       if self.N < color="#804040">return 0.0
       return sqrt((self.N*self.X2 - self.X**2)
                   /(self.N*(self.N-1)))

if __name__ == '__main__':
   import timeit

   print "function:",(
     timeit.Timer('[sd(x) for x in xrange(100000)][-1]',
     "from fn_attributes import *; sd = make_standard_deviator()").timeit(number=5)
     )

   print "   Class:",(
     timeit.Timer('[sd(x) for x in xrange(100000)][-1]',
     "from fn_attributes import *; sd = Make_standard_deviator()").timeit(number=5)
     )

Saturday, March 18, 2006

What's wrong with Perl

It can be a refreshing tonic of a language when you move to perl
from something like basic, C or Java.
Perl can do what tens of Unix tools such as sed and awk can do from
the shell, but with more flexibility and higher order data structures
such as the perl 'hash' or 'associative array'.
You can sit and evolve a perl one-liner of X hundred characters
that will find and process millions of lines from thousands of files
from within hundreds of directories.
When you want to know how to do something else there is The Camel
, either the book or that human store of Perl Knowledge on the next
floor, a few cubicles over.
Before long, you have amassed a large personal perl knowledge
base, and can tackle most of your tasks, without having to consult
The Camel. You become more confident in perl and start to code for
others in your team, and write scripts in files rather than one
liners.

WAKE UP!

Languages have moved on.

Do you use perl references? Happy with them? Do you know that you
can get all that power without all the reference-gymnastics. Indeed,
you can get more power as you are freed from the mundane “how
do I create and use a hash of hashes”, to “O.K. The data
naturally starts as a hash of hashes so...”

Perl subroutines. Even AWK, one of the Unix stalwart languages
that perl was created to supplant has named arguments to its
functions. The better scripting languages can do a lot better,
allowing you to pass two lists into a function as easily as passing
in two integers. Or to return three hashes just as simply.

The perl slogan “There is more than one way to do it”
is often cited as one of perls strengths, and used by many
perl-mongers as their reason to stick with it. But how many of you
actually mean the extended form: “There is more than one way to
do it, and mine is the best”, or, “There is more than one
way to do it, but you will use mine!”, or more likely: “There
is more than one way to do it, and I don't know yours”, or
“There is more than one way to do it, and you will learn
mine!”, or “There is more than one way to do it, but I
don't know how to find any of them in Camels index”.

Perl still has goto! (and in its most evil of forms too, the
computed goto: where you have to interpret the program in your head
to work out where the goto might jump to from the listing because the
goto target is the result of an expression).

So much of perl documentation reads like 1001 recipes for using
1001 features, without any central theme to shape them.

Being an engineer, I am trained in finding patterns and
extrapolation. It is much easier for me to learn a small set of
powerful rules with a straightforward way of combining them. Perl,
unfortunately has a large number of 'gotchas' that bite when you try
to venture out from the recipes and examples given.

Conclusion

If you know perl, and have a need for programs greater than a few
tens of lines of code, then you should invest a week or two of your
time learning a different dynamic language. In that time, put aside
your perl knowledge and try to not just use the other languages
syntax, but learn how things are naturally done in that language.
Nothing may beat the 'perl -p -i -e' one-liner, but proper function
parameters and strict typing may be better for your longer scripts.

If you are thinking of learning Perl then think twice. Perls
'sweat-spot' is much diminished as new dynamic languages have emerged
with a less tacky support for new methodologies and standards such as
object oriented programming, program maintenance, functional
programming, XML, multi-language programming, and programming on
multiple frameworks such as dot-net and the Java Virtual Machine.