Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

Tuesday, April 27, 2021

Read PEP 636 on Structural Pattern Matching: The Tutorial

 Hi, I am just reading through PEP 636 -- Structural Pattern Matching: Tutorial. I'll give my views as I go along...

I really liked the choice of example - a text adventure. I am old enough to relate to it, and I guess a quick google would inform others.

Section "Matching sequences" serves as a gentle introduction to a match statement with one case

Subsequent sections expand on aspects of match: literals seem to match literally; names, (also called variables), when matched are also bind-to for use in the stastements, (plueral), of the case block.

skipping a few sections to something that is stated, that is nevertheless a departure from previous uses, that is the treatment of the underscore.

Underscores

In section "Adding a wildcard" underscore becomes more than just a valid name, as it is outside the match statement. 

Underscore always matches an object - just like any other identifier name; but does not get assigned the object it matches - like any other name.

Their example:

match command.split():
    case ["quit"]: ... # Code omitted for brevity
    case ["go", direction]: ...
    case ["drop", *objects]: ...
    ... # Other cases
    case _:
        print(f"Sorry, I couldn't understand {command!r}")

You cannot write the following

match command.split():
    case ["quit"]: ... # Code omitted for brevity
    case ["go", direction]: ...
    case ["drop", *objects]: ...
    ... # Other cases
    case _:
        print(f"Sorry, I couldn't understand {_!r}")

As _ is not assigned. Is it de-assigned in this block or will an earlier assignment outside the match statement still exist?


I am curious as to the decision for this extra non-assignment behaviour for underscore?

Testing Scope and Assignment

Lets expand on their example assigning names globally and trying different commands to match different case clauses:


_, direction, objects = "_ global", "direction global", "objects global"
print(f"{(_, direction, objects)=}")

command = "Nothing to match"
for command in ["quit", "quit me", "go south", "drop anvil arrows"]:
    print(f"\n## COMMAND IS: {command!r}\n")
    match command.split():
        case ["quit"]:
            print("quit")
            print(f"{(_, direction, objects)=}")
        case ["go", direction]:
            print("go", direction)
            print(f"{(_, direction, objects)=}")
        case ["drop", *objects]:
            print("drop", objects)
            print(f"{(_, direction, objects)=}")
        case _:
            print(f"Sorry, I couldn't understand {command!r}")
            print(f"{(_, direction, objects)=}")

Output:

xx

Python 3.10.0a7 (tags/v3.10.0a7:53e5529, Apr  6 2021, 10:20:47) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
======== RESTART: C:/Users/paddy/Google Drive/Code/python_3_10_match.py ========
(_, direction, objects)=('_ global', 'direction global', 'objects global')

## COMMAND IS: 'quit'

quit
(_, direction, objects)=('_ global', 'direction global', 'objects global')

## COMMAND IS: 'quit me'

Sorry, I couldn't understand 'quit me'
(_, direction, objects)=('_ global', 'direction global', 'objects global')

## COMMAND IS: 'go south'

go south
(_, direction, objects)=('_ global', 'south', 'objects global')

## COMMAND IS: 'drop anvil arrows'

drop ['anvil', 'arrows']
(_, direction, objects)=('_ global', 'south', ['anvil', 'arrows'])
>>> 

 Concentrating on underscore, we see that it always has the initially assigned global value.

Once direction is changed in the case statement, however, it is this global value that is changed and the next iteration of the loop that does not match to it shows its new changed value.


The special treatment of underscore seems odd to me? Why not assign to it like normal and have only the special treatment of checking that if it is the only pattern then that pattern is the last pattern?

As

Reading section "Capturing matched sub-patterns" I kind of thought ahead and was thinking "another use for the walrus operator", until the use of the as keyword revealed itself :-)

"as" reads well, I like it.


Type and attribute matching

Yay! Duck type friendly matching of attribute names.

Outside of a match statement a call of MyObject might require arguments, but it seems that MyObject() is valid in a match statement, regardless - something to remember.


END?

I'll stop here for now although there is more of the doc to read.





Tuesday, April 06, 2021

Wednesday, February 17, 2021

Longest Common Substring: An investigation using Python.

 (Best browsed on a larger than phone screen)

Saturday, February 13, 2021

Python: Counting without Counter

Normally, if you are counting things then you need to first think of collections.Counter. It's in the standard library and I use it a lot myself, but lately, I've had two instances where I was counting, and Counter was not the best fit!

1. When the keys to count are known, adjacent, integers. 

Or can be easily manipulated to be

In this case you can replace counter by an array of zeroes and increment array[key] whenever you see key.

I used this in code to bin numbers. The bin limits are set and sorted, so you know exactly how many bins are needed. numbers to bin are first compared with the limits to work out which bin they occupy and then that bin is incremented:

from bisect import bisect_right
 
def bin_it(limits: list, data: list) -> list:
    "Bin data according to (ascending) limits."
    bins = [0] * (len(limits) + 1)      # adds under/over range bins too
    for d in data:
        bins[bisect_right(limits, d)] += 1
    return bins

I have replaced hash lookup in Counter with array indexing.

2. When defaultdict(int) is better

Another way to count arbitrary objects is to use defaultdict(int) and just increment item counts with the defauldict automatically starting the count at zero from its initialising call of int().

I was splitting words in a book then working out the frequencies of what words follow others with an aim to randomly create sentencies where the following word would by a weighted random choice of the words in the book that followed a word.

So if the book were just two sentences it would be broken down as:

>>> book_text = 'The cat sat on the mat. The mat is that upon which the cat sat'
>>> words = remove_punctuation(book_text).split()
>>> words
['the', 'cat', 'sat', 'on', 'the', 'mat', '.', 'the', 'mat', 'is', 'that', 'upon', 'which', 'the', 'cat', 'sat', '.']
>>> 

I need to count how many times 'cat' comes after 'the'; 'sat' comes after 'cat', ... I also want to be able to to access 'the' and find all the words that it precedes, together with a count of how many times that word directly succeeds 'the'. 

I want a dictionary with key 'word' that returns a dictionary who's keys are all the immediate successor words, and who's values are the counts of how many times the successor word is found to the original word.

Something like the following nested dict:

{'the': {'cat': 2, 'mat': 2},
 'cat': {'sat': 2},
 'sat': {'on': 1, '.': 1},
 'on': {'the': 1},
 'mat': {'.': 1, 'is': 1},
 '.': {'the': 1},
 'is': {'that': 1},
 'that': {'upon': 1},
 'upon': {'which': 1},
 'which': {'the': 1}}

 

See how it shows that 'the' is followed twice by 'cat' and once by'mat'; 'sat' is followed once by 'on' and once by a full stop '.', etc.

The data structure needs to be an outer dict that will automatically create an inner counter dict. I use lowercase counter because that inner dict can be either a collections.Counter or a defaultdict(int) used as a counter. Since I need a defaultdict at the outer level, and don't need any of the Counter special methods, I chose to use defaultdict for the nested dicts too. 

That leads to the following definition and population of the datastructure:

>>> word2next = defaultdict(lambda :defaultdict(int))
>>> for lh, rh in zip(words, words[1:]):
    word2next[lh][rh] += 1

    
>>> assert word2next == {'the': {'cat': 2, 'mat': 2},
                     'cat': {'sat': 2},
                     'sat': {'on': 1, '.': 1},
                     'on': {'the': 1},
                     'mat': {'.': 1, 'is': 1},
                     '.': {'the': 1},
                     'is': {'that': 1},
                     'that': {'upon': 1},
                     'upon': {'which': 1},
                     'which': {'the': 1}}
>>> 
 
zip(*word2next[some_word].items()) then returns a list of successor words and a list of successor_word_counts that can then be used as arguments to random.choices() to randomly select the next word in a Markov Chain sentence generator based on the book text.  

Conclusion

When counting odd stuff in Python then collections.Counter should always be considered, but it isn't the only way to count; arrays initialised to zero, and  sometimes defaultdict(int), might also be worth considering.


End.




Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog archive