Thursday, October 29, 2020

On Eulers pentagonal reduction of the partition function.

 I watched a great Mathologer video on youtube which examined the number of integer partitions of natural numbers. Here's the video:



.A partition of a natural number is number of ways that positive integers can be summed to equal the number, so for 4:

4 = 4
4 = 1 + 3
4 = 2 + 2
4 = 1 + 1 + 2
4 = 3 + 1
4 = 1 + 2 + 1
4 = 2 + 1 + 1
4 = 1 + 1 + 1 + 1

Counting shows there are 8 "partitions". Normally duplicates when disregarding order are removed so, for example, 3 + 1 is a duplicate of 1 + 3. These are the actual partitions that the video then concentrates on. E.g again, for four the actual partitions are:

4 = 4
4 = 2 + 2
4 = 1 + 3
4 = 1 + 1 + 2
4 = 1 + 1 + 1 + 1

giving a count of the partitions as 5.


Partition finder.

The video shows that by considering the n-1 "gaps" between n blocks as being either open or closed you can generate all the actual partitions of n things (with duplicates).

This lead to me writing the following code:

def part_by(count, n):
    " Opening/closing n - 1 gaps between n blocks according to binary count"
    p, num = [], 1
    for _ in range(n - 1):
        if count & 1:
            num += 1
        else:
            p.append(num)
            num = 1
        count //= 2
    p.append(num)
    return p

def partitions_with_dups(n):
    "All ways of opening/closing n - 1 gaps between n blocks"
    return [part_by(i, n) for i in range(2**(n - 1) if n else 1)]

def partitions(n):
    "partitions removing duplications in order of terms"
    return sorted(set(tuple(sorted(part_by(i, n))) 
                      for i in range(2**(n - 1) if n else 1)))

def partition(n):
    "Count of different partitions irrespective of order"
    return len(partitions(n))


if __name__ == '__main__':
    print("partition(n) for n from 0... = ", 
          [partition(n) for n in range(11)])

Running the code produces the correct sequence of partiton(n)

partition(n) for n from 0... =  [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]

Eulers reduction/formula

The video in chapter 2 tries a modified fibonacci-like algorithm to generate the sequence. I coded the algorithm likes this, and submitted it to a rosetta Code task written by someone else who was inspired by the video.

from itertools import islice


def posd():
    "diff between position numbers. 1, 2, 3... interleaved with  3, 5, 7..."
    count, odd = 1, 3
    while True:
        yield count
        yield odd
        count, odd = count + 1, odd + 2

def pos_gen():
    "position numbers. 1 3 2 5 7 4 9 ..."
    val = 1
    diff = posd()
    while True:
        yield val
        val += next(diff)
                
def plus_minus():
    "yield (list_offset, sign) or zero for Partition calc"
    n, sign = 0, [1, 1]
    p_gen = pos_gen()
    out_on = next(p_gen)
    while True:
        n += 1
        if n == out_on:
            next_sign = sign.pop(0)
            if not sign:
                sign = [-next_sign] * 2
            yield -n, next_sign
            out_on = next(p_gen)
        else:
            yield 0
            
def part(n):
    "Partition numbers"
    p = [1]
    p_m = plus_minus()
    mods = []
    for _ in range(n):
        next_plus_minus = next(p_m)
        if next_plus_minus:
            mods.append(next_plus_minus)
        p.append(sum(p[offset] * sign for offset, sign in mods))
    return p[-1]

        
print("(Intermediaries):")
print("    posd:", list(islice(posd(), 10)))
print("    pos_gen:", list(islice(pos_gen(), 10)))
print("    plus_minus:", list(islice(plus_minus(), 15)))
print("\nPartitions:", [part(x) for x in range(15)])

 The narrator goes on to show how the slightly modified series can be used to generate prime numbers, which I coded as the following addition to the last chunk of code:

def par_primes():
    "Prime number generator from the partition machine"
    p = [1]
    p_m = plus_minus()
    mods = []
    n = 0
    while True:
        n += 1
        next_plus_minus = next(p_m)
        if next_plus_minus:
            mods.append(next_plus_minus)
        p.append(sum(p[offset] * sign for offset, sign in mods))
        if p[0] + 1 == p[-1]:
            yield p[0]
        p[0] += 1
    yield p

print("\nPrimes:", list(islice(par_primes(), 15)))

The output from the previous two code sections is this: 

(Intermediaries):
    posd: [1, 3, 2, 5, 3, 7, 4, 9, 5, 11]
    pos_gen: [1, 2, 5, 7, 12, 15, 22, 26, 35, 40]
    plus_minus: [(-1, 1), (-2, 1), 0, 0, (-5, -1), 0, (-7, -1), 0, 0, 0, 0, (-12, 1), 0, 0, (-15, 1)]

Partitions: [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, 56, 77, 101, 135]

Primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

This Euler algorithm is very fast as the old one needed generating 2**(n - 1) actual partitions. 

Other (recursive) method.

I had originally thought to solve the Rosetta Code task by searching for an easier algorithm and implementing that. 

I found this stackoverflow answer which gave a recursive formula that seemed more straight-forward.

I did find one issue with the explanation but coded the algorithm up as follows:

from functools import lru_cache
import sys

sys.setrecursionlimit(10_000_000)
    

@lru_cache(maxsize=None)
def _p(n, m):
    if n < 0: 
        return 0
    elif n == 0: 
        return 1
    elif n  / 2 < m < n:
        return 1
    else:
        return sum(_p(n - i, i) for i in range(m, n+1))
        

def p(n):
    return _p(n, 1)

#%%
for n in [666] + list(range(1000, 6667, 50)) + [6666]:
	print(n, p(n))

Ooh such short and succinct code!
You might be able to tell though that it too has efficiency issues. It will calculate p(666) but the Rosetta Code task asks for the calculation of p(6666) - with four sixes not three. If I just ask for p(6666) then Python crashes, silently!

If you I gradually jump up to p(6666) then it will eventually give the answer of  193655306161707661080005073394486091998480950338405932486880600467114423441282418165863  I think the cache needs to grow to limit recursion depth. This routine took hours to get to p(6666) where Eulers algorithm took milliseconds.

Note: The following cache information after calculating p(6666)

>>> _p.cache_info()
CacheInfo(hits=21_921_546_630, misses=22_221_112, maxsize=None, currsize=22_221_112)
>>> 

END.


Sunday, October 11, 2020

Python dataclass __hash__ generation

 I was reading a blog explaining dataclasses and was confused about its description of the unsafe_hash parameter.

Reading the docs was also confusing so I decided to write a program that generated a truth table of all the parameters said to govern __hash__ generation for dataclasses and check the output generated in each case.

The code

# -*- coding: utf-8 -*-
"""

__hash__ generation for @dataclass()

    Documentation could be better w.r.t. unsafe_hash:
      https://docs.python.org/3/library/dataclasses.html?highlight=unsafe_hash
    
    Parameters:
      eq:           default True,  equality checks as a tuple of values.
      frozen:       default False, stops field re-assignment.
      unsafe_hash   default False, Forces the gen of a __hash__ method 
                                   irrespective of eq and frozen values.
      
    __hash__ methods on a dataclass:
      Ref: https://docs.python.org/3/reference/datamodel.html#object.__hash__
      
      If set to None:   class is unhashable.
      If present:       class is hashable
      If absent:        class is unhashable; eq *should* be false too.
      
Created on Sun Oct 11 10:12:17 2020

@author: Paddy3118


"""

from dataclasses import dataclass
from itertools import product


print(__doc__ + """
TABLE: generated by testing the generated dadaclass

unsafe_hash, eq, frozen => Dataclass __hash__ state""")
for unsafe_hash, eq, frozen in product((False, True), repeat=3):
    @dataclass(eq=eq, frozen=frozen, unsafe_hash=unsafe_hash)
    class Dc:
        x: int
    
    if  '__hash__' not in dict(Dc.__dict__):
        state = 'ABSENT'
    elif dict(Dc.__dict__)['__hash__'] is None:
        state = 'SET_NONE (Unhashable)'
    else:
        state = 'PRESENT'
    print(f"{unsafe_hash!s:5}, {eq!s:5}, {frozen!s:5} => {state}")
    del Dc
    
    

The output:

__hash__ generation for @dataclass()
    
    Documentation could be better w.r.t. unsafe_hash:
      https://docs.python.org/3/library/dataclasses.html?highlight=unsafe_hash
    
    Parameters:
      eq:           default True,  equality checks as a tuple of values.
      frozen:       default False, stops field re-assignment.
      unsafe_hash   default False, Forces the gen of a __hash__ method 
                                   irrespective of eq and frozen values.
    
    __hash__ methods on a dataclass:
      Ref: https://docs.python.org/3/reference/datamodel.html#object.__hash__
      
      If set to None:   class is unhashable.
      If present:       class is hashable
      If absent:        class is unhashable; eq *should* be false too.

Created on Sun Oct 11 10:12:17 2020

@author: Paddy3118



TABLE: generated by testing the generated dadaclass

unsafe_hash, eq, frozen => Dataclass __hash__ state
False, False, False => ABSENT
False, False, True  => ABSENT
False, True , False => SET_NONE (Unhashable)
False, True , True  => PRESENT
True , False, False => PRESENT
True , False, True  => PRESENT
True , True , False => PRESENT
True , True , True  => PRESENT


Conclusion:

Set eq and frozen as appropriate and the __hash__ method is likely to be what you want. Only set unsafe_hash if you know what you are doing.


End.

Tuesday, October 06, 2020

Python 3.9 dict union op: keys at odds with values

 In short: keys preserve the first while values preserve the last w,r,t, duplicates.

Given two dicts with keys that compare equal,; under union the first key is kept

Given two dicts with keys that compare equal; under union the second value is kept

Where all equal here

We have the following:

Python 3.9.0 (tags/v3.9.0:9cf6752, Oct  5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 1.0 == 1 == 1.0 + 0.0j
True
>>> 

 The number one expressed as an int, float or complex number all compare equal despite the type differences - which is fine. 

 Values of duplicated keys  

When creating the union of dicts that have the "same" keys, PEP584 states what values should be kept: the value in the dict to the right of the "|" operator. Unfortunately this is contrary to what happens to the keys

dict keys are now ordered

With the recently introduced insertion ordering for dict keys this naturally makes the first occurrence of a dict key, the one that is naturally kept and this is indeed the case when using keys that compare equal but are different:

>>> m, n, o = {1:0, 2:0, 3:0, 4:0}, {2.0: 0, 3.0: 0}, {3.0+0.0j: 0}
>>> m | n | o
{1: 0, 2: 0, 3: 0, 4: 0}
>>> o | n | m
{(3+0j): 0, 2.0: 0, 1: 0, 4: 0}
>>> 

It's shown in the keys above. 

 In  unions, which keys and values are kept differ!

You have the strange case that a given key-value pair from a union may have not appeared in any of the input dicts (if also comparing types). 


>>> p, q, r = {1:1, 2:2, 3:3, 4:4}, {2.0: 20, 3.0: 30}, {3.0+0.0j: 300}
>>> p | q | r
{1: 1, 2: 20, 3: 300, 4: 4}
>>> r | q | p
{(3+0j): 3, 2.0: 2, 1: 1, 4: 4}
>>> 

Note, for example, in the last output above, the key-value pair of float 2.0 and int 2 does not appear in any of the input dicts p, q, or r.

Can we do better?

Not sure. The PEP gives valid reasons for the last value being kept. Insertion ordering is good reason that the keys first seen being kept - as-is that the key orderings match that of sets under the same operator:

>>> s, t, u = {1, 2, 3, 4}, {2.0, 3.0}, {3.0+0.0j}
>>> s | t | u
{1, 2, 3, 4}
>>> u | t | s
{1, 2.0, (3+0j), 4}
>>> 

It is only a mild niggle for me, but I do need to know of this quirk.