Mainly Tech projects on Python and Electronic Design Automation.

Sunday, March 23, 2025

Incremental combinations without caching

 

Irie server room

Someone had a problem where they received initial data d1, worked on all r combinations of the data initially received, but by the time they had finished that, they checked and found there was now extra data d2, and they need to, in total, process the r combinations of all data d1+d2.

They don't want to process combinations twice, and the solutions given seemed to generate and store the combinations of  d1, then generate the combinations of d1+d2 but test and reject any combination that was found previously.

Seems like a straight-forward answer that is easy to follow, but I thought: Is there a way to create just the extra combinations but without storing all the combinations from before?

My methods

It's out with Vscode as my IDE. I'll be playing with a lot of code that will end up deleted, modified, rerun. I could use a Jupyter notebook, but I can't publish them to my blog satisfactorily. I'll develop a .py file but with cells: a line comment of # %% visually splits the file into cells in the IDE adding buttons and editor commands to execute cells and selected code, in any order, on a restartable kernel running in an interactive window that also runs Ipython.

When doodling like this, I often create long lines, spaced to highlight comparisons between other lines. I refactor names to be concise at the time, as what you can take in at a glance helps find patterns. Because of that, this blog post is not written to be read on the small screens of phones.

 So, its combinations; binomials, nCr.

A start

  • First thing was to try and see patterns in the combinations of d1+d2 minus those of just d1.
  • Dealing with sets of values; sets are unordered, so will at some  time need a function to print them in order to aid pattern finding.
  • Combinations can be large - use combinations of ints then later work with any type.
  • Initial combinations of 0 to n-1 ints I later found to be more awkward to reason about so changed to work with combinations of 1 to n ints. extending to n+x added ints n+1 to n+x to combinations.

#!/bin/env python3
"""
Check patterns in extending combinations nCr
"""

# %%
from functools import reduce
from itertools import combinations
from pprint import pformat
from typing import Any, Iterable

# %%
def nCr(n, r):
    return combinations(range(1, n+1), r=r)

def pf_set(s: set) -> str:
    "Format set with sorted elements for printing"
    return  f"{{{pformat(sorted(s), width=120, compact=True)[1:-1]}}}"

Diffs

In the following cell I create combinations for r = 3, and n in some range n_range in c and print successive combinations, and differences between successive combinations.

# %%

r = 3
n_range = range(3, 7)
print(f"\n# Investigate combinations nCr for {r=}, and {n_range=}\n")

# generate some combinations
c = {n: set(nCr(n, r)) for n in n_range}

print("c = {")
for key, val in c.items():
    print(f" {key:2}: {pf_set(val)},")
print("    }")


def pp_diffsby(c, n_range, r, delta_n):
    "Print nCr in c diffs by delta_n"
    print(f"\nDiffs by {delta_n}")
    all_n = list(n_range)
    n1 = all_n[0]
    print(f"      {n1}C{r} = {pf_set(c[n1])}")
    for n1, n2 in zip(all_n, all_n[delta_n:]):
        print(f"{n2}C{r} - {n1}C{r} = {pf_set(c[n2] - c[n1])}")


pp_diffsby(c, n_range, r, 1)
pp_diffsby(c, n_range, r, 2)

Cell output:

# Investigate combinations nCr for r=3, and n_range=range(3, 7)

c = {
  3: {(1, 2, 3)},
  4: {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)},
  5: {(1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5)},
  6: {(1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 2, 6), (1, 3, 4), (1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (1, 5, 6),
 (2, 3, 4), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6), (2, 5, 6), (3, 4, 5), (3, 4, 6), (3, 5, 6), (4, 5, 6)},
    }

Diffs by 1
      3C3 = {(1, 2, 3)}
4C3 - 3C3 = {(1, 2, 4), (1, 3, 4), (2, 3, 4)}
5C3 - 4C3 = {(1, 2, 5), (1, 3, 5), (1, 4, 5), (2, 3, 5), (2, 4, 5), (3, 4, 5)}
6C3 - 5C3 = {(1, 2, 6), (1, 3, 6), (1, 4, 6), (1, 5, 6), (2, 3, 6), (2, 4, 6), (2, 5, 6), (3, 4, 6), (3, 5, 6), (4, 5, 6)}

Diffs by 2
      3C3 = {(1, 2, 3)}
5C3 - 3C3 = {(1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5)}
6C3 - 4C3 = {(1, 2, 5), (1, 2, 6), (1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (1, 5, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5),
 (2, 4, 6), (2, 5, 6), (3, 4, 5), (3, 4, 6), (3, 5, 6), (4, 5, 6)}

Patterns

Looking at diffs 4C3 - 3C3 each tuple is like they took 3C2 = {(1,2), (1,3), (2,3)} and tagged the extra 4 on to every inner tuple.
Lets call this modification extending,

5C3 - 4C3 seems to follow the same pattern.

Function extend

# %%

def extend(s: set[tuple[Any]], value: Any) -> set[tuple[Any]]:
    """
    Returns set of tuples of s with each tuple extended by value
    """
    return set((*tpl, value) for tpl in s)

s = {(0, 1), (1, 2), (0, 2)}
print(f"{s = }; {extend(s, 3) = }")
assert extend(s, 3) == {(0, 1, 3), (1, 2, 3), (0, 2, 3)}

Rename nCr to bino and check extend works

nCr  was originally working with 0..n-1 and bino was 1..n. Now they both do

# %%

# binomial combinations of ints 1..

def bino(n: int, r: int) -> set[tuple[int]]:
    """
    All combinations of 1..n ints taken r at a time

    bino(4, 3) == {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)}
    """
    return set(combinations(range(1, n+1), r))


print(f"{(bino(4, 3) == (bino(3, 3) | extend(bino(3, 2), value=4)))  =  }")

Cell output:

(bino(4, 3) == (bino(3, 3) | extend(bino(3, 2), value=4)))  =  True

Pascal

After finding that pattern I went searching for it using Gemini AI. My question was:
show that comb(n+1, r) = comb(n, r) + (n+1)* comb(n, r-1)
The answer said I got my text prompt wrong and mentioned Pascals Rule.
I scanned the page, as I no time for how things were expressed but it seemed reasonable that I had the algorithm right, and that there were relations of some kind for bigger differences in n.

Pascals rule checker

I wrote a function to do the check then exercised it  (a few times, now deleted).

# %%
def pascals_rule(n: int, r: int) -> bool:
    "check C(n+1, r) == C(n, r) | extend(C(n, r-1), n + 1)"
    return bino(n + 1, r) == bino(n, r) | extend(bino(n, r - 1), n + 1)

assert pascals_rule(6, 3)

Diff by 1 extra item "done", attempting diff by 2.

Looking back at the diffs by 2 table and looking for patterns I thought I might need different types of extension functions modifying the tuples within sets in different ways - it seemed "mathematical" so...

# %%

# Some functions that may be needed

def extend_mul(s: set[tuple[Any]], e: Iterable) -> set[tuple[Any]]:
    """
    set where each tuple of s is extended in turn, by every item in e

    s = {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)}
    extend_mul(s, (5, 6)) == {(1, 2, 3, 5), (1, 2, 3, 6),
                              (1, 2, 4, 5), (1, 2, 4, 6),
                              (1, 3, 4, 5), (1, 3, 4, 6),
                              (2, 3, 4, 5), (2, 3, 4, 6)}
    """
    return {(*t, item) for t in s for item in e}

def extend_muli(s: set[tuple[Any]], e: Iterable) -> set[tuple[Any]]:
    """
    set where each tuple of s is extended in turn, by every *item in e

    s = {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)}
    extend_muli(s, ((5,), (6,))) == {(1, 2, 3, 5), (1, 2, 3, 6),
                              (1, 2, 4, 5), (1, 2, 4, 6),
                              (1, 3, 4, 5), (1, 3, 4, 6),
                              (2, 3, 4, 5), (2, 3, 4, 6)}
    """
    return {(*t, *item) for t in s for item in e}

def extend_add(s: set[tuple[Any]], e: Iterable) -> set[tuple[Any]]:
    """
    set where each tuple of s is extended once, all  items of *e

    s = {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)}
    extend_add(s, (5, 6)) == {(1, 2, 3, 5, 6), (1, 2, 4, 5, 6),
                              (1, 3, 4, 5, 6), (2, 3, 4, 5, 6)}
    """
    return {(*t, *e) for t in s}


Diff by 2 pattern finding

It was incremental - find a pattern in 3C?, subtract it from 5C3, Find a pattern in 3C? that covers part of the remainder; repeat.
(Where ? <=3).

I also shortened function pf_set to pf so it would take less space when printing formatted expressions in f-strings

# %%

print(f"To simplify formatted printing of sets: {(pf:=pf_set) = }\n")

print("# Looking again at the diffs by 2 i.e. `5C3 - 3C3`")
n, r, x = 3, 3, 2

print(f"5C3 - 3C3 = {pf(bino(5, 3) - bino(3, 3))}")
print(f"\n  There's {pf(bino(3, 3-1)) = } in there with each tuple extended by 4 and by 5 ")
print(f"\n  {pf(extend_mul(bino(3, 3-1), (3+1, 3+2))) = }")
print(f"\n  Whats left: {pf(tmp1 := (bino(5, 3) - bino(3, 3) - extend_mul(bino(3, 3-1), (3+1, 3+2)))) = }")
print(f"\n    Now {pf(bino(3, 3-2)) = }")
print(f"\n    So {pf(tmp2 := (extend_add(bino(3, 3-2), (3+1, 3+2)))) = }")
print(f"\n  Finally: {pf(tmp1 - tmp2) = }")

Cell output:

To simplify formatted printing of sets: (pf:=pf_set) = <function pf_set at 0x7f1bd45365c0>

# Looking again at the diffs by 2 i.e. `5C3 - 3C3`
5C3 - 3C3 = {(1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5)}

  There's pf(bino(3, 3-1)) = '{(1, 2), (1, 3), (2, 3)}' in there with each tuple extended by 4 and by 5

  pf(extend_mul(bino(3, 3-1), (3+1, 3+2))) = '{(1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (2, 3, 4), (2, 3, 5)}'

  Whats left: pf(tmp1 := (bino(5, 3) - bino(3, 3) - extend_mul(bino(3, 3-1), (3+1, 3+2)))) = '{(1, 4, 5), (2, 4, 5), (3, 4, 5)}'

    Now pf(bino(3, 3-2)) = '{(1,), (2,), (3,)}'

    So pf(tmp2 := (extend_add(bino(3, 3-2), (3+1, 3+2)))) = '{(1, 4, 5), (2, 4, 5), (3, 4, 5)}'

  Finally: pf(tmp1 - tmp2) = '{}'

Behold Diff by 2

# %%
n, r, x = None, None, None

print(f"\n# lets set some variables and use those for the diffs by 2")
print(f"  {(n:=3), (r:=3) = }")
print(f"  {len(bino(n+2, r)) = }")
print(f"  {bino(n+2, r) == (bino(n, r) | extend_mul(bino(n, r-1), (n+1, n+2)) | extend_add(bino(n, r-2), (n+1, n+2))) = }")

Cell output:

# lets set some variables and use those for the diffs by 2
  (n:=3), (r:=3) = (3, 3)
  len(bino(n+2, r)) = 10
  bino(n+2, r) == (bino(n, r) | extend_mul(bino(n, r-1), (n+1, n+2)) | extend_add(bino(n, r-2), (n+1, n+2))) = True

Pascals rules for increasing diffs

I followed the same method for diffs of three and ended up with these three functions:

# %%

# By similar observation and much checking:
def pascal_rule_1(n, r) -> bool:
    """
    Checks bino(n+1, r) == (bino(n, r)
                            | extend_add(bino(n, r-1), (n+1,)))
    """
    return bino(n+1, r) == (bino(n, r)
                            | extend_add(bino(n, r-1), (n+1,)))


def pascal_rule_2(n, r) -> bool:
    """
    Checks bino(n+2, r) == (bino(n, r)
                            | extend_mul(bino(n, r-1), (n+1, n+2))
                            | extend_add(bino(n, r-2), (n+1, n+2)))
    """
    return bino(n+2, r) == (bino(n, r)
                            | extend_mul(bino(n, r-1), (n+1, n+2))
                            | extend_add(bino(n, r-2), (n+1, n+2)))

def pascal_rule_3(n, r) -> bool:
    """
    Checks bino(n+3, r) == (bino(n, r)
                            | extend_muli(bino(n, r-1), tuple(combinations((n+1, n+2, n+3), 1)))
                            | extend_muli(bino(n, r-2), tuple(combinations((n+1, n+2, n+3), 2)))
                            | extend_muli(bino(n, r-3), tuple(combinations((n+1, n+2, n+3), 3)))
                            )
    """
    extra_n = tuple(range(n+1, n+4))  # n..n+3 inclusive
    return bino(n+3, r) == (bino(n, r)
                            | extend_muli(bino(n, r-1), tuple(combinations(extra_n, 1)))
                            | extend_muli(bino(n, r-2), tuple(combinations(extra_n, 2)))
                            | extend_muli(bino(n, r-3), tuple(combinations(extra_n, 3)))
                            )

# %%
# Simple Checks
assert pascal_rule_1(7, 4)
assert pascal_rule_2(9, 4)
assert pascal_rule_3(11, 4)

Generalised Pascals rule

What can I say, I looked for patterns in the pascals rule functions for discrete diffs and tried to find patterns. I looked deeper into identities between the extend functions.

I finally found the following function that passed my tests, (many not shown). 

# %%
# from pascal_rule_3 to pascal_rule_x

def pascal_rule_x(n, r, x) -> bool:
    """
    Checks bino(n+x, r) == union_reduce(<extend_muli selections of bino(n, 0 < r < r)> )

    ie if already used bino(n, r) and along comes x more for n then it shows how to
    calculate bino(n+x, r) without the need for storing bino(n, r)

    """
    extra_n = tuple(range(n+1, n+x+1))  # n..n+x inclusive
    n_r_terms = (extend_muli((bino(n, r-i) if r-i > 0 else {()}),   # extend this
                             tuple(combinations(extra_n, i)))       # by this
                for i in range(min(x, r) + 1))
    reduction = reduce(set.union, n_r_terms, set())
    return bino(n+x, r) == reduction

assert pascal_rule_x(11, 4, 3)  # n, r, x
assert pascal_rule_x(11, 5, 4)
assert pascal_rule_x(3, 2, 3)

I don't like the if r-i > 0 else {()} bit as it doesn't seem elegant. There is probably some identity to be found that would make it disappear but, you know.

Back to the original problem

If comb(d1, r) is processed and then we find an extra d2 items, then we want to process extra_comb(d1, r, d2) where extra_comb does not include or save comb of d1.

We just need to exclude the nCr term in reduction of function pascal_rule_x.

Eventually I arrive at

# %%

first = list(combinations('abcd', 3))
first
# %%
all = list(combinations('abcdef', 3))
# %%
extra = sorted(set(all) - set(first))
extra
# %%

def extra_combs(orig='abcd', r=3, delta='ef'):
    C = combinations  # Less typing
    extra_n = tuple(delta)  # n..n+x inclusive
    n = tuple(orig)
    n_r_terms = (extend_muli((C(n, r-i) if r-i > 0 else {()}),   # extend this
                             tuple(C(extra_n, i)))               # by this
                for i in range(1, min(len(extra_n), r) + 1))     # miss C(n, r)
    reduction = reduce(set.union, n_r_terms, set())
    # set(C(n+extra_n, r)) - reduction == set(C(n, r))
    return reduction

n, r, delta = 'abcd', 3, 'efg'
assert set(combinations(n+delta, r)) \
         == set(combinations(n, r)).union(extra_combs(n, r, delta))

Tests

# %%

# Test comb(n+x, r) == comb(n, r) | extra_combs(n, r, x)

n, r, delta_ = 'abcdefg', 3, 'hijkl'

for r in range(len(n)):
    for delta in (delta_[:i] for i in range(len(delta_))):
        combnx = set(combinations(n+delta, r))
        combn = set(combinations(n, r))
        extra = extra_combs(n, r, delta)
        assert combnx == (combn | extra), f"Whoops! For  {(n, r, delta) = }"
        # checks that extra does not generate any of combn
        assert not extra.intersection(combn), f"Whoops! For  {(n, r, delta) = }"


END.


No comments:

Post a Comment

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive