Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

Saturday, August 30, 2014

Shuffling to create unique Permutations.

"The chances that anyone has ever shuffled a pack of cards in the same way twice in the history of the world are infinitesimally small, statistically speaking."
From QI: http://qi.com/infocloud/playing-cards
Given the task of creating a random sample size of N unique permutations of K cards then for quite small values of K, and N in the millions for example the total number of permutations of K cards is much greater than N. This means that S random shuffles of the cards, where S is "quite close" to N is likely to generate the N unique permutations.

Factorials get large quickly!

The number of permutations of K cards is given by K!, K-factorial.
Here is a table of factorials:
In [6]:
from math import factorial as fact

for k in range(1,10) + range(10, 41, 5):
    print('K = %2i, K! = %10e (= %i)' % (k, fact(k), fact(k)))
K =  1, K! = 1.000000e+00 (= 1)
K =  2, K! = 2.000000e+00 (= 2)
K =  3, K! = 6.000000e+00 (= 6)
K =  4, K! = 2.400000e+01 (= 24)
K =  5, K! = 1.200000e+02 (= 120)
K =  6, K! = 7.200000e+02 (= 720)
K =  7, K! = 5.040000e+03 (= 5040)
K =  8, K! = 4.032000e+04 (= 40320)
K =  9, K! = 3.628800e+05 (= 362880)
K = 10, K! = 3.628800e+06 (= 3628800)
K = 15, K! = 1.307674e+12 (= 1307674368000)
K = 20, K! = 2.432902e+18 (= 2432902008176640000)
K = 25, K! = 1.551121e+25 (= 15511210043330985984000000)
K = 30, K! = 2.652529e+32 (= 265252859812191058636308480000000)
K = 35, K! = 1.033315e+40 (= 10333147966386144929666651337523200000000)
K = 40, K! = 8.159153e+47 (= 815915283247897734345611269596115894272000000000)

Random sample counts for different N and K

Python uses the random modules Mersenne Twister algorithm producing 53-bit precision floats having a period of 2**19937-1 for its pseudorandom number generator and the unbiased Fischer-Yates (a.k.a Knuth) shuffle algorithm.
In [14]:
from random import shuffle

for n in (1000, 10000, 100000, 1000000):
    kstart = [x for x in range(20) if fact(x) >= n][0]
    for k in range(kstart, kstart + 6):
        perms = set()
        perm = list(range(k))
        shuffles = 0
        while len(perms) < n:
            for i in range(n - len(perms)):
                shuffle(perm)
                shuffles += 1
                perms.add(tuple(perm))
        print('N=%7i, K=%2i, shuffles=%7i i.e %5.1f%% more shuffles than N, K!/N= %i' 
              % (n, k, shuffles, shuffles * 100. / n - 100, fact(k) // n))
    print('')
N=   1000, K= 7, shuffles=   1099 i.e   9.9% more shuffles than N, K!/N= 5
N=   1000, K= 8, shuffles=   1020 i.e   2.0% more shuffles than N, K!/N= 40
N=   1000, K= 9, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 362
N=   1000, K=10, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 3628
N=   1000, K=11, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 39916
N=   1000, K=12, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 479001

N=  10000, K= 8, shuffles=  11508 i.e  15.1% more shuffles than N, K!/N= 4
N=  10000, K= 9, shuffles=  10131 i.e   1.3% more shuffles than N, K!/N= 36
N=  10000, K=10, shuffles=  10016 i.e   0.2% more shuffles than N, K!/N= 362
N=  10000, K=11, shuffles=  10001 i.e   0.0% more shuffles than N, K!/N= 3991
N=  10000, K=12, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 47900
N=  10000, K=13, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 622702

N= 100000, K= 9, shuffles= 116985 i.e  17.0% more shuffles than N, K!/N= 3
N= 100000, K=10, shuffles= 101308 i.e   1.3% more shuffles than N, K!/N= 36
N= 100000, K=11, shuffles= 100136 i.e   0.1% more shuffles than N, K!/N= 399
N= 100000, K=12, shuffles= 100015 i.e   0.0% more shuffles than N, K!/N= 4790
N= 100000, K=13, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 62270
N= 100000, K=14, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 871782

N=1000000, K=10, shuffles=1170059 i.e  17.0% more shuffles than N, K!/N= 3
N=1000000, K=11, shuffles=1012763 i.e   1.3% more shuffles than N, K!/N= 39
N=1000000, K=12, shuffles=1001070 i.e   0.1% more shuffles than N, K!/N= 479
N=1000000, K=13, shuffles=1000075 i.e   0.0% more shuffles than N, K!/N= 6227
N=1000000, K=14, shuffles=1000007 i.e   0.0% more shuffles than N, K!/N= 87178
N=1000000, K=15, shuffles=1000000 i.e   0.0% more shuffles than N, K!/N= 1307674


That last line of the table above means that if you want a million perms of just fifteen cards then you are likely to get them by just shuffling the cards and writing what you get down a million times!

End.

Saturday, August 02, 2014

Permutations: an exercise. In Python

Following on from this post by Harry Garrood. This is an implementation in Python.
I don't read Haskell very well so took Harry's description of the method as well as his description of how to extract the cycle length of a permutation and then did my own Python thing.
Harrys's description of the task had a lovely animation to keep me on the right path but missed out one crucial step for me.
Harry states:
  1. Take one card from the top of the deck and discard it into a second pile.
  2. Take another card from the top of the deck, and put it at the bottom of the deck.
  3. Repeat these two steps, putting all discarded cards from step 1 into the same pile, until the original deck is all gone and the second pile has all the cards in it.
The above constitutes one shuffle but the next shuffle involves swappling the second for the first pile and then repeating by first discarding then "bottoming" for the next shuffle.
I had a variable set up to alternate between "discarding" and "bottoming" and initially did not re-initialize it on starting the next shuffle, so beware.

Algorithm in action

This was actually the last routine that I wrote, as I needed it to explain the algorithm with nice output:
In [16]:
def shuffleperms_tutor(cards='A2345'):
    start, shuffles = list(cards), 0
    deck2 = start[::]
    fmt = '{!s:=^26} {!s:=^27} {!s:=^19}'
    fm2 = fmt.replace('=^', '').replace('s', 'r', 2)
    print(fmt.format('DECK', 'DECK2', 'COMMENT'))
    while True:
        deck, deck2, discard = deck2, [], True
        print(fm2.format(deck, deck2, 'Shuffle {}'.format(shuffles)))
        while deck:
            if discard:
                deck2.insert(0, deck.pop(0))
            else:
                deck.append(deck.pop(0))
            print(fm2.format(deck, deck2,
                             'Post Discard' if discard else 'Post bottomming'))
            discard = not discard
        shuffles += 1
        if deck2 == start:
            break
    return shuffles
In [17]:
shuffleperms_tutor(cards='A234')
===========DECK=========== ===========DECK2=========== ======COMMENT======
['A', '2', '3', '4']       []                          Shuffle 0          
['2', '3', '4']            ['A']                       Post Discard       
['3', '4', '2']            ['A']                       Post bottomming    
['4', '2']                 ['3', 'A']                  Post Discard       
['2', '4']                 ['3', 'A']                  Post bottomming    
['4']                      ['2', '3', 'A']             Post Discard       
['4']                      ['2', '3', 'A']             Post bottomming    
[]                         ['4', '2', '3', 'A']        Post Discard       
['4', '2', '3', 'A']       []                          Shuffle 1          
['2', '3', 'A']            ['4']                       Post Discard       
['3', 'A', '2']            ['4']                       Post bottomming    
['A', '2']                 ['3', '4']                  Post Discard       
['2', 'A']                 ['3', '4']                  Post bottomming    
['A']                      ['2', '3', '4']             Post Discard       
['A']                      ['2', '3', '4']             Post bottomming    
[]                         ['A', '2', '3', '4']        Post Discard       

Out[17]:
2
If you run shuffleperms_tutor(cards='A2345') you get an equivalent to Harry's first example that begins
===========DECK=========== ===========DECK2=========== ======COMMENT======`

['A', '2', '3', '4', '5']  []                          Shuffle 0          

['2', '3', '4', '5']       ['A']                       Post Discard       

['3', '4', '5', '2']       ['A']                       Post bottomming    

['4', '5', '2']            ['3', 'A']                  Post Discard       

['5', '2', '4']            ['3', 'A']                  Post bottomming    

['2', '4']                 ['5', '3', 'A']             Post Discard       

['4', '2']                 ['5', '3', 'A']             Post bottomming    

['2']                      ['4', '5', '3', 'A']        Post Discard       

['2']                      ['4', '5', '3', 'A']        Post bottomming    

[]                         ['2', '4', '5', '3', 'A']   Post Discard       

['2', '4', '5', '3', 'A']  []                          Shuffle 1          

...

No tutor

If you strip out the print statements to give the tutorial on algorithm progress; and switch to using a range of integers as the cards in the deck you get the following
In [18]:
def shuffleperms(cards):
    start, shuffles = list(range(cards)), 0
    deck2 = start[::]
    while True:
        deck, deck2, discard = deck2, [], True
        while deck:
            if discard:
                deck2.insert(0, deck.pop(0))
            else:
                deck.append(deck.pop(0))
            discard = not discard
        shuffles += 1
        if deck2 == start:
            break
    return shuffles
In [19]:
print([shuffleperms(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

Issues with naming

After learning what was happening I thoght of an improved(?) way to state the process that does not use the term "discard" and explicitely states what happens when going from one shuffle to another:

Alternate definition

Bruteforce algorithm:
  1. Start with an empty "deck" and a second deck of N different cards.
  2. Record the starting order of the second deck.
  3. If the first deck is empty then swap the decks and initialise the action on the first deck to be a "Transfer"
  4. Whilst there are cards in the first deck:
  5. Transfer the top card of the deck to the top of second deck
  6. Move the the resultant top card of the deck to the bottom of the first deck.
  7. Continue alternating steps 4.A and 4.B until all cards are eventually transferred to the second deck.
  8. This constitutes one shuffle of the deck back to the second deck
  9. If the order of cards in the second deck equals the starting order then return the number of shuffles; otherwise repeat from step 3.

Bruteforce it

This was my first implementation.
I had realised that the alternate transferring of a card then moving a card could be done by indexing of the list that is the deck of cards and wrote:
In [20]:
def shuffleperms_bruteforce(cards):
    start, shuffles = list(range(cards)), 0
    deck2 = start[::]
    while True:
        deck, deck2, transfer = deck2, [], True
        while deck:
            if transfer:
                transfered, kept = deck[0::2], deck[1::2]
            else:    
                transfered, kept = deck[1::2], deck[0::2]
            transfer = transfer if len(deck) % 2 == 0 else not transfer
            deck, deck2 = kept, transfered[::-1] + deck2
        shuffles += 1
        if deck2 == start:
            break
    return shuffles

print([shuffleperms_bruteforce(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

Cycle counting of a permutation.

Thanks must be given to Harry for a very enjoyable section explaining this method
  1. Do the first shuffle - which we now know produces the first permutation
  2. Extract the cycles mapping initial positions to items positions in the next permutation.
  3. Calculate the result as the lowest common multiple of the lengths of all the cycles.

An lcm routine

I actually had one hanging around from prior Rosetta Code tasks. You could use this:
In [21]:
import fractions
def lcm(a, b): 
    return int(abs(a * b) / fractions.gcd(a, b)) if a and b else 0

lcm(3, 7)
Out[21]:
21

The main cycle finding routine

In [22]:
def shuffleperms_permcycles(cards):
    start, perm = _firstperm(cards)
    permedto = [perm.index(i) for i in start]
    cycles = _extract_cycles(start, permedto)
    shuffles = reduce(lcm, (len(c) for c in cycles))
    return shuffles

def _firstperm(cards):
    deck, deck2, transfer = list(range(cards)), [], True
    start = deck[::]
    while deck:
        if transfer:
            transfered, kept = deck[0::2], deck[1::2]
        else:    
            transfered, kept = deck[1::2], deck[0::2]
        transfer = transfer if len(deck) % 2 == 0 else not transfer
        deck, deck2 = kept, transfered[::-1] + deck2
    return start, deck2

def _extract_cycles(start, permedto):
    remains, cycles = start[::], []
    while remains:
        cycles.append([])
        item, thiscycle = remains[0], cycles[-1]
        while item in remains:
            remains.remove(item)
            thiscycle.append(item)
            item = permedto[item]
    return cycles

print([shuffleperms_permcycles(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

OEIS

The On-Line Encyclopedia of Integer Sequences doesn't seem to have this sequence of values:
In [23]:
[shuffleperms_permcycles(i) for i in range(1,20)]
Out[23]:
[1, 2, 3, 2, 5, 6, 5, 4, 6, 6, 15, 12, 12, 30, 15, 4, 17, 18, 10]
I would love to add it. It needs adding. My problem is that this task doesn't seem to have a proper name! It would be presumptious of me to name it, but if you have a reference to this shuffle that came with a name then please mention it in the comments.

END.

Sunday, July 27, 2014

Does the Californian dressing down, jeans-n-T-shirt attire of most software professionals hurt us?

Earlier generations have said  that "You are what you wear" but that doesn't seem to apply in the software industry wear billionaires are seen in the media with hardly a jacket, let alone a tailored suit and tie.

Everybody thinks that they too can be just as successful and, of course, you want to fit in with peers, so most dress down in comparison with other professions.

But what is the effect on the individual?  We can't all be Steve Jobs. Most of us will end up in the median of the profession. Could we  do better if we valued ourselves higher and signalled that to our employers by a mass change in our attire?

We have already seen that large tech companies have been found guilty of colluding to hold back our wages; and that Software outsourcing is not the panacea it was wished to be. Managers know that it can be difficult to recruit good developers too.

So, in this capitalist environment we find ourselves in, maybe we should follow the legal profession and recognise that, in some ways, we are what we wear, and to our employers - currently, they are happy for us to wear jeans.

Thursday, July 10, 2014

Unique numbers in multiplication tables

 I answered a Stack overflow question that asked for the number of unique numbers in an n-by-n multiplication table and got it horribly wrong.
I decided to look into the maths of it a little further last night and now know a bit more about the question but still have no formula for generating the number of distinct integers in an n-by-n multiplication table.

Table generator

In [1]:
def ptable(n):
    width = len(str(n*n))
    nrange = range(1, n+1)
    for i in nrange:
        print(' '.join('%*i' % (width, i * j) for j in nrange))

for n in range(1, 4):
    n2 = 2**n
    print('\n%i-by-%i times table' % (n2, n2))
    ptable(n2)

    
2-by-2 times table
1 2
2 4

4-by-4 times table
 1  2  3  4
 2  4  6  8
 3  6  9 12
 4  8 12 16

8-by-8 times table
 1  2  3  4  5  6  7  8
 2  4  6  8 10 12 14 16
 3  6  9 12 15 18 21 24
 4  8 12 16 20 24 28 32
 5 10 15 20 25 30 35 40
 6 12 18 24 30 36 42 48
 7 14 21 28 35 42 49 56
 8 16 24 32 40 48 56 64

Unique number counter

The only way i could think of counting the number of unique integers is to generate the table then get the size of the set of the tables numbers:
In [2]:
def mult_table_numbs(n):
    nrange = range(1, n+1)
    return len(set(n0 * n1 for n0 in nrange for n1 in nrange))

for n in range(1,11):
    print('%2i: %i' % (n, mult_table_numbs(n)))
 1: 1
 2: 3
 3: 6
 4: 9
 5: 14
 6: 18
 7: 25
 8: 30
 9: 36
10: 42

So for n of 4 there are 9 distinct numbers counted.
Stick the numbers 1,3,6,9,14,18,25,30,36,42 into OEIS and the integer sequence search engine indeed recognises it as A027424
I simplified. I actually first used Pythons itertools.product and reduce. for product I had to use product(range(1,n+1), repeat=2). The repeat=2 is because it is for a two-dimensional multiplication table.
That got me thinking and I quickly realised that I could create a k-dimensional multiplication table unique number counter by setting the repeat to a new argument k with default value 2.

The k-dimensional multiplication table unique number counter

In [3]:
from itertools import product
from functools import reduce

mul = int.__mul__

def mult_table_numbers(n, k=2):
    return len(set(reduce(mul, p, 1) 
                   for p in product(range(1,n+1), repeat=k)))

Dimension hopping

OEIS mentioned no formulae for generating the unique number counts except by direct counting so I thought "what if I explore the dimensions k as well as varying n"?
print('n,k=2.. matrix')

for n in range(2,13):

    print('%2i: %s' % (n, repr([mult_table_numbers(n, k)

                                for k in range(2,10)]).replace(' ','')))
The above takes several tens of minutes to run but finally gives the following results:
n,k=2.. matrix

 2: [3,4,5,6,7,8,9,10]

 3: [6,10,15,21,28,36,45,55]

 4: [9,16,25,36,49,64,81,100]

 5: [14,30,55,91,140,204,285,385]

 6: [18,40,75,126,196,288,405,550]

 7: [25,65,140,266,462,750,1155,1705]

 8: [30,80,175,336,588,960,1485,2200]

 9: [36,100,225,441,784,1296,2025,3025]

10: [42,120,275,546,980,1632,2565,3850]
The 2: [3,4,5,6,7,8,9,10] line reads that for n=2: a 2-dimensional multiplication table has 3 distinct numbers; a 3D table 4; a 5D table 6 distinct numbers, and so on...
Now each line with its series of distinct numbers for different k- dimensions is itself a sequence. The n=2 line is the sequence k+1 for example. I could look up the sequences in OEIS and extract the formulas for each row. If I could then link the formulas for different rows I might be able to finally generate a formula for the unique count, lets call it m(n, k) for dimension k=2.

OEIS extraction

Time on OEIS revealed the following information:
m(n,k)PolynomialOEIS Description
m(2,k)(k+1)Positive integers A000027
m(3,k)(k+1)*(k+2)/2Triangular numbers A000217
m(4,k)(k+1)*(k+1)Squares A000290
m(5,k)(k+1)(k+2)(2*(k+1)+1)/6Square pyramidal numbers A000330
m(6,k)(k+1)*2(k+2)/2Pentagonal pyramidal numbers A002411
m(7,k)(k+1)(2+k)(3+k)(1+3(k+1))/244-dimensional pyramidal numbers A001296
m(8,k)(k+1)^2(k+2)(k+3)/64-dimensional figurate numbers: A002417
m(9,k)((k+1)*(k+2)/2)^2Sum of first n cubes; or n-th triangular number squared A000537
m(10,k)(k+1)*2(k+2)(2k+3)/6A108678
Whilst playing around with the polynomials using SymPy Gamma and the Wolfram sites I rearranged them to help find patterns

Constant multipliers for the polynomials above (when expanded)

I could find no patterns in this:
nk**0k**1k**2k**3k**4
211---
313/21/2--
4121--
5113/63/21/3-
615/221/2-
7131/1219/811/121/8
8117/617/67/61/6
91313/43/21/4
10119/611/311/61/3


nrearranged polynomial for m(n,k)
1(1/(1)) * (1+0k)
2(1/(1)) * (1+k)
3(1/(1*2)) * (1+k)(2+k)
4(1/(1*1)) * (1+k)(1+k)
5(1/(1*2*3)) * (1+k)(2+k)(3+2k)
6(1/(1*2*1)) * (1+k)(2+k)(1+k)
7(1/(1*2*3*4)) * (1+k)(2+k)(3+k)(4+3k)
8(1/(1*2*1*3)) * (1+k)(2+k)(1+k)(3+k)
9(1/(1*2*1*2)) * (1+k)(2+k)(1+k)(2+k)
10(1/(1*2*1*3)) * (1+k)(2+k)(1+k)(3+2k)
There are some patterns in the above. the divisor is always all the constant terms (x+yk) of the polynomial multiplied together.
I then saw that some later polynomials were multiples of earlier ones so decided on using the shorthand @n for m(n,k) to produce the following:

shortformfactored polynomialAlternate factorisation
@1(1+0k)
@2(1+k)
@3@2*(2+k) / 2
@4@2*@2
@5@3*(3+2k) / 3
@6@2*@3
@7@3*(3+k)(4+3k) / 6
@8(@6 = @2*@3)*(3+k) / 3@4*(2+k)(3+k) / 6
@9(@6 = @2*@3)*(2+k) / 2@3*@3
@10(@6 = @2*@3)*(3+2k) / 3@2*@5

Unfortunately I can see no pervasive patterns here either.

Conclusion

I've looked. I've enjoyed the journey, but I've still to find what I'm looking for!

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive