Mainly Tech projects on Python and Electronic Design Automation.

Saturday, August 30, 2014

Shuffling to create unique Permutations.

"The chances that anyone has ever shuffled a pack of cards in the same way twice in the history of the world are infinitesimally small, statistically speaking."
From QI: http://qi.com/infocloud/playing-cards
Given the task of creating a random sample size of N unique permutations of K cards then for quite small values of K, and N in the millions for example the total number of permutations of K cards is much greater than N. This means that S random shuffles of the cards, where S is "quite close" to N is likely to generate the N unique permutations.

Factorials get large quickly!

The number of permutations of K cards is given by K!, K-factorial.
Here is a table of factorials:
In [6]:
from math import factorial as fact

for k in range(1,10) + range(10, 41, 5):
    print('K = %2i, K! = %10e (= %i)' % (k, fact(k), fact(k)))
K =  1, K! = 1.000000e+00 (= 1)
K =  2, K! = 2.000000e+00 (= 2)
K =  3, K! = 6.000000e+00 (= 6)
K =  4, K! = 2.400000e+01 (= 24)
K =  5, K! = 1.200000e+02 (= 120)
K =  6, K! = 7.200000e+02 (= 720)
K =  7, K! = 5.040000e+03 (= 5040)
K =  8, K! = 4.032000e+04 (= 40320)
K =  9, K! = 3.628800e+05 (= 362880)
K = 10, K! = 3.628800e+06 (= 3628800)
K = 15, K! = 1.307674e+12 (= 1307674368000)
K = 20, K! = 2.432902e+18 (= 2432902008176640000)
K = 25, K! = 1.551121e+25 (= 15511210043330985984000000)
K = 30, K! = 2.652529e+32 (= 265252859812191058636308480000000)
K = 35, K! = 1.033315e+40 (= 10333147966386144929666651337523200000000)
K = 40, K! = 8.159153e+47 (= 815915283247897734345611269596115894272000000000)

Random sample counts for different N and K

Python uses the random modules Mersenne Twister algorithm producing 53-bit precision floats having a period of 2**19937-1 for its pseudorandom number generator and the unbiased Fischer-Yates (a.k.a Knuth) shuffle algorithm.
In [14]:
from random import shuffle

for n in (1000, 10000, 100000, 1000000):
    kstart = [x for x in range(20) if fact(x) >= n][0]
    for k in range(kstart, kstart + 6):
        perms = set()
        perm = list(range(k))
        shuffles = 0
        while len(perms) < n:
            for i in range(n - len(perms)):
                shuffle(perm)
                shuffles += 1
                perms.add(tuple(perm))
        print('N=%7i, K=%2i, shuffles=%7i i.e %5.1f%% more shuffles than N, K!/N= %i' 
              % (n, k, shuffles, shuffles * 100. / n - 100, fact(k) // n))
    print('')
N=   1000, K= 7, shuffles=   1099 i.e   9.9% more shuffles than N, K!/N= 5
N=   1000, K= 8, shuffles=   1020 i.e   2.0% more shuffles than N, K!/N= 40
N=   1000, K= 9, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 362
N=   1000, K=10, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 3628
N=   1000, K=11, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 39916
N=   1000, K=12, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 479001

N=  10000, K= 8, shuffles=  11508 i.e  15.1% more shuffles than N, K!/N= 4
N=  10000, K= 9, shuffles=  10131 i.e   1.3% more shuffles than N, K!/N= 36
N=  10000, K=10, shuffles=  10016 i.e   0.2% more shuffles than N, K!/N= 362
N=  10000, K=11, shuffles=  10001 i.e   0.0% more shuffles than N, K!/N= 3991
N=  10000, K=12, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 47900
N=  10000, K=13, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 622702

N= 100000, K= 9, shuffles= 116985 i.e  17.0% more shuffles than N, K!/N= 3
N= 100000, K=10, shuffles= 101308 i.e   1.3% more shuffles than N, K!/N= 36
N= 100000, K=11, shuffles= 100136 i.e   0.1% more shuffles than N, K!/N= 399
N= 100000, K=12, shuffles= 100015 i.e   0.0% more shuffles than N, K!/N= 4790
N= 100000, K=13, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 62270
N= 100000, K=14, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 871782

N=1000000, K=10, shuffles=1170059 i.e  17.0% more shuffles than N, K!/N= 3
N=1000000, K=11, shuffles=1012763 i.e   1.3% more shuffles than N, K!/N= 39
N=1000000, K=12, shuffles=1001070 i.e   0.1% more shuffles than N, K!/N= 479
N=1000000, K=13, shuffles=1000075 i.e   0.0% more shuffles than N, K!/N= 6227
N=1000000, K=14, shuffles=1000007 i.e   0.0% more shuffles than N, K!/N= 87178
N=1000000, K=15, shuffles=1000000 i.e   0.0% more shuffles than N, K!/N= 1307674


That last line of the table above means that if you want a million perms of just fifteen cards then you are likely to get them by just shuffling the cards and writing what you get down a million times!

End.

Saturday, August 02, 2014

Permutations: an exercise. In Python

Following on from this post by Harry Garrood. This is an implementation in Python.
I don't read Haskell very well so took Harry's description of the method as well as his description of how to extract the cycle length of a permutation and then did my own Python thing.
Harrys's description of the task had a lovely animation to keep me on the right path but missed out one crucial step for me.
Harry states:
  1. Take one card from the top of the deck and discard it into a second pile.
  2. Take another card from the top of the deck, and put it at the bottom of the deck.
  3. Repeat these two steps, putting all discarded cards from step 1 into the same pile, until the original deck is all gone and the second pile has all the cards in it.
The above constitutes one shuffle but the next shuffle involves swappling the second for the first pile and then repeating by first discarding then "bottoming" for the next shuffle.
I had a variable set up to alternate between "discarding" and "bottoming" and initially did not re-initialize it on starting the next shuffle, so beware.

Algorithm in action

This was actually the last routine that I wrote, as I needed it to explain the algorithm with nice output:
In [16]:
def shuffleperms_tutor(cards='A2345'):
    start, shuffles = list(cards), 0
    deck2 = start[::]
    fmt = '{!s:=^26} {!s:=^27} {!s:=^19}'
    fm2 = fmt.replace('=^', '').replace('s', 'r', 2)
    print(fmt.format('DECK', 'DECK2', 'COMMENT'))
    while True:
        deck, deck2, discard = deck2, [], True
        print(fm2.format(deck, deck2, 'Shuffle {}'.format(shuffles)))
        while deck:
            if discard:
                deck2.insert(0, deck.pop(0))
            else:
                deck.append(deck.pop(0))
            print(fm2.format(deck, deck2,
                             'Post Discard' if discard else 'Post bottomming'))
            discard = not discard
        shuffles += 1
        if deck2 == start:
            break
    return shuffles
In [17]:
shuffleperms_tutor(cards='A234')
===========DECK=========== ===========DECK2=========== ======COMMENT======
['A', '2', '3', '4']       []                          Shuffle 0          
['2', '3', '4']            ['A']                       Post Discard       
['3', '4', '2']            ['A']                       Post bottomming    
['4', '2']                 ['3', 'A']                  Post Discard       
['2', '4']                 ['3', 'A']                  Post bottomming    
['4']                      ['2', '3', 'A']             Post Discard       
['4']                      ['2', '3', 'A']             Post bottomming    
[]                         ['4', '2', '3', 'A']        Post Discard       
['4', '2', '3', 'A']       []                          Shuffle 1          
['2', '3', 'A']            ['4']                       Post Discard       
['3', 'A', '2']            ['4']                       Post bottomming    
['A', '2']                 ['3', '4']                  Post Discard       
['2', 'A']                 ['3', '4']                  Post bottomming    
['A']                      ['2', '3', '4']             Post Discard       
['A']                      ['2', '3', '4']             Post bottomming    
[]                         ['A', '2', '3', '4']        Post Discard       

Out[17]:
2
If you run shuffleperms_tutor(cards='A2345') you get an equivalent to Harry's first example that begins
===========DECK=========== ===========DECK2=========== ======COMMENT======`

['A', '2', '3', '4', '5']  []                          Shuffle 0          

['2', '3', '4', '5']       ['A']                       Post Discard       

['3', '4', '5', '2']       ['A']                       Post bottomming    

['4', '5', '2']            ['3', 'A']                  Post Discard       

['5', '2', '4']            ['3', 'A']                  Post bottomming    

['2', '4']                 ['5', '3', 'A']             Post Discard       

['4', '2']                 ['5', '3', 'A']             Post bottomming    

['2']                      ['4', '5', '3', 'A']        Post Discard       

['2']                      ['4', '5', '3', 'A']        Post bottomming    

[]                         ['2', '4', '5', '3', 'A']   Post Discard       

['2', '4', '5', '3', 'A']  []                          Shuffle 1          

...

No tutor

If you strip out the print statements to give the tutorial on algorithm progress; and switch to using a range of integers as the cards in the deck you get the following
In [18]:
def shuffleperms(cards):
    start, shuffles = list(range(cards)), 0
    deck2 = start[::]
    while True:
        deck, deck2, discard = deck2, [], True
        while deck:
            if discard:
                deck2.insert(0, deck.pop(0))
            else:
                deck.append(deck.pop(0))
            discard = not discard
        shuffles += 1
        if deck2 == start:
            break
    return shuffles
In [19]:
print([shuffleperms(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

Issues with naming

After learning what was happening I thoght of an improved(?) way to state the process that does not use the term "discard" and explicitely states what happens when going from one shuffle to another:

Alternate definition

Bruteforce algorithm:
  1. Start with an empty "deck" and a second deck of N different cards.
  2. Record the starting order of the second deck.
  3. If the first deck is empty then swap the decks and initialise the action on the first deck to be a "Transfer"
  4. Whilst there are cards in the first deck:
  5. Transfer the top card of the deck to the top of second deck
  6. Move the the resultant top card of the deck to the bottom of the first deck.
  7. Continue alternating steps 4.A and 4.B until all cards are eventually transferred to the second deck.
  8. This constitutes one shuffle of the deck back to the second deck
  9. If the order of cards in the second deck equals the starting order then return the number of shuffles; otherwise repeat from step 3.

Bruteforce it

This was my first implementation.
I had realised that the alternate transferring of a card then moving a card could be done by indexing of the list that is the deck of cards and wrote:
In [20]:
def shuffleperms_bruteforce(cards):
    start, shuffles = list(range(cards)), 0
    deck2 = start[::]
    while True:
        deck, deck2, transfer = deck2, [], True
        while deck:
            if transfer:
                transfered, kept = deck[0::2], deck[1::2]
            else:    
                transfered, kept = deck[1::2], deck[0::2]
            transfer = transfer if len(deck) % 2 == 0 else not transfer
            deck, deck2 = kept, transfered[::-1] + deck2
        shuffles += 1
        if deck2 == start:
            break
    return shuffles

print([shuffleperms_bruteforce(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

Cycle counting of a permutation.

Thanks must be given to Harry for a very enjoyable section explaining this method
  1. Do the first shuffle - which we now know produces the first permutation
  2. Extract the cycles mapping initial positions to items positions in the next permutation.
  3. Calculate the result as the lowest common multiple of the lengths of all the cycles.

An lcm routine

I actually had one hanging around from prior Rosetta Code tasks. You could use this:
In [21]:
import fractions
def lcm(a, b): 
    return int(abs(a * b) / fractions.gcd(a, b)) if a and b else 0

lcm(3, 7)
Out[21]:
21

The main cycle finding routine

In [22]:
def shuffleperms_permcycles(cards):
    start, perm = _firstperm(cards)
    permedto = [perm.index(i) for i in start]
    cycles = _extract_cycles(start, permedto)
    shuffles = reduce(lcm, (len(c) for c in cycles))
    return shuffles

def _firstperm(cards):
    deck, deck2, transfer = list(range(cards)), [], True
    start = deck[::]
    while deck:
        if transfer:
            transfered, kept = deck[0::2], deck[1::2]
        else:    
            transfered, kept = deck[1::2], deck[0::2]
        transfer = transfer if len(deck) % 2 == 0 else not transfer
        deck, deck2 = kept, transfered[::-1] + deck2
    return start, deck2

def _extract_cycles(start, permedto):
    remains, cycles = start[::], []
    while remains:
        cycles.append([])
        item, thiscycle = remains[0], cycles[-1]
        while item in remains:
            remains.remove(item)
            thiscycle.append(item)
            item = permedto[item]
    return cycles

print([shuffleperms_permcycles(i) for i in (4, 5, 52, 53, 100)])
[2, 5, 510, 53, 120]

OEIS

The On-Line Encyclopedia of Integer Sequences doesn't seem to have this sequence of values:
In [23]:
[shuffleperms_permcycles(i) for i in range(1,20)]
Out[23]:
[1, 2, 3, 2, 5, 6, 5, 4, 6, 6, 15, 12, 12, 30, 15, 4, 17, 18, 10]
I would love to add it. It needs adding. My problem is that this task doesn't seem to have a proper name! It would be presumptious of me to name it, but if you have a reference to this shuffle that came with a name then please mention it in the comments.

END.

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive