Saturday, August 30, 2014

Shuffling to create unique Permutations.

"The chances that anyone has ever shuffled a pack of cards in the same way twice in the history of the world are infinitesimally small, statistically speaking."
From QI: http://qi.com/infocloud/playing-cards
Given the task of creating a random sample size of N unique permutations of K cards then for quite small values of K, and N in the millions for example the total number of permutations of K cards is much greater than N. This means that S random shuffles of the cards, where S is "quite close" to N is likely to generate the N unique permutations.

Factorials get large quickly!

The number of permutations of K cards is given by K!, K-factorial.
Here is a table of factorials:
In [6]:
from math import factorial as fact

for k in range(1,10) + range(10, 41, 5):
    print('K = %2i, K! = %10e (= %i)' % (k, fact(k), fact(k)))
K =  1, K! = 1.000000e+00 (= 1)
K =  2, K! = 2.000000e+00 (= 2)
K =  3, K! = 6.000000e+00 (= 6)
K =  4, K! = 2.400000e+01 (= 24)
K =  5, K! = 1.200000e+02 (= 120)
K =  6, K! = 7.200000e+02 (= 720)
K =  7, K! = 5.040000e+03 (= 5040)
K =  8, K! = 4.032000e+04 (= 40320)
K =  9, K! = 3.628800e+05 (= 362880)
K = 10, K! = 3.628800e+06 (= 3628800)
K = 15, K! = 1.307674e+12 (= 1307674368000)
K = 20, K! = 2.432902e+18 (= 2432902008176640000)
K = 25, K! = 1.551121e+25 (= 15511210043330985984000000)
K = 30, K! = 2.652529e+32 (= 265252859812191058636308480000000)
K = 35, K! = 1.033315e+40 (= 10333147966386144929666651337523200000000)
K = 40, K! = 8.159153e+47 (= 815915283247897734345611269596115894272000000000)

Random sample counts for different N and K

Python uses the random modules Mersenne Twister algorithm producing 53-bit precision floats having a period of 2**19937-1 for its pseudorandom number generator and the unbiased Fischer-Yates (a.k.a Knuth) shuffle algorithm.
In [14]:
from random import shuffle

for n in (1000, 10000, 100000, 1000000):
    kstart = [x for x in range(20) if fact(x) >= n][0]
    for k in range(kstart, kstart + 6):
        perms = set()
        perm = list(range(k))
        shuffles = 0
        while len(perms) < n:
            for i in range(n - len(perms)):
                shuffle(perm)
                shuffles += 1
                perms.add(tuple(perm))
        print('N=%7i, K=%2i, shuffles=%7i i.e %5.1f%% more shuffles than N, K!/N= %i' 
              % (n, k, shuffles, shuffles * 100. / n - 100, fact(k) // n))
    print('')
N=   1000, K= 7, shuffles=   1099 i.e   9.9% more shuffles than N, K!/N= 5
N=   1000, K= 8, shuffles=   1020 i.e   2.0% more shuffles than N, K!/N= 40
N=   1000, K= 9, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 362
N=   1000, K=10, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 3628
N=   1000, K=11, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 39916
N=   1000, K=12, shuffles=   1000 i.e   0.0% more shuffles than N, K!/N= 479001

N=  10000, K= 8, shuffles=  11508 i.e  15.1% more shuffles than N, K!/N= 4
N=  10000, K= 9, shuffles=  10131 i.e   1.3% more shuffles than N, K!/N= 36
N=  10000, K=10, shuffles=  10016 i.e   0.2% more shuffles than N, K!/N= 362
N=  10000, K=11, shuffles=  10001 i.e   0.0% more shuffles than N, K!/N= 3991
N=  10000, K=12, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 47900
N=  10000, K=13, shuffles=  10000 i.e   0.0% more shuffles than N, K!/N= 622702

N= 100000, K= 9, shuffles= 116985 i.e  17.0% more shuffles than N, K!/N= 3
N= 100000, K=10, shuffles= 101308 i.e   1.3% more shuffles than N, K!/N= 36
N= 100000, K=11, shuffles= 100136 i.e   0.1% more shuffles than N, K!/N= 399
N= 100000, K=12, shuffles= 100015 i.e   0.0% more shuffles than N, K!/N= 4790
N= 100000, K=13, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 62270
N= 100000, K=14, shuffles= 100000 i.e   0.0% more shuffles than N, K!/N= 871782

N=1000000, K=10, shuffles=1170059 i.e  17.0% more shuffles than N, K!/N= 3
N=1000000, K=11, shuffles=1012763 i.e   1.3% more shuffles than N, K!/N= 39
N=1000000, K=12, shuffles=1001070 i.e   0.1% more shuffles than N, K!/N= 479
N=1000000, K=13, shuffles=1000075 i.e   0.0% more shuffles than N, K!/N= 6227
N=1000000, K=14, shuffles=1000007 i.e   0.0% more shuffles than N, K!/N= 87178
N=1000000, K=15, shuffles=1000000 i.e   0.0% more shuffles than N, K!/N= 1307674


That last line of the table above means that if you want a million perms of just fifteen cards then you are likely to get them by just shuffling the cards and writing what you get down a million times!

End.

No comments:

Post a Comment