In which I revisit low-discrepancy series, Hunt for an understanding of the Sobol sequence, then translate and optimise a C++ program into Python.
Re-visit VdC
I had left a mental note to get back to the Van der Corput sequence after creating the Rosetta Code task some years ago. Van der Corput is a type of low-discrepancy or quasi random sequence often used in Financial simulations; and with ever larger digital systems on automotive System on Chips, I keep a look-out for Verification techniques to add to the toolbox.I decided to investigate the Sobol Sequence, understand the maths and write some Python!
Sobol Maths
I followed references; I googled; I read; and I just could not gain an understanding of what is happening, enough to write some code. What code I initially browsed gave me only snippets - Those snippets were bit twiddling so I coded those up for fun.Fun bit twiddles
Highest 1 bit in binary number
def binhi1(b): "highest 1 bit of binary b. (== ceil(log2(b)))" pos = 0 while b > 0: pos, b = pos + 1, b // 2 return pos
Example 1
for n in range(18): print(f"{n:>5b} {binhi1(n)}") 0 0 1 1 10 2 11 2 100 3 101 3 110 3 111 3 1000 4 1001 4 1010 4 1011 4 1100 4 1101 4 1110 4 1111 4 10000 5 10001 5
Lowest 0 bit in binary number
def binlo0(b): "lowest 0 bit from right of binary b. Always >=1" pos, b2 = 1, b // 2 while b2 * 2 != b: pos, b, b2 = pos + 1, b2, b2 // 2 return pos
Example 2
for n in range(18): print(f"{n:<05b binlo0="" font="" n="">) 00000 1 00001 2 00010 1 00011 3 00100 1 00101 2 00110 1 00111 4 01000 1 01001 2 01010 1 01011 3 01100 1 01101 2 01110 1 01111 5 10000 1 10001 2 05b>
Code to translate
One of the Sobol references was this one by S. Joe and F. Y. Kuo. They had written academic papers on Sobol and had generated constants for the use of the sequence in up to 21201 dimensions. They also had a C++ program with BSD-style license and, crucially, sample outputs from their C++ code that could aid me in writing Python conversion of their code.I downloaded their C++ and their latest set of direction numbers and took a look. Eager to code something, I coded up a reader for their directions file:
Initial dirctions file reader
def direction_file_reader(fname="Sobol_new-joe-kuo-6.21201"): """\ Reads direction file of format: d s a m_i 2 1 0 1 3 2 1 1 3 4 3 1 1 3 1 5 3 2 1 1 1 6 4 1 1 1 3 3 ... """ with open(fname) as f: headers = f.readline().strip().split() assert headers == ['d', 's', 'a', 'm_i'] directions = [] for line in f: d, s, a, *m = [int(cell) for cell in line.strip().split()] directions.append((d, s, a, tuple([0] + m))) return directions
The direction file was three single columns then all the other columns made up the list m.
C++ code direct translation.
I decided to translate the C++ code near verbatim to Python, then run the example and ensure that the Python gave the same output as stated for the C++ program.I cut-n-pasted sections of the C++ code as comments/unused triple-quoted strings into my Python then slowly translated from C++.
The C++ used several single-character, uppercase, variable names. I took them to be references to their papers or maybe earlier Fortran programs and either kept them, or made note of their re-naming.
I finally got my first working Python code which is below, but without the C++ code comments I followed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | def sobol_points(N, D): """From: C++ Source of Frances Y. Kuo "sobol.cc" """ pow2_32 = 2**32 L = binhi1(N) C = [binlo0(i) for i in range(N)] points= [[0 for j in range(D)] for i in range(N)] V = [1 << (32- i) for i in range(L+1)] X = [0] for i in range(1, N): X.append(X[i-1] ^ V[C[i-1]]) points[i][0] = X[i]/pow2_32; del V, X directions = direction_file_reader() for j in range(1, D): d, s, a, m = directions[j-1] #print(d, s, a, m) if L <= s: V = [0] + [m[i] << (32 - i) for i in range(1, L+1)] else: V = [0] + [m[i] << (32 - i) for i in range(1, s+1)] for i in range(s+1, L+1): V.append(V[i-s] ^ (V[i-s] >> s)) for k in range(1, s): V[i] ^= (((a >> (s-1-k)) & 1) * V[i-k]) X = [0] for i in range(1, N): X.append(X[i-1] ^ V[C[i-1]]) points[i][j] = X[i]/pow2_32 return points |
Issues with the directly translated Python/C++:
- The calculation ov V[] in lines 21 to 28 I thought should become a function.
- All points are computed and returned at once. It would be better as a generator function.
- The calculation of X[] in lines 29 to 31 seems like a very inefficient loop within a loop. Furthermore, if turned into a generator; Iwould need to unscramble the loops to try and recalculate X for each point without another loop.
Final Python generator function version
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | # -*- coding: utf-8 -*- """sobol_point_gen.py Usage: sobol_point_gen.py N D <dimensions_file> sobol_point_gen.py -h | --help Options: -h --help Show this screen. Arguments: N Number of points to generate (<2**32) D Number of dimensions for each point. (D <= max D in dimensions_file) dimensions_file See: https://web.maths.unsw.edu.au/~fkuo/sobol/ Created from an original c++ source by Frances Y. Kuo and Stephen Joe Created on Fri Nov 1 09:13:12 2019 @author: Paddy3118 """ from functools import lru_cache #%% def binhi1(b): "highest 1 bit of binary b. (== ceil(log2(b)))" pos = 0 while b > 0: pos, b = pos + 1, b // 2 return pos def binlo0(b): "lowest 0 bit from right of binary b. Always >=1" pos, b2 = 1, b // 2 while b2 * 2 != b: pos, b, b2 = pos + 1, b2, b2 // 2 return pos #%% def direction_file_reader(d, fname="Sobol_new-joe-kuo-6.21201"): """\ Reads d+header lines from direction file of format: d s a m_i 2 1 0 1 3 2 1 1 3 4 3 1 1 3 1 5 3 2 1 1 1 6 4 1 1 1 3 3 7 4 4 1 3 5 13 8 5 2 1 1 5 5 17 9 5 4 1 1 5 5 5 10 5 7 1 1 7 11 19 11 5 11 1 1 5 1 1 12 5 13 1 1 1 3 11 13 5 14 1 3 5 5 31 14 6 1 1 3 3 9 7 49 15 6 13 1 1 1 15 21 21 16 6 16 1 3 1 13 27 49 17 6 19 1 1 1 15 7 5 18 6 22 1 3 1 15 13 25 19 6 25 1 1 5 5 19 61 20 7 1 1 3 7 11 23 15 103 21 7 4 1 3 7 13 13 15 69 22 7 7 1 1 3 13 7 35 63 23 7 8 1 3 5 9 1 25 53 24 7 14 1 3 1 13 9 35 107 25 7 19 1 3 1 5 27 61 31 26 7 21 1 1 5 11 19 41 61 27 7 28 1 3 5 3 3 13 69 28 7 31 1 1 7 13 1 19 1 29 7 32 1 3 7 5 13 19 59 30 7 37 1 1 3 9 25 29 41 31 7 41 1 3 5 13 23 1 55 32 7 42 1 3 7 3 13 59 17 ... """ with open(fname) as f: headers = f.readline().strip().split() assert headers == ['d', 's', 'a', 'm_i'] directions = [] for i, line in zip(range(d), f): d, s, a, *m = [int(cell) for cell in line.strip().split()] directions.append((d, s, a, tuple([0] + m))) return directions #%% @lru_cache(128) def calc_dnum(s, a, m, mxbits): # V, direction numbers if mxbits <= s: dirnum = [0] + [m[i] << (32 - i) for i in range(1, mxbits+1)] else: dirnum = [0] + [m[i] << (32 - i) for i in range(1, s+1)] for i in range(s+1, mxbits+1): dirnum.append(dirnum[i-s] ^ (dirnum[i-s] >> s)) for k in range(1, s): dirnum[i] ^= (((a >> (s-1-k)) & 1) * dirnum[i-k]) return dirnum def sobol_point_gen(N, D, fname="Sobol_new-joe-kuo-6.21201"): pow2_32 = 2**32 mxbits = binhi1(N) # L, max bits needed dnum = [1 << (32- i) for i in range(mxbits+1)] # V, direction numbers directions = direction_file_reader(D, fname) xd = [0 for j in range(D)] # X pt = [0 for j in range(D)] # POINTS yield pt for n in range(1, N): nfirst0 = binlo0(n-1) # C pt = [0 for j in range(D)] xd[0] = xd[0] ^ dnum[nfirst0] pt[0] = xd[0] / pow2_32 for jj in range(1, D): _, s, a, m = directions[jj-1] dnum2 = calc_dnum(s, a, m, mxbits) # V, direction numbers xd[jj] = xd[jj] ^ dnum2[nfirst0] pt[jj] = xd[jj]/pow2_32 yield pt #%% if __name__ == '__main__': first_sobol_3D_points = [ [0, 0, 0], [0.5, 0.5, 0.5], [0.75, 0.25, 0.25], [0.25, 0.75, 0.75], [0.375, 0.375, 0.625], [0.875, 0.875, 0.125], [0.625, 0.125, 0.875], [0.125, 0.625, 0.375], [0.1875, 0.3125, 0.9375], [0.6875, 0.8125, 0.4375]] generated = list(sobol_point_gen(10, 3)) assert generated == first_sobol_3D_points |
Function def direction_file_reader is changed to only read as much of thedirections file as is necessary.
Function def sobol_point_gen is now a generator function in which successive calls to next give successive points. The loops are swapped inside with an outer loop yielding a point every iteration, and an inner loop over dimensions > 1.
Function def calc_dnum is cached. Initial suspicions of a lot of recalculation proved true:
calc_dnum cache stats
In [1]: g = list(sobol_point_gen(10000, 3)) In [2]: calc_dnum.cache_info() Out[2]: CacheInfo(hits=20012, misses=4, maxsize=128, currsize=4)
That's a wonderful work. I am reading it in 2021
ReplyDelete