Successive Approximation
Given a named variable and the range in which its value must exist, generates successive binary digits which can be used to converge on the value successive binary digits approximating the value to ever greater precision.
Binary search for a value.
Starting with the original lower and upper range boundaries for the named variable, the value is compared to the range midpoint: if its lower than the midpoint then the range maximum is updated to become the midpoint as the value is in this half; otherwise the range minimum is updated with the midpoint to narrow therange containing the value by a half on each round. If the lower half of the range is used then generate a zero otherwise generate a one for this "round" in the binary search.
def bisect1(value, rmin, rmax):
rmid = (rmin + rmax) / 2
if value <= rmid:
if TUTOR:
print(" is <=", rmid)
bit = 0 # push 0
rmax = rmid # range lower half
else:
if TUTOR:
print(" is >", rmid)
bit = 1 # push 1
rmin = rmid # range upper half
return rmin, rmax, bit
def encode1(value, rangemin, rangemax, bitcount):
if TUTOR:
print(f"Successively encoding {value} between [{rangemin}, {rangemax}]:")
assert rangemin <= value <= rangemax, "Out of range!"
rmin, rmax = rangemin, rangemax # start at extreme
result = []
for i in range(bitcount):
rmin, rmax, bit = bisect1(value, rmin, rmax)
result.append(bit)
if TUTOR:
print(f" Resulting bit stream: {result}.")
return result, rmin, rmax
There is a TUTOR
mode to turn on and before giving the encoder a whirl.
TUTOR = True
encode1(2/3, -2, 4, 9)
Successively encoding 0.6666666666666666 between [-2, 4]:
is <= 1.0
is > -0.5
is > 0.25
is > 0.625
is <= 0.8125
is <= 0.71875
is <= 0.671875
is > 0.6484375
is > 0.66015625
Resulting bit stream: [0, 1, 1, 1, 0, 0, 0, 1, 1].
([0, 1, 1, 1, 0, 0, 0, 1, 1], 0.66015625, 0.671875)
You should be able to see how the encoder successively approaches the value.
Lets try some corner cases:
encode1(0.5, 0, 1, 8)
Successively encoding 0.5 between [0, 1]:
is <= 0.5
is > 0.25
is > 0.375
is > 0.4375
is > 0.46875
is > 0.484375
is > 0.4921875
is > 0.49609375
Resulting bit stream: [0, 1, 1, 1, 1, 1, 1, 1].
([0, 1, 1, 1, 1, 1, 1, 1], 0.49609375, 0.5)
encode1(0, 0, 1, 8)
Successively encoding 0 between [0, 1]:
is <= 0.5
is <= 0.25
is <= 0.125
is <= 0.0625
is <= 0.03125
is <= 0.015625
is <= 0.0078125
is <= 0.00390625
Resulting bit stream: [0, 0, 0, 0, 0, 0, 0, 0].
([0, 0, 0, 0, 0, 0, 0, 0], 0, 0.00390625)
encode1(1, 0, 1, 8)
Successively encoding 1 between [0, 1]:
is > 0.5
is > 0.75
is > 0.875
is > 0.9375
is > 0.96875
is > 0.984375
is > 0.9921875
is > 0.99609375
Resulting bit stream: [1, 1, 1, 1, 1, 1, 1, 1].
([1, 1, 1, 1, 1, 1, 1, 1], 0.99609375, 1)
Bitstream decoding
With the encoders bitstream and theinitial range, the initial value can be recovered to ever greater accuracy as more bits are processed. The decoder takes the same decisions as the encoder when successively bisecting the initial range.
def decode1(bits, rangemin, rangemax):
if TUTOR:
print(f"Successively decoding {bits} between [{rangemin}, {rangemax}]:")
rmin, rmax = rangemin, rangemax # start at extremes
for bit in bits:
rmid = (rmin + rmax) / 2
if bit == 0:
rmax = rmid # range lower half
else:
rmin = rmid # range upper half
if TUTOR:
print(f" in range [{rmin}, {rmax}]")
value = (rmin + rmax) / 2 # Midrange
delta = abs(value - rmin)
if TUTOR:
print(f" Resulting value: {value} ±{delta}.")
return value, delta
Lets encode a value then decode the whole of its bitstream
bits, _, _ = encode1(2/3, -2, 4, 15)
print()
value = decode1(bits, -2, 4) # Note use of the same initial range!
Successively encoding 0.6666666666666666 between [-2, 4]:
is <= 1.0
is > -0.5
is > 0.25
is > 0.625
is <= 0.8125
is <= 0.71875
is <= 0.671875
is > 0.6484375
is > 0.66015625
is > 0.666015625
is <= 0.6689453125
is <= 0.66748046875
is <= 0.666748046875
is > 0.6663818359375
is > 0.66656494140625
Resulting bit stream: [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1].
Successively decoding [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1] between [-2, 4]:
in range [-2, 1.0]
in range [-0.5, 1.0]
in range [0.25, 1.0]
in range [0.625, 1.0]
in range [0.625, 0.8125]
in range [0.625, 0.71875]
in range [0.625, 0.671875]
in range [0.6484375, 0.671875]
in range [0.66015625, 0.671875]
in range [0.666015625, 0.671875]
in range [0.666015625, 0.6689453125]
in range [0.666015625, 0.66748046875]
in range [0.666015625, 0.666748046875]
in range [0.6663818359375, 0.666748046875]
in range [0.66656494140625, 0.666748046875]
Resulting value: 0.666656494140625 ±9.1552734375e-05.
Truncating binary digits
The full 15 bits of the encoding above give a result that is correct when rounded to 5 decimal digits. If a smaller number of bits (from the left), are decoded then the value should be reproduced to a lesser level of precision. Lets try:
print("# Decode the first 10 bits.")
value = decode1(bits[:10], -2, 4)
print("\n# Decode only the first 5 bits.")
value = decode1(bits[:5], -2, 4)
# Decode the first 10 bits.
Successively decoding [0, 1, 1, 1, 0, 0, 0, 1, 1, 1] between [-2, 4]:
in range [-2, 1.0]
in range [-0.5, 1.0]
in range [0.25, 1.0]
in range [0.625, 1.0]
in range [0.625, 0.8125]
in range [0.625, 0.71875]
in range [0.625, 0.671875]
in range [0.6484375, 0.671875]
in range [0.66015625, 0.671875]
in range [0.666015625, 0.671875]
Resulting value: 0.6689453125 ±0.0029296875.
# Decode only the first 5 bits.
Successively decoding [0, 1, 1, 1, 0] between [-2, 4]:
in range [-2, 1.0]
in range [-0.5, 1.0]
in range [0.25, 1.0]
in range [0.625, 1.0]
in range [0.625, 0.8125]
Resulting value: 0.71875 ±0.09375.
Higher dimensions
What if we have more than one named variables values to encode in a similar fashion - preserving the ability to truncate on the right, (or generate more bits), to affect precision. Well lets give each variable, which we can think of as a "dimension" a name, a range, and all the dimensions a constant ordering.
If we assume the same precision for each dimension and interleave the bitstream for the dimensions in encoding, then the combined interleaved bitstream could be truncated as before ; de-interleaved, then individually decoded back to generate values for each dimension.
from collections import namedtuple
Dimension = Dim = namedtuple('Dim', 'name, rangemin, rangemax')
dimensions = (Dim('s', -4, 4), # In order s then t
Dim('t', -5, 5))
values = ( -1.5, 2.7) # In order s then t
def bisect(value, rmin, rmax):
rmid = (rmin + rmax) / 2
if value <= rmid:
bit = 0 # push 0
rmax = rmid # range lower half
else:
bit = 1 # push 1
rmin = rmid # range upper half
return rmin, rmax, bit
def encode2(values, dimensions, bitcount_combined):
for value, (name, rangemin, rangemax) in zip(values, dimensions):
assert rangemin <= value <= rangemax, f"{name} value is out of range!"
search = [[value, rmin, rmax] # start at extreme
for value, (_, rmin, rmax)
in zip(values, dimensions)]
dcount = len(dimensions)
result = [] # One result for all dimensions
for i in range(bitcount_combined):
d = search[i % dcount] # dimension to interleave
d[1], d[2], bit = bisect(*d)
result.append(bit)
return result
def decode2(bits, dimensions):
search = [[rmin, rmax] # start at extreme
for _, rmin, rmax in dimensions]
dcount = len(dimensions)
for i, bit in enumerate(bits):
rmin, rmax = search[i % dcount] # dimension to decode
rmid = (rmin + rmax) / 2
if bit == 0:
rmax = rmid # range lower half
else:
rmin = rmid # range upper half
search[i % dcount] = [rmin, rmax]
values = [((rmin + rmax) / 2, abs((rmin - rmax) / 2))
for rmin, rmax in search]
return values # [(value, delta), ...]
print("Encode then decode",values)
for val, delta in decode2(encode2(values, dimensions, 30), dimensions):
print(f" {val} ±{delta:g}")
Encode then decode (-1.5, 2.7)
-1.5001220703125 ±0.00012207
2.700042724609375 ±0.000152588
Again, we can truncate and take the leftmost 14 bits of the encoding to decode to less precision in both values
for val, delta in decode2(encode2(values, dimensions, 30)[:14], dimensions):
print(f" {val} ±{delta:g})")
-1.53125 ±0.03125)
2.6953125 ±0.0390625)
A closer look at interleaving.
Create separate bitstreams for two dimensions and compare to that of the combined.
# Recap
print("Dimensions:", dimensions)
print("Values:", values)
Dimensions: (Dim(name='s', rangemin=-4, rangemax=4), Dim(name='t', rangemin=-5, rangemax=5))
Values: (-1.5, 2.7)
s_bits, *_ = encode1(-1.5, -4, 4, 7)
Successively encoding -1.5 between [-4, 4]:
is <= 0.0
is > -2.0
is <= -1.0
is <= -1.5
is > -1.75
is > -1.625
is > -1.5625
Resulting bit stream: [0, 1, 0, 0, 1, 1, 1].
t_bits, *_ = encode1(2.7, -5, 5, 7)
Successively encoding 2.7 between [-5, 5]:
is > 0.0
is > 2.5
is <= 3.75
is <= 3.125
is <= 2.8125
is > 2.65625
is <= 2.734375
Resulting bit stream: [1, 1, 0, 0, 0, 1, 0].
def alternate(list1, list2):
"zip up the members of the two lists, list1[0] first"
return sum((list(m1m2) for m1m2 in zip(list1, list2)), [])
st_bits = alternate(s_bits, t_bits)
print(st_bits)
[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]
st_encode2 = encode2(values, dimensions, 14)
st_encode2
[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]
## OR
bits_per_dim = bpd = 7
assert encode2(values, dimensions, bpd * 2) == alternate(encode1(-1.5, -4, 4, bpd)[0],
encode1(2.7, -5, 5, bpd)[0])
print("OK")
Successively encoding -1.5 between [-4, 4]:
is <= 0.0
is > -2.0
is <= -1.0
is <= -1.5
is > -1.75
is > -1.625
is > -1.5625
Resulting bit stream: [0, 1, 0, 0, 1, 1, 1].
Successively encoding 2.7 between [-5, 5]:
is > 0.0
is > 2.5
is <= 3.75
is <= 3.125
is <= 2.8125
is > 2.65625
is <= 2.734375
Resulting bit stream: [1, 1, 0, 0, 0, 1, 0].
OK
On precision.
The code works to the precision of the Python datatypes passed into it. The ranges so far have used Pythons 64 bit floating point datatype for range and value. If decimals or fractions are used then much more precision could be achieved with the same code due to duck typing.
Human readability
st_bits
[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]
These strings of binary bits are hard for us to accurately remember and transcribe. Traditionally we have used octal, base-8; and hexadecimal, base 16 number bases to get shorter, more memorable, representations for binary numbers.
I am going to use abase 32
numbering scheme to encode the bits. Every five successive bits will become one of 32 different characters. The characters used are the digits zero to nine then 22 lowercase ASCII letters missing out certain letters, (a, i, l, and o), that could be mistaken for numbers.
Users will more naturally express precision in terms of the number of characters in the representation and so the underlying bit precision will be in multiples of 5-bits.
ch32 = "0123456789bcdefghjkmnpqrstuvwxyz"
def n2bitlist(n, _width=5):
return tuple((1 if x == '1' else 0) for x in f"{n:0{_width}b}")
# Map binary tuple to code character
bool2ch = {n2bitlist(i): ch for i, ch in enumerate(ch32)}
ch2bool = {ch: list(b) for b, ch in bool2ch.items()}
def encode(values, dimensions, code_len):
bits = encode2(values, dimensions, code_len * 5)
# To base 32
b = tuple(bits)
#print('Bits:', b)
code = (bool2ch[b[i * 5: (i + 1) * 5]] for i in range(code_len))
return ''.join(code)
def decode(code, dimensions):
bits = sum((ch2bool[ch] for ch in code), [])
values = decode2(bits, dimensions)
return values
# Recap
print("Dimensions:", dimensions)
print("Values:", values)
Dimensions: (Dim(name='s', rangemin=-4, rangemax=4), Dim(name='t', rangemin=-5, rangemax=5))
Values: (-1.5, 2.7)
st_code = encode(values, dimensions, 3)
print('Values encode to %r' % st_code)
Values encode to 'f2x'
print("Then decode to:")
for val, delta in decode(st_code, dimensions):
print(f" {val} ±{delta:g}")
Then decode to:
-1.515625 ±0.015625
2.6953125 ±0.0390625
Geohashes
Geohashes are a convenient way of encoding geographical coordinates as a character string.
I have used the same textual mapping so that the following dimensional data will reproduce Geohash en/decoding.
geo = (Dim('latitude', -90, 90), # In order latitude then longitude.
Dim('longitude', -180, 180))
Unfortunately Geohashes, whilst specifying coordinates in the order latitude then longitude; alternate bits from longitude and then latidude - i.e. in reverse, so adjustments are needed.
def encode3(values, dimensions, bitcount_combined, alternate_from_last=True):
for value, (name, rangemin, rangemax) in zip(values, dimensions):
assert rangemin <= value <= rangemax, f"{name} value is out of range!"
search = [[value, rmin, rmax] # start at extreme
for value, (_, rmin, rmax)
in zip(values, dimensions)]
dcount = len(dimensions)
result = [] # One result for all dimensions
for i in range(bitcount_combined):
dim = (dcount - 1 -(i % dcount)) if alternate_from_last else i % dcount
d = search[dim] # dimension to interleave
d[1], d[2], bit = bisect(*d)
result.append(bit)
return result
def decode3(bits, dimensions, alternate_from_last=True):
search = [[rmin, rmax] # start at extreme
for _, rmin, rmax in dimensions]
dcount = len(dimensions)
for i, bit in enumerate(bits):
dim = (dcount - 1 - (i % dcount)) if alternate_from_last else i % dcount
rmin, rmax = search[dim] # dimension to decode
rmid = (rmin + rmax) / 2
if bit == 0:
rmax = rmid # range lower half
else:
rmin = rmid # range upper half
search[dim] = [rmin, rmax]
values = [((rmin + rmax) / 2, abs((rmin - rmax) / 2))
for rmin, rmax in search]
return values # [(value, delta), ...]
def geo_encode(values, code_len):
bits = encode3(values, geo, code_len * 5)
# To base 32
b = tuple(bits)
#print('Bits:', b)
code = (bool2ch[b[i * 5: (i + 1) * 5]] for i in range(code_len))
return ''.join(code)
def geo_decode(code):
bits = sum((ch2bool[ch] for ch in code), [])
values = decode3(bits, geo)
return values
coord = (51.433718, -0.214126)
geo_hash = geo_encode(coord, 6)
geo_hash
'gcpue5'
latlong = geo_decode(geo_hash)
print(f"Geohash '{geo_hash}' decodes to:")
for (val, delta), (name, *_) in zip(geo_decode(geo_hash), geo):
print(f" {name:10}: {val} ±{delta:g}")
Geohash 'gcpue5' decodes to:
latitude : 51.43524169921875 ±0.00274658
longitude : -0.2142333984375 ±0.00549316
External sites for testing Geohashes
You can enter the coordinates or the geohash above into this site and find that it points to the "Wimpledon All England Lawn Tennis and Croquet Club".
End Notes.
This blog post was written because the WIkipedia entry and a web search did nt turn up a good description of Geohashes for me to finish the Python Rosetta Code entry initially. After more reading I finally completed the Python entry.
I thought about generalising the method to more than two dimensions and also wanted to highlight the swithing of order in geohashing: lat/long normal coord order vs long/lat used when alternating the bits from the binary search.
Another generalisation might be to use n-ary search for a value producing a base-n "bit"; but I could not think of a need, and this post is already long.
END.
Not saved yet
python3 | launching
No comments:
Post a Comment