Successive Approximation

Given a named variable and the range in which its value must exist, generates successive binary digits which can be used to converge on the value successive binary digits approximating the value to ever greater precision.

Binary search for a value.

Starting with the original lower and upper range boundaries for the named variable, the value is compared to the range midpoint: if its lower than the midpoint then the range maximum is updated to become the midpoint as the value is in this half; otherwise the range minimum is updated with the midpoint to narrow therange containing the value by a half on each round. If the lower half of the range is used then generate a zero otherwise generate a one for this "round" in the binary search.

[1]

​def bisect1(value, rmin, rmax):    rmid = (rmin + rmax) / 2    if value <= rmid:        if TUTOR:            print("  is <=", rmid)        bit  = 0                          # push 0        rmax = rmid                       # range lower half    else:        if TUTOR:            print("  is  >", rmid)        bit  = 1                          # push 1        rmin = rmid                       # range upper half            return rmin, rmax, bit​def encode1(value, rangemin, rangemax, bitcount):    if TUTOR:        print(f"Successively encoding {value} between [{rangemin}, {rangemax}]:")    assert rangemin <= value <= rangemax, "Out of range!"    rmin, rmax = rangemin, rangemax   # start at extreme    result = []    for i in range(bitcount):        rmin, rmax, bit = bisect1(value, rmin, rmax)        result.append(bit)    if TUTOR:        print(f"  Resulting bit stream: {result}.")            return result, rmin, rmax

There is a TUTOR mode to turn on and before giving the encoder a whirl.

[2]

TUTOR = True​encode1(2/3, -2, 4, 9)

Successively encoding 0.6666666666666666 between [-2, 4]:
  is <= 1.0
  is  > -0.5
  is  > 0.25
  is  > 0.625
  is <= 0.8125
  is <= 0.71875
  is <= 0.671875
  is  > 0.6484375
  is  > 0.66015625
  Resulting bit stream: [0, 1, 1, 1, 0, 0, 0, 1, 1].

([0, 1, 1, 1, 0, 0, 0, 1, 1], 0.66015625, 0.671875)

You should be able to see how the encoder successively approaches the value.

Lets try some corner cases:

[3]

encode1(0.5, 0, 1, 8)

Successively encoding 0.5 between [0, 1]:
  is <= 0.5
  is  > 0.25
  is  > 0.375
  is  > 0.4375
  is  > 0.46875
  is  > 0.484375
  is  > 0.4921875
  is  > 0.49609375
  Resulting bit stream: [0, 1, 1, 1, 1, 1, 1, 1].

([0, 1, 1, 1, 1, 1, 1, 1], 0.49609375, 0.5)

[4]

encode1(0, 0, 1, 8)

Successively encoding 0 between [0, 1]:
  is <= 0.5
  is <= 0.25
  is <= 0.125
  is <= 0.0625
  is <= 0.03125
  is <= 0.015625
  is <= 0.0078125
  is <= 0.00390625
  Resulting bit stream: [0, 0, 0, 0, 0, 0, 0, 0].

([0, 0, 0, 0, 0, 0, 0, 0], 0, 0.00390625)

[5]

encode1(1, 0, 1, 8)

Successively encoding 1 between [0, 1]:
  is  > 0.5
  is  > 0.75
  is  > 0.875
  is  > 0.9375
  is  > 0.96875
  is  > 0.984375
  is  > 0.9921875
  is  > 0.99609375
  Resulting bit stream: [1, 1, 1, 1, 1, 1, 1, 1].

([1, 1, 1, 1, 1, 1, 1, 1], 0.99609375, 1)

Bitstream decoding

With the encoders bitstream and theinitial range, the initial value can be recovered to ever greater accuracy as more bits are processed. The decoder takes the same decisions as the encoder when successively bisecting the initial range.

[6]

def decode1(bits, rangemin, rangemax):    if TUTOR:        print(f"Successively decoding {bits} between [{rangemin}, {rangemax}]:")    rmin, rmax = rangemin, rangemax   # start at extremes    for bit in bits:        rmid = (rmin + rmax) / 2        if bit == 0:            rmax = rmid                       # range lower half        else:            rmin = rmid                       # range upper half        if TUTOR:            print(f"  in range [{rmin}, {rmax}]")                value = (rmin + rmax) / 2                  # Midrange    delta = abs(value - rmin)    if TUTOR:        print(f"  Resulting value: {value} ±{delta}.")            return value, delta

Lets encode a value then decode the whole of its bitstream

[7]

bits, _, _ = encode1(2/3, -2, 4, 15)print()value = decode1(bits, -2, 4)  # Note use of the same initial range!​

Successively encoding 0.6666666666666666 between [-2, 4]:
  is <= 1.0
  is  > -0.5
  is  > 0.25
  is  > 0.625
  is <= 0.8125
  is <= 0.71875
  is <= 0.671875
  is  > 0.6484375
  is  > 0.66015625
  is  > 0.666015625
  is <= 0.6689453125
  is <= 0.66748046875
  is <= 0.666748046875
  is  > 0.6663818359375
  is  > 0.66656494140625
  Resulting bit stream: [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1].

Successively decoding [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1] between [-2, 4]:
  in range [-2, 1.0]
  in range [-0.5, 1.0]
  in range [0.25, 1.0]
  in range [0.625, 1.0]
  in range [0.625, 0.8125]
  in range [0.625, 0.71875]
  in range [0.625, 0.671875]
  in range [0.6484375, 0.671875]
  in range [0.66015625, 0.671875]
  in range [0.666015625, 0.671875]
  in range [0.666015625, 0.6689453125]
  in range [0.666015625, 0.66748046875]
  in range [0.666015625, 0.666748046875]
  in range [0.6663818359375, 0.666748046875]
  in range [0.66656494140625, 0.666748046875]
  Resulting value: 0.666656494140625 ±9.1552734375e-05.

Truncating binary digits

The full 15 bits of the encoding above give a result that is correct when rounded to 5 decimal digits. If a smaller number of bits (from the left), are decoded then the value should be reproduced to a lesser level of precision. Lets try:

[8]

print("# Decode the first 10 bits.")value = decode1(bits[:10], -2, 4)print("\n# Decode only the first 5 bits.")value = decode1(bits[:5], -2, 4)​

# Decode the first 10 bits.
Successively decoding [0, 1, 1, 1, 0, 0, 0, 1, 1, 1] between [-2, 4]:
  in range [-2, 1.0]
  in range [-0.5, 1.0]
  in range [0.25, 1.0]
  in range [0.625, 1.0]
  in range [0.625, 0.8125]
  in range [0.625, 0.71875]
  in range [0.625, 0.671875]
  in range [0.6484375, 0.671875]
  in range [0.66015625, 0.671875]
  in range [0.666015625, 0.671875]
  Resulting value: 0.6689453125 ±0.0029296875.

# Decode only the first 5 bits.
Successively decoding [0, 1, 1, 1, 0] between [-2, 4]:
  in range [-2, 1.0]
  in range [-0.5, 1.0]
  in range [0.25, 1.0]
  in range [0.625, 1.0]
  in range [0.625, 0.8125]
  Resulting value: 0.71875 ±0.09375.

Higher dimensions

What if we have more than one named variables values to encode in a similar fashion - preserving the ability to truncate on the right, (or generate more bits), to affect precision. Well lets give each variable, which we can think of as a "dimension" a name, a range, and all the dimensions a constant ordering.

If we assume the same precision for each dimension and interleave the bitstream for the dimensions in encoding, then the combined interleaved bitstream could be truncated as before ; de-interleaved, then individually decoded back to generate values for each dimension.

[9]

from collections import namedtuple​Dimension = Dim = namedtuple('Dim', 'name, rangemin, rangemax')​dimensions = (Dim('s', -4, 4),  # In order s then t              Dim('t', -5, 5))values = ( -1.5, 2.7)           # In order s then t​def bisect(value, rmin, rmax):    rmid = (rmin + rmax) / 2    if value <= rmid:        bit  = 0                          # push 0        rmax = rmid                       # range lower half    else:        bit  = 1                          # push 1        rmin = rmid                       # range upper half         return rmin, rmax, bit​def encode2(values, dimensions, bitcount_combined):    for value, (name, rangemin, rangemax) in zip(values, dimensions):        assert rangemin <= value <= rangemax, f"{name} value is out of range!"    search = [[value, rmin, rmax]        # start at extreme              for value, (_, rmin, rmax)               in zip(values, dimensions)]    dcount = len(dimensions)    result = []                          # One result for all dimensions    for i in range(bitcount_combined):        d = search[i % dcount]           # dimension to interleave        d[1], d[2], bit = bisect(*d)        result.append(bit)    return result​def decode2(bits, dimensions):    search = [[rmin, rmax]               # start at extreme              for _, rmin, rmax in dimensions]    dcount = len(dimensions)    for i, bit in enumerate(bits):        rmin, rmax = search[i % dcount]  # dimension to decode        rmid = (rmin + rmax) / 2        if bit == 0:            rmax = rmid                       # range lower half        else:            rmin = rmid                       # range upper half        search[i % dcount] = [rmin, rmax]    values = [((rmin + rmax) / 2, abs((rmin - rmax) / 2))              for rmin, rmax in search]    return values  #  [(value, delta), ...]​

[10]

print("Encode then decode",values)for val, delta in decode2(encode2(values, dimensions, 30), dimensions):    print(f"  {val} ±{delta:g}")    

Encode then decode (-1.5, 2.7)
  -1.5001220703125 ±0.00012207
  2.700042724609375 ±0.000152588

Again, we can truncate and take the leftmost 14 bits of the encoding to decode to less precision in both values

[11]

for val, delta in decode2(encode2(values, dimensions, 30)[:14], dimensions):    print(f"  {val} ±{delta:g})")

  -1.53125 ±0.03125)
  2.6953125 ±0.0390625)

A closer look at interleaving.

Create separate bitstreams for two dimensions and compare to that of the combined.

[12]

# Recapprint("Dimensions:", dimensions)print("Values:", values)

Dimensions: (Dim(name='s', rangemin=-4, rangemax=4), Dim(name='t', rangemin=-5, rangemax=5))
Values: (-1.5, 2.7)

[13]

s_bits, *_ = encode1(-1.5, -4, 4, 7)

Successively encoding -1.5 between [-4, 4]:
  is <= 0.0
  is  > -2.0
  is <= -1.0
  is <= -1.5
  is  > -1.75
  is  > -1.625
  is  > -1.5625
  Resulting bit stream: [0, 1, 0, 0, 1, 1, 1].

[14]

t_bits, *_ = encode1(2.7, -5, 5, 7)

Successively encoding 2.7 between [-5, 5]:
  is  > 0.0
  is  > 2.5
  is <= 3.75
  is <= 3.125
  is <= 2.8125
  is  > 2.65625
  is <= 2.734375
  Resulting bit stream: [1, 1, 0, 0, 0, 1, 0].

[15]

def alternate(list1, list2):    "zip up the members of the two lists, list1[0] first"    return sum((list(m1m2) for m1m2 in zip(list1, list2)), [])

[16]

st_bits = alternate(s_bits, t_bits)print(st_bits)

[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]

[17]

st_encode2 = encode2(values, dimensions, 14)st_encode2

[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]

[18]

## ORbits_per_dim = bpd = 7assert encode2(values, dimensions, bpd * 2) == alternate(encode1(-1.5, -4, 4, bpd)[0],                                                          encode1(2.7, -5, 5, bpd)[0])print("OK")

Successively encoding -1.5 between [-4, 4]:
  is <= 0.0
  is  > -2.0
  is <= -1.0
  is <= -1.5
  is  > -1.75
  is  > -1.625
  is  > -1.5625
  Resulting bit stream: [0, 1, 0, 0, 1, 1, 1].
Successively encoding 2.7 between [-5, 5]:
  is  > 0.0
  is  > 2.5
  is <= 3.75
  is <= 3.125
  is <= 2.8125
  is  > 2.65625
  is <= 2.734375
  Resulting bit stream: [1, 1, 0, 0, 0, 1, 0].
OK

On precision.

The code works to the precision of the Python datatypes passed into it. The ranges so far have used Pythons 64 bit floating point datatype for range and value. If decimals or fractions are used then much more precision could be achieved with the same code due to duck typing.

Human readability

[19]

st_bits

[0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]

These strings of binary bits are hard for us to accurately remember and transcribe. Traditionally we have used octal, base-8; and hexadecimal, base 16 number bases to get shorter, more memorable, representations for binary numbers.

I am going to use abase 32 numbering scheme to encode the bits. Every five successive bits will become one of 32 different characters. The characters used are the digits zero to nine then 22 lowercase ASCII letters missing out certain letters, (a, i, l, and o), that could be mistaken for numbers.

Users will more naturally express precision in terms of the number of characters in the representation and so the underlying bit precision will be in multiples of 5-bits.

[20]

ch32 = "0123456789bcdefghjkmnpqrstuvwxyz"​def n2bitlist(n, _width=5):    return tuple((1 if x == '1' else 0) for x in f"{n:0{_width}b}")​# Map binary tuple to code characterbool2ch = {n2bitlist(i): ch for i, ch in enumerate(ch32)} ch2bool = {ch: list(b) for b, ch in bool2ch.items()}

[21]

​def encode(values, dimensions, code_len):    bits = encode2(values, dimensions, code_len * 5)    # To base 32    b = tuple(bits)    #print('Bits:', b)    code =  (bool2ch[b[i * 5: (i + 1) * 5]] for i in range(code_len))      return ''.join(code)​def decode(code, dimensions):    bits = sum((ch2bool[ch] for ch in code), [])    values = decode2(bits, dimensions)    return values

[22]

# Recapprint("Dimensions:", dimensions)print("Values:", values)

Dimensions: (Dim(name='s', rangemin=-4, rangemax=4), Dim(name='t', rangemin=-5, rangemax=5))
Values: (-1.5, 2.7)

[23]

st_code = encode(values, dimensions, 3)print('Values encode to %r' % st_code)

Values encode to 'f2x'

[24]

print("Then decode to:")for val, delta in decode(st_code, dimensions):    print(f"  {val} ±{delta:g}")

Then decode to:
  -1.515625 ±0.015625
  2.6953125 ±0.0390625

Geohashes

Geohashes are a convenient way of encoding geographical coordinates as a character string.

I have used the same textual mapping so that the following dimensional data will reproduce Geohash en/decoding.

[25]

geo = (Dim('latitude',   -90,  90),    # In order latitude then longitude.       Dim('longitude', -180, 180))​

Unfortunately Geohashes, whilst specifying coordinates in the order latitude then longitude; alternate bits from longitude and then latidude - i.e. in reverse, so adjustments are needed.

[26]

def encode3(values, dimensions, bitcount_combined, alternate_from_last=True):    for value, (name, rangemin, rangemax) in zip(values, dimensions):        assert rangemin <= value <= rangemax, f"{name} value is out of range!"    search = [[value, rmin, rmax]        # start at extreme              for value, (_, rmin, rmax)               in zip(values, dimensions)]    dcount = len(dimensions)    result = []                          # One result for all dimensions    for i in range(bitcount_combined):        dim = (dcount - 1 -(i % dcount)) if alternate_from_last else i % dcount        d = search[dim]                  # dimension to interleave        d[1], d[2], bit = bisect(*d)        result.append(bit)    return result​def decode3(bits, dimensions, alternate_from_last=True):    search = [[rmin, rmax]               # start at extreme              for _, rmin, rmax in dimensions]    dcount = len(dimensions)    for i, bit in enumerate(bits):        dim = (dcount - 1 - (i % dcount)) if alternate_from_last else i % dcount        rmin, rmax = search[dim]         # dimension to decode        rmid = (rmin + rmax) / 2        if bit == 0:            rmax = rmid                       # range lower half        else:            rmin = rmid                       # range upper half        search[dim] = [rmin, rmax]    values = [((rmin + rmax) / 2, abs((rmin - rmax) / 2))              for rmin, rmax in search]    return values  #  [(value, delta), ...]​def geo_encode(values, code_len):    bits = encode3(values, geo, code_len * 5)    # To base 32    b = tuple(bits)    #print('Bits:', b)    code =  (bool2ch[b[i * 5: (i + 1) * 5]] for i in range(code_len))      return ''.join(code)​def geo_decode(code):    bits = sum((ch2bool[ch] for ch in code), [])    values = decode3(bits, geo)    return values

[27]

coord = (51.433718, -0.214126)geo_hash = geo_encode(coord, 6)geo_hash

'gcpue5'

[28]

latlong = geo_decode(geo_hash)print(f"Geohash '{geo_hash}' decodes to:")for (val, delta), (name, *_) in zip(geo_decode(geo_hash), geo):    print(f"  {name:10}: {val} ±{delta:g}")

Geohash 'gcpue5' decodes to:
  latitude  : 51.43524169921875 ±0.00274658
  longitude : -0.2142333984375 ±0.00549316

External sites for testing Geohashes

You can enter the coordinates or the geohash above into this site and find that it points to the "Wimpledon All England Lawn Tennis and Croquet Club".

End Notes.

This blog post was written because the WIkipedia entry and a web search did nt turn up a good description of Geohashes for me to finish the Python Rosetta Code entry initially. After more reading I finally completed the Python entry.

I thought about generalising the method to more than two dimensions and also wanted to highlight the swithing of order in geohashing: lat/long normal coord order vs long/lat used when alternating the bits from the binary search.

Another generalisation might be to use n-ary search for a value producing a base-n "bit"; but I could not think of a need, and this post is already long.

END.

Not saved yet

python3 | launching

Go deh!

Sunday, June 21, 2020