Thursday, April 16, 2009

Bimap. Bi-directional mapping/dictionary for Python?

I had a chance encounter with the boost C++ library documentation and
came across a description of their href="http://cablemodem.fibertel.com.ar/mcape/boost/libs/bimap/index.html">bimap
(that is bimap without a 't') .



It seems to be a bi-directional mapping.



In a Python dict, keys map to values. In a bimap, values would also map
to keys. It seems they put keys/values on an equal footing by renaming
them as left and right members, and having methods that work on either
the right to left or the left to right mapping.



I had just re-visited a program of mine that dealt with what I would
consider large sets of values. I was mapping test names to test
coverage where their were tens of trhousands of test names, each of
~100 characters long and for each file, I had many thousands of code
coverage points, which themselves were strings of ~150 characters each.
I new I had to work with a dict mapping test_names to sets of covered
points for each test and do a lot of set arithmatic to rank the tests
in order of contribution to the overall coverage figure, so I decided
to use indices; replacing files by negative numbers and coverage points
by positive numbers. My program works, and is very fast but I note that
I create both a testname2index mapping dict and the inverse
index2testname mapping, (and coverpoint2index and index2coverpoint
mapping dicts).



What I had been constructing, unawares, was a bimap by using two dicts.



I then spent time googling for Python bimaps to no avail, (although I
think there are haskel and Java implementations out there - and Google
'helps' by sugesting you meant bitmap with a 't').



So my questions are:


  1.  Is their a Python bimap implementation out there?

  2.  Would anyone want one? (I already code around its
    absense).

  3.  What would its methods be?




Methods



On my third question, I guess you could have all the normal methods of
a dict but preceeded by either
left_
or right_
with calling of a method without those prefixes raising an exception,
as it seems important to not favour one mapping direction over another.



This would give:

style="font-family: monospace; font-weight: bold; text-decoration: underline;">      
DICT      BIMOD
(LEFT)  BIMOD (RIGHT) style="font-family: monospace; font-weight: bold;">
style="font-family: monospace;">     
clear       
left_clear  right_clear   
 

      
copy        
left_copy  right_copy    
 


  
fromkeys    
left_fromkeys  right_fromkeys  
style="font-family: monospace;">
       
get         
left_get 
right_get       
style="font-family: monospace;">
   
has_key     
left_has_key  right_has_key   
style="font-family: monospace;">
     
items       
left_items  right_items   
 


 
iteritems    left_iteritems 
right_iteritems


  
iterkeys    
left_iterkeys  right_iterkeys  
style="font-family: monospace;">
 itervalues  
left_itervalues  right_itervalues
style="font-family: monospace;">
      
keys        
left_keys  right_keys    
 


       
pop         
left_pop 
right_pop       
style="font-family: monospace;">
   
popitem     
left_popitem  right_popitem   
style="font-family: monospace;">
 setdefault  
left_setdefault  right_setdefault
style="font-family: monospace;">
    
update      
left_update  right_update    
style="font-family: monospace;">
    
values      
left_values  right_values    





The above couldn't be extended for indexing however, unless you did
something like:

  style="font-family: monospace;">bimod_instance.left[leftindex]
and bimod_instance..right[rightindex]


but then wouldn't it be better to use the dotted form for all the
methods and have:

style="font-family: monospace;"> 
bimod_instance.left.haskey  # (with a dot after left), ...




I note that their are redundancies above, for example, left_clear ==
right_clear ...



Strewth, I've only scratched the surface :-)



- Paddy.

12 comments:

  1. Search for bijection/surjection/injection instead of "bimap" and you might do better.

    I've got a dirt-simple Bijection dict subclass at http://www.aminus.net/geniusql/browser/trunk/geniusql/objects.py#L32 -- it only adds a "key_for(value)" method.

    ReplyDelete
  2. You can simplify the methods by adding an intermediate object, like this:

    dict.left.keys() == dict.right.values()

    Of course, this only works for dicts where both keys and values are hashable.

    ReplyDelete
  3. Hi Fumanchu, thanks for the terms. A quick wikipedia search and I now know that the three terms relate to a mapping between a functions argument and its result:

    In an injection, each input maps to one output, however the output set maybe larger than the set of inputs.

    In a surjection, each output maps to one input, however the input set may be larger than the set of outputs.

    In a bijection, each input maps to exactly one output.

    I can see why they called the datatype a bimap now.

    -

    Hi Growl, I was slowly getting to that myself with what I said in the third-from-last paragraph, below the table.

    -

    My search for a Python bimap/bijection map got me a Xah Lee rant, but searching for "bidirectional map" was more interesting, just not in Python.

    - Paddy.

    ReplyDelete
  4. After reading your post I decided that might be a fun day project. I'm sure someone else has done this, but I thought I would share my Class with you.

    http://trydyingtolive.dynalias.com/code/bidict.txt

    ReplyDelete
  5. @tryDyingToLive,
    I took a look at your code - Thanks for taking the time and effort to produce it.

    I took a look at the example at the head of the file and spotted a problem: a bimap maps members of a left set to a right set, and vice-versa, but an item on the left can appear as an item on the right in the general case.

    In my uses so far, I don't think this is the case, but you might conceivably want to do:
    BiDict({1:'a',2:'b','c':'c',3:3})

    - Paddy.

    ReplyDelete
  6. You can actually do that. Its one of the level's of strictness.

    b=BiDict({1:'a',2:'b','c':'c',3:3},'bi-dict')

    The code is set to the strictest as default. Probably because it was late when I finished.

    ReplyDelete
  7. Have you considered any unabashedly relational solutions?

    ReplyDelete
  8. Well, I tend to not grab for a relational DB quickly. How would one help?

    ReplyDelete
  9. So if you delete a key, then does the value go as well ?

    ReplyDelete
  10. For a bimap, it would make sense as by definition, only left/right pairs should exist.

    Maybe the code for a bimap should stress deletion of left/right pairs.

    - Paddy.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete