Well, after needing to split two sets into:
- Items unique to set1
- Items common to both sets
- items unique to set2
exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1
was having to traverse each set three times, and, seemed like a useful bit of code to propose that Python add to its libraries.
I dutifully asked just that on comp.lang.python-ideas, and the debate is healthily progressing.
I liked the code. But then I thought, what about more than two sets? What would a partition of more than two sets even look like?
Next morning I read a lot more of the Wikipedia article on Venn diagrams and picked up on them being a visualization of good ol' truth tables. That spark helped me understand that the sets could represent the three columns of input, if I add a binary count below the three then it looks like the LHS of a truth table; but the truth table for three inputs has eight lines not seven?
If the ones in the truth table represent an item being in a set then you get the following relationship between the input sets set0, set1, set2; and the contents of the output bins:
SET
210
===
000 <-> ???->
001 <-> bin_0->
010 <-> bin_1->
011 <-> bin_0&1->
100 <-> bin_2->
101 <-> bin_0&2->
110 <-> bin_1&2->
111 <-> bin_0&1&2->
(Scan across any row. For every one bit set then the set of that column is in the bin)
I also worked out that the initial row of 000 must correspond to a bin for items that are in no set?! Since the items come from a set then this bin will always be empty.
O.K. I have a relationship between the number of input sets, binary counts and output binning, but no code.
I wrote the following few lines, got stumped, and decided to finish reading "The long war" at a Pub on the river Avon whilst sipping Thatchers Gold cider served so cold it comes as a slushy.
I took another hard look at the code I had and eventually thought that I just had to get something down and play around with it until it felt right. I finally came up with the following function that when tested seemed to give the right answers:
Of course, you also need to know the order for the result so I then came up with the following:
i in range(1,2**n)]was having to traverse each set three times, and, seemed like a useful bit of code to propose that Python add to its libraries.
I dutifully asked just that on comp.lang.python-ideas, and the debate is healthily progressing.
The two set case
Oscar Benjamin and Peter Otten hammered out this code for an algorithm to do the split whilst traversing each set only once, that is we are down to two traversals in total.def partition(setx, sety): xonly, xandy, yonly = set(), set(), set() for x in setx: if x in sety: xandy.add(x) else: xonly.add(x) for y in sety: if y not in setx: yonly.add(y) return xonly, xandy, yonly
The N set case
This is where Venn diagrams come in. Given three sets you want to return sets of items corresponding to the seven regions of the three set Venn diagram. Looking at the partition function above, I couldn't work it out and went to bed on it.Next morning I read a lot more of the Wikipedia article on Venn diagrams and picked up on them being a visualization of good ol' truth tables. That spark helped me understand that the sets could represent the three columns of input, if I add a binary count below the three then it looks like the LHS of a truth table; but the truth table for three inputs has eight lines not seven?
If the ones in the truth table represent an item being in a set then you get the following relationship between the input sets set0, set1, set2; and the contents of the output bins:
SET
210
===
000 <-> ???->
001 <-> bin_0->
010 <-> bin_1->
011 <-> bin_0&1->
100 <-> bin_2->
101 <-> bin_0&2->
110 <-> bin_1&2->
111 <-> bin_0&1&2->
(Scan across any row. For every one bit set then the set of that column is in the bin)
I also worked out that the initial row of 000 must correspond to a bin for items that are in no set?! Since the items come from a set then this bin will always be empty.
O.K. I have a relationship between the number of input sets, binary counts and output binning, but no code.
I wrote the following few lines, got stumped, and decided to finish reading "The long war" at a Pub on the river Avon whilst sipping Thatchers Gold cider served so cold it comes as a slushy.
set0 = {1, 3, 5, 7} set1 = {2, 3, 6, 7} set2 = {4, 5, 6, 7}
def venndiv(*sets): bins = tuple(set() for n in range(2**len(sets)) - 1) for i, si in enumerate(sets):
The Slog
Back from the pub; a walk; playing with my youngest son; ...I took another hard look at the code I had and eventually thought that I just had to get something down and play around with it until it felt right. I finally came up with the following function that when tested seemed to give the right answers:
def venndiv(*sets): bins = tuple(set() for n in range(2**len(sets))) for i, si in enumerate(sets): # Over every individual item of each set for item in si: binindex = 0 for j, sj in enumerate(sets): # Test for membership of each set if item in sj: binindex += 2**j # Binary count binning bins[binindex].add(item) return bins[1:] # bins[0] - an item in no set is impossible. print(venndiv({1,2,3}, {3,4,5})) # (set([1, 2]), set([4, 5]), set([3])) assert venndiv(set0, set1, set2) == (set([1]), set([2]), set([3]), set([4]), set([5]), set([6]), set([7])) setA = set('A AB AC ABC'.split()) setB = set('B AB BC ABC'.split()) setC = set('C AC BC ABC'.split()) print(venndiv(setA, setB, setC)) # (set(['A']), set(['B']), set(['AB']), set(['C']), # set(['AC']), set(['BC']), set(['ABC']))
def bin_order(n): return ['bin' + ''.join(chr(65+j) for j in range(n) if i & 1<for
print(bin_order(3)) # ['binA', 'binB', 'binAB', 'binC', 'binAC', 'binBC', 'binABC']
End bit
I am not asking for function venndiv to become part of the Python libraries. It is just an exercise to see if I could expand on the useful two-set case.Cheers :-)
Is there any way to indent code in these comments? I haven’t found one, so instead I put the code I wanted to post here on gist.
ReplyDeleteIt’s a different way to write your venndiv function, more to my personal taste (but perhaps less to yours).
Bosker, Any chance of maybe you doing your own blog entry on how you developed your answer as well as expanding on how it works?
DeleteSure! Thanks for the suggestion. http://bosker.wordpress.com/2013/07/10/venn-diagram-partitioning/
DeleteHi bosker, saw your code and will take a look in the next 12 hour hopefully.
ReplyDeleteAs for the lack of decent editing facilities; It seems Google gets enough ad revenue with this level of support. I and others have asked years ago but not much changes. I wouldn't be surprised if they can blogger in the near future as not being "shiney-shiney" enough.
Hi paddy3118,
ReplyDeleteThast exactly what I need :)
But, there has been a problem in the visualization of the bin_order function, could you paste it in plain text.
Thanks in advance
Peio
Sorry for the late reply.
Deletedef bin_order(n):
....return ['bin' + ''.join(chr(65+j) for j in range(n) if i & 1<<j)
............for i in range(1,2**n)]
Hello,
ReplyDeleteI am wondering if there is a way to pass n-number of sets into this function, instead of specifying one by one?
Thanks.
if you have a list pf sets just unpack it in the call, something like:
Deletevenndiv(*list_of_sets)
Thanks, this helps. One more question: say that I am interested in only finding unique elements to a set rather than computing all possible binary combinations. For example, given A, B and C sets this code returns A, B, AB, C, AC, BC, ABC. I am interested in calculating only A, B and C conditions. How can I modify the code to obtain this result?
Delete