Well, after needing to split two sets into:
- Items unique to set1
- Items common to both sets
- items unique to set2
exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1
was having to traverse each set three times, and, seemed like a useful bit of code to propose that Python add to its libraries.
I dutifully asked just that on comp.lang.python-ideas, and the debate is healthily progressing.
I liked the code. But then I thought, what about more than two sets? What would a partition of more than two sets even look like?
Next morning I read a lot more of the Wikipedia article on Venn diagrams and picked up on them being a visualization of good ol' truth tables. That spark helped me understand that the sets could represent the three columns of input, if I add a binary count below the three then it looks like the LHS of a truth table; but the truth table for three inputs has eight lines not seven?
If the ones in the truth table represent an item being in a set then you get the following relationship between the input sets set0, set1, set2; and the contents of the output bins:
SET
210
===
000 <-> ???->
001 <-> bin_0->
010 <-> bin_1->
011 <-> bin_0&1->
100 <-> bin_2->
101 <-> bin_0&2->
110 <-> bin_1&2->
111 <-> bin_0&1&2->
(Scan across any row. For every one bit set then the set of that column is in the bin)
I also worked out that the initial row of 000 must correspond to a bin for items that are in no set?! Since the items come from a set then this bin will always be empty.
O.K. I have a relationship between the number of input sets, binary counts and output binning, but no code.
I wrote the following few lines, got stumped, and decided to finish reading "The long war" at a Pub on the river Avon whilst sipping Thatchers Gold cider served so cold it comes as a slushy.
I took another hard look at the code I had and eventually thought that I just had to get something down and play around with it until it felt right. I finally came up with the following function that when tested seemed to give the right answers:
Of course, you also need to know the order for the result so I then came up with the following:
i in range(1,2**n)]was having to traverse each set three times, and, seemed like a useful bit of code to propose that Python add to its libraries.
I dutifully asked just that on comp.lang.python-ideas, and the debate is healthily progressing.
The two set case
Oscar Benjamin and Peter Otten hammered out this code for an algorithm to do the split whilst traversing each set only once, that is we are down to two traversals in total.def partition(setx, sety): xonly, xandy, yonly = set(), set(), set() for x in setx: if x in sety: xandy.add(x) else: xonly.add(x) for y in sety: if y not in setx: yonly.add(y) return xonly, xandy, yonly
The N set case
This is where Venn diagrams come in. Given three sets you want to return sets of items corresponding to the seven regions of the three set Venn diagram. Looking at the partition function above, I couldn't work it out and went to bed on it.Next morning I read a lot more of the Wikipedia article on Venn diagrams and picked up on them being a visualization of good ol' truth tables. That spark helped me understand that the sets could represent the three columns of input, if I add a binary count below the three then it looks like the LHS of a truth table; but the truth table for three inputs has eight lines not seven?
If the ones in the truth table represent an item being in a set then you get the following relationship between the input sets set0, set1, set2; and the contents of the output bins:
SET
210
===
000 <-> ???->
001 <-> bin_0->
010 <-> bin_1->
011 <-> bin_0&1->
100 <-> bin_2->
101 <-> bin_0&2->
110 <-> bin_1&2->
111 <-> bin_0&1&2->
(Scan across any row. For every one bit set then the set of that column is in the bin)
I also worked out that the initial row of 000 must correspond to a bin for items that are in no set?! Since the items come from a set then this bin will always be empty.
O.K. I have a relationship between the number of input sets, binary counts and output binning, but no code.
I wrote the following few lines, got stumped, and decided to finish reading "The long war" at a Pub on the river Avon whilst sipping Thatchers Gold cider served so cold it comes as a slushy.
set0 = {1, 3, 5, 7} set1 = {2, 3, 6, 7} set2 = {4, 5, 6, 7}
def venndiv(*sets): bins = tuple(set() for n in range(2**len(sets)) - 1) for i, si in enumerate(sets):
The Slog
Back from the pub; a walk; playing with my youngest son; ...I took another hard look at the code I had and eventually thought that I just had to get something down and play around with it until it felt right. I finally came up with the following function that when tested seemed to give the right answers:
def venndiv(*sets): bins = tuple(set() for n in range(2**len(sets))) for i, si in enumerate(sets): # Over every individual item of each set for item in si: binindex = 0 for j, sj in enumerate(sets): # Test for membership of each set if item in sj: binindex += 2**j # Binary count binning bins[binindex].add(item) return bins[1:] # bins[0] - an item in no set is impossible. print(venndiv({1,2,3}, {3,4,5})) # (set([1, 2]), set([4, 5]), set([3])) assert venndiv(set0, set1, set2) == (set([1]), set([2]), set([3]), set([4]), set([5]), set([6]), set([7])) setA = set('A AB AC ABC'.split()) setB = set('B AB BC ABC'.split()) setC = set('C AC BC ABC'.split()) print(venndiv(setA, setB, setC)) # (set(['A']), set(['B']), set(['AB']), set(['C']), # set(['AC']), set(['BC']), set(['ABC']))
def bin_order(n): return ['bin' + ''.join(chr(65+j) for j in range(n) if i & 1<for
print(bin_order(3)) # ['binA', 'binB', 'binAB', 'binC', 'binAC', 'binBC', 'binABC']
End bit
I am not asking for function venndiv to become part of the Python libraries. It is just an exercise to see if I could expand on the useful two-set case.Cheers :-)