I had a look at the TopologicalSorter class and it seems to mix the graph algorithm in with a use case in a less than ideal way.
- If given a graph of dependencies it will give one ordering that can satisfy the dependencies via its static_order() method only.
- It supports one way to add parallelism.
I had written a Topological Sort task on Rosetta Code, a Python implementation, and had been both implementing and using T-sorts to compile Electronics Design Automation libraries for over a decade, (maybe 20+ years).
Below, I make slight alterations to the RC toposort2 function to create toposort3, then compare it to the new graphlib.TopologicalSorter class.
The Modules A, B, C, D example
They give the following input graph for T-sorting:
graph = {"D": {"B", "C"},
"C": {"A"},
"B": {"A"}}
Or more colloquially: D depends on, (or must come after), both B and C; C depends on A and B depends on A.
The static_order ordering given is:
('A', 'C', 'B', 'D')
Now those dependencies have only a "looser" ordering of the set of nodes B and C. Nodes B and C can be swapped and it will still satisfy the dependencies. B and C can be implemented in parallel in fact.
My routine, instead of returning a iterator over individual items, instead iterates over sets of items.
- All items in each set must be completed/ordered before all those of a preceding set from the iterator.
- The items in each set may be ordered/completed in any way or completed in parallel.
My routines result iterator if turned into a tuple would yield:
({'A'}, {'B', 'C'}, {'D'})
If this were a calculated compilation order for Verilog and VHDL libraries then it clearly shows the opportunity for parallelism in compiling B and C and that a maximum of two libraries can be compiled in parallel.
RosettaCode task Example
The RosettaCode task graph shows even more scope for parallelism.
Here is the task data run through the new module and my toposort function:
from functools import reduce from pprint import pprint as pp from collections import defaultdict from graphlib import TopologicalSorter # LIBRARY mapped_to LIBRARY DEPENDENCIES data = { 'des_system_lib': set('std synopsys std_cell_lib des_system_lib dw02 dw01 ramlib ieee'.split()), 'dw01': set('ieee dw01 dware gtech'.split()), 'dw02': set('ieee dw02 dware'.split()), 'dw03': set('std synopsys dware dw03 dw02 dw01 ieee gtech'.split()), 'dw04': set('dw04 ieee dw01 dware gtech'.split()), 'dw05': set('dw05 ieee dware'.split()), 'dw06': set('dw06 ieee dware'.split()), 'dw07': set('ieee dware'.split()), 'dware': set('ieee dware'.split()), 'gtech': set('ieee gtech'.split()), 'ramlib': set('std ieee'.split()), 'std_cell_lib': set('ieee std_cell_lib'.split()), 'synopsys': set(), } for k, v in data.items(): v.discard(k) # Ignore self dependencies # LIBRARY mapped_to LIBRARY PREDECESSORS data_invert = defaultdict(set) for lib, deps in data.items(): for dep in deps: data_invert[dep].add(lib) ts = TopologicalSorter(data) graphlib_order = tuple(ts.static_order()) def toposort3(data): extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys()) data.update({item:set() for item in extra_items_in_deps}) while True: ordered = set(item for item,dep in data.items() if not dep) if not ordered: break yield ordered data = {item: (dep - ordered) for item,dep in data.items() if item not in ordered} assert not data, "A cyclic dependency exists amongst %r" % data paddys_order = tuple(toposort3(data)) print('Python 3.9 graphlib gives a topological ordering:') pp(graphlib_order) print('\nMy topological ordering of sets that must be in tuple order, but each item in an individual set can be executed in any order for that set, or in *parallel*:') pp(paddys_order)
Output:
Python 3.9 graphlib gives a topological ordering: ('des_system_lib', 'dw03', 'dw07', 'dw06', 'dw04', 'dw05', 'ramlib', 'std_cell_lib', 'synopsys', 'dw02', 'dw01', 'std', 'dware', 'gtech', 'ieee') My topological ordering of sets that must be in tuple order, but each item in an individual set can be executed in any order for that set, or in *parallel*: ({'synopsys', 'std', 'ieee'}, {'dware', 'ramlib', 'gtech', 'std_cell_lib'}, {'dw07', 'dw02', 'dw06', 'dw01', 'dw05'}, {'des_system_lib', 'dw04', 'dw03'})
From the output above, my routine doesn't hide what can be run in parallel, and encompasses every possible ordering of items, wheras the graphlib module only shows one ordering, and if the graphlib modules parallelism support is used, it is more difficult to get an idea of the opportunities for parallelism rather than just starting up another task.
I can see that I can use a maximum of five and a minimum of 3 tasks in parallel and that those parallel tasks can be executed in a minimum of four sequential steps.
So, That's why and how I think graphlib could be better.
END.
The output from your library can be reproduced exactly (apart from the display order of set items) via:
ReplyDeletets = TopologicalSorter(data)
ts.prepare()
while ts:
....ready = ts.get_ready()
....print(set(ready))
....ts.done(*ready)
It is better even than that. If initial parallel jobs return at different times then the standard library approach would be used to check for modified availability for parallelisation. I learnt some more :-)
ReplyDelete