I mentioned in an earlier post that I had written my own ranker and thought I'd revisit this with some code.
I verify and ensure the safety of microprocessors for my day job. One way that very complex CPU's are tested is to create another model of the chip which can be used to generate pseudo-random instruction streams to run on CPU. The so-called ISG can create thousands (millions!) of these tests in very little time, and the ISG is written in such a way that it can be 'tweaked' to give some control or steering to what the instruction streams will exercise on the CPU.
Now simulating these instruction streams and gathering information on just what parts of the CPU are exercised, called covered, by each individual test takes time, and multiple ISG generated tests may cover the same regions of the CPU. To increase the overall coverage of of the CPU we run what is called a regression - all the tests are run and their coverage and the time they take to simulate are stored. at the end of the regression run you may have several thousands of tests that cover only part of the CPU.
If you take the regression results and rank them you can find that subset of the tests that give all the coverage. Usually thousands of pseudo-random tests might be ranked and generate a sub-list of only hundreds of tests that when run would give the same coverage. What we then usually do is look at what isn't covered and generate some more tests by the ISG or other methods to try and fill the gaps; run the new regression and rank again in a loop to fully exercise the CPU and hit some target coverage goal.
Ranking tests is an important part of the regression flow described above, and when it works well you forget about it. Unfortunately sometimes I want to to rank other data, for which the stock ranking program from the CAD tool vendors does not fit. So here is the guts of a ranking program that will scale to handling hundreds of thousands of tests and coverage points.
Input
Normally I have to parse my input from text or HTML files of results generated by other CAD programs - it is tedious work that I will skip by providing idealised inputs in the form of a Python dict. (Sometimes the code for parsing input files can be as large or larger than the ranking algorithm).Let us assume that each ISG test has a name, runs for a certain 'time' and when simulated is shown to 'cover' a set of numbered features of the design. after the parsing, the gathered input data is represented by the results dict in the program.
1 2 results = { 3 # 'TEST': ( TIME, set([COVERED_POINT ...])), 4 'test_00': ( 2.08, set([2, 3, 5, 11, 12, 16, 19, 23, 25, 26, 29, 36, 38, 40])), 5 'test_01': ( 58.04, set([0, 10, 13, 15, 17, 19, 20, 22, 27, 30, 31, 33, 34])), 6 'test_02': ( 34.82, set([3, 4, 6, 12, 15, 21, 23, 25, 26, 33, 34, 40])), 7 'test_03': ( 32.74, set([4, 5, 10, 16, 21, 22, 26, 39])), 8 'test_04': (100.00, set([0, 1, 4, 6, 7, 8, 9, 11, 12, 18, 26, 27, 31, 36])), 9 'test_05': ( 4.46, set([1, 2, 6, 11, 14, 16, 17, 21, 22, 23, 30, 31])), 10 'test_06': ( 69.57, set([10, 11, 15, 17, 19, 22, 26, 27, 30, 32, 38])), 11 'test_07': ( 85.71, set([0, 2, 4, 5, 9, 10, 14, 17, 24, 34, 36, 39])), 12 'test_08': ( 5.73, set([0, 3, 8, 9, 13, 19, 23, 25, 28, 36, 38])), 13 'test_09': ( 15.55, set([7, 15, 17, 25, 26, 30, 31, 33, 36, 38, 39])), 14 'test_10': ( 12.05, set([0, 4, 13, 14, 15, 24, 31, 35, 39])), 15 'test_11': ( 52.23, set([0, 3, 6, 10, 11, 13, 23, 34, 40])), 16 'test_12': ( 26.79, set([0, 1, 4, 5, 7, 8, 10, 12, 13, 31, 32, 40])), 17 'test_13': ( 16.07, set([2, 6, 9, 11, 13, 15, 17, 18, 34])), 18 'test_14': ( 40.62, set([1, 2, 8, 15, 16, 19, 22, 26, 29, 31, 33, 34, 38])), 19 } 20
Greedy ranking algorithm
The object of the algorithm is to select and order a subset of the tests that:- Cover as many of the coverage points as possible by at least one test.
- After the above, reduce the number of tests needed to achieve that maximum coverage by as much as is possible.
- Generate a ranking of the tests selected to allow an even smaller set of tests to be selected if necessary.
- After all the above having increasing importance, it would be good to also reduce the total 'time' accrued by the ranking tests .
- Of course it needs to work for large sets of tests and points to cover.
If there are more than one test giving the same incremental additional coverage at any stage then the test taking the least 'time' is picked.
The following function implements the algorithm:
21 def greedyranker(results): 22 results = results.copy() 23 ranked, coveredsofar, costsofar, round = [], set(), 0, 0 24 noncontributing = [] 25 while results: 26 round += 1 27 # What each test can contribute to the pool of what is covered so far 28 contributions = [(len(cover - coveredsofar), -cost, test) 29 for test, (cost, cover) in sorted(results.items()) ] 30 # Greedy ranking by taking the next greatest contributor 31 delta_cover, benefit, test = max( contributions ) 32 if delta_cover > 0: 33 ranked.append((test, delta_cover)) 34 cost, cover = results.pop(test) 35 coveredsofar.update(cover) 36 costsofar += cost 37 for delta_cover, benefit, test in contributions: 38 if delta_cover == 0: 39 # this test cannot contribute anything 40 noncontributing.append( (test, round) ) 41 results.pop(test) 42 return coveredsofar, ranked, costsofar, noncontributing 43
The function above is a bit dry so I took a bit of time to annotate it with a tutor capability that when run prints out just what it is doing along the way:
The function with tutor
It implements the same thing but does it noisily:44 def greedyranker(results, tutor=True): 45 results = results.copy() 46 ranked, coveredsofar, costsofar, round = [], set(), 0, 0 47 noncontributing = [] 48 while results: 49 round += 1 50 # What each test can contribute to the pool of what is covered so far 51 contributions = [(len(cover - coveredsofar), -cost, test) 52 for test, (cost, cover) in sorted(results.items()) ] 53 if tutor: 54 print('\n## Round %i' % round) 55 print(' Covered so far: %2i points: ' % len(coveredsofar)) 56 print(' Ranked so far: ' + repr([t for t, d in ranked])) 57 print(' What the remaining tests can contribute, largest contributors first:') 58 print(' # DELTA, BENEFIT, TEST') 59 deltas = sorted(contributions, reverse=True) 60 for delta_cover, benefit, test in deltas: 61 print(' %2i, %7.2f, %s' % (delta_cover, benefit, test)) 62 if len(deltas)>=2 and deltas[0][0] == deltas[1][0]: 63 print(' Note: This time around, more than one test gives the same') 64 print(' maximum delta contribution of %i to the coverage so far' 65 % deltas[0][0]) 66 if deltas[0][1] != deltas[1][1]: 67 print(' we order based on the next field of minimum cost') 68 print(' (equivalent to maximum negative cost).') 69 else: 70 print(' the next field of minimum cost is the same so') 71 print(' we arbitrarily order by test name.') 72 zeroes = [test for delta_cover, benefit, test in deltas 73 if delta_cover == 0] 74 if zeroes: 75 print(' The following test(s) cannot contribute more to coverage') 76 print(' and will be dropped:') 77 print(' ' + ', '.join(zeroes)) 78 79 # Greedy ranking by taking the next greatest contributor 80 delta_cover, benefit, test = max( contributions ) 81 if delta_cover > 0: 82 ranked.append((test, delta_cover)) 83 cost, cover = results.pop(test) 84 if tutor: 85 print(' Ranking %s in round %2i giving extra coverage of: %r' 86 % (test, round, sorted(cover - coveredsofar))) 87 coveredsofar.update(cover) 88 costsofar += cost 89 90 for delta_cover, benefit, test in contributions: 91 if delta_cover == 0: 92 # this test cannot contribute anything 93 noncontributing.append( (test, round) ) 94 results.pop(test) 95 if tutor: 96 print('\n## ALL TESTS NOW RANKED OR DISCARDED\n') 97 return coveredsofar, ranked, costsofar, noncontributing
Sample output
The code to call the ranker and print the results is:98 99 100 totalcoverage, ranking, totalcost, nonranked = greedyranker(results) 101 print(''' 102 A total of %i points were covered, 103 using only %i of the initial %i tests, 104 and should take %g time units to run. 105 106 The tests in order of coverage added: 107 108 TEST DELTA-COVERAGE''' 109 % (len(totalcoverage), len(ranking), len(results), totalcost)) 110 print('\n'.join(' %6s %i' % r for r in ranking))
The output has a lot of stuff from the tutor followed by the result at the end.
For this pseudo randomly generate test case of 15 tests it shows that only seven are needed to generate the maximum total coverage. (And if you were willing to loose the coverage of three tests that each cover only one additional point then 4 out of 15 tests would give 92.5% of the maximum coverage possible).
## Round 1
Covered so far: 0 points:
Ranked so far: []
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
14, -2.08, test_00
14, -100.00, test_04
13, -40.62, test_14
13, -58.04, test_01
12, -4.46, test_05
12, -26.79, test_12
12, -34.82, test_02
12, -85.71, test_07
11, -5.73, test_08
11, -15.55, test_09
11, -69.57, test_06
9, -12.05, test_10
9, -16.07, test_13
9, -52.23, test_11
8, -32.74, test_03
Note: This time around, more than one test gives the same
maximum delta contribution of 14 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
Ranking test_00 in round 1 giving extra coverage of: [2, 3, 5, 11, 12, 16, 19, 23, 25, 26, 29, 36, 38, 40]
## Round 2
Covered so far: 14 points:
Ranked so far: ['test_00']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
12, -58.04, test_01
10, -100.00, test_04
9, -12.05, test_10
9, -26.79, test_12
9, -85.71, test_07
8, -4.46, test_05
7, -15.55, test_09
7, -16.07, test_13
7, -40.62, test_14
7, -69.57, test_06
6, -34.82, test_02
5, -5.73, test_08
5, -32.74, test_03
5, -52.23, test_11
Ranking test_01 in round 2 giving extra coverage of: [0, 10, 13, 15, 17, 20, 22, 27, 30, 31, 33, 34]
## Round 3
Covered so far: 26 points:
Ranked so far: ['test_00', 'test_01']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
7, -100.00, test_04
5, -12.05, test_10
5, -26.79, test_12
5, -85.71, test_07
4, -4.46, test_05
3, -5.73, test_08
3, -16.07, test_13
3, -32.74, test_03
3, -34.82, test_02
2, -15.55, test_09
2, -40.62, test_14
1, -52.23, test_11
1, -69.57, test_06
Ranking test_04 in round 3 giving extra coverage of: [1, 4, 6, 7, 8, 9, 18]
## Round 4
Covered so far: 33 points:
Ranked so far: ['test_00', 'test_01', 'test_04']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
4, -12.05, test_10
3, -85.71, test_07
2, -4.46, test_05
2, -32.74, test_03
1, -5.73, test_08
1, -15.55, test_09
1, -26.79, test_12
1, -34.82, test_02
1, -69.57, test_06
0, -16.07, test_13
0, -40.62, test_14
0, -52.23, test_11
The following test(s) cannot contribute more to coverage
and will be dropped:
test_13, test_14, test_11
Ranking test_10 in round 4 giving extra coverage of: [14, 24, 35, 39]
## Round 5
Covered so far: 37 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -4.46, test_05
1, -5.73, test_08
1, -26.79, test_12
1, -32.74, test_03
1, -34.82, test_02
1, -69.57, test_06
0, -15.55, test_09
0, -85.71, test_07
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
The following test(s) cannot contribute more to coverage
and will be dropped:
test_09, test_07
Ranking test_05 in round 5 giving extra coverage of: [21]
## Round 6
Covered so far: 38 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -5.73, test_08
1, -26.79, test_12
1, -69.57, test_06
0, -32.74, test_03
0, -34.82, test_02
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
The following test(s) cannot contribute more to coverage
and will be dropped:
test_03, test_02
Ranking test_08 in round 6 giving extra coverage of: [28]
## Round 7
Covered so far: 39 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05', 'test_08']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -26.79, test_12
1, -69.57, test_06
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
Ranking test_12 in round 7 giving extra coverage of: [32]
## Round 8
Covered so far: 40 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05', 'test_08', 'test_12']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
0, -69.57, test_06
The following test(s) cannot contribute more to coverage
and will be dropped:
test_06
## ALL TESTS NOW RANKED OR DISCARDED
A total of 40 points were covered,
using only 7 of the initial 15 tests,
and should take 209.15 time units to run.
The tests in order of coverage added:
TEST DELTA-COVERAGE
test_00 14
test_01 12
test_04 7
test_10 4
test_05 1
test_08 1
test_12 1
What should be next
There is a new Unified Coverage Interoperability Standard for a database for storing test coverage data ideally the greedy ranker should be hooked up to that UCIS DB to get its inputs via its C-interface or maybe its XML output instead of parsing text files.Addendum
Random results dict creator
As used for testing:def cover_creator(ntests=25, maxcoverpoints=100): import random results = {} coveredrange = (maxcoverpoints * 1 // 6, 1 + maxcoverpoints * 2 // 6) print coveredrange for test in range(ntests): name = 'test_%02i' % test covered = sorted(set(random.randint(0, maxcoverpoints-1) for i in range(random.randint(*coveredrange)))) time = len(covered) * (100 + (random.random() - 0.5) * 40) / 100.0 results[name] = ( float('%6.2f' % time), set(covered)) return results
END.
(Using Python≥2.7 set notation)
ReplyDeleteI believe that greedyranker({
'test1': (1, {1, 2, 3, 4, 5, 6, 7, 8}),
'test2a': (1, {1, 3, 5, 7, 9, 11, 13}),
'test2b': (1, {2, 4, 6, 8, 10, 12}),
'test3a': (1, {9}),
'test3b': (1, {10}),
'test3c': (1, {11}),
'test3d': (1, {12}),
'test3e': (1, {13}),
})
should optimally return 'test2a' and 'test2b' as ranked.
Replace last line of my comment with “should optimally return *only* 'test2a' and 'test2b' as ranked.”
ReplyDeleteHi ΤΖΩΤΖΙΟΥ, optimaly, you are correct. This is a greedy algorithm that can be used when there are thousands of tests and tens of thousands of cover points when finding the optimum solution would take a lifetime (of a star or more ;-)
ReplyDeleteYes, obviously attempting the full powerset could take a lifetime. :)
ReplyDeleteI would be interested in giving it a shot as a pet project, if you gave me a sample of such input data. If I come up with a modified algorithm that improves results in tolerably increased time, I guess it'll be a win-win.
Hi again ΤΖΩΤΖΙΟΥ, the new addendum has the code I used to create sample test data. Have fun, and tell me how you get on please. Thanks.
DeleteHi - I got interested enough in the UCIS to mostly wrap it in Python https://bitbucket.org/verilab/pyucis
ReplyDeleteI say 'mostly' as I haven't quite gotten the whole SWIG interface completed, but a lot of it is there and working. Might be interesting/ useful. Certainly a lot less code to write using Pythonic iterators and file handling than using the C equivalent version of the API - most of the example code is about 1/3rd the equivalent sample from the UCIS specification.
Can we at the end of the test do the following:
ReplyDeleteFor each set check if this set is a subset of any other set and then exclude this from the result.
This _could_ be done, but for my use cases, it is not likely to exclude anything more.
Delete