Go deh!

Mainly Tech projects on Python and Electronic Design Automation.

Tuesday, November 22, 2011

A better candidate selection process

I just read an article that had a good go at explaining why Silicon Valley firms are filled with white middle class males that managed to actually do what it set out to do when it explained why it wanted to take a more reasoned approach. They boil it down to everyone having an in-built bias - not necessarily a good or bad thing, it is just how humans operate. The article then comes up with a very useful experiment:

When selecting from resumes for interview, have the names, age, sex, and origin of the candidates blanked out.

The author found that his resulting selections for interview were a lot different to what he had before.

He goes on to discuss how orchestras changed from being all male affairs to the current mix of genders by them adopting a selection process where candidates played unseen, behind a screen, and where selected based on how well they played.

This got me wondering about the current controversy about Oxbridge admissions. Maybe the universities should adopt similar schemes, candidates should be lead to a separate room, unseen, where voice changers would mask ethnicity and accent, The interview would be audio only and taped, with the selection panel in a separate room. After the interview, the candidate should be given a copy of the tape, and another copy kept for independent review. The idea would be to make the process a better meritocracy rather than the obvious selection of "people like me" that goes on at the moment.

Oh, here's the link to the original article.




Tuesday, November 08, 2011

Should you worry about a 2x speedup?

Let's take as context an implementation of a task for Rosetta Code - a site set up to compare how different programming languages are used to implement the same task for over five hundred tasks and over four hundred languages.

My short answer would be it depends! You need to:
  • Read all of the task description,
  • Read some of the solutions in other languages, 
  • And maybe skim the tasks talk page
I.e.  ensure you know what the task is asking for and then verify that both solutions solve the task as stated. It can be very easy to misinterpret what the task is asking for, for example, if a task asks for a particular algorithm, do both the examples you are comparing use that algorithm?

As well as comparing two implementations for speed, you should also compare for readability. How well the code reads can have a large impact on how easy the code is to maintain. It has been known for task descriptions to be modified; someone tracking that modification may need to work out if and how code needs to be updated. If an example is overly complex  and/or unidiomatic then it could cause problems.

Time complexity. If one version of code works better when given 'bigger' data then  you need to know more about when that happens - it could be that the cross-over point in terms of speed of execution is never likely to be met. Maybe the size of data needed to reach cross over is unreasonable to expect, or that other mechanisms come into play that mask predicted gains (in other words you might need to verify using that actual bigger data set to account for things like swapping or caching at the OS and hardware level.

How fast does it need to be? Rosetta code doesn't usually mention absolute speed of execution, but if one example takes ten hours and the other takes five then you might want to take that into account. If one example took 0.2 seconds and the other only 0.1 seconds then I guess there is an unwritten expectation that examples "don't take long to run" where long is related to the expectation and patience of the user.

You need to look at the context. In the case of Rosetta code, it may be best to give a solution using a similar algorithm to other examples, or a solution that shows accepted use of the language.

When you make your considered choice, you might want to squirrel away the losing code with notes on why it wasn't used,  - On Rosetta Code we sometimes add more than one solution to a task with comments contrasting the two if they both have merit.

It seems to me that talk about optimising for speed, and speed comparisons tends to dominate on the web over other optimisations, (usually with no extra info on the accuracy of the result. Actually there might be more cases of a revised result that showed not even the first digit of the original answer was right, but more than two digits of precision were shown in the answers)!

Wednesday, June 08, 2011

Woa - a glitch on Wolfram Mathworld?

I had come across Kaprekar numbers and decided to write a Rosetta Code task around the series. There were issues about the need for extra explanation of why 1 is a member of the series.
I had originally read the wp description and noted that there is a main part to testing if a number is in the series:
 In mathematics, a Kaprekar number for a given base is a non-negative integer, the representation of whose square in that base can be split into two parts that add up to the original number again. For instance, 45 is a Kaprekar number, because 45² = 2025 and 20+25 = 45.
 That is, splitting the ordered digits of the square of the number into two parts that sum to the original number.

Wikipedia then goes on to state that runs of all zeros are not considered positive integers and so 100*100 = 10,000 and 100 + 000 = 100 is disallowed.

What got me was that 1 is considered a member of the series. The only way I could shoe-horn it into the general rule is to allow "splitting" the digits of the square before the first or after the last digit and so producing a number with no digits in it as a partial sum that is allowed and has value zero?!

I widened my references and took a look at the Wolfram Mathworld entry.  Its explanation for the series does not allow for 1 being a member, but shows it as such anyway. It is confused as it describes a series that would exclude numbers 4879 and 5292, i.e. Sloanes A053816; but points to  Sloanes  A006886!

Sloanes A006886 gives another definition for the series:
 Kaprekar numbers: n such that n=q+r and n^2=q*10^m+r, for some m >= 1, q>=0 and 0<=r<10^m, with n != 10^a, a>=1.
 If we try n = 1 then n*n = 1 and there is no m, q, and r that satisfies all the conditions.

I will submit a comment to Mathworld and see what they have to say.

P.S. I could have started this blog entry: "I'm not a mathematician but ..."








Wednesday, June 01, 2011

What http://perl6.org/ gets right

One thing I really like about the perl6 homepage is that they politely ask for people to be nice to each other up-front. The psychology goes a bit further: they use a female "voice", and associate the niceness with a butterfly image that is repeated on other pages.

Now that is brilliant! Finding a problem and directly addressing it in such an appropriate manner is something I really like.


Sunday, May 29, 2011

Good enough solutions can be great!

I've had to imlement the solutions to two problems on Rosetta code lately and they both involve heuristics that get around the prolems of exhaustive search.

The first problem was in finding a way to shuffle the letters of a string so that a minimum of letters end up in the same place. My first solution was to go through all letter permutations and return the best. That takes for ever on longer strings such as the word 'antidisestablishmentarianism'.
The algorithm I then turned to just swaps letters if they swap into positions that put different characters into corresponding locations. That algorithm is very fast and seems to give good answers.

The second problem was in creating knights tours of chess boards of varying sizes. The given Wansdorff's algorithm is much quicker than a full depth first search, and was straight-forward to code . I later found evidence that for larger board sizes the algorithm fails . (Although the paper does suggest simple improvements that work for boards sizes with thousands of squares).

Coding those two, makes me remember that results are what count  normally. You won't achieve much by searching for the theory of relativity if the queen wants Newtons laws of motion so that sailing ships can trade and plunder!






Saturday, March 12, 2011

"That Goes Without Saying (or Does It)". It didn't!

I spent the time to watch a talk from Larry Wall the Perl, and now Perl 6 author; where he is using the Rosetta Code site for what it was built for - comparing programming language design, by examples.

Unfortunately he was preaching to his acolytes so there is no meat to the talk. Larry says 'X' about a language compared to Perl or Perl 6 and the audience accepts without even asking for clarifications. Argument never enters their minds.

What he did do well is to show examples first from the Root mean square page and indicate how anyone can compare and contrast languages using it. Larry then categorized Spiral matrix and Zigzag matrix as tasks whose examples in Perl where "fun to write" which I liked as I remember getting a lot out of Zigzag when I first found RC, so much so that I think starting the new task "Spiral matrix" was probably the first task I wrote for RC. (I thought it would be good to learn by doing a variant of an existing task).

Larry showed extended versions of the zigzag and spiral matrix solutions that used text based turtle graphics to animate the solutions.

In summary, his language comparisons were droopy; his use of RC - spot on!



Tuesday, January 11, 2011

New laptop with an i7-640M processor; and a 32 inch "monitor".

After a lot of hard work, and the car scrappage schemes of last year, I got rewarded with bonus!?
Nothing like a bwankers bonus, but enough to buy some fripperies after the essentials, and I decided to replace my aging 17" desktop-replacement laptop and get me a large 28" monitor.

After much searching in the post-xmas sales, I came across a curious machine at a very good price. The Advent Sienna 710 is a PCWorld/Dixons/Currys own-branded 15" laptop that has what, at the time, is the highest speed core for a laptop processor - an Intel 17-640M for under £500.

I guess the laptop has such a low price as it will not appeal to a gamer as it relies on Intels graphics accelerator and is so basic it doesn't have the usual scroll area at the side of the pad., which isn't too hot but that makes it ideal for me as I wanted something fast and multi-core, but don't indulge in much 3D work and can live with the pad deficiencies.

Our second TV had packed up mid-2010 and we were making-do with a very old 19" LCD. Instead of getting the 28" monitor I decided to buy a 32" Sony 1080P TV and use that as my monitor.

Well, the Advent laptop and Sony TV work beautifully together. I have a huge 32" screen at a decent resolution and with the sound coming through the HDMI connection, I don't need extra laptop speakers as the Sony sound is very good  (better than on our more expensive main TV).

I ran my improved Python multi-processing example on the laptop and was very impressed with the speed, and with all the python instances appearing in the task manager.



from concurrent import futures
from math import floor, sqrt

NUMBERS = [2**n - 1 for n in range(55, 65)]

def lowest_factor(n, _start=3):
    if n % 2 == 0:
        return 2
    search_max = int(floor(sqrt(n))) + 1
    for i in range(_start, search_max, 2):
        if n % i == 0:
            return i
    return n

def prime_factors(n, lowest):
    pf = []
    while n > 1:
        pf.append(lowest)
        n //= lowest
        lowest = lowest_factor(n, max(lowest, 3))
    return pf

def prime_factors_of_number_with_lowest_prime_factor(NUMBERS):
    with futures.ProcessPoolExecutor() as executor:
        low_factor, number = max( (l, f) for l, f in zip(executor.map(lowest_factor, NUMBERS), NUMBERS) )
        all_factors = prime_factors(number, low_factor)
        return number, all_factors


def main():
    print( 'For these numbers:\n  ' + '\n  '.join(str(p) for p in NUMBERS) )
    number, all_factors = prime_factors_of_number_with_lowest_prime_factor(NUMBERS)
    print('    The one with the largest minimum prime factor is %i:' % number)
    print('      All its prime factors in order are: %s' % all_factors)


if __name__ == '__main__':
    main()


END.



Tuesday, December 07, 2010

Stupid macro languages

In the Python web world, the debate on whether templating languages should offer programming language type abilities has reared its head again. from "Stupid templating languages", to "Not So Stupid Template Languages", "In Response to "Stupid Template Languages"", and discussed more on reddit here.

In the world of Electronic Design Automation and integrating complex EDA software tools, the EDA companies,, on the whole, have embraced TCL as their scripting tool, but sometimes I have inherited a complex design flow, where, to add control, and later flexibility within that control, proprietary initialization file formats and their parsers have grown by adding their own macro language type features and grown to need the power of something like the bash shell.  Rather than ditching their proprietary lashed-together, one-off languages for something that is tried and tested and used/debugged by many - such as bash/python/Tcl/Lua they persevere by adding more capability to an in-house, one project language.

In the case of web templating the main argument is that given an intelligent web templating language, people are apt to migrate what should be business logic into the templates thereby muddying the separation of the business logic from the presentation layer. In the EDA field the problem is often the maintenance of the proprietary macro language and the proliferation of languages with which one needs to be familiar with in order to master a design environment.

I won't comment more on the web templating debate, but when it comes to initialization file formats and or initialization scripts then you should stick to pre-existing standards such as .ini files or .csv files in the simpler cases, or use pre-existing scripting languages when you need more power such as bash scripts, Python, or Tcl.


Monday, October 04, 2010

Ipod Tablet Opportunity?

The ipad is close to £600 in the UK. The ipod touch is less than £200. Given that there are tablet computers on sale for less than £200, and inspired by news of a future ipod touch add-on that gives it phone capabilities; how about designing a ten-inch screen,  extra battery, and phone dongle that would take a (partially disassembled), ipod touch inside it and turn it into a ten inch ipad-alike?

One problem I see is that you would probably have to provide a service to take the users ipod touch and disassemble/assemble it inside the ten-inch  "dock"; but Apple seems to provide a lot of monetary difference to try and make a profit.

Just a thought!

Saturday, September 18, 2010

Regression runners: Whot! No Job Arrays?

I went from running regressions consisting of greater than 50K simulations nightly using Mentors QuestaSim, where I had written the regression framework and results preparation myself - to my first real Cadece vManager controlled regression project.

I should explain to my readers who usually see Python focused blog entries, that this is about the running of large regressions on compute farms for verifying ASICs (although my regression runner used Python extensively).

I did like the more minimal, HTML-based GUI on the tool - I've spent too much time creating work-arounds for flashy GUI's that don't work outside a narrow area, and so know they can be a double-edged sword. The tool needs a lot of configuration behind the scenes, and they have chosen to do this using their own vsif file format which seems to me to be a design fault. If they had gone for XML (gosh is that me advocating the use of XML), or YAML, then the format would be as readable and more easily used in other tools - no tool is an island in a design flow.

The main problem with vManager, is with its use of LSF. LSF is a separate tool used to harness a network of computers into a compute cluster. LSF runs jobs on the cluster, managing the cluster to give maximum job throughput. vManager can turn each test that needs to run on the design into a possible series of jobs for LSF, but it fails to use job arrays! Job arrays allow for many similar jobs to be submitted and controlled as one array of jobs where the indexing of the individual job in the array is used to introduce a variation in the test.

In my regression framework I created arrays of over 100K jobs, that took several weeks to run, and all the time that the job array at the heart of my regression was running, IT could change the parameters of my job array to control its use of resources according to changing priorities; scale back the maximum number of concurrent simulations during the 9-till-5; ramp up during weekends, (or stop then resume for a 'planned' license outage). IT had all the information they need to show the full extent of the regression and LSF will give the throughput of the array from which I can estimate completion times. Another engineer wishing to start a regression has immediate feedback of what else is running rather than a huge list of hundreds of jobs that it is difficult to make sense of. In short, ASIC verification regressions map naturally to Job Arrays.

The other problem I have with vManager is a missing feature: Randomization of the order that tests are run. If you have 100 tests to run on five areas of the design then it is natural to compose the vsif file that describes each test in some sort of orderly progression through the tests to run. When the regression is running and you are monitoring partial results then it is easier to get a sense of the overall result if the individual tests are run in a random order. In my regression framework I generated tests in an order, then randomized the set of tests based on Pythons excellent Mersenne Twister implementation. By monitoring results during previous runs I could judge how much of a regression had to run if we needed results in a specified time.

A bit of background (re?)search showed that Mentor have their own regression running framework, that also doesn't use job arrays; and that the Grid Engine cluster management tool supports job arrays.

Followers

Subscribe Now: google

Add to Google Reader or Homepage

Go deh too!

whos.amung.us

Blog Archive