I just read an article that had a good go at explaining why Silicon Valley firms are filled with white middle class males that managed to actually do what it set out to do when it explained why it wanted to take a more reasoned approach. They boil it down to everyone having an in-built bias - not necessarily a good or bad thing, it is just how humans operate. The article then comes up with a very useful experiment:
When selecting from resumes for interview, have the names, age, sex, and origin of the candidates blanked out.
The author found that his resulting selections for interview were a lot different to what he had before.
He goes on to discuss how orchestras changed from being all male affairs to the current mix of genders by them adopting a selection process where candidates played unseen, behind a screen, and where selected based on how well they played.
This got me wondering about the current controversy about Oxbridge admissions. Maybe the universities should adopt similar schemes, candidates should be lead to a separate room, unseen, where voice changers would mask ethnicity and accent, The interview would be audio only and taped, with the selection panel in a separate room. After the interview, the candidate should be given a copy of the tape, and another copy kept for independent review. The idea would be to make the process a better meritocracy rather than the obvious selection of "people like me" that goes on at the moment.
Oh, here's the link to the original article.
Tuesday, November 22, 2011
Tuesday, November 08, 2011
Should you worry about a 2x speedup?
Let's take as context an implementation of a task for Rosetta Code - a site set up to compare how different programming languages are used to implement the same task for over five hundred tasks and over four hundred languages.
My short answer would be it depends! You need to:
As well as comparing two implementations for speed, you should also compare for readability. How well the code reads can have a large impact on how easy the code is to maintain. It has been known for task descriptions to be modified; someone tracking that modification may need to work out if and how code needs to be updated. If an example is overly complex and/or unidiomatic then it could cause problems.
Time complexity. If one version of code works better when given 'bigger' data then you need to know more about when that happens - it could be that the cross-over point in terms of speed of execution is never likely to be met. Maybe the size of data needed to reach cross over is unreasonable to expect, or that other mechanisms come into play that mask predicted gains (in other words you might need to verify using that actual bigger data set to account for things like swapping or caching at the OS and hardware level.
How fast does it need to be? Rosetta code doesn't usually mention absolute speed of execution, but if one example takes ten hours and the other takes five then you might want to take that into account. If one example took 0.2 seconds and the other only 0.1 seconds then I guess there is an unwritten expectation that examples "don't take long to run" where long is related to the expectation and patience of the user.
You need to look at the context. In the case of Rosetta code, it may be best to give a solution using a similar algorithm to other examples, or a solution that shows accepted use of the language.
When you make your considered choice, you might want to squirrel away the losing code with notes on why it wasn't used, - On Rosetta Code we sometimes add more than one solution to a task with comments contrasting the two if they both have merit.
It seems to me that talk about optimising for speed, and speed comparisons tends to dominate on the web over other optimisations, (usually with no extra info on the accuracy of the result. Actually there might be more cases of a revised result that showed not even the first digit of the original answer was right, but more than two digits of precision were shown in the answers)!
My short answer would be it depends! You need to:
- Read all of the task description,
- Read some of the solutions in other languages,
- And maybe skim the tasks talk page
As well as comparing two implementations for speed, you should also compare for readability. How well the code reads can have a large impact on how easy the code is to maintain. It has been known for task descriptions to be modified; someone tracking that modification may need to work out if and how code needs to be updated. If an example is overly complex and/or unidiomatic then it could cause problems.
Time complexity. If one version of code works better when given 'bigger' data then you need to know more about when that happens - it could be that the cross-over point in terms of speed of execution is never likely to be met. Maybe the size of data needed to reach cross over is unreasonable to expect, or that other mechanisms come into play that mask predicted gains (in other words you might need to verify using that actual bigger data set to account for things like swapping or caching at the OS and hardware level.
How fast does it need to be? Rosetta code doesn't usually mention absolute speed of execution, but if one example takes ten hours and the other takes five then you might want to take that into account. If one example took 0.2 seconds and the other only 0.1 seconds then I guess there is an unwritten expectation that examples "don't take long to run" where long is related to the expectation and patience of the user.
You need to look at the context. In the case of Rosetta code, it may be best to give a solution using a similar algorithm to other examples, or a solution that shows accepted use of the language.
When you make your considered choice, you might want to squirrel away the losing code with notes on why it wasn't used, - On Rosetta Code we sometimes add more than one solution to a task with comments contrasting the two if they both have merit.
It seems to me that talk about optimising for speed, and speed comparisons tends to dominate on the web over other optimisations, (usually with no extra info on the accuracy of the result. Actually there might be more cases of a revised result that showed not even the first digit of the original answer was right, but more than two digits of precision were shown in the answers)!
Wednesday, June 08, 2011
Woa - a glitch on Wolfram Mathworld?
I had come across Kaprekar numbers and decided to write a Rosetta Code task around the series. There were issues about the need for extra explanation of why 1 is a member of the series.
I had originally read the wp description and noted that there is a main part to testing if a number is in the series:
Wikipedia then goes on to state that runs of all zeros are not considered positive integers and so 100*100 = 10,000 and 100 + 000 = 100 is disallowed.
What got me was that 1 is considered a member of the series. The only way I could shoe-horn it into the general rule is to allow "splitting" the digits of the square before the first or after the last digit and so producing a number with no digits in it as a partial sum that is allowed and has value zero?!
I widened my references and took a look at the Wolfram Mathworld entry. Its explanation for the series does not allow for 1 being a member, but shows it as such anyway. It is confused as it describes a series that would exclude numbers 4879 and 5292, i.e. Sloanes A053816; but points to Sloanes A006886!
Sloanes A006886 gives another definition for the series:
I will submit a comment to Mathworld and see what they have to say.
P.S. I could have started this blog entry: "I'm not a mathematician but ..."
I had originally read the wp description and noted that there is a main part to testing if a number is in the series:
In mathematics, a Kaprekar number for a given base is a non-negative integer, the representation of whose square in that base can be split into two parts that add up to the original number again. For instance, 45 is a Kaprekar number, because 45² = 2025 and 20+25 = 45.That is, splitting the ordered digits of the square of the number into two parts that sum to the original number.
Wikipedia then goes on to state that runs of all zeros are not considered positive integers and so 100*100 = 10,000 and 100 + 000 = 100 is disallowed.
What got me was that 1 is considered a member of the series. The only way I could shoe-horn it into the general rule is to allow "splitting" the digits of the square before the first or after the last digit and so producing a number with no digits in it as a partial sum that is allowed and has value zero?!
I widened my references and took a look at the Wolfram Mathworld entry. Its explanation for the series does not allow for 1 being a member, but shows it as such anyway. It is confused as it describes a series that would exclude numbers 4879 and 5292, i.e. Sloanes A053816; but points to Sloanes A006886!
Sloanes A006886 gives another definition for the series:
Kaprekar numbers: n such that n=q+r and n^2=q*10^m+r, for some m >= 1, q>=0 and 0<=r<10^m, with n != 10^a, a>=1.If we try n = 1 then n*n = 1 and there is no m, q, and r that satisfies all the conditions.
I will submit a comment to Mathworld and see what they have to say.
P.S. I could have started this blog entry: "I'm not a mathematician but ..."
Wednesday, June 01, 2011
What http://perl6.org/ gets right
One thing I really like about the perl6 homepage is that they politely ask for people to be nice to each other up-front. The psychology goes a bit further: they use a female "voice", and associate the niceness with a butterfly image that is repeated on other pages.
Now that is brilliant! Finding a problem and directly addressing it in such an appropriate manner is something I really like.
Now that is brilliant! Finding a problem and directly addressing it in such an appropriate manner is something I really like.
Sunday, May 29, 2011
Good enough solutions can be great!
I've had to imlement the solutions to two problems on Rosetta code lately and they both involve heuristics that get around the prolems of exhaustive search.
The first problem was in finding a way to shuffle the letters of a string so that a minimum of letters end up in the same place. My first solution was to go through all letter permutations and return the best. That takes for ever on longer strings such as the word 'antidisestablishmentarianism'.
The algorithm I then turned to just swaps letters if they swap into positions that put different characters into corresponding locations. That algorithm is very fast and seems to give good answers.
The second problem was in creating knights tours of chess boards of varying sizes. The given Wansdorff's algorithm is much quicker than a full depth first search, and was straight-forward to code . I later found evidence that for larger board sizes the algorithm fails . (Although the paper does suggest simple improvements that work for boards sizes with thousands of squares).
Coding those two, makes me remember that results are what count normally. You won't achieve much by searching for the theory of relativity if the queen wants Newtons laws of motion so that sailing ships can trade and plunder!
The first problem was in finding a way to shuffle the letters of a string so that a minimum of letters end up in the same place. My first solution was to go through all letter permutations and return the best. That takes for ever on longer strings such as the word 'antidisestablishmentarianism'.
The algorithm I then turned to just swaps letters if they swap into positions that put different characters into corresponding locations. That algorithm is very fast and seems to give good answers.
The second problem was in creating knights tours of chess boards of varying sizes. The given Wansdorff's algorithm is much quicker than a full depth first search, and was straight-forward to code . I later found evidence that for larger board sizes the algorithm fails . (Although the paper does suggest simple improvements that work for boards sizes with thousands of squares).
Coding those two, makes me remember that results are what count normally. You won't achieve much by searching for the theory of relativity if the queen wants Newtons laws of motion so that sailing ships can trade and plunder!
Saturday, March 12, 2011
"That Goes Without Saying (or Does It)". It didn't!
I spent the time to watch a talk from Larry Wall the Perl, and now Perl 6 author; where he is using the Rosetta Code site for what it was built for - comparing programming language design, by examples.
Unfortunately he was preaching to his acolytes so there is no meat to the talk. Larry says 'X' about a language compared to Perl or Perl 6 and the audience accepts without even asking for clarifications. Argument never enters their minds.
What he did do well is to show examples first from the Root mean square page and indicate how anyone can compare and contrast languages using it. Larry then categorized Spiral matrix and Zigzag matrix as tasks whose examples in Perl where "fun to write" which I liked as I remember getting a lot out of Zigzag when I first found RC, so much so that I think starting the new task "Spiral matrix" was probably the first task I wrote for RC. (I thought it would be good to learn by doing a variant of an existing task).
Larry showed extended versions of the zigzag and spiral matrix solutions that used text based turtle graphics to animate the solutions.
In summary, his language comparisons were droopy; his use of RC - spot on!
Unfortunately he was preaching to his acolytes so there is no meat to the talk. Larry says 'X' about a language compared to Perl or Perl 6 and the audience accepts without even asking for clarifications. Argument never enters their minds.
What he did do well is to show examples first from the Root mean square page and indicate how anyone can compare and contrast languages using it. Larry then categorized Spiral matrix and Zigzag matrix as tasks whose examples in Perl where "fun to write" which I liked as I remember getting a lot out of Zigzag when I first found RC, so much so that I think starting the new task "Spiral matrix" was probably the first task I wrote for RC. (I thought it would be good to learn by doing a variant of an existing task).
Larry showed extended versions of the zigzag and spiral matrix solutions that used text based turtle graphics to animate the solutions.
In summary, his language comparisons were droopy; his use of RC - spot on!
Tuesday, January 11, 2011
New laptop with an i7-640M processor; and a 32 inch "monitor".
After a lot of hard work, and the car scrappage schemes of last year, I got rewarded with bonus!?
Nothing like a bwankers bonus, but enough to buy some fripperies after the essentials, and I decided to replace my aging 17" desktop-replacement laptop and get me a large 28" monitor.
After much searching in the post-xmas sales, I came across a curious machine at a very good price. The Advent Sienna 710 is a PCWorld/Dixons/Currys own-branded 15" laptop that has what, at the time, is the highest speed core for a laptop processor - an Intel 17-640M for under £500.
I guess the laptop has such a low price as it will not appeal to a gamer as it relies on Intels graphics accelerator and is so basic it doesn't have the usual scroll area at the side of the pad., which isn't too hot but that makes it ideal for me as I wanted something fast and multi-core, but don't indulge in much 3D work and can live with the pad deficiencies.
Our second TV had packed up mid-2010 and we were making-do with a very old 19" LCD. Instead of getting the 28" monitor I decided to buy a 32" Sony 1080P TV and use that as my monitor.
Well, the Advent laptop and Sony TV work beautifully together. I have a huge 32" screen at a decent resolution and with the sound coming through the HDMI connection, I don't need extra laptop speakers as the Sony sound is very good (better than on our more expensive main TV).
I ran my improved Python multi-processing example on the laptop and was very impressed with the speed, and with all the python instances appearing in the task manager.
END.
Nothing like a b
After much searching in the post-xmas sales, I came across a curious machine at a very good price. The Advent Sienna 710 is a PCWorld/Dixons/Currys own-branded 15" laptop that has what, at the time, is the highest speed core for a laptop processor - an Intel 17-640M for under £500.
I guess the laptop has such a low price as it will not appeal to a gamer as it relies on Intels graphics accelerator and is so basic it doesn't have the usual scroll area at the side of the pad., which isn't too hot but that makes it ideal for me as I wanted something fast and multi-core, but don't indulge in much 3D work and can live with the pad deficiencies.
Our second TV had packed up mid-2010 and we were making-do with a very old 19" LCD. Instead of getting the 28" monitor I decided to buy a 32" Sony 1080P TV and use that as my monitor.
Well, the Advent laptop and Sony TV work beautifully together. I have a huge 32" screen at a decent resolution and with the sound coming through the HDMI connection, I don't need extra laptop speakers as the Sony sound is very good (better than on our more expensive main TV).
I ran my improved Python multi-processing example on the laptop and was very impressed with the speed, and with all the python instances appearing in the task manager.
from concurrent import futures from math import floor, sqrt NUMBERS = [2**n - 1 for n in range(55, 65)] def lowest_factor(n, _start=3): if n % 2 == 0: return 2 search_max = int(floor(sqrt(n))) + 1 for i in range(_start, search_max, 2): if n % i == 0: return i return n def prime_factors(n, lowest): pf = [] while n > 1: pf.append(lowest) n //= lowest lowest = lowest_factor(n, max(lowest, 3)) return pf def prime_factors_of_number_with_lowest_prime_factor(NUMBERS): with futures.ProcessPoolExecutor() as executor: low_factor, number = max( (l, f) for l, f in zip(executor.map(lowest_factor, NUMBERS), NUMBERS) ) all_factors = prime_factors(number, low_factor) return number, all_factors def main(): print( 'For these numbers:\n ' + '\n '.join(str(p) for p in NUMBERS) ) number, all_factors = prime_factors_of_number_with_lowest_prime_factor(NUMBERS) print(' The one with the largest minimum prime factor is %i:' % number) print(' All its prime factors in order are: %s' % all_factors) if __name__ == '__main__': main()
END.