I've run simulations, tens of thousands of them at a time, over and over as we developed chips. In one project I noticed that I could predict the final result after only a small number of results were in which allowed me to halt the rest of the simulations, or make advance preparations for the final result.

I looked it up at the time and, indeed, there is an equation where if you want the pass rate of a "large" population to within a given accuracy, it will give you the minimum sample size to use.

To some, that's all gobledygook so I'll try to explain with some code.

## Explanation

Lets say you have a large randomised population of pass/fails or ones and zeroes:

And that we take a sample from it and compute the pass rate of that single, smaller sample

Every time we run that pass_rate expression we get a different value. It is random. We need to run that pass_rate calculation many times to get an idea of what the likely pass_rate would be when using the given sample size:

I have learnt that the question to ask is not how many times the sample pass rate was the population pass rate, but to define an acceptable margin of error, (say 5%) and say we want to be confident that pass rates be within that margin a certain percentage of the time

*So for a sample size of 123 we could expect the pass rate of the sample to be within 5% of the actual pass rate of the population, 0.5 or 50%, onlr o.4 or 40% of the time!*

## We need more!

What is actually done is we state what we *think *the population pass rate is, **p_hat**, (choose closer to 50% if you are unsure); the margin of error around p_hat we want, **epsilon**, usually +/-5% or +/-3%; and the **confidence_level** in hitting within the pass_rates margin of error.

There are calculators that will then give you n, the size of the sample needed to satisfy those condition.

## Doing it myself

I calculated for one specific sample size, above. Obviousely, if I calculated pass_rates over a range of sample_sizes, and increqasing runs_per_sample, I could search out the sample size needed.

That is done in my next program. I have to switch to using the numpy library for its speed and sample_size becomes a range.

When the pass rate confidence levels are calculated I end up with a list of confidence levels for increasing sample sizes that are usually not increasing due to the randomness, e.g.

range_hits = [...0.94, **0.95**, 0.93, 0.954, ... 0.95, 0.96, **0.95**, 0.96, 0.96, 0.97, ...] # confidence levels

The range of sample_size corresponding to the first occurrence>= the requested confidence level, and the last occorrence of a confidence level < the requested confidence level in then slightly widened and the runs_per_sample increased on another iteration to get a better result.

Here's a sample of the output I get when searching

### Sample output

### My Code

**END.**

* *

## No comments:

## Post a Comment