## Why do people say there is modulo bias when using a random number generator?

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like `rand()` in C++.

So `rand()` is a pseudo-random number generator which chooses a natural number between 0 and `RAND_MAX`, which is a constant defined in `cstdlib` (see this article for a general overview on `rand()`).

Now what happens if you want to generate a random number between say 0 and 2. For the sake of explanation, lets say `RAND_MAX` was 10 and I decide that the best way to generate a random number between 0 and 2 is to do `rand()%3`. Assuming `rand()` does generate each number between 0 and 10 with equal probability, (this is arguable but for this post I will assume it does), why would `rand()%3` not produce the numbers between 0 and 2 with equal probability? When `rand()` returns 0, 3, 6, or 9, `rand()%3 == 0`. When `rand()` returns 1, 4, 7, or 10, `rand()%3 == 1`. When `rand()` returns 2, 5, or 8, `rand()%3 == 2`. Now if we analyze this statistically, we very quickly see that the probability of getting a 0 is 4/11, 1 is 4/11 but 2 is 3/11. This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does `rand()%n` return a range of numbers from 0 to n-1 with equal probability? When `RAND_MAX%n == n - 1`. In this case, along with our earlier assumption `rand()` does return a number between 0 and `RAND_MAX` with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? One way is to keep generating random numbers till you get a number in your desired range:

``````int x;
do
{
x = rand();
} while (x >= n);
``````

Hope that helps everyone!