In software there is often the need to chose randomly between a number of alternatives. Very often it is sufficient to chose between them with equal probability. For this we can simply generate a pseudo random number from a standard uniform random number generator of sufficient resolution, and then truncate the value down to the number of choices that we actually have. Given a random number generator that produces a random 32 bit number, and where R is not bigger than 65535, this macro will generate a number in the range of [0,R), that is between zero and R-1:

   #define randInRange( R ) ((( rand() >> 16 ) * (R)) >> 16 )

Sometimes there is the need to chose between a fixed number of alternatives where each alternative has a different probability, and the sum of all the probabilities is equal to one. For example you might have 4 choices, a - d with the following probabilities:

    a   0.3421
    b   0.1943
    c   0.2501
    d   0.2135

Mathematically this is called choosing a discrete random variable. Discrete refers to the idea that we are choosing from a fixed, countable set of values. ( note: This is part of the more general problem of choosing non-uniform random variables, which can include choosing continuous random variables. For example you may want to chose a series of random numbers such that their distribution approximates a given curve - like a Poisson distribution. )


The search method:

First, generate a random number X in the interval of [0-1). Given a probability vector P, with N entries, and a threshold variable T. Iterate through the array of probabilities, adding each Pi to T and then comparing T to X. As soon as X is smaller than T, chose i. Like so:

// p is an array of probabilities expressed in fixed point. // n is the number of elements in p u32 drvSearch( u32* p, u32 n ) { u32 t = 0; u32 x = rand(); for ( u32 i = 0; i < n; i++ ) { t += p[i]; if ( t > x ) return i; } }

This method works best if the probabilities are sorted with the most probable choices coming first. This way it is more likely to terminate the search early.

This method has several disadvantages. First it only works well if the probability array remains sorted. If the array is sorted, with the most probable elements first, then the time to generate each value may approach constant time. Maintaining a sorted array can be a prohibitive if each probability array will be used only a few times, or if the probabilities change frequently.

The other disadvantage is that this method may be slow for large n if the probability array is fairly uniform. As the probability array approaches perfect uniformity, all options having equal probability, the time to generate each number becomes proportional to O(n). Conversely, a highly non-uniform distribution can be used to generate numbers in nearly constant time.


The table lookup method:

Another method is to create a sufficiently large table, and to fill it with the desired output symbols in the desired proportions and then to use a standard uniform generator to chose a random entry from this table. To generate the distribution given above, for example, create a table with 10000 entries. Fill 3421 entries with a, 1943 entries with b, 2501 entries with c, and 2132 entries with d. Now chose a random number from 0-9999, and look up the result in the table.

The advantage of this method is that a choice can be generated in constant time. The main drawback is that accuracy of the method is proportional to the size of the table. You can only increase the resolution of the probabilities by increasing the size of the table. For any decent resolution you need to create very large tables. Even on modern computers with megabytes of memory this will create cache problems. Setting up these large tables can also be time consuming.

There are methods which can be used to compress the tables to a manageable size. One such method is to represent the probabilities in multiple tables each of which represents the probablities at different orders of magnitude. Given the probability distribution:

    a   0.3421
    b   0.1943
    c   0.2501
    d   0.2135

Build a table representing just the tenths place, where we round down every probability to the next tenth. This would create a table with 8 entries: ( a,a,a,b,c,c,d,d ). The sum of the probabilities in this simplified estimate is 0.8. There is 0.2 of the distribution which is unaccounted for, so we make another table representing the 100th's place, with : ( a,a,a,a,b,b,b,b,b,b,b,b,b,c,c,c,c,c,d ). We can continue this to ever lower decimal places until we have accounted for all of the probabilities in the array. For each table we compute C, which is the sum of the probabilities covered by that table and the previous tables. In the case of the distribution given above we end up with 4 tables like so.

    0.1:       8 elements: ( a,a,a,b,c,c,d,d )                       c: 0.8000
    0.01:     19 elements: ( a,a,a,a,b,b,b,b,b,b,b,b,b,c,c,c,c,c,d ) c: 0.9900
    0.001:     9 elements: ( a,a,b,b,b,b,d,d,d )                     c: 0.9990
    0.0001:   10 elements: ( a,b,b,b,c,d,d,d,d,d )                   c: 1.0000

Now to generate a choice we need to chose a random number X in the range of [0,1). If that number is less than C for the first table, then we just use the tenths digit as an index into the first table. If the generated number is greater or equal, then we subtract C for the first table from X, and then use the first two digits of X to index into the second table. We continue down until we run out of tables. The last table is guaranteed to apply because for any proper set of probabilities the last C is guaranteed to be 1.0.

We used decimal numbers in the description above, but in practice it is easier to implement this method with hexadecimal numbers. In the following code example each table represents one hex place:

struct Table { u32 mTableSize; u32* mTable; u32 mC; }; // p is an array of probabilities expressed in fixed point. // n is the number of elements in p // tables is an array of 8 Table structures. We assume that // a large enough buffer has been allocated in mTable to // accomodate the required number of elements. In this particular // case n*16 elements should always be sufficient. In practice one // might want to allocate the memory dynamically. void drvTableSetup( u32* p, u32 n, Table* tables ) { u32 shift = 28; u32 c = 0; for ( u32 i = 0; i < 8; i++ ) { u32* t = tables[i].mTable; for ( u32 j = 0; j < n; j++ ) { u32 count = (( p[j] >> shift ) & 0xF ); tables[i].mCount += count; c += ( 0x1 << shift ) * count; for ( u32 k = 0; k < count; k++ ) *t++ = j; } tables[i].mC = c; shift -= 4; } } u32 drvTableGenerate( Table* tables ) { u32 x = rand(); for ( u32 i = 0; i < 8; i++ ) if ( tables[i].mC > x ) return tables[i].mTable[( x >> ((7-i)*4)) & 0xF ]; }

The table method is nearly perfect. It requires very small tables. It runs in nearly constant time. The time to run this algorithm scales linearly with the desired accuracy. However this method is not suitable in situations where the probability array changes often because even though the array does not have to be sorted, it does take time to generate new probability tables. This method is also not suitable in situations where the probability array is very long, because the upper bound on the size of the table is n*t*r, where n is the length of the probability array, and t is the scale being used ( t would be 10 in the first example and 16 in the second ), and r is the number of tables ( 4 in the first example and 8 in the second ).


Set of Binary Distributions method:

First we observe that any array of probabilities with n elements ( where n > 1 ), can be expressed as a set of n-1 distributions of two elements, where the probability of each binary distribution is the same. In other words given one distribution:

p( a = 0.3421, b = 0.1943, c = 0.2501, d = 0.2135 )
can be expressed as:
p( 
    p( b = alpha1, a = 1 - alpha1 ) = 1/3
    p( c = alpha2, a = 1 - alpha2 ) = 1/3
    p( d = alpha3, a = 1 - alpha3 ) = 1/3
 )
for some set of alphas, and some arrangement of the output symbols. In this case the three alphas would be 0.1945, 0.2501, and 0.2135.

You can visualize what this means by imagining a probability array as a series of n different sized bottles each of which holds a certain amount of liquid. The amount of liquid corresponds to how probable that element is. The statement above means that you can repackage the liquid into n-1 bottles which are all the same size, and where no bottle holds liquid from more than two of the original set of bottles.

The condition above can be easily proved, but since I am lazy, and the proof is boring, we will leave that as an exercise to the diligent reader.

Given the observation above, if we can find a way to quickly construct the set of n-1 binary distributions, generating a random variable in constant time is easy. First select one of our n-1 binary distributions. Since the binary distributions are equally probable, all we need to do is generate a uniform random integer from [0,n-1), using a method like randInRange given above. Once we have selected a binary distribution, we need to choose one of the two symbols in that distribution according to the probability given by that distribution. This can be easily done by generating another random real number in the interval [0,1), and using that to generate one of the two symbols in the given binary distribution.

So how do you generate an appropriate set of binary distributions from an arbitrary set of n probabilities? First create an array of n-1 binary distributions. Each binary distribution consists of two output symbols, and a single probability p of selecting the first symbol. The remainder, 1-p, is the probability of selecting the second symbol.

A naive way of filling in this array would be to first sort the probability array P. Then for each binary distribution Di, where i is an integer in the range [0,n-1), we pick the symbol a, with the lowest probability Pj and make that the first symbol in distribution Di, and we remove a from P. Since there are n-1 distributions D, we set the probability of choosing a in Di to (n-1)Pj. The remaining probability Di is ( 1 - (n-1)Pj ). We take the symbol b with the largest probability Pk and assign that symbol as the second element in distribution Di. Then we subtract from Pk the amount accounted for by the second part of distribution Di. That is we subtract (1-(n-1)Pj)/(n-1) from Pk. Now we resort the probability array P. We repeat this until we have filled in all n-1 distributions D.

This method will always work, and is straightforward to understand, but it has the downside that it requires that P be sorted multiple times. With a little cleverness you can come up with a set of distributions without having to sort. All that is required is that you find pairs of probabilities, a and b, in P where one element of the pair, a is less probable than 1/(n-1), but where the sum of the pair is more probable than 1/(n-1). To do this, at each iteration, start at the top of the array P, and search until you find elements which are suitable for a and b. It can be done such that you don't need to iterate through the elements of P more than once to fill in all the distributions, and you don't need to sort P. Being lazy I will again leave proving this to the reader and just give you pseudo code:

struct Distribution { u32 mA; u32 mB; u32 mProb; }; struct Table { u32 mNumDists; Distribution* mDist; }; // 0.32 fixed point 1/(n-1) #define DIST_INV( n ) ( 0xFFFFFFFF/( (n) - 1 )) // p is an array of probabilities expressed in fixed point. // n is the number of elements in p // we assume here that mDist in table has already been allocated // as an array of n-1 Distribution structures. We assume the // probabilies sum to 1, and that no element has zero or negative // probabilities. void distTableSetup( u32* p, u32 n, Table* table ) { if ( n == 1 ) { // special case distro table->mNumDists = 1; table->mDist[0].mA = 0; table->mDist[0].mB = 0; table->mDist[0].mProb = 0; } else { Distribution* mDist = table->mDist; u32 thresh = DIST_INV( n ); u32 aIdx = 0; u32 bIdx = 0; u32 i; table->mNumDists = n-1; for ( i = 0; i < n-1; i++ ) { while (( p[aIdx] >= thresh ) || ( p[aIdx] == 0 )) aIdx++; // find a small non-zero prob while (( bIdx == aIdx ) || ((( p[aIdx] + p[bIdx] ) - 1 ) < ( thresh - 1 ))) bIdx++; // find a sum big enough or equal if ( aIdx >= n ) aIdx = n-1; if ( bIdx >= n ) bIdx = n-1; // this distribution consumes the rest of aIdx. computeBiDist( p, n, mDist++, aIdx, bIdx ); if (( bIdx < aIdx ) && ( p[bIdx] < thresh )) aIdx = bIdx; else aIdx++; } } return table; } // i is the distribution, aIdx is an index into P for symbol A, // bIdx is an index into P for symbol B void computeBiDist( u32* p, u32 n, Distribution* dist, u32 aIdx, u32 bIdx ) { dist->mA = aIdx; dist->mB = bIdx; if ( aIdx == bIdx ) { dist->mProb = 0; } else { if ((( p[aIdx] >> 1 ) * ( n - 1 )) >= 0x80000000 ) dist->mProb = 0xFFFFFFFF; else dist->mProb = p[aIdx] * ( n - 1 ); p[bIdx] -= ( DIST_INV( n ) - p[aIdx] ); } p[aIdx] = 0; }


And to generate the random variable:

u32 distTableGenerate( Table* table ) { u32 tableIdx = randInRange( table->mNumDists - 1 ); if ( rand() <= table->mDist[tableIdx].mProb ) return table->mDist[tableIdx].mA; else return table->mDist[tableIdx].mB; }

The advantages of this method are that the random variables can be generated in constant time. The tables required are very small, and scale linearly as O(n) where n is the length of the probability array. Setting up the array is fast. It does not require sorting, and can be done in O(n) time also. If just part of the probability distribution changes, it is straightforward to recalculate just part of the set of binary distributions. Although I don't give the algorithm here, in principle what you do is just remove the binary distributions affected by the partial changes. That is if the probability of symbol A and B change, we will remove all binary distributions that have either A or B in them, and use the new probabilities for A and B to compute new binary distributions in way similar to what was described above.

Often when trying to make a proper distribution, ones starts off with an array of number that are in the right proportion to the distribution we want to model, but are not represented as a series of probabilities that sum to one. For example given the series 20, 30, 135, 15, we would want a distribution 0.10, 0.15, 0.675, 0.075. The following function takes an array of numbers and generates an appropriate distribution in 0.32 fixed point.

static void normProbs( const u32* in, u32* out, u32 n ) { u32 i; u32 scale = 0; u32 err = 0; u32 shift = 0; u32 max = 0; u32 maxIdx = 0; // figure out the scale again: for ( i = 0; i < n; i++ ) scale += (( in[i] << shift ) >> 8 ); if (( scale < 0xFFFF ) && ( shift < 24 )) { shift += 8; goto again; } if ( scale == 0 ) { // degenerate all zero prob array. Can't do anything with // it... just echo it back out - but its surely wrong. assert(0); for ( i = 0; i < n; i++ ) out[i] = in[i]; } scale = 0x10000000 / (( scale + 0x7FF ) >> 12 ); // apply it for ( i = 0; i < n; i++ ) { out[i] = ((( in[i] << shift ) + 0x7FFF ) >> 16 ) * scale; err += out[i]; if ( out[i] > max ) { max = out[i]; maxIdx = i; } } // correct any accumulated error - it should be negligible. // Add it to the largest prob where it will be least noticed. out[maxIdx] -= err; }

You can read more discussion of other methods of from a paper called Fast Generation of Discrete Random Variables.