Nov 9, 2017

All Monte Carlo methods rely on a source of random numbers. Most such sources would more precisely be called pseudorandom numbers, since a deterministic algorithm cannot, by definition, ever produce truly random numbers. If you Google “random numbers” you will encounter a daunting list of articles, including many criticizing the random number generator used by Excel. It’s true: some random number generators are better than others, but it is silly to classify random number generators as “good” or “bad”. In the electronics industry, laser-powered wire strippers are far superior to the old-fashioned mechanical kind. I use the old fashioned kind at home and they work just fine. The only important question is whether a particular random number generator is suitable for the task at hand. If you were concerned with generating cryptographic codes, Excel’s random number generator would not be adequate. If you are modeling how long customers wait at the checkout counter of your retail store, searching for an NSA-quality algorithm would be a waste of time.

We will first take a look at Excel’s RAND( ) function, which produces uniformly distributed random numbers between 0 and 1. The output is never exactly one, and by my experiments is exactly equal to zero about 60 times in one billion (1 X 10^{9}) trials. The RAND( ) function is of critical importance, because if we can generate uniformly distributed random numbers in this range, we can then proceed to generate other random numbers for any probability distribution we wish.

We will not discuss Excel’s RANDBETWEEN( ) function here. It’s very convenient, but if it didn’t exist it would be a simple matter to implement its functionality. For example,

=FLOOR(RAND() * ($B$1-$A$1+1) + $A$1,1)

will yield a uniformly distributed integer between $A$1 and $B$1.

The need for generating random numbers following a normal distribution comes up frequently, so it is great for us that Microsoft provides a built in function for this purpose. It’s called NORM.INV( ) , and all we need do is supply as input a uniformly distributed random number between 0 and 1, e.g, the output of RAND( ), the mean for our new random numbers, and the standard deviation for our new random numbers. The following would generate a random number that might be used to model the weight of 18-year-old men in the United States:

Not all probability distribution functions in Excel come with a corresponding inverse function we can use to generate random numbers.

The “Random Number Generation” tool of the Data Analysis add-in can provide a set of random numbers that follow your choice of probability distribution. It would seem to be ideal, and if you only want a single set of random numbers, it would be fine. In Monte Carlo analysis, however, we want to create sets of random numbers over and over and over again, perhaps thousands or even millions of times. The RNG tool cannot be used directly from Excel formulas, and using it in VBA is awkward at best.

If we want to generate sets of random numbers not conveniently provided by Excel, we can write our own formula, create a VBA macro, or use a VLOOKUP table. We will examine the VLOOKUP table since it always works, indeed, we can make up our own probability distribution if we like (not that making up a distribution is likely to prove uselful). We will illustrate this technique by generating random numbers that follow a Poisson distribution.

The Poisson distribution is useful for modeling the fluctuation in counts of things, perhaps events. The number of cars going through a tollbooth each minute; the number of people waiting on the checkout line when you shop for groceries. In general, we are more interested in the Poisson distribution when the average is small; if the average is large enough using the normal distribution may be good enough.

We start by creating a list of integers starting at zero and continuing to the largest value we deem to be interesting in whatever our model might be. For our simple example, we will regard numbers greater than 23 to be beyond our range of interest. (If the line at the cash register were longer than 23 people you would probably come back later anyway.)

In this example, we create the list of integers in column “D”, since it is easier if we create a list of probability values on the left-hand side.

We will put our choice of Poisson distribution mean, in this example we’ll use 5.3, in cell D2. We will then create a formula to calculate the probability of each of the integers in our list. We’ll put the column of probabilities on the left-hand side of the column of integers to make the use of VLOOKUP more convenient.

We now have two columns. To the left of each integer is the probability of that integer appearing as a Poisson-distributed random number. Everything looks fine. It’s not.

Very often, when we are creating program code or writing database queries it is the boundaries where errors are likely to creep in. When we look at the first line, we realize the entire column is off. We want to look up probability values using VLOOKUP but clearly when we lookup probability values we cannot expect an exact match. When we lookup any random number less than 0.00499 (in this example) we want to find “0”, but VLOOKUP will want a nonexistent number on the previous line. All our probability values need to be moved down one notch. We will need to manually insert the value 0 as the first probability value.

Poisson distributions have “long tails”. which is to say that although large values are unlikely, the do occur and must be dealt with. In our table we must manually put in the maximum probability value of 1.0. (A larger number will also work, but it would just look weird.)

We can now generate as many approximately Poisson-distributed random variates as we want, as often as we want, using VLOOKUP.

Here are our results:

We have illustrated how to use VLOOKUP to generate random numbers approximating a Poisson distribution. There is nothing about this technique that is specific to the Poisson distribution, however, though it can become clumsy if you want more than 3 or 4 decimal places of precision for other distributions. All you need to do is create a list of values representing the cumulative probability density to the left of a column of values from which you would like to select randomly. Using RAND to generate a uniformly distributed variate and then using VLOOKUP to find the corresponding row in the table does the trick!