The following Matlab code was used to investigate the claim that the random number generator randn generates samples of a ZERO-MEAN random variable. The intent here is to give you an idea of how a specific analysis might proceed, and to familiarize you with some useful Matlab commands.
________________________________________________________________________
REPORT ON MATLAB’S NORMAL RANDOM NUMBER GENERATOR
ABSTRACT
The Matlab command randn(m,n) is used to generate an m x n array of samples from a normal or Gaussian probability distribution, of which the mean is zero and the variance is one. The goal of this analysis was to assess the claim that the mean equals one. The approach was to observe the behavior of the sample mean, or the average, as the number of values averaged approaches infinity. Specifically, it was observed that the estimated 3-sigma region for the sample mean does indeed appear to converge to zero, and is centered about the claimed mean. Hence, we can conclude that the claimed mean is correct.
1. INTRODUCTION
Random number generators are instrumental in conducting simulations of random phenomena. To accurately simulate the phenomenon under study, it is important that the generator have the properties claimed. In this report we are interested in one generator which is associated with a common analysis package, Matlab ™. It is the generator which claims to generate samples from a Normal distribution. Specifically, the Matlab command randn(m,n) is claimed to generate a sample from each of a total of mxn independent and identically distributed (i.i.d.) random variables, each having a normal distribution with mean equal to zero and variance equal to one. Thus, the list of claims include:
Claim 2: The variance equals one
Claim 3: The distribution is normal (or Gaussian), and
Claim 4: The generation of any number is independent of
the generation of any other.
In this sample report, we will investigate Claim 1. In the next section we summarize the theory behind our approach. This theory is then applied in Section 3. Finally, we offer some discussion in Section 4. The Matlab code used to conduct the analysis is included in the Appendix.
2. THEORY ASSOCIATED WITH RANDOM NUMBER GENERATION
First of all, it is a potential source of major confusion to talk about random numbers. Numbers are not random. But what is implied is that the mechanism for generating them is. Furthermore, this mechanism must be distinguished in relation to the goal of the simulation. For example, these numbers can all be assumed to belong to the 1-D random variable, X. We can use these numbers to get an idea of the shape of the probability density function. But this idea will depend on how many numbers were used. Intuitively, if a very large number were used, then we would have confidence in what we see. The same reasoning applies to Claim 1. If the average of these numbers is small, then we can assume the mean is zero. But here again, we should recognize that how many numbers are averaged is important, in order to quantify what it means to be close to zero. Clearly, wee we to compute such an average over and over again, the result would never be the same. Hence, the algorithm which yields the average number is given by the averaging of mxn random variables. If we assume that they each have the same p.d.f. then they each have the same mean and variance, as well. In this case the average, or sample mean, has a mean equal to the common mean. If these mxn random variables are uncorrelated, then the variance of this sample mean will be reduced by a factor of mxn relative to the common variance. Combining these two facts leads to the following approach to assessing whether Claim 1 is true
METHOD AND RESULTS
Based on the discussion of the last section, our approach for identifying the assumedly common mean of the collection of random variables is the following:
For successively larger chosen size, mxn, generate a large
number of simulations of the sample mean, and from these, estimate the 3-sigma
range for this random variable. Then, as this mxn size increases, if this
region appears to be approaching zero, the value about which it approaches this
will be assumed to be the true mean.
The results of this are shown in Figure 1, which suggests that as the number mxn approaches infinity, the variance of the sample mean approaches zero. Since the 3-sigma range is centered around zero, we can conclude that, in the limit, the sample mean would converge to the claimed mean, which is zero.
DISCUSSION
The above analysis relied on a number of other claims that we did not address. Specifically, it presumed that each of the mxn random variables had the same mean. Simply because the sample mean converges to zero, does not mean that each random variable must have the same mean. For, consider a sequence of random variables with means alternating between plus and minus one. As more and more are averaged, it is clear that the sample mean will converge to zero. Yet none of them have mean equal to zero. Another claim not addressed specifically here is Claim 4. Our approach does, however, provide some support for this claim, since, in Figure 1 the rate at which the 3-sigma region becomes small is consistent with the rate of variance reduction associated with averaging i.i.d. random variables. But, here again, our argument relies on the assumption that all the random variables have the same variance. An approach to better addressing these issues would be to investigate an n-D random vector. But a discussion on that approach is reserved for the student.
APPENDIX
% PROGRAM NAME: HW_1
% PURPOSE: To investigate the claim that the randn
% command really does generate a ZERO MEAN random variable
% METHOD: Estimate the 3-sigma range for the estimator
% of the mean using n iid random variables. It should
% become increasingly localized about the true unknown
% mean as n approaches infinity.
% ==============================
xlow=[]; % initialize range limit vector
xhigh=[];
nvec=[];
for i=1:3
n=10^i; % number of iid r.v.s averaged
nvec=[nvec,n];
xmat=randn(n,1000); % generate 1000 realizations of n-vector
xbar=mean(xmat); % compute 1000 measurements of xbar
xlow=[xlow,mean(xbar)-3*std(xbar)]; %essential lower limit of range of xbar
xhigh=[xhigh,mean(xbar)+3*std(xbar)];%essential upper limit of range of xbar
end
plot(log10(nvec),[xlow;xhigh])
xlabel('Base-10 log of number averaged')
title('Estimated 3-sigma range for xbar(n) versus log(n)')