CPE 619 Testing RandomNumber Generators - PowerPoint PPT Presentation

About This Presentation

Title:

CPE 619 Testing RandomNumber Generators

Description:

D=0 Exact fit. D has a chi-square distribution with k-1 degrees of freedom. ... Chi-Square for Other Distributions ... Chi-square test is a one-dimensional test ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 55

Provided by: Mil36

Learn more at: http://www.ece.uah.edu

Category:

more less

Transcript and Presenter's Notes

Title: CPE 619 Testing RandomNumber Generators

1
CPE 619Testing Random-Number Generators

Aleksandar Milenkovic
The LaCASA Laboratory
Electrical and Computer Engineering Department
The University of Alabama in Huntsville
http//www.ece.uah.edu/milenka
http//www.ece.uah.edu/lacasa

2
Overview

Chi-square test
Kolmogorov-Smirnov Test
Serial-correlation Test
Two-level tests
K-dimensional uniformity or k-distributivity
Serial Test
Spectral Test

3
Testing Random-Number Generators

Goal To ensure that the random number generator
produces a
random stream
Plot histograms
Plot quantile-quantile plot
Use other tests
Passing a test is necessary but not sufficient
Pass ¹ GoodFail ? Bad
New tests ? Old generators fail the test
Tests can be adapted for other distributions

4
Chi-Square Test

Most commonly used test
Can be used for any distribution
Prepare a histogram of the observed data
Compare observed frequencies with theoretical
k Number of cells
oi Observed frequency for ith cell
ei Expected frequency
D0 Þ Exact fit
D has a chi-square distribution with k-1 degrees
of freedom.
Þ Compare D with c21-a k-1 Pass with
confidence a if D is less

5
Example 27.1

1000 random numbers with x0 1
Observed difference 10.380
Observed is Less ? Accept IID U(0, 1)

6
Chi-Square for Other Distributions

Errors in cells with a small ei affect the
chi-square statistic more
Best when ei's are equal
Þ Use an equi-probable histogram with variable
cell sizes
Combine adjoining cells so that the new cell
probabilities are approximately equal
The number of degrees of freedom should be
reduced to k-r-1 (in place of k-1), where r is
the number of parameters estimated from the
sample
Designed for discrete distributions and for large
sample sizes only ? Lower significance for finite
sample sizes and continuous distributions
If less than 5 observations, combine neighboring
cells

7
Kolmogorov-Smirnov Test

Developed by A. N. Kolmogorov and N. V. Smirnov
Designed for continuous distributions
Difference between the observed CDF (cumulative
distribution function) Fo(x) and the expected cdf
Fe(x) should be small

8
Kolmogorov-Smirnov Test

K maximum observed deviation below the
expected cdf
K- minimum observed deviation below the
expected cdf
K lt K1-an and K- lt K1-an Þ Pass at a
level of significance
Don't use max/min of Fe(xi)-Fo(xi)
Use Fe(xi1)-Fo(xi) for K-
For U(0, 1) Fe(x)x
Fo(x) j/n, where x gt x1, x2, ..., xj-1

9
Example 27.2

30 Random numbers using a seed of x015
The numbers are14, 11, 2, 6, 18, 23, 7,
21, 1, 3, 9, 27, 19, 26, 16, 17, 20,
29, 25, 13, 8, 24, 10, 30, 28, 22, 4,
12, 5, 15.

10
Example 27.2 (contd)

The normalized numbers obtained by dividing the
sequence by 31 are0.45161, 0.35484, 0.06452,
0.19355, 0.58065, 0.74194, 0.22581, 0.67742,
0.03226, 0.09677, 0.29032, 0.87097, 0.61290,
0.83871, 0.51613, 0.54839, 0.64516, 0.93548,
0.80645, 0.41935, 0.25806, 0.77419, 0.32258,
0.96774, 0.90323, 0.70968, 0.12903, 0.38710,
0.16129, 0.48387.

11
Example 27.2 (contd)

K0.9n value for n 30 and a 0.1 is
1.0424
ObservedltTable? Pass

12
Chi-square vs. K-S Test
13
Serial-Correlation Test

Nonzero covariance Þ Dependence. The inverse is
not true
Rk Autocovariance at lag k Covxn, xnk
For large n, Rk is normally distributed with a
mean of zero and a variance of 1/144(n-k)
100(1-a) confidence interval for the
autocovariance is
For k?1 Check if CI includes zero
For k 0, R0 variance of the sequence
Expected to be 1/12 for IID U(0,1)

14
Example 27.3 Serial Correlation Test

10,000 random numbers with x01

15
Example 27.3 (contd)

All confidence intervals include zero ? All
covariances are statistically insignificant at
90 confidence.

16
Two-Level Tests

If the sample size is too small, the test results
may apply locally, but not globally to the
complete cycle.
Similarly, global test may not apply locally
Use two-level tests
Þ Use Chi-square test on n samples of size k
each and then use a Chi-square test on the set
of n Chi-square statistics so obtained
Þ Chi-square on Chi-square test.
Similarly, K-S on K-S
Can also use this to find a nonrandom'' segment
of an otherwise random sequence.

17
k-Distributivity

k-Dimensional Uniformity
Chi-square Þ uniformity in one dimensionÞ Given
two real numbers a1 and b1 between 0 and 1 such
that b1 gt a1
This is known as 1-distributivity property of un.
The 2-distributivity is a generalization of this
property in two dimensions
For all choices of a1, b1, a2, b2 in 0, 1,
b1gta1 and b2gta2

18
k-Distributivity (contd)

k-distributed if
For all choices of ai, bi in 0, 1, with bigtai,
i1, 2, ..., k.
k-distributed sequence is always
(k-1)-distributed. The inverse is not true.
Two tests
Serial test
Spectral test
Visual test for 2-dimensions Plot successive
overlapping pairs of numbers

19
Example 27.4

Tausworthe sequence generated by
The sequence is k-distributed for k up to d /l
e, that is, k1.
In two dimensions Successive overlapping pairs
(xn, xn1)

20
Example 27.5

Consider the polynomial
Better 2-distributivity than Example 27.4

21
Serial Test

Goal To test for uniformity in two dimensions or
higher.
In two dimensions, divide the space between 0
and 1 into K2 cells of equal area

22
Serial Test (contd)

Given x1, x2,, xn, use n/2 non-overlapping
pairs (x1, x2), (x3, x4), and count the points
in each of the K2 cells
Expected n/(2K2) points in each cell
Use chi-square test to find the deviation of the
actual counts from the expected counts
The degrees of freedom in this case are K2-1
For k-dimensions use k-tuples of non-overlapping
values
k-tuples must be non-overlapping
Overlapping ? number of points in the cells are
not independent chi-square test cannot be used
In visual check one can use overlapping or
non-overlapping
In the spectral test overlapping tuples are used
Given n numbers, there are n-1 overlapping pairs,
n/2 non-overlapping pairs

23
Spectral Test

Goal To determine how densely the k-tuples x1,
x2, , xk can fill up the k-dimensional
hyperspace
The k-tuples from an LCG fall on a finite number
of parallel hyper-planes
Successive pairs would lie on a finite number of
lines
In three dimensions, successive triplets lie on a
finite number of planes

24
Example 27.6 Spectral Test

All points lie on three straight lines.
Or

Plot of overlapping pairs

25
Example 27.6 (contd)

In three dimensions, the points (xn, xn-1, xn-2)
for the above generator would lie on five planes
given by
Obtained by adding the following to equation
Note that kk1 will be an integer between 0 and
4.

26
Spectral Test (More)

Marsaglia (1968) Successive k-tuples obtained
from an LCG fall on, at most, (k!m)1/k parallel
hyper-planes, where m is the modulus used in the
LCG.
Example m 232, fewer than 2,953 hyper-planes
will contain all 3-tuples, fewer than 566
hyper-planes will contain all 4-tuples, and
fewer than 41 hyper-planes will contain all
10-tuples. Thus, this is a weakness of LCGs.
Spectral Test Determine the max distance
between adjacent hyper-planes.
Larger distance Þ worse generator
In some cases, it can be done by complete
enumeration

27
Example 27.7

Compare the following two generators
Using a seed of x015, first generator
Using the same seed in the second generator

28
Example 27.7 (contd)

Every number between 1 and 30 occurs once and
only once
Þ Both sequences will pass the chi-square test
for uniformity

29
Example 27.7 (contd)

First Generator

30
Example 27.7 (contd)

Three straight lines of positive slope or ten
lines of negative slope
Since the distance between the lines of positive
slope is more, consider only the lines with
positive slope
Distance between two parallel lines yaxc1 and
yaxc2 is given by
The distance between the above lines is
or 9.80

31
Example 27.7 (contd)

Second Generator

32
Example 27.7 (contd)

All points fall on seven straight lines of
positive slope or six straight lines of negative
slope.
Considering lines with negative slopes
The distance between lines is
or 5.76.
The second generator has a smaller maximum
distance and, hence, the second generator has a
better 2-distributivity
The set with a larger distance may not always be
the set with fewer lines

33
Example 27.7 (contd)

Either overlapping or non-overlapping k-tuples
can be used
With overlapping k-tuples, we have k times as
many points, which makes the graph visually more
complete.The number of hyper-planes and the
distance between them are the same with either
choice.
With serial test, only non-overlapping k-tuples
should be used.
For generators with a large m and for higher
dimensions, finding the maximum distance becomes
quite complex.
See Knuth (1981)

34
Summary

Chi-square test is a one-dimensional
testDesigned for discrete distributions and
large sample sizes
K-S test is designed for continuous variables
Serial correlation test for independence
Two level tests find local non-uniformity
k-dimensional uniformity k-distributivity
tested by spectral test or serial test

35
Random Variate Generation
36
Overview

Inverse transformation
Rejection
Composition
Convolution
Characterization

37
Random-Variate Generation

General Techniques
Only a few techniques may apply to a particular
distribution
Look up the distribution in Chapter 29

38
Inverse Transformation

Used when F-1 can be determined either
analytically or empirically

39
Proof
40
Example 28.1

For exponential variates
If u is U(0,1), 1-u is also U(0,1)
Thus, exponential variables can be generated by

41
Example 28.2

The packet sizes (trimodal) probabilities
The CDF for this distribution is

42
Example 28.2 (contd)

The inverse function is
Note CDF is continuous from the right? the
value on the right of the discontinuity is
used? The inverse function is continuous from
the left? u0.7 ? x64

43
Applications of the Inverse-Transformation
Technique
44
Rejection

Can be used if a pdf g(x) exists such that c g(x)
majorizes the pdf f(x) ? c g(x) gt f(x) 8 x
Steps
1. Generate x with pdf g(x)
2. Generate y uniform on 0, cg(x)
3. If y lt f(x), then output x and
returnOtherwise, repeat from step 1? Continue
rejecting the random variates x and y until y gt
f(x)
Efficiency how closely c g(x) envelopes f(x)
Large area between c g(x) and f(x) ? Large
percentage of (x, y) generated in steps 1 and 2
are rejected
If generation of g(x) is complex, this method
may not be efficient

45
Example 28.2

Beta(2,4) density function
Bounded inside a rectangle of height 2.11?
Steps
Generate x uniform on 0, 1
Generate y uniform on 0, 2.11
If y lt 20 x(1-x)3, then output x and
returnOtherwise repeat from step 1

46
Composition

Can be used if CDF F(x) Weighted sum of n other
CDFs.
Here, , and
Fi's are distribution functions.
n CDFs are composed together to form the desired
CDFHence, the name of the technique.
The desired CDF is decomposed into several other
CDFs? Also called decomposition
Can also be used if the pdf f(x) is a weighted
sum of n other pdfs

Steps
Generate a random integer I such that
This can easily be done using the
inverse-transformation method.
Generate x with the ith pdf fi(x) and return.

48
Example 28.4

pdf
Composition of two exponential pdf's
Generate
If u1lt0.5, return otherwise return xa ln u2.
Inverse transformation better for Laplace

49
Convolution

Sum of n variables
Generate n random variate yi's and sum
For sums of two variables, pdf of x
convolution of pdfs of y1 and y2. Hence the name
Although no convolution in generation
If pdf or CDF Sum ? Composition
Variable x Sum ? Convolution

50
Convolution Examples

Erlang-k åi1k Exponentiali
Binomial(n, p) åi1n Bernoulli(p)? Generated n
U(0,1), return the number of RNs less than p
c2(n) åi1n N(0,1)2
G(a, b1)G(a,b2)G(a,b1b2)? Non-integer value
of b integer fraction
åi1n Any Normal ? å U(0,1) Normal
åi1m Geometric Pascal
åi12 Uniform Triangular

51
Characterization

Use special characteristics of distributions ?
characterization
Exponential inter-arrival times ? Poisson number
of arrivals? Continuously generate exponential
variates until their sum exceeds T and return the
number of variates generated as the Poisson
variate.
The ath smallest number in a sequence of ab1
U(0,1) uniform variates has a b(a, b)
distribution.
The ratio of two unit normal variates is a
Cauchy(0, 1) variate.
A chi-square variate with even degrees of freedom
c2(n) is the same as a gamma variate g(2,n/2).
If x1 and x2 are two gamma variates g(a,b) and
g(a,c), respectively, the ratio x1/(x1x2) is a
beta variate b(b,c).
If x is a unit normal variate, ems x is a
lognormal(m, s) variate.

52
Summary
Yes
Is CDF a sum of other CDFs?
Use composition
Is pdf a sum of other pdfs?
Yes
Use Composition
53
Summary (contd)
Is the variate a sum of other variates
Yes
Use convolution
Is the variate related to other variates?
Yes
Use characterization
Does a majorizing function exist?
Yes
Use rejection
No
Use empirical inversion
54
Homework 6