Random Numbers in HEP - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Random Numbers in HEP

Description:

Tests of Random Engines in CLHEP. Heinrich's note about RandEngine. Essential ... All engines other than RandEngine and DRand48 pass the entire DIEHARD Suite. ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 26
Provided by: markfi2
Category:
Tags: hep | engines | numbers | random

less

Transcript and Presenter's Notes

Title: Random Numbers in HEP


1
Random Numbers in HEP
  • Tests of Random Engines in CLHEP
  • Heinrichs note about RandEngine

2
Essential Concepts
  • Instances
  • Use of a random in one module need not affect the
    sequence of randoms seen by another
  • Engines vs Distributions
  • An Engine is a source of uniform (pseudo-)
    randomness
  • A Distribution uses an engine to fire off
    variates
  • Flat()
  • Gaussian()
  • Validity of Randomness
  • How unbiased, ergodic, unpredictable is the
    engine?

3
How Engines Can be Bad
  • Wrong single-number statistics
  • Wrong mean (c.f. the buggy Linux RandEngine)
  • Wrong sigma (or higher moment)
  • Very easy to detect such a major deficiency
  • Non-uniform distribution with right moments
  • Possible! (but mathematically pathological)
  • And certainly possible to an excellent
    approximation
  • Sequence-order dependent flaws
  • Can be harder to detect
  • Can still lead to incorrect physics!

4
An engine with good moments but flawed
distribution
  • Every engine on a real computer!
  • (But if the granularity is small enough this
    flaw becomes moot)

5
Typical Sequence Flaws
  • Tom Nash, doing a thesis in the 70s, found that
    pairs of numbers from the random engine in use
    lied in stripes.
  • He was far from the first to notice this!
  • But it was affecting the physics

6
Testing an Engine
  • The general test
  • Start with a mathematical observation If Sn is
    an ergodic sequence, uniformly distributed on
    (0,1), then for all subsequences T, some property
    P should hold.
  • Property P should hold, to accuracy e(n) and with
    probability p(n), where e(n)?0 and p(n)? 1 for
    large n.
  • Apply the Does P hold test to a bunch of
    sequences generated by the engine.
  • An Engine that passes every possible test based
    on every possible property to perfect precision
    would be a perfect random engine
  • But mere mortals have to pick some set of tests
  • And have to settle for some finite precision on
    each test

7
The DIEHARD Suite
  • Invented (more like collected) by Marsaglia.
  • 13 tests (for 13 properties of the sequence).
  • Each test uses the sequence to generate a group
    of numbers, with mathematically known
    distribution for random sequences.
  • The distribution obtained is compared via a
    Kolmogorov-Smyrnoff test with the proper
    distribution.
  • If any P-value is extremely close to 1, this is
    proof of non-randomness of the sequence.

8
Interpreting the Results of a test suite
  • The longer the sequences used, the more
    discrimination power.
  • If no test fails, that is not proof that the
    sequence is random
  • It is proof that either the tester was not clever
    enough to find and expose non-random properties,
    or that the sequence used was not long enough.
  • But for practical uses, some suite of tests at
    some level of N is good enough.

9
The DIEHARD Tests
  • Birthday Spacings
  • The likelihood of j birthdays being shared among
    m people in an n-day year is Poisson distributed
    with mean m3/(4n).
  • Not order-sensitive.
  • Overlapping 5-permutation Test
  • Each 5-number subsequence forms one of 120
    possible ordering numbers. The count for each
    should be equal and uncorrelated.
  • Binary Rank Test (32x32 and 6x8)
  • The distribution of ranks of 32x32 bit matrices
    is known theoretically. This detects many linear
    realations involving up to 1000 bits of the
    sequence.
  • Bitstream Test
  • For a random stream of 221 bits, the number of
    missing 20-bit words should be normally
    distributed with mean 142,000 and sigma 428.

10
and
  • Overlapping Pairs and Quadruples, Sparse
    Occupancy
  • Missing 2- and 4-letter words in a long
    sequence of letters from a 1024-letter alphabet
  • Not very order sensitive past N20 or 40
  • DNA Test
  • Missing 10-genome sequences in the 4-letter
    genome alphabet
  • Count-the-1-s test on a stream of or specific
    bytes
  • Uses the number of 1s in 8-bit sequences, to
    form pseudo-letters, which are then grouped into
    5-letter words.
  • Runs Test
  • The theoretical distributions and covariances fro
    up and down runs of N numbers are known.

11
and
  • Overlapping Pairs and Quadruples, Sparse
    Occupancy
  • Missing 2- and 4-letter words in a long
    sequence of letters from a 1024-letter alphabet
  • Not very order sensitive past N20 or 40
  • DNA Test
  • Missing 10-genome sequences in the 4-letter
    genome alphabet
  • Count-the-1-s test on a stream of or specific
    bytes
  • Uses the number of 1s in 8-bit sequences, to
    form pseudo-letters, which are then grouped into
    5-letter words.
  • Runs Test
  • The theoretical distributions and covariances fro
    up and down runs of N numbers are known.

12
and
  • Craps test
  • A long sequence of games of craps is simulated.
    The distribution of wins and losses, and of the
    number of dice throws for each game, is known.
  • Parking Lot Test
  • Make a number of attempts to park a circle in a
    random lot of a square of side 100. Re-try if
    there are collisions. After 120,000 tries, the
    number of successes shuld match some
    (experimentally determined) distribution.
  • Overlapping Sums Test
  • The sums of 100 variables should be virtually
    normally distributed.
  • Sensitive to medium-scale imperfections.
  • Squeeze Test
  • Number of iterations needed to reduce 231 down to
    1, by doing k ceiling(krandom()).

13
The Minimum Distance Test
  • The last two tests are the MD test in 2 and 3
    dimensions
  • Use N pairs (triplets) of randoms as coordinates
    of N points in the unit volume.
  • The closest pair of points has some distance d.
  • d has a known theoretical distribution
  • As applied by Marsaglia, most generators
    semi-fail the 2-D Minimum distance test!

14
The Minimum Distance Test
  • The last two tests are the MD test in 2 and 3
    dimensions
  • Use N pairs (triplets) of randoms as coordinates
    of N points in the unit volume.
  • The closest pair of points has some distance d.
  • d has a known theoretical distribution
  • As applied by Marsaglia, most generators
    semi-fail the 2-D Minimum distance test!
  • P-values of .99 are typical.
  • They did not semi-fail when the FORTRAN versions
    of the tests were applied.
  • The f2c version showed this behavior.
  • ?

15
The Minimum Distance Test
  • Marsaglia speculated this was due to some funny
    business (error) with converting array code using
    f2c.
  • Turns out it was a function of being able to do
    more iterations on later processors.
  • The theoretical distribution had been calculated
    only to lowest order. By the time ZOOM did these
    tests, we were exposing the fact that the actual
    distribution differed.
  • See my Technical memo TM2170, where I do the next
    few orders. This resolves the discrepancy.

16
The CLHEP Engines
  • JamesRandom
  • Fred James adaptation of Marsaglias
    add-with-carry generator. Large state.
  • Ranecu
  • Multiplicative Congruential generator using
    formula constants of L'Ecuyer. Large state.
  • Ranecu
  • Multiplicative Congruential generator using
    formula constants of L'Ecuyer. Loarge state.
  • Ranlux and Ranlux64
  • Lueschers folding generator. For luxury
    levels of 2 it is excellent for luxury 4 is
    provably ergodic. Moderate state.
  • TripleRand
  • Used in the Canopy Lattice gauge computations,
    this combines a Hurd primitive polynomial
    generator with two other unrelated generators to
    form a family of trustworthy yet distinct
    sequences. Moderate state.

17
The CLHEP Engines
  • JamesRandom
  • Fred James adaptation of Marsaglias
    add-with-carry generator. Large state.
  • Ranecu
  • Multiplicative Congruential generator using
    formula constants of L'Ecuyer. Large state.
  • Ranecu
  • Multiplicative Congruential generator using
    formula constants of L'Ecuyer. Loarge state.
  • Ranlux and Ranlux64
  • Lueschers folding generator. For luxury
    levels of 2 it is excellent for luxury 4 is
    provably ergodic. Moderate state.
  • TripleRand
  • Used in the Canopy Lattice gauge computations,
    this combines a Hurd primitive polynomial
    generator with two other unrelated generators to
    form a family of trustworthy yet distinct
    sequences. Moderate state.

18
More CLHEP Engines
  • Hurd288 and Hurd 160
  • Based on primitive polynomials over Z(2).
    Provably ergodic (but I dont know how well to
    trust the proof). Used as basis for TripleRand
    and DualRand. Small state.
  • RanshiEngine
  • Modeled on spinning balls. Very large state, but
    fast and excellent.
  • MTwistEngine
  • Based on number-theory considerations involving
    large Mersenne primes. Very Large state. Many
    randomness properties proven, though the math is
    difficult. Fast and excellent (can be made even
    faster)
  • All the above engines have excellent randomness
    properties.

19
More CLHEP Engines
  • Hurd288 and Hurd 160
  • Based on primitive polynomials over Z(2).
    Provably ergodic (but I dont know how well to
    trust the proof). Used as basis for TripleRand
    and DualRand. Small state.
  • RanshiEngine
  • Modeled on spinning balls. Very large state, but
    fast and excellent.
  • MTwistEngine
  • Based on number-theory considerations involving
    large Mersenne primes. Very Large state. Many
    randomness properties proven, though the math is
    difficult. Fast and excellent (can be made even
    faster)
  • All the above engines have excellent randomness
    properties.

20
And some Low-quality CLHEP Engines
  • DRand48Engine
  • Based on a common Unix system random number
    generator. Very small state and fast, but fails
    several randomness tests.
  • RandEngine
  • Directly calls the system function rand(). Not
    portable state not savable. Its claim to fame
    is that the usual implementation of rand() is
    dissected in Knuth and shown to be a terrible
    generator!!
  • RandEngine, when coded properly, fails half of
    the DIEHARD tests
  • NonRandomEngine
  • Lets people test program behavior on a set forced
    sequence of randoms. Requires user to specify a
    sequence. Useful for testing. If you use this to
    try to generate true random numbers, you get what
    you deserve.
  • Why was RandEngine left in CLHEP?
  • According to a web page circa 1997, because
    Geant4 insisted on it.
  • Fortunately, they then proceeded not to use it.

21
How the CLHEP Engines do on DIEHARD
  • DRand48Engine failed the three alphabet
    sequences tests
  • indicating bad correllations at the range of
    5-number subsequences
  • Ranlux at luxury level 0 (which was never
    recommended) failed the Birthday and Craps tests.
  • At luxury level above 1 it is fine

22
How the CLHEP Engines do on DIEHARD
  • Several engines semi-failed the Minimum Distance
    test
  • The more numbers we could afford to generate, the
    more likely the failure
  • We now understand the flaw
  • A summer student in 1998 coded the test in C
    from scratch, with my correction terms in the
    distribution, and all the good engines passed.
  • All engines other than RandEngine and DRand48
    pass the entire DIEHARD Suite.

23
RandEngine
  • On systems until Linux, rand() used a
    standardized 15-bit linear congruence
  • Lame, see Knuth
  • But kind of OK.
  • At some point, an attempt was made in CLHEP to
    combine 2 calls to rand() by a shift and OR to
    get 3 bits of randomness.
  • Still just as lame

24
RandEngine
  • This coding was blown, in that when Linux came
    along with a 32-bit result for rand(), the OR
    caused the average value returned to be 5/8!!!!
  • Results from programs run on Linux which used
    RandEngine must be discarded.
  • Fortunately, it appears nobody trusted RandEngine
    anyway
  • CDF uses Ranecu
  • Geant4 does not have RandENgine anywhere in there
    source code.

25
Recommendations
  • Ranecu is perfectly fine, no need to change.
  • Ranlux on level 2 is about as fast but provably
    reliable.
  • MTwist (if you have only a few instances so can
    afford a very large state), TripleRand, and
    JamesEngine are about twice as fast and very
    probably just as good or better.
  • Ranshi is faster still but has a huge state, and
    Im not 100 convinced it cant have flaws.
Write a Comment
User Comments (0)
About PowerShow.com