Title: Entropy and programs using random numbers
1Entropy and programs using random numbers
- Introduction to entropy
- Entropy and data compression
- Predictability of random number generation
- Entropy and system security
- An easy but predictable method
- Shuffling an array
- Linux random bit generation devices
- A programmed set of randomness tests
- A program to generate random passwords
2Introduction to entropy
- Randomness is the property attributed to
behaviour, activity or a sequence of numbers
which lack any appearance of order. A system is
deterministic if subsequent behaviour can be
derived from a knowledge of its starting state,
and the physical laws and programmed rules
governing changes of state. -
- Programmed computers are inherently
deterministic, because a program is a sequence of
instructions with an intended (i.e.
prespecified) result from given input. Test
planning generally assumes a program which is
deterministic.
3Some quotations
- "God doesn't play dice with the universe"
- Albert Einstein.
- "Random numbers should not be generated with a
method chosen atrandom." - Donald Knuth.
- "The total entropy of any isolated thermodynamic
system tends to increase over time, approaching a
maximum value." - The second law of thermodynamics.
4Thermodynamic and Shannon entropy
- Thermodynamic entropy is a measure of the amount
of energy in a physical system that cannot be
used to do work. - Entropy in information systems is related to
entropy in thermodynamic systems, but these are
different concepts. The entropy rate of an
information source (Shannon or information
entropy) is the number of bits needed to encode a
character from this source. - If the information is very predictable, it is
also very compressible so fewer bits are needed.
One way to measure information entropy is to find
out how much it is possible to compress a
sequence of symbols into a smaller file.
5Entropy and data compression
- rich_at_saturn compress ls -l
- total 36
- -rwxr-xr-x 1 rich rich 9220 Apr 5 1717
avltree - -rw-r--r-- 1 rich rich 6905 Apr 5 1715
avltree.c - rich_at_saturn compress gzip
- rich_at_saturn compress ls -l
- total 16
- -rw-r--r-- 1 rich rich 2123 Apr 5 1715
avltree.c.gz - -rwxr-xr-x 1 rich rich 4001 Apr 5 1717
avltree.gz - In the above example
- 1. A machine code executable was reduced from
9220 to 4001 bytes, to 43.39 . - 2. A 'C' source file used to compile the
executable was reduced from 6905 to 2123 bytes,
to 30.74
6Entropy and system security
- One of the most interesting uses of entropy
within computing systems is for security
purposes, because it is important that passwords
and encryption keys should be unpredictable. - Some systems used to generate random numbers are
themselves - inherently deterministic. Before using a random
number generator to generate keys or passwords a
security evaluation should be carried out. - a. Can an attacker predict anything about future
states of the generator based upon knowledge of
previous system states ? - b. Does an attacker have any ability to influence
these states ? -
7Does true randomness exist ?
- We don't know. Some systems used to generate
random numbers are inherently deterministic,
though are clearly good enough for security
purposes because minor changes in the input which
can't be controlled beyond a known precision
result in big enough changes in the output.
Supposing enough were known about - The exact starting position and velocity of a six
sided die, - its aerodynamics and the resistance of the air
- the weight distrubution of the die and other
properties - inclination etc. of the surface on which the die
lands etc. - Then it would be theoretically possible to
compute which number would be on top when the die
comes to rest on the surface on which it lands.
8How much entropy is needed ?
- The minimum amount will depend upon the kind of
attacks on the system to be secured. Having much
more than what is needed can improve system
lifetime but in some cases reduces usability. - At one time IBM consultants recommended that
master system keys and passwords be generated by
the system manager using dice. The reason for
this is that using a simple system meant that the
manager could be as certain as possible about the
means by which these keys were created and the
conditions surrounding this event. - However, the increased performance of computers
now requires more entropy within cryptographic
keys than can easily be generated using dice.
Systems that lock out attackers after a few wrong
tries or which use multiple security factors are
thought to need less entropy. -
9An easy but predictable method 1
- Using the Posix ANSI 'C' library ltstdlib.hgt 2
functions and a constant are defined. - Function void srand(unsigned int seed) is used
to seed the pseudorandom number generator. - Function int rand(void) generates a
pseudorandom sequence of numbers in the range - 0 RAND_MAX . RAND_MAX is a system constant,
typically 232 - 1.
10An easy but predictable method 2
11An easy but predictable method 3
- The POSIX 1003.12003 standard gives these
example implementations of rand() and srand().
From this simple implementation it is clear that
the sequence generated will repeat whenever a
value for next is repeated. As a 32 bit integer
is used for next, the maximum possible sequence
length will be 232.
12Shuffling an array 1
- Binary tree and quick sort algorithms are known
to degrade to O(N2) operations if the input is
sorted. But data can be shuffled with O(N)
operations. So the risk of a severely reduced
performance can be traded for a small performance
loss by shuffling the data before sorting it. The
following program uses rand() and srand() to swap
each element with a pseudo-randomly selected
element. The time in seconds since the epoch
(1/1/1970 on Unix or 1/1/1980 on Windows) is used
to seed the pseudo-random generator.
13Shuffling an array 2
14Shuffling an array 3
15Linux random devices 1
- To improve upon the obvious limitations of
predictable pseudorandom number generators, the
Linux operating system kernel provides 2 device
files for the purpose of generating entropy. One
of these, /dev/urandom, is fast and less
cautious, the other, /dev/random is slow and
cautious. - The faster device uses the slower device to
reseed a pseudorandom sequence. The following
description is from Linux documentation.
16Linux random devices 2
- RANDOM(4) Linux Programmer's
Manual - NAME
- random, urandom - kernel random number source
devices - DESCRIPTION
- The character special files /dev/random and
/dev/urandom (present since - Linux 1.3.30) provide an interface to the
kernel's random number gener- - ator. File /dev/random has major device
number 1 and minor device number 8. File
/dev/urandom has major device number 1 and minor
device number 9. - The random number generator gathers
environmental noise from device - drivers and other sources into an entropy
pool. The generator also - keeps an estimate of the number of bits of
noise in the entropy pool. - From this entropy pool random numbers are
created. -
17Linux random devices 3
- When read, the /dev/random device will only
return random bytes within - the estimated number of bits of noise in the
entropy pool. /dev/random - should be suitable for uses that need very
high quality randomness such - as one-time pad or key generation. When the
entropy pool is empty, - reads from /dev/random will block until
additional environmental noise - is gathered.
- When read, /dev/urandom device will return as
many bytes as are - requested. As a result, if there is not
sufficient entropy in the - entropy pool, the returned values are
theoretically vulnerable to a - cryptographic attack on the algorithms used
by the driver. Knowledge - of how to do this is not available in the
current non-classified liter- - ature, but it is theoretically possible that
such an attack may exist. - If this is a concern in your application, use
/dev/random instead.
18Linux random devices 4
- rich_at_saturn random cat /dev/random gt random
- (CTRL-C was pressed after counting to 30
seconds)? - rich_at_saturn random cat /dev/urandom gt urandom
- (CTRL-C was pressed after counting to 30
seconds)? - rich_at_saturn pwgen ls -l
- total 593736
- -rw-r--r-- 1 rich rich 520 Apr 5 1819
random - -rw-r--r-- 1 rich rich 94748672 Apr 5 1819
urandom - 520 bytes were generated by the cautious entropy
device in about the same time 94 MBytes were
generated by the fast device.
19A randomness test program 1
- Programs can be downloaded from the Internet
which will test a source of random numbers using
various methods to determine their statistical
properties. The following site provides
information about some of these methods and a
program called ent, which provides a set of these
tests - http//www.fourmilab.ch/random/
20A randomness test program 2
- This ent program is also available as an Unbuntu
package, so it was installed using the command - sudo aptitude install ent
- 3.7Mb of random data was then generated using
- cat /dev/urandom gt randata
- and pressing ltctrlgt and ltcgt after about 10
seconds.
21A randomness test program 3
- rich_at_saturn/devel/ent ent randata
- Entropy 7.999995 bits per byte.
- Optimum compression would reduce the size
- of this 37867520 byte file by 0 percent.
- Chi square distribution for 37867520 samples is
259.05, and - randomly would exceed this value 50.00 percent of
the times. - Arithmetic mean value of data bytes is 127.5217
(127.5 random). - Monte Carlo value for Pi is 3.141374304 (error
0.01 percent). - Serial correlation coefficient is -0.000110
(totally uncorrelated 0.0).
22A randomness test program 4
- rich_at_saturn/devel/ent ent /usr/share/dict/words
- Entropy 4.422962 bits per byte.
- Optimum compression would reduce the size
- of this 931467 byte file by 44 percent.
- Chi square distribution for 931467 samples is
13250024.68, and randomly would exceed this value
0.01 percent of the times. - Arithmetic mean value of data bytes is 95.1313
(127.5 random). - Monte Carlo value for Pi is 3.999098194 (error
27.30 percent). - Serial correlation coefficient is -0.136842
(totally uncorrelated 0.0). - Clearly, the spelling dictionary wasn't as random
as the output of /dev/urandom. Source code for a
program which computes the monte carlo value for
PI is available in the older HTML notes.
23Program generated passwords 1
- Passwords chosen by humans often have too little
entropy. When an individual is required to choose
a password, one is often selected which an
attacker would find very easy to guess. The
advantage of getting the user to choose a
password is that there is a better chance that
the individual won't have to write it down. If
the risk mitigated by use of a password is from a
remote system attacker rather than a local one,
it is better for a strong randomly-generated
password to be written down than for a weak
password to be used.
24Program generated passwords 2
25Program generated passwords 3
26Program generated passwords 4
27Program generated passwords 5
28Program output
- How many passwords ?
- 8
- Gdh6sWH3
- cpUXETpc
- zVpj6an8
- fjQhfM9S
- VagGz3rF
- tJhAgJGm
- 6XVqReQ2
- WxQA5mxx