Title: 010.141 Engineering Mathematics II Lecture 3 Distributions
1010.141 Engineering Mathematics IILecture
3Distributions Random Variables
- Bob McKay
- School of Computer Science and Engineering
- College of Engineering
- Seoul National University
- Partly based on
- Sheldon Ross A First Course in Probability
2Outline
- Random Variables
- Cumulative Distribution Functions
- Discrete Distributions
- Bernoulli Distribution
- Binomial Distribution
- Poisson Distribution
- Other Discrete Distributions
- Geometric
- Negative Binomial
- Hypergeometric
- Zeta (Zipf)
3A Note
- The notion of random variables is a source of
major confusion for students - Mainly because, strictly, a random variable isnt
a variable - That is, the logical notion of a variable cant
be directly generalised to a random variable - In fact, a random variable is a function
- This leads many textbooks into extremely
confusing explanations - To try to avoid this, we will give more careful
definitions than in the textbook
4?-Algebra
- A ?-algebra over a set ? is a collection ? of
subsets of ? satisfying - If E is in ?, then so is ?\E
- The union of countably many sets in ? is also in
? - ? ? ?
- (which implies that ? ? ?)
- That is, ? is a non-empty subset of the power set
P(?), closed under complementation and countable
union, and containing ? - Note that P(?) is itself a ?-algebra, but it is
not the only one (or even, the most important one)
5Measure Space
- A measure space (?,?,?) is a ?-algebra ? over a
set ?, together with a measure ? ? ? 0, ?
satisfying - ?(?) 0
- ?(?E?E E) ?E?E ?(E) for any countable set E of
disjoint sets from ? - That is, a measure space is a sigma algebra with
a measure, which maps ? to zero, and is countably
additive - Note that P(?) itself may not be measurable
- In general, it wont be if ? is uncountable,
which is why we have to go to all this trouble.
6Probability Space
- A probability space (?, ?,P) is a measure space
with - P(?) 1
- Note for probabilities, we usually write P
rather than ? - When ? is a finite set, with ? being the power
set P(?), this gives us our original probability
axioms
7Random Variables
- Given a probability space (?, ?,P), a random
variable is a measurable function X ? ? S for
some set S - Usually, S is the real numbers
- We usually write
- X gt 0 for ? ? ? X(?) gt 0
- P(X gt 0) for P(X gt 0)
- P(X x) for P(X x)
8Random Variable Example (Ross)
- Three balls are to be randomly selected (without
replacement) from an urn containing 20 balls
numbered 1 to 20 - If we bet that at least one of the drawn balls
has a number at least 17, what is the probability
of winning?
- PX20 19C2 / 20C3 ? 0.15
- PX19 18C2 / 20C3 ? 0.134
- PX18 17C2 / 20C3 ? 0.119
- PX17 16C2 / 20C3 ? 0.105
9Cumulative Distribution Functions
- Given a real-valued random variable X over a
probability space (?, ?,P), the Cumulative
Distribution Function (cdf) - FX R ? R
- can be defined as
- FX(b) P X ? b
- We usually just write F instead of FX
10Properties of the cdf
- a lt b ? FX(a) ? FX(b)
- limb?? FX(b) 1
- limb?-? FX(b) 0
- FX is right-continuous
- If
- limm?? bm b
- bm1 ? bm for all m
- Then
- limm?? F(bm) F(b)
11Discrete Random Variables
- A discrete random variable is one which can take
at most countably many values - For a discrete random variable, we can define the
probability mass function - px(a) P X a
- The probability mass function px(a) can be
positive for only countably many values of a - Hence we can enumerate them x1, x2, ...
- We also have
- ?i1? px(xi) 1
- We usually just write p(a) rather than px(a)
12Bernoulli Random Variables
- A Bernoulli random variable is one which can take
only one of two values (say 0 or 1), so we have - p(1) P X 1 p
- p(0) P X 0 1 - p
13Binomial Random Variables
- Suppose we conduct n independent trials with a
Bernoulli random variable - The probability mass function of i successes is
then given by the (n, p) Binomial Distribution - p(i) nCi pi (1 - p)n-i
14Binomial Example (Ross)
- Suppose an airplane engine will fail, in flight,
with probability 1-p, independently between
engines. Suppose that the flight will crash only
if more than 50 of the engines fail. When should
you prefer 4 engines to 2? - For four engines
- p4(OK) 4C2p2(1 - p)2 4C3p3(1 - p)
4C4p4 6p2(1 - p)2 4p3(1 - p) p4 - For two engines
- p2(OK) 2C1p (1 - p) 2C2p2 2p(1 - p) p2
- p4(OK) ? p2(OK) then reduces to
- p ? 2/3
15Binomial Random Variable Properties
- If X is a (n, p) binomial random variable, then
as k ranges from 0 to n, p(k) first increases
monotonically, then decreases monotonically - The largest value is when k is the largest
integer less than or equal to p (n 1)
16Poisson Random Variables
- The Binomial Distribution isnt always easy to
work with - Either mathematically or practically
- We may know that a distribution is binomial, but
not know - or even care about - n - Fortunately, for large n, and for values of p
small enough that np is moderate, it may be
approximated - Set ? np
- Then p(i) ? e-??i / i!
- The distribution p(i) e-??i / i! is known as
the Poisson - It is a probability distribution, because
- ?i0? p(i) e-? ?i0? ?i / i! e-?e? 1
17Poisson Example (Ross)
- Suppose we are counting the number of ?-particles
given off per second from 1 gram of radioactive
material. - We know that, on average, there are 3.2
?-particles per second - What is the probability, in any given second,
that there are at most 2 ?-particles? - The gram of material contains a huge number of
particles, of the order of Avogradros number 6
1023 - The probabilities of disintegration of the
particles are independent - Hence the mass obeys a binomial distribution,
which may be accurately approximated by a Poisson
distribution with ? 3.2 - PX ? 2 e-3.2 3.2 e-3.2 (3.2)2/2 e-3.2
? 0.382
18Geometric Random Variables
- Suppose we perform trials until one success is
achieved - If we let X be the number of trials required, it
has the form - P X n (1 - p)n-1p
- This is known as the geometric distribution
19Negative Binomial Random Variables
- In the same way as we generalised the Bernoulli
distribution to the binomial, we can generalise
the geometric distribution by performing trials
until r successes are achieved - This is known as the negative binomial
distribution - It has the form
- P X n n-1Cr-1(1 - p)n-rpr-1
20Hypergeometric Random Variables
- We can generalise the geometric distribution in a
different way, by assuming that the trials are
not independent - Instead, suppose we have to choose a sample of
size n by random sampling (without replacement)
from an urn originally containing Np white balls
and N (1- p) black - If we let X be the total number of white balls
selected, this generates the hypergeometric
distribution - It has the form
- P X i NpCk N-NpCn-ik / NCn
21Hypergeometric Application
- One important application of the hypergeometric
distribution is in catch-recatch statistics - Suppose we want to estimate the number of fish of
a particular species living in a lake - We catch say r 50 fish, then tag and release
them - We assume fish dont learn
- ie all fish are equally likely to be caught again
- Now we catch another, say, n 40 fish
- Assuming there are N fish in the lake, the number
i which are tagged should follow the
hypergeometric distribution - But thats the wrong way round
- We know i (say 4), we want to know N
- From N, we can estimate Pi(N)
- We assume that the appropriate N is that which
maximises Pi(N) - In this case, we find N 500
22The Zipf Distribution
- For many problems, its reasonable to assume that
the distribution falls off exponentially in a
parameter k - That is, P X k C / k?1
- For the distribution to be a probability
distribution, the probabilities must sum to one - This implies that
- C 1 / ?k1? (1/k)?1
23Zipf Distribution Examples
- Examples of known occurrence of Zipf
distributions include - Popularity of websites
- Linkage of networks
- Wealth of individuals
- Popularity of names
- Frequency of words in documents
- Financial market volatility
- Phase transitions in physical systems
- Events in self-organised critical systems
24Summary
- ?-algebras and Measures
- Random Variables
- Discrete Distributions
- Bernoulli Distribution
- Binomial Distribution
- Poisson Distribution
- Other Discrete Distributions
- Geometric
- Negative Binomial
- Hypergeometric
- Zeta (Zipf)
25?????