Title: Probability: Studying Randomness
1Chapter 4
- Probability Studying Randomness
2Randomness and Probability
- Random Process where the outcome in a particular
trial is not known in advance, although a
distribution of outcomes may be known for a long
series of repetitions - Probability The proportion of time a particular
outcome will occur in a long series of
repetitions of a random process - Independence When the outcome of one trial does
not effect probailities of outcomes of subsequent
trials
3Probability Models
- Probability Model
- Listing of possible outcomes
- Probability corresponding to each outcome
- Sample Space (S) Set of all possible outcomes of
a random process - Event Outcome or set of outcomes of a random
process (subset of S) - Venn Diagram Graphic description of a sample
space and events
4Rules of Probability
- The probability of an event A, denoted P(A) must
lie between 0 and 1 (0 ? P(A) ? 1) - For the sample space S, P(S)1
- Disjoint events have no common outcomes. For 2
disjoint events A and B, P(A or B) P(A) P(B) - The complement of an event A is the event that A
does not occur, denoted Ac. P(A)P(Ac) 1 - The probability of any event A is the sum of the
probabilities of the individual outcomes that
make up the event when the sample space is finite
5Assigning Probabilities to Events
- Assign probabilities to each individual outcome
and add up probabilities of all outcomes
comprising the event - When each outcome is equally likely, count the
number of outcomes corresponding to the event and
divide by the total number of outcomes - Multiplication Rule A and B are independent
events if knowledge that one occurred does not
effect the probability the other has occurred. If
A and B are independent, then P(A and B)
P(A)P(B) - Multiplication rule extends to any finite number
of events
6Example - Casualties at Gettysburg
- Results from Battle of Gettysburg
Counts
Proportions
Killed, Wounded, Captured/Missing are considered
casualties, what is the probability a randomly
selected Northern soldier was a casualty? A
Southern soldier? Obtain the distribution across
armies
7Random Variables
- Random Variable (RV) Variable that takes on the
value of a numeric outcome of a random process - Discrete RV Can take on a finite (or countably
infinite) set of possible outcomes - Probability Distribution List of values a random
variable can take on and their corresponding
probabilities - Individual probabilities must lie between 0 and 1
- Probabilities sum to 1
- Notation
- Random variable X
- Values X can take on x1, x2, , xk
- Probabilities P(Xx1) p1 P(Xxk) pk
8Example Wars Begun by Year (1482-1939)
- Distribution of Numbers of wars started by year
- X of wars stared in randomly selected year
- Levels x10, x21, x32, x43, x54
- Probability Distribution
9Masters Golf Tournament 1st Round Scores
10Continuous Random Variables
- Variable can take on any value along a continuous
range of numbers (interval) - Probability distribution is described by a smooth
density curve - Probabilities of ranges of values for X
correspond to areas under the density curve - Curve must lie on or above the horizontal axis
- Total area under the curve is 1
- Special case Normal distributions
11Means and Variances of Random Variables
- Mean Long-run average a random variable will
take on (also the balance point of the
probability distribution) - Expected Value is another term, however we really
do not expect that a realization of X will
necessarily be close to its mean. Notation E(X) - Mean of a discrete random variable
12Examples - Wars Masters Golf
m0.67
m73.54
13Statistical Estimation/Law of Large Numbers
- In practice we wont know m but will want to
estimate it - We can select a sample of individuals and observe
the sample mean - By selecting a large enough sample size we can be
very confident that our sample mean will be
arbitrarily close to the true parameter value - Margin of error measures the upper bound (with a
high level of confidence) in our sampling error.
It decreases as the sample size increases
14Rules for Means
- Linear Transformations a bX (where a and b
are constants) E(abX) mabX a bmX - Sums of random variables X Y (where X and Y
are random variables) E(XY) mXY mX mY - Linear Functions of Random Variables
- E(a1X1?anXn) a1m1anmn where E(Xi)mi
15Example Masters Golf Tournament
- Mean by Round (Note ordering)
- m173.54 m273.07 m373.76 m473.91
- Mean Score per hole (18) for round 1
- E((1/18)X1) (1/18)m1 (1/18)73.54 4.09
- Mean Score versus par (72) for round 1
- E(X1-72) mX1-72 73.54-72 1.54 (1.54
over par) - Mean Difference (Round 1 - Round 4)
- E(X1-X4) m1 - m4 73.54 - 73.91 -0.37
- Mean Total Score
- E(X1X2X3X4) m1 m2 m3 m4
- 73.5473.0773.7673.91 294.28 (6.28 over
par)
16Variance of a Random Variable
- Variance Measure of the spread of the
probability distribution. Average squared
deviation from the mean - Standard Deviation (Positive) Square Root of
Variance
Rules for Variances (X, Y RVs a, b constants)
17Variance of a Random Variable
- Special Cases
- X and Y are independent (outcome of one does not
alter the distribution of the other) r 0, last
term drops out - ab1 and r 0 V(XY) sX2 sY2
- a1 b -1 and r 0 V(X-Y) sX2 sY2
- ab1 and r ?0 V(XY) sX2 sY2 2rsXsY
- a1 b -1 and r ?0 V(X-Y) sX2 sY2
-2rsXsY
18Wars Masters (Round 1) Golf Scores
s2.7362 s .8580
s2 9.47 s 3.08
19Masters Scores (Rounds 1 4)
- m1 73.54 m4 73.91 s129.48 s4211.95
r0.24 - Variance of Round 1 scores vs Par
V(X1-72)s129.48 - Variance of Sum and Difference of Round 1 and
Round 4 Scores
20General Rules of Probability
- Union of set of events Event that any (at least
one) of the events occur - Disjoint events Events that share no common
sample points. If A, B, and C are pairwise
disjoint, the probability of their union is
P(A)P(B)P(C) - Intersection of two (or more) events The event
that both (all) events occur. - Addition Rule P(A or B) P(A)P(B)-P(A and B)
- Conditional Probability The probability B occurs
given A has occurred P(BA) - Multiplication Rule (generalized to conditional
prob) - P(A and B)P(A)P(BA)P(B)P(AB)
21Conditional Probability
- Generally interested in case that one event
precedes another temporally (but not necessary) - When P(A) gt 0 (otherwise is trivial)
- Contingency Table Table that cross-classifies
individuals or probabilities across 2 or more
event classifications - Tree Diagram Graphical description of
cross-classification of 2 or more events
22John Snow London Cholera Death Study
- 2 Water Companies (Let D be the event of death)
- SouthwarkVauxhall (S) 264913 customers, 3702
deaths - Lambeth (L) 171363 customers, 407 deaths
- Overall 436276 customers, 4109 deaths
Note that probability of death is almost 6 times
higher for SV customers than Lambeth customers
(was important in showing how cholera spread)
23John Snow London Cholera Death Study
Contingency Table with joint probabilities (in
body of table) and marginal probabilities (on
edge of table)
24John Snow London Cholera Death Study
Death
Company
.0140
D (.0085)
SV
.6072
DC (.5987)
.9860
WaterUser
.0024
D (.0009)
.3928
L
DC (.3919)
.9976
Tree Diagram obtaining joint probabilities by
multiplication rule
25Example Florida lotto
- You select 6 distinct digits from 1 to 53 (no
replacement) - State randomly draws 6 digits from 1 to 53
- Probability you match all 6 digits
- First state draw P(match 1st) 6/53
- Given you match 1st, you have 5 left and state
has 52 left P(match 2nd given matched 1st)
5/52 - Process continues P(match 3rd given 12) 4/51
- P(match 4th given 123) 3/50
- P(match 5th given 1234) 2/49
- P(match 6th given 1234) 1/48
26Bayess Rule - Updating Probabilities
- Let A1,,Ak be a set of events that partition a
sample space such that (mutually exclusive and
exhaustive) - each set has known P(Ai) gt 0 (each event can
occur) - for any 2 sets Ai and Aj, P(Ai and Aj) 0
(events are disjoint) - P(A1) P(Ak) 1 (each outcome belongs to
one of events) - If C is an event such that
- 0 lt P(C) lt 1 (C can occur, but will not
necessarily occur) - We know the probability will occur given each
event Ai P(CAi) - Then we can compute probability of Ai given C
occurred
27Northern Army at Gettysburg
- Regiments partition of soldiers (A1,,A9).
Casualty event C - P(Ai) (size of regiment) / (total soldiers)
(Column 3)/95369 - P(CAi) ( casualties) / (regiment size)
(Col 4)/(Col 3) - P(CAi) P(Ai) P(Ai and C) (Col 5)(Col 6)
- P(C)sum(Col 7)
- P(AiC) P(Ai and C) / P(C) (Col 7)/.2416
28Independent Events
- Two events A and B are independent if
P(BA)P(B) and P(AB)P(A) , otherwise
they are dependent or not independent. - Cholera Example
- P(D) .0094 P(DS) .0140 P(DL) .0024
- Not independent (which firm would you prefer)?
- Union Army Example
- P(C) .2416 P(CA1).6046 P(CA5).0156
- Not independent Almost 40 times higher risk for
A1