Title: Correlations Revisited
1Correlations Revisited
2Probability
- I think you're begging the question, said
Haydock, and I can see looming ahead one of those
terrible exercises in probability where six men
have white hats and six men have black hats and
you have to work it out by mathematics how likely
it is that the hats will get mixed up and in what
proportion. If you start thinking about things
like that, you would go round the bend. Let me
assure you of that! - Agatha ChristieThe Mirror Crack's
3- Misunderstanding of probability may be the
greatest of all impediments to scientific
literacy. - Stephen Jay Gould
4The Personal Probability Interpretation
Personal probability of an event the degree
to which a given individual believes the event
will happen. Sometimes subjective probability
used because the degree of belief may be
different for each individual.
- Restrictions on personal probabilities
- Must fall between 0 and 1 (or between 0 and
100). - Must be coherent.
5Probability Definitions and Relationships
Sample space All the possible outcomes that can
occur. Simple event one outcome in the sample
space a possible outcome of a random
circumstance. Event a collection of one or more
simple events in the sample space often written
as A, B, C, and so on.
6Assigning Probabilities
- A probability is a value between 0 and 1 and is
written either as a fraction or as a proportion. - A probability simply is a number between 0 and 1
that is assigned to a possible outcome of a
random circumstance. - For the complete set of distinct possible
outcomes of a random circumstance, the total of
the assigned probabilities must equal 1.
7Classical Approach
- A mathematical index of the relative frequency of
likelihood of the occurrence of a specific event. - Based on games of chance
- The specific conditions of the game are known.
8Determining the probability of an Outcome
(Classical)
A Simple LotteryChoose a three-digit number
between 000 and 999. Player wins if his or her
three-digit number is chosen. Suppose the 1000
possible 3-digit numbers (000, 001, 002, 999) are
equally likely.In long run, a player should win
about 1 out of 1000 times. Probability 0.001 of
winning.This does not mean a player will win
exactly once in every thousand plays.
9Example Probability of Simple Events
Random Circumstance A three-digit winning
lottery number is selected.Sample Space
000,001,002,003, . . . ,997,998,999. There
are 1000 simple events.Probabilities for Simple
Event Probability any specific three-digit
number is a winner is 1/1000. Assume all
three-digit numbers are equally likely.
Event A last digit is a 9 009,019, . . .
,999. Since one out of ten numbers in set, P(A)
1/10. Event B three digits are all the same
000, 111, 222, 333, 444, 555, 666, 777,
888, 999. Since event B contains 10 events,
P(B) 10/1000 1/100.
10Estimating Probabilities from Observed
Categorical Data - Empirical Approach
Assuming data are representative, the probability
of a particular outcome is estimated to be the
relative frequency (proportion) with which that
outcome was observed.
11Methods of sampling
- Simple random selection
- Every member of the population has an equal
chance of being selected. - Systematic
- Every Xth person.
- Stratified
- Random sampling by subgroup.
- Why?
12Determining the probability of an Outcome
Empirical Approach
Observe the Relative Frequency of random
circumstances
The Probability of Lost Luggage1 in 176
passengers on U.S. airline carriers will
temporarily lose their luggage.This number is
based on data collected over the long run. So the
probability that a randomly selected passenger on
a U.S. carrier will temporarily lose luggage is
1/176 or about 0.006.
13Proportions and Percentages as Probabilities
- The proportion of passengers who lose their
luggage is 1/176 or about 0.006 (6 out of 1000). - About 0.6 of passengers lose their luggage.
- The probability that a randomly selected
passenger will lose his/her luggage is about
0.006. - The probability that you will lose your luggage
is about 0.006.
Last statement is not exactly correct your
probability depends on other factors (how late
you arrive at the airport, etc.).
14Example Probability of Male versus Female Births
- Long-run relative frequency of males born in the
United States is about 0.512 (512 boys born per
1000 births)
Table provides results of simulation the
proportion is far from .512 over the first few
weeks but in the long run settles down around
.512.
15Nightlights and Myopia
Assuming these data are representative of a
larger population, what is the approximate
probability that someone from that population who
sleeps with a nightlight in early childhood will
develop some degree of myopia?
Note 72 7 79 of the 232 nightlight users
developed some degree of myopia. So we estimate
the probability to be 79/232 0.34.
16Complementary Events
One event is the complement of another event if
the two events do not contain any of the same
simple events and together they cover the entire
sample space. Notation AC represents the
complement of A.
Note P(A) P(AC) 1
ExampleA Simple Lottery (cont) A player
buying single ticket wins AC player does not
win P(A) 1/1000 so P(AC) 999/1000
17Mutually Exclusive Events
Two events are mutually exclusive if they do not
contain any of the same simple events (outcomes).
Example A Simple Lottery A all three digits
are the same. B the first and last digits are
different The events A and B are mutually
exclusive.
18Independent and Dependent Events
- Two events are independent of each other if
knowing that one will occur (or has occurred)
does not change the probability that the other
occurs. - Two events are dependent if knowing that one will
occur (or has occurred) changes the probability
that the other occurs.
19Example Independent Events
- Customers put business card in restaurant glass
bowl. - Drawing held once a week for free lunch.
- You and Vanessa put a card in two consecutive wks.
Event A You win in week 1. Event B Vanessa
wins in week 2
- Events A and B refer to to different random
circumstances and are independent.
20Example Dependent Events
Event A Alicia is selected to answer Question
1. Event B Alicia is selected to answer
Question 2.
Events A and B refer to different random
circumstances, but are A and B independent
events?
- P(A) 1/50.
- If event A occurs, her name is no longer in the
bag P(B) 0. - If event A does not occur, there are 49 names in
the bag (including Alicias name), so P(B)
1/49.
Knowing whether A occurred changes P(B). Thus,
the events A and B are not independent.
21Joint and Marginal Probabilities
- These probabilities refer to the proportion of an
event as a fraction of the total.
22Unions and intersections
- PAÈB ¹ PA PB because A and B do overlap.
- PAÈB PA PB - PAÇB.
- AÇB is the intersection of A and B it includes
everything that is in both A and B, and is
counted twice if we add PA and PB.
23(No Transcript)
24Conditional Probability
- Consider two events A and B.
- What is the probability of A, given the
information that B occurred? P(A B) ? - Example
- What is the probability that a women is married
given that she is 18 - 29 years old?
25Probability Problems
- P(Married 18-29) 7842/ 22,512
26 Conditional probability and independence
- If we know that one event has occurred it may
change our view of the probability of another
event. Let - A rain today, B rain tomorrow, C rain
in 90 days time - It is likely that knowledge that A has occurred
will change your view of the probability that B
will occur, but not of the probability that C
will occur. - We write P(BA) ¹ P(B), P(CA) P(C). P(BA)
denotes the conditional probability of B, given
A. - We say that A and C are independent, but A and B
are not. - Note that for independent events P(AÇC)
P(A)P(C).
27Conditional probability - tornado forecasting
- Consider the classic data set on the next Slide
consisting of forecasts and observations of
tornados (Finley, 1884). - Let
- F Tornado forecast
- T Tornado observed
- Use the frequencies in the table to estimate
probabilities its a large sample, so estimates
should not be too bad.
28Forecasts of tornados
29Conditional probability - tornado forecasting
- P(T) 51/2803 0.0182
- P(TÇF) 28/2803
- P(TF) 28/100 0.2800
- P(TFc) 23/2703 0.0085
- Knowledge of the forecast changes P(T). F and T
are not independent. - P(FT) 28/51 0.5490
- P(TF), P(FT) are often confused but are
different quantities, and can take very different
values.
30Continuous and discrete random variables
- A continuous random variable is one which can (in
theory) take any value in some range, for example
crop yield, maximum temperature. - A discrete variable has a countable set of
values. They may be - counts, such as numbers of accidents
- categories, such as much above average, above
average, near average, below average, much below
average - binary variables, such as dropout/no dropout
31Probability distributions
- If we measure a random variable many times, we
can build up a distribution of the values it can
take. - Imagine an underlying distribution of values
which we would get if it was possible to take
more and more measurements under the same
conditions. - This gives the probability distribution for the
variable.
32Continuous probability distributions
- Because continuous random variables can take all
values in a range, it is not possible to assign
probabilities to individual values. - Instead we have a continuous curve, called a
probability density function, which allows us to
calculate the probability a value within any
interval. - This probability is calculated as the area under
the curve between the values of interest. The
total area under the curve must equal 1.
33Normal (Gaussian) distributions
- Normal (also known as Gaussian) distributions are
by far the most commonly used family of
continuous distributions. - They are bell-shaped and are indexed by two
parameters - The mean m the distribution is symmetric about
this value - The standard deviation s this determines the
spread of the distribution. Roughly 2/3 of the
distribution lies within 1 standard deviation of
the mean, and 95 within 2 standard deviations.
34The probability of continuous variables
- IQ test
- Mean 100 and sd 15
- What is the probability of randomly selecting an
individual with a test score of 130 or greater? - P(X 95)?
- P(X 112)?
- P(X 95 or X 112)?
35The probability of continuous variables (cont.)
- What is the probability of randomly selecting
three people with a test score greater than 112? - Remember the multiplication rule for independent
events.