Nuts and Bolts A Review of Probability Theory - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Nuts and Bolts A Review of Probability Theory

Description:

Pr( i=1 to Ai) = i=1 to Pr (Ai) ... Pr(Ai) = Prior distribution for the Ai. It summarizes your beliefs about the probability of event Ai before Ai or B are observed. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 28
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Nuts and Bolts A Review of Probability Theory


1
Nuts and BoltsA Review of Probability Theory
  • Review classical, frequentist, and subjective
    interpretations of probability
  • Probability Axioms and Definitions
  • Conditional Probability
  • Bayes Theorem

2
The frequency interpretation of probability
  • The frequency interpretation The probability
    that some specific outcome of a process will be
    obtained can be interpreted as the relative
    frequency with which that outcome would be
    obtained if the process were repeated a large
    number of times under similar conditions.
  • e.g. the probability of obtaining a head in a
    fair coin toss is ½ because the relative
    frequency of heads should be ½ if I were to flip
    a coin many times.
  • How do p-values relate to the frequency
    interpretation of probability?

3
The frequency interpretation and the sampling
distribution
  • When we make statistical inferences from the
    frequentist perspective, we assume that our data
    is a sample from an entire population.
  • - The population is described by the population
    mean and the population variance that are
    unknown.
  • - The sample is described by the sample mean and
    the sample variance.
  • The sample mean and variance provide estimates
    about the mean and variance of the entire
    population.
  • Importantly, these estimates are known only with
    some uncertainty.
  • Our uncertainty about a statistic like the mean
    is summarized by its sampling distribution.

4
The sampling distribution
  • The sampling distribution is a hypothetical
    distribution of all possible values of a
    statistic of interest for samples of size N that
    could be formed for a given population.
  • The observed sample mean is just one realization
    of this population.
  • Needless to say, this is a theoretical construct
    since, with a large population, there will be
    billions or even trillions of unique samples and
    it would be superior to simply sample the entire
    population.
  • P-values refer to the proportion of hypothetical
    draws from the sampling distribution that are
    consistent with the null hypothesis.
  • ? if p-values are based on the concept of a
    sampling distribution, do they make sense if your
    data contains the entire population.

5
The Classical Interpretation of Probability
  • The classical interpretation is based on the
    concept of equally likely outcomes.
  • If the outcome of some process must be one of n
    different outcomes, and if these outcomes are
    equally likely to occur, then the probability of
    each outcome is 1/n.
  • e.g. If I were to flip a fair coin, the
    probability of a heads would be ½ because heads
    and tails are equally likely outcomes.

6
The appeal of the classical approach
  • The classical approach offers an appealing
    summary of uncertainty in a one-shot situation.
  • The Figure below defines the sample space for a
    hypothetical experiment where outcomes are either
    a success or a failure. The probability of a
    success is the size of the accept region over the
    size of the entire sample space.
  • (Note the success and failure regions are each
    defined by a series of equally likely outcomes
    such as success 1 or 2 outcome on a 6 sided
    die.)

A Hypothetical Sample Space
Note that this sample space must be divided into
a series of equal-sized regions
Success
Failure
7
Problems with the Classical Interpretation
  • The drawback of the classical interpretation is
    that the concept of equally likely outcomes is
    itself probabilistic.
  • ? In a sense, this makes the classical
    definition of probability circular.
  • ? Furthermore, the concept begins to break down
    in contexts other than gambling when events are
    not equally likely.
  • The classical response is
  • Laplaces Rule of Insufficient Reason in the
    absence of compelling evidence to the contrary,
    we should assume that events are equally likely.
  • This concept response is actually more useful to
    Bayesians when defending their priors than for
    classicists.

8
The Subjective Interpretation of Probability
  • The probability that a person assigns to a
    possible outcome of some process represents his
    or her own judgment of the likelihood that the
    outcome will be obtained.
  • In contrast to the classical and frequentist
    interpretations of probability, this means that
    different individuals could have different
    probability judgments.
  • e.g. If I were to flip a fair coin, the
    probability of a head could be 3/4 because, for
    some reason, I think that God wants it to be a
    head.

9
Is subjective probability theory really that ad
hoc?
  • Not NecessarilyGood-Bayesians elicit priors in
    a manner than ensures coherence.
  • Consider the probability that an individual
    attributes to an event E is defined as the number
    p such that, for an arbitrary positive or
    negative stake S, the individual would be willing
    to exchange the certain quantity of money pS for
    a lottery in which she receives S if E occurs and
    zero otherwise.
  • This being granted, once an individual has
    evaluated the probabilities of certain events,
    two cases present themselves either it is
    possible to bet with him in such a way as to be
    assured of winning, or else this probability does
    not exist. In the first case, one should say that
    the evaluation of probabilities given by this
    individual contains an incoherence, an intrinsic
    contradiction in the other we say the individual
    is coherent. It is precisely this condition of
    coherence which constitutes the sole principle
    from which one can deduce the whole calculus of
    probability (de Finetti Chapter 1).
  • Extensions of de Finettis axioms form the basis
    of subjective expected utility theory. Later
    chapters of the book introduce the concept of
    exchangeability which we wont talk about today,
    but which is rather important to probability
    theory.

10
The Axiomatic Definition of Probability
  • Suppose that for experimental model M, the sample
    space S of possible outcomes is defined as A1,
    ,An ?? S.
  • Let Pr(Ai ) the probability of an event Ai in
    the sample space S.
  • A probability distribution on a sample space S is
    a specification of numbers Pr(Ai) which satisfy
    A1, A2, A3.
  • A1. For any outcome Ai, Pr(Ai) ? 0.
  • A2. Pr(S) 1.
  • A3. For any infinite sequence of disjoint events
    A1, , An
  • Pr(??i1 to ? Ai) ??i1 to ? Pr (Ai)
  • Note it turns out that each of these three
    axioms can be justified using the coherence
    criterion.

11
Some Theorems Based on the Definition of
Probability and a Few Proofs
  • Theorem 1. Pr(?) 0
  • Proof
  • By definition, Aj and Ak are disjoint if Aj ? Ak
    ?.
  • Further, it is obvious that ? ? ? ?.
  • Thus, if Aj ? and Ak ?, then Aj and Ak are
    disjoint.
  • Let A1 An define the set of events such that Aj
    ?.
  • By the above definitions, it follows that the
    events Aj are disjoint.
  • Since the Aj are disjoint, we can exploit A3 such
    that
  • Pr(?) Pr(??i Ai ) ??i Pr (Ai) ?i Pr(?)
    n Pr(?)
  • In order that Pr(?) n Pr(?), Pr(?) must equal 0.

12
Some Theorems cont.
  • Theorem 2. For any sequence of n disjoint events
    A1,,An,
  • Pr(??i to n Ai ) ??i to n Pr (Ai)
  • Proof
  • Let A1,,An define the n disjoint events and let
    Ak ? for events k ? n1,, ?.
  • By the definition of disjoint events, we have an
    infinite series of disjoint events.
  • By A3 and Theorem 1 which states that Pr(?)0
  • Pr(??i to n Ai ) Pr(??i to ? Ai ) ??i to ?
    Pr (Ai).
  • ?i to n Pr (Ai) ?n1 to ? Pr (Ai)
  • ?i to n Pr (Ai) 0
  • ?i to n Pr (Ai)

13
Some Theorems cont.
  • Theorem 3. For any event A, Pr(AC) 1 Pr(A)
  • Theorem 4. For any event A, 0 ? Pr(A) ? 1
  • Proof by contradiction in two parts
  • Part 1. Suppose Pr(A) lt 0. Then that would
    violate axiom A1, a contradiction.
  • Part 2. Suppose Pr(A) gt 1. Then by Theorem 3,
    Pr(AC) lt 0, which also contradicts A1.
  • Thus, 0 ? Pr(A) ? 1.

14
Some Theorems cont.
  • Theorem 5. For any two events A and B,
  • Pr(A ?? B) Pr(A) Pr(B) Pr(A ? B)
  • Proof
  • A ? B (A ? BC) ? (A ? B) ? (AC ? B)
  • Since all three elements in the equation are
    disjoint, Theorem 2 implies
  • Pr(A ? B) Pr(A ? BC) Pr(A ? B) Pr(AC ? B)
  • Pr(A ? BC) Pr(A ? B) Pr(AC ? B) Pr(A ?
    B) - Pr(A ? B)
  • Further, we know that Pr(A) Pr(A ? BC) Pr(A ?
    B)
  • and that Pr(B) Pr(A ? B) Pr(AC ? B)
  • Thus, Pr(A ? B) Pr(A) Pr(B) - Pr(A ? B)

15
Independent Events
  • Intuitively, we define independence as
  • Two events A and B are independent if the
    occurrence or non-occurrence of one of the events
    has no influence on the occurrence or
    non-occurrence of the other event.
  • Mathematically, we write define independence as
  • Two events A and B are independent if Pr(A ? B)
    Pr(A)Pr(B).

16
Example of Independence
  • Are party id and vote choice independent in
    presidential elections?
  • Suppose Pr(Rep. ID) .4, Pr(Rep. Vote) .5, and
    Pr(Rep. ID ? Rep. Vote) .35
  • To test for independence, we ask whether
  • Pr Pr(Rep. ID) Pr(Rep. Vote) .35 ?
  • Substituting into the equations, we find that
  • Pr Pr(Rep. ID) Pr(Rep. Vote) .4.5 .2 ??
    .35,
  • so the events are not independent.

17
Independence of Several Events
  • The events A1, , An are independent if
  • Pr(A1 ? A2 ? ? An) Pr(A1)Pr(A2)Pr(An)
  • And, this identity must hold for any subset of
    events.

18
Conditional Probability
  • Conditional probabilities allow us to understand
    how the probability of an event A changes after
    it has been learned that some other event B has
    occurred.
  • The key concept for thinking about conditional
    probabilities is that the occurrence of B
    reshapes the sample space for subsequent events.
  • - That is, we begin with a sample space S
  • - A and B ? S
  • - The conditional probability of A given that B
    looks just at the subset of the sample space for
    B.

The conditional probability of A given B is
denoted Pr(A B). - Importantly, according to
Bayesian orthodoxy, all probability distributions
are implicitly or explicitly conditioned on the
model.
S
Pr(A B)
B
A
19
Conditional Probability Cont.
  • By definition If A and B are two events such
    that Pr(B) gt 0, then

S
Pr(A B)
B
A
Example What is the Pr(Republican Vote
Republican Identifier)? Pr(Rep. Vote ? Rep. Id)
.35 and Pr(Rep ID) .4 Thus, Pr(Republican
Vote Republican Identifier) .35 / .4 .875
20
Useful Properties of Conditional Probabilities
  • Property 1. The Conditional Probability for
    Independent Events
  • If A and B are independent events, then

Property 2. The Multiplication Rule for
Conditional Probabilities In an experiment
involving two non-independent events A and B, the
probability that both A and B occurs can be found
in the following two ways
21
Conditional Probability and Partitions of a
Sample Space
  • The set of events A1,,Ak form a partition of a
    sample space S if ?i1 to k Ai S.
  • If the events A1,,Ak partition S and if B is any
    other event in S (note that it is impossible for
    Ai ? B ? for some i), then the events A1 ? B,
    A2 ? B,,Ak ? B will form a partition of B.
  • Thus, B (A1 ? B) ? (A2 ? B) ? ? (Ak ? B)
  • Pr( B ) ?i1 to k Pr( Ai B )
  • Finally, if Pr( Ai ) gt 0 for all i, then
  • Pr( B ) ?i1 to k Pr( B Ai ) Pr( Ai )

22
Example of conditional probability and partitions
of a sample space
  • Pr( B ) ?i1 to k Pr( B Ai ) Pr( Ai )
  • Example. What is the Probability of a Republican
    Vote?
  • Pr( Rep. Vote ) Pr( Rep. Vote Rep. ID ) Pr(
    Rep. ID )
  • Pr( Rep. Vote Ind. ID ) Pr( Ind. ID )
  • Pr( Rep. Vote Dem. ID ) Pr( Dem. ID )
  • Note the definition for Pr(B) defined above
    provides the denominator for Bayes Theorem.

23
Bayes Theorem (Rule, Law)
  • Bayes Theorem Let events A1,,Ak form a
    partition of the space S such that Pr(Aj) gt 0 for
    all j and let B be any event such that Pr(B) gt 0.
    Then for i 1,..,k

Proof
Bayes Theorem is just a simple rule for
computing the conditional probability of events
Ai given B from the conditional probability of B
given each event Ai and the unconditional
probability of each Ai
24
Interpretation of Bayes Theorem
Pr(Ai) Prior distribution for the Ai. It
summarizes your beliefs about the probability of
event Ai before Ai or B are observed.
Pr( B Ai ) The conditional probability of B
given Ai. It summarizes the likelihood of event
B given Ai.
?k Pr( Ak ) Pr( B Ak ) The normalizing
constant. This is equal to the sum of the
quantities in the numerator for all events Ak.
Thus, P( Ai B ) represents the likelihood of
event Ai relative to all other elements of the
partition of the sample space.
Pr( Ai B ) The posterior distribution of Ai
given B. It represents the probability of event
Ai after Ai has B has been observed.
25
Example of Bayes Theorem
  • What is the probability in a survey that someone
    is black given that they respond that they are
    black when asked?
  • - Suppose that 10 of the population is black, so
    Pr(B) .10.
  • - Suppose that 95 of blacks respond Yes, when
    asked if they are black, so Pr( Y1 B ) .95.
  • - Suppose that 5 of non-blacks respond Yes, when
    asked if they are black, so Pr( Y1 BC) .05

We reach the surprising conclusion that even if
95 of black and non-black respondents correctly
classify themselves according to race, the
probability that someone is black given that they
say they are black is less than .7.
26
Combining Data
  • When applying Bayes Theorem, the order in which
    you collect the data doesnt matter.
  • It also doesnt matter whether you peak at the
    data halfway through an experiment.

27
Example cont.
  • Continuing the last example, suppose that the
    interviewer also makes an estimate of the
    respondents race. Lets say the interviewer
    correctly classifies 90 percent of respondents,
    and her classification is independent of the
    self-classification.
  • Thus, Pr(Y2 B) 0.9 and Pr(Y2 BC) 0.1.

One way to incorporate information is to
recalculate our estimates from scratch.
Alternatively, we can just update our last set of
results
Write a Comment
User Comments (0)
About PowerShow.com