Probability Theory Longin Jan Latecki Temple University - PowerPoint PPT Presentation

About This Presentation
Title:

Probability Theory Longin Jan Latecki Temple University

Description:

Title: Introduction to Bayesian Learning Author: A. Guy Incognito Last modified by: CIS Computer Labs Created Date: 6/18/2004 2:35:19 AM Document presentation format – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 76
Provided by: A1067
Learn more at: https://cis.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Probability Theory Longin Jan Latecki Temple University


1
Probability TheoryLongin Jan LateckiTemple
University
  • Slides based on slides by
  • Aaron Hertzmann, Michael P. Frank, and
    Christopher Bishop

2
What is reasoning?
  • How do we infer properties of the world?
  • How should computers do it?

3
Aristotelian logic
  • If A is true, then B is true
  • A is true
  • Therefore, B is true

A My car was stolen B My car isnt where I left
it
4
Real-world is uncertain
  • Problems with pure logic
  • Dont have perfect information
  • Dont really know the model
  • Model is non-deterministic

So lets build a logic of uncertainty!
5
Beliefs
  • Let B(A) belief A is true
  • B(A) belief A is false
  • e.g., A my car was stolen
  • B(A) belief my car was stolen

6
Reasoning with beliefs
  • Cox Axioms Cox 1946
  • Ordering exists
  • e.g., B(A) gt B(B) gt B(C)
  • Negation function exists
  • B(A) f(B(A))
  • Product function exists
  • B(A ? Y) g(B(AY),B(Y))

This is all we need!
7
  • The Cox Axioms uniquely define a complete system
    of reasoning This is probability theory!

8
Principle 1
  • Probability theory is nothing more than common
    sense reduced to calculation.
  • - Pierre-Simon Laplace, 1814

9
Definitions
  • P(A) probability A is true
  • B(A) belief A is true
  • P(A) 2 01
  • P(A) 1 iff A is true
  • P(A) 0 iff A is false
  • P(AB) prob. of A if we knew B
  • P(A, B) prob. A and B

10
Examples
  • A my car was stolen
  • B I cant find my car
  • P(A) .1
  • P(A) .5
  • P(B A) .99
  • P(A B) .3

11
Basic rules
  • Sum rule
  • P(A) P(A) 1

Example A it will rain today p(A) .9
p(A) .1
12
Basic rules
  • Sum rule
  • ?i P(Ai) 1

when exactly one of Ai must be true
13
Basic rules
  • Product rule
  • P(A,B) P(AB) P(B)
  • P(BA) P(A)

14
Basic rules
  • Conditioning

Product Rule
P(A,B) P(AB) P(B)
P(A,BC) P(AB,C) P(BC)
Sum Rule
?i P(Ai) 1
?i P(AiB) 1
15
Summary
P(A,B) P(AB) P(B)
  • Product rule
  • Sum rule
  • All derivable from Cox axioms must obey rules of
    common sense
  • Now we can derive new rules

?i P(Ai) 1
16
Example
  • A you eat a good meal tonight
  • B you go to a highly-recommended restaurant
  • B you go to an unknown restaurant
  • Model P(B) .7, P(AB) .8, P(AB) .5
  • What is P(A)?

17
Example, continued
  • Model P(B) .7, P(AB) .8, P(AB) .5
  • 1 P(B) P(B)
  • 1 P(BA) P(BA)
  • P(A) P(BA)P(A) P(BA)P(A)
  • P(A,B) P(A,B)
  • P(AB)P(B) P(AB)P(B)
  • .8 .7 .5 (1-.7) .71

Sum rule
Conditioning
Product rule
Product rule
18
Basic rules
  • Marginalizing

P(A) ?i P(A, Bi)
for mutually-exclusive Bi
e.g., p(A) p(A,B) p(A, B)
19
Syllogism revisited
  • A -gt B
  • A
  • Therefore B
  • P(BA) 1
  • P(A) 1
  • P(B) P(B,A) P(B, A)
  • P(BA)P(A) P(BA)P(A)
  • 1

20
  • Knowing P(A,B,C) is equivalent to knowing
    P(AB,C), P(BC), P(C), P(A), P(AC), P(CA), etc.

21
  • Given a complete model, we can derive any other
    probability

Principle 2
22
Inference
  • Model P(B) .7, P(AB) .8, P(AB) .5
  • If we know A, what is P(BA)?
  • (Inference)

P(A,B) P(AB) P(B) P(BA) P(A)
P(AB) P(B)
P(BA)
.8 .7 / .71 .79
P(A)
Bayes Rule
23
Inference
  • Bayes Rule

Likelihood
Prior
P(DM) P(M)
P(MD)
P(D)
Posterior
24
  • Describe your model of the world, and then
    compute the probabilities of the unknowns given
    the observations

Principle 3
25
  • Use Bayes Rule to infer unknown model variables
    from observed data

Principle 3a
Likelihood
Prior
P(DM) P(M)
P(MD)
P(D)
Posterior
26
Bayes Theorem
Rev. Thomas Bayes1702-1761
posterior ? likelihood prior
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Example diagnosis
  • Jo takes a test for a nasty disease. The result
    of the test is either positive (T) or
    negative (T). The test is 95 reliable. 1 of
    people with Jos age/background have the disease.
  • If the test result is positive, does Jo have
    the disease? MacKay 2003

31
Example diagnosis
  • Model P(D) .01, P(TD) .95, P(TD) .95

P(TD) P(D)
P(DT)
.16
P(T)
Using P(T) P(TD)P(D) P(TD)P(D)
.95 .01 (1-.95).99 .059
32
Example diagnosis
  • What if we tried a different test?
  • 99.9 reliable test -gt P(DT2) 91
  • 70 reliable test -gt P(DT3) 2.3
  • The posterior merges information (could use
    multiple tests, e.g., P(DT2, T3) )

33
Independence
  • Definition
  • A and B are independent iff

P(A,B) P(A) P(B)
34
Example
Suppose a red die and a blue die are rolled. The
sample space
Are the events sum is 7 and the blue die is 3
independent?
35
The events sum is 7 and the blue die is 3 are
independent
S 36
p(sum is 7 and blue die is 3) 1/36 p(sum is 7)
p(blue die is 3) 6/366/361/36 Thus, p((sum is
7) and (blue die is 3)) p(sum is 7) p(blue die
is 3)
36
Conditional Probability
  • Let E,F be any events such that Pr(F)gt0.
  • Then, the conditional probability of E given F,
    written Pr(EF), is defined as Pr(EF)
    Pr(E?F)/Pr(F).
  • This is what our probability that E would turn
    out to occur should be, if we are given only the
    information that F occurs.
  • If E and F are independent then Pr(EF) Pr(E).
  • Pr(EF) Pr(E?F)/Pr(F)
  • Pr(E)Pr(F)/Pr(F) Pr(E)

37
Visualizing Conditional Probability
  • If we are given that event F occurs, then
  • Our attention gets restricted to the subspace F.
  • Our posterior probability for E (after seeing F)
    correspondsto the fraction of F where Eoccurs
    also.
  • Thus, p'(E)p(EnF)/p(F).

Entire sample space S
Event F
Event E
EventEnF
38
Conditional Probability Example
  • Suppose I choose a single letter out of the
    26-letter English alphabet, totally at random.
  • Use the Laplacian assumption on the sample space
    a,b,..,z.
  • What is the (prior) probabilitythat the letter
    is a vowel?
  • PrVowel __ / __ .
  • Now, suppose I tell you that the letter chosen
    happened to be in the first 9 letters of the
    alphabet.
  • Now, what is the conditional (or posterior)
    probability that the letter is a vowel, given
    this information?
  • PrVowel First9 ___ / ___ .

1st 9letters
vowels
w
z
r
k
b
c
a
t
y
u
d
f
e
x
g
i
o
l
s
h
j
n
p
m
q
v
Sample Space S
39
Example
  • What is the probability that, if we flip a coin
    three times, that we get an odd number of tails
    (event E), if we know that the event F, the
    first flip comes up tails occurs?
  • (TTT), (TTH), (THH), (HTT), (HHT),
    (HHH), (THT), (HTH)
  • Each outcome has probability 1/4,
  • p(E F) 1/41/4 ½, where Eodd number of
    tails
  • or p(EF) p(E?F)/p(F) 2/4 ½
  • For comparison p(E) 4/8 ½
  • E and F are independent, since p(E F) Pr(E).

40
Example Two boxes with balls
  • Two boxes first 2 blue and 7 red balls second
    4 blue and 3 red balls
  • Bob selects a ball by first choosing one of the
    two boxes, and then one ball from this box.
  • If Bob has selected a red ball, what is the
    probability that he selected a ball from the
    first box.
  • An event E Bob has chosen a red ball.
  • An event F Bob has chosen a ball from the first
    box.
  • We want to find p(F E)

41
Whats behind door number three?
  • The Monty Hall problem paradox
  • Consider a game show where a prize (a car) is
    behind one of three doors
  • The other two doors do not have prizes (goats
    instead)
  • After picking one of the doors, the host (Monty
    Hall) opens a different door to show you that the
    door he opened is not the prize
  • Do you change your decision?
  • Your initial probability to win (i.e. pick the
    right door) is 1/3
  • What is your chance of winning if you change your
    choice after Monty opens a wrong door?
  • After Monty opens a wrong door, if you change
    your choice, your chance of winning is 2/3
  • Thus, your chance of winning doubles if you
    change
  • Huh?

42
(No Transcript)
43
Monty Hall Problem
Ci - The car is behind Door i, for i equal to 1,
2 or 3. Hij - The host opens Door j after the
player has picked Door i, for i and j equal to
1, 2 or 3. Without loss of generality, assume,
by re-numbering the doors if necessary, that the
player picks Door 1, and that the host then
opens Door 3, revealing a goat. In other words,
the host makes proposition H13 true. Then the
posterior probability of winning by not switching
doors is P(C1H13).
44
The probability of winning by switching is
P(C2H13), since under our assumption switching
means switching the selection to Door 2, since
P(C3H13) 0 (the host will never open the door
with the car)
The posterior probability of winning by not
switching doors is P(C1H13) 1/3.
45
Discrete random variables
  • Probabilities over discrete variables

C 2 Heads, Tails P(CHeads) .5
P(CHeads) P(CTails) 1
Possible values (outcomes) are discrete E.g.,
natural number (0, 1, 2, 3 etc.)
46
Terminology
  • A (stochastic) experiment is a procedure that
    yields one of a given set of possible outcomes
  • The sample space S of the experiment is the set
    of possible outcomes.
  • An event is a subset of sample space.
  • A random variable is a function that assigns a
    real value to each outcome of an experiment

Normally, a probability is related to an
experiment or a trial.
Lets take flipping a coin for example, what are
the possible outcomes?
Heads or tails (front or back side) of the coin
will be shown upwards.
After a sufficient number of tossing, we can
statistically conclude that the probability of
head is 0.5.
In rolling a dice, there are 6 outcomes. Suppose
we want to calculate the prob. of the event of
odd numbers of a dice. What is that probability?
47
Random Variables
  • A random variable V is any variable whose value
    is unknown, or whose value depends on the precise
    situation.
  • E.g., the number of students in class today
  • Whether it will rain tonight (Boolean variable)
  • The proposition Vvi may have an uncertain truth
    value, and may be assigned a probability.

48
Example
  • A fair coin is flipped 3 times. Let S be the
    sample space of 8 possible outcomes, and let X be
    a random variable that assignees to an outcome
    the number of heads in this outcome.
  • Random variable X is a function XS ? X(S),
    where X(S)0, 1, 2, 3 is the range of X, which
    is the number of heads, andS (TTT), (TTH),
    (THH), (HTT), (HHT), (HHH), (THT), (HTH)
  • X(TTT) 0 X(TTH) X(HTT) X(THT) 1X(HHT)
    X(THH) X(HTH) 2X(HHH) 3
  • The probability distribution (pdf) of random
    variable X is given by P(X3) 1/8, P(X2)
    3/8, P(X1) 3/8, P(X0) 1/8.

49
Experiments Sample Spaces
  • A (stochastic) experiment is any process by which
    a given random variable V gets assigned some
    particular value, and where this value is not
    necessarily known in advance.
  • We call it the actual value of the variable, as
    determined by that particular experiment.
  • The sample space S of the experiment is justthe
    domain of the random variable, S domV.
  • The outcome of the experiment is the specific
    value vi of the random variable that is selected.

50
Events
  • An event E is any set of possible outcomes in S
  • That is, E ? S domV.
  • E.g., the event that less than 50 people show up
    for our next class is represented as the set 1,
    2, , 49 of values of the variable V ( of
    people here next class).
  • We say that event E occurs when the actual value
    of V is in E, which may be written V?E.
  • Note that V?E denotes the proposition (of
    uncertain truth) asserting that the actual
    outcome (value of V) will be one of the outcomes
    in the set E.

51
Probability of an event E
  • The probability of an event E is the sum of the
    probabilities of the outcomes in E. That is
  • Note that, if there are n outcomes in the event
    E, that is, if E a1,a2,,an then

52
Example
  • What is the probability that, if we flip a coin
    three times, that we get an odd number of tails?
  • (TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT),
    (HTH)
  • Each outcome has probability 1/8,
  • p(odd number of tails) 1/81/81/81/8 ½

53
Venn Diagram
Experiment Toss 2 Coins. Note Faces.
Tail
Event
TH
HT
HH
Outcome
TT
S
Sample Space
S HH, HT, TH, TT
54
Discrete Probability Distribution ( also called
probability mass function (pmf) )
  • 1. List of All possible x, p(x) pairs
  • x Value of Random Variable (Outcome)
  • p(x) Probability Associated with Value
  • 2. Mutually Exclusive (No Overlap)
  • 3. Collectively Exhaustive (Nothing Left Out)
  • 4. 0 ? p(x) ? 1
  • 5. ? p(x) 1

55
Visualizing Discrete Probability Distributions
Table
Listing
Tails
f(x
)
p(x
)
  • (0, .25), (1, .50), (2, .25)

Count
0
1
.25
1
2
.50
2
1
.25
p(x)
Graph
Equation
.50
n
!
x
n
x
?
p
x
p
p
(
)
(
)
?
?
1
.25
x
n
x
!
(
)
!
?
x
.00
0
1
2
56
N is the total number of trials and nij is the
number of instances where Xxi and Yyj
  • Marginal Probability
  • Conditional Probability

Joint Probability
57
  • Sum Rule

Product Rule
58
The Rules of Probability
  • Sum Rule
  • Product Rule

59
Continuous variables
  • Probability Distribution Function (PDF)
  • a.k.a. marginal probability

P(a x b) sab p(x) dx
p(x)
Notation P(x) is prob p(x) is PDF
x
60
Continuous variables
  • Probability Distribution Function (PDF)
  • Let x 2 R
  • p(x) can be any function s.t.
  • s-11 p(x) dx 1
  • p(x) 0
  • Define P(a x b) sab p(x) dx

61
Continuous Prob. Density Function
  • 1. Mathematical Formula
  • 2. Shows All Values, x, and Frequencies, f(x)
  • f(x) Is Not Probability
  • 3. Properties

(Value, Frequency)
f(x)
?
f
x
dx
(
)
?
1
x
a
b
All x
(Area Under Curve)
Value
f
x
(
)
a
x
b
?
?
?
0,
62
Continuous Random Variable Probability
d
?
P
c
x
d
f
x
dx
(
)
(
)
?
?
?
c
f(x)
Probability Is Area Under Curve!
X
c
d
63
Probability mass function
In probability theory, a probability mass
function (pmf) is a function that gives the
probability that a discrete random variable is
exactly equal to some value. A pmf differs from
a probability density function (pdf) in that the
values of a pdf, defined only for continuous
random variables, are not probabilities as such.
Instead, the integral of a pdf over a range of
possible values (a, b gives the probability of
the random variable falling within that range.
Example graphs of a pmfs. All the values of a pmf
must be non-negative and sum up to 1. (right)
The pmf of a fair die. (All the numbers on the
die have an equal chance of appearing on top
when the die is rolled.)
64
Suppose that X is a discrete random variable,
taking values on some countable sample space  S
? R. Then the probability mass function  fX(x) 
for X is given by Note that this explicitly
defines  fX(x)  for all real numbers, including
all values in R that X could never take indeed,
it assigns such values a probability of
zero. Example. Suppose that X is the outcome of
a single coin toss, assigning 0 to tails and 1
to heads. The probability that X x is 0.5 on
the state space 0, 1 (this is a Bernoulli
random variable), and hence the probability mass
function is
65
Uniform Distribution
  • 1. Equally Likely Outcomes
  • 2. Probability Density
  • 3. Mean Standard Deviation

f(x)
x
d
c
Mean Median
66
Uniform Distribution Example
  • Youre production manager of a soft drink
    bottling company. You believe that when a
    machine is set to dispense 12 oz., it really
    dispenses 11.5 to 12.5 oz. inclusive.
  • Suppose the amount dispensed has a uniform
    distribution.
  • What is the probability that less than 11.8 oz.
    is dispensed?

67
Uniform Distribution Solution
f(x)
1.0
x
11.5
12.5
11.8
  • P(11.5 ? x ? 11.8) (Base)(Height)
  • (11.8 - 11.5)(1) 0.30

68
Normal Distribution
  • 1. Describes Many Random Processes or Continuous
    Phenomena
  • 2. Can Be Used to Approximate Discrete
    Probability Distributions
  • Example Binomial
  • Basis for Classical Statistical Inference
  • A.k.a. Gaussian distribution

69
Normal Distribution
  • 1. Bell-Shaped Symmetrical
  • 2. Mean, Median, Mode Are Equal
  • 4. Random Variable Has Infinite Range

Mean
light-tailed distribution
70
Probability Density Function
  • f(x) Frequency of Random Variable x
  • ? Population Standard Deviation
  • ? 3.14159 e 2.71828
  • x Value of Random Variable (-?lt x lt ?)
  • ? Population Mean

71
(No Transcript)
72
Effect of Varying Parameters (? ?)
73
Normal Distribution Probability
Probability is area under curve!
74
Infinite Number of Tables
Normal distributions differ by mean standard
deviation.
Each distribution would require its own table.
Thats an infinite number!
75
Standardize theNormal Distribution
Normal Distribution
Standardized Normal Distribution
One table!
76
Intuitions on Standardizing
  • Subtracting ? from each value X just moves the
    curve around, so values are centered on 0 instead
    of on ?
  • Once the curve is centered, dividing each value
    by ?gt1 moves all values toward 0, pressing the
    curve

77
Standardizing Example
Normal Distribution
78
Standardizing Example
Normal Distribution
Standardized Normal Distribution
79
Why use Gaussians?
  • Convenient analytic properties
  • Central Limit Theorem
  • Works well
  • Not for everything, but a good building block
  • For more reasons, see Bishop 1995, Jaynes
    2003

80
Rules for continuous PDFs
  • Same intuitions and rules apply
  • Sum rule s-11 p(x) dx 1
  • Product rule p(x,y) p(xy)p(x)
  • Marginalizing p(x) s p(x,y)dy
  • Bayes Rule, conditioning, etc.

81
Multivariate distributions
Uniform x U(dom)
Gaussian x N(?, ?)
Write a Comment
User Comments (0)
About PowerShow.com