Title: A Quick Overview of Probability
1A Quick Overview of Probability
- Tom Mitchell
- Machine Learning 10-601
- Jan 21 2009
- a significant amount of this material is pilfered
from Andrew Moores slides and William Cohens
slides - www.cs.cmu.edu/awm/tutorials
- http//www.cs.cmu.edu/tom/10601_sp08/slides/proba
bility-1-23-2008.ppt
2The Problem of Induction
- David Hume (1711-1776) pointed out
- Empirically, induction seems to work
- Statement (1) is an application of induction.
- This stumped people for about 200 years
3A Second Problem of Induction
- A black crow seems to support the hypothesis all
crows are black. - A pink highlighter supports the hypothesis all
non-black things are non-crows - Thus, a pink highlighter supports the hypothesis
all crows are black.
4Probability Theory
- Events
- discrete random variables, continuous random
variables, compound events - Axioms of probability
- What defines a reasonable theory of uncertainty
- Independent events
- Conditional probabilities
- Bayes rule and beliefs
- Joint probability distribution
5Random Variables
- Informally, A is a random variable if
- A denotes something about which we are uncertain
- perhaps the outcome of a randomized experiment
- Examples
- A True if a randomly drawn person from our
class is female - A Hometown of a randomly drawn person from our
class - A True if two randomly drawn persons from our
class have same birthday - A True if the 1,000,000,000,000th digit of pi
is 7 - Define P(A) as the fraction of possible worlds
in which A is true - the set of possible worlds is called the sample
space, S - A random variable A is a function defined over S
- A S ? 0,1
6A little formalism
- More formally, we have
- a sample space S (e.g., set of students in our
class) - aka the set of possible worlds
- a random variable is a function defined over the
sample space - Gender S ? m, f
- Weight S ? Reals
- an event is a subset of S
- e.g., the subset of S for which Genderf
- e.g., the subset of S for which (Genderm) AND
(nationalityUS) - were often interested in probabilities of
specific events - and specific events conditioned on other specific
events
7Visualizing A
Sample space of all possible worlds
P(A) Area of reddish oval
Worlds in which A is true
Its area is 1
Worlds in which A is False
8The Axioms of Probability
- 0 lt P(A) lt 1
- P(True) 1
- P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
9(This is Andrews joke)
The Axioms Of Probability
10These Axioms are Not to be Trifled With
- There have been many many other approaches to
understanding uncertainty - Fuzzy Logic, three-valued logic, Dempster-Shafer,
non-monotonic reasoning, - 25 years ago people in AI argued about these now
they mostly dont - Any scheme for combining uncertain information,
uncertain beliefs, etc, really should obey
these axioms - If you gamble based on uncertain beliefs, then
you can be exploited by an opponent ? your
uncertainty formalism violates the axioms - di
Finetti 1931 (the Dutch book argument)
11Interpreting the axioms
- 0 lt P(A) lt 1
- P(True) 1
- P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
The area of A cant get any smaller than 0
And a zero area would mean no world could ever
have A true
12Interpreting the axioms
- 0 lt P(A) lt 1
- P(True) 1
- P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
The area of A cant get any bigger than 1
And an area of 1 would mean all worlds will have
A true
13Interpreting the axioms
- 0 lt P(A) lt 1
- P(True) 1
- P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
14Theorems from the Axioms
- 0 lt P(A) lt 1, P(True) 1, P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
- ? P(not A) P(A) 1-P(A)
15Theorems from the Axioms
- 0 lt P(A) lt 1, P(True) 1, P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
- ? P(not A) P(A) 1-P(A)
P(A or A) 1 P(A and A) 0 P(A or
A) P(A) P(A) - P(A and A) 1
P(A) P(A) - 0
16Elementary Probability in Pictures
A
A
17Another useful theorem
- 0 lt P(A) lt 1, P(True) 1, P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
- ? P(A) P(A B) P(A B)
A A and (B or B) (A and B) or (A and
B) P(A) P(A and B) P(A and B) P((A and B)
and (A and B)) P(A) P(A and B) P(A and B)
P(A and A and B and B)
18Elementary Probability in Pictures
A B
B
A B
B
19Multivalued Discrete Random Variables
- Suppose A can take on more than 2 values
- A is a random variable with arity k if it can
take on exactly one value out of v1,v2, .. vk - Thus
20Elementary Probability in Pictures
A2
A3
A5
A4
A1
21More about Multivalued Random Variables
- Using the axioms of probability
- 0 lt P(A) lt 1, P(True) 1, P(False) 0
- P(A or B) P(A) P(B) - P(A and B)
- And assuming that A obeys
22More about Multivalued Random Variables
- Using the axioms of probabilityand assuming that
A obeys
23Definition of Conditional Probability
P(A B) P(AB)
----------- P(B)
Corollary The Chain Rule
P(A B) P(AB) P(B)
24Conditional Probability in Pictures
picture P(BA2)
A2
A3
A5
A4
A1
25Independent Events
- Definition two events A and B are independent if
Pr(A and B)Pr(A)Pr(B). - Intuition outcome of A has no effect on the
outcome of B (and vice versa). - We need to assume the different rolls are
independent to solve the problem. - You almost always need to assume independence of
something to solve any learning problem.
26Picture A independent of B
27posterior
prior
Bayes rule
Bayes, Thomas (1763) An essay towards solving a
problem in the doctrine of chances. Philosophical
Transactions of the Royal Society of London,
53370-418
by no means merely a curious speculation in the
doctrine of chances, but necessary to be solved
in order to a sure foundation for all our
reasonings concerning past facts, and what is
likely to be hereafter. necessary to be
considered by any that would give a clear account
of the strength of analogical or inductive
reasoning
28More General Forms of Bayes Rule
29More General Forms of Bayes Rule
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34The Joint Distribution
Example Boolean variables A, B, C
Recipe for making a joint distribution of M
variables
35The Joint Distribution
Example Boolean variables A, B, C
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
- Recipe for making a joint distribution of M
variables - Make a truth table listing all combinations of
values of your variables (if there are M Boolean
variables then the table will have 2M rows).
36The Joint Distribution
Example Boolean variables A, B, C
A B C Prob
0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
- Recipe for making a joint distribution of M
variables - Make a truth table listing all combinations of
values of your variables (if there are M Boolean
variables then the table will have 2M rows). - For each combination of values, say how probable
it is.
37The Joint Distribution
Example Boolean variables A, B, C
A B C Prob
0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
- Recipe for making a joint distribution of M
variables - Make a truth table listing all combinations of
values of your variables (if there are M Boolean
variables then the table will have 2M rows). - For each combination of values, say how probable
it is. - If you subscribe to the axioms of probability,
those numbers must sum to 1.
A
0.05
0.10
0.05
0.10
0.25
C
0.05
0.10
B
0.30
38Using the Joint
One you have the JD you can ask for the
probability of any logical expression involving
your attribute
39Using the Joint
P(Poor Male) 0.4654
40Using the Joint
P(Poor) 0.7604
41Inference with the Joint
42Inference with the Joint
P(Male Poor) 0.4654 / 0.7604 0.612
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51Inference is a big deal
- Ive got this evidence. Whats the chance that
this conclusion is true? - Ive got a sore neck how likely am I to have
meningitis? - I see my lights are out and its 9pm. Whats the
chance my spouse is already asleep?