Title: Basic Concepts and Definitions in ProbabilityBased Reasoning
1Basic Concepts and Definitions in
Probability-Based Reasoning
- Robert J. Mislevy
- University of Maryland
- September 13, 2004
2A quote from Glenn Shafer
- Probability is not really about numbers
- it is about the structure of reasoning.
- Glenn Shafer, quoted in Pearl, 1988, p. 77
3Views of Probability
- Two conceptions of probability
- Aleatory (chance)
- Long-run frequencies, mechanisms
- Probability is a property of the world
- Degree of belief (subjective)
- Probability is a property of Your state of
knowledge (de Finetti) and model of the situation - Same formal definitions machinery
- Aleatory paradigms as analogical basis for degree
of belief (Glenn Shafer)
4Frames of discernment
- Frame of discernment is all the possible
combinations of values of the variables your are
working with. (Shafer, 1976) - Discern detect, recognize, distinguish
- Property of you as much as property of world
- Depends on what you know and what your purpose is
- Frame of discernment can evolve over time
- Medical diagnosis
- Document literacy example (more information)
5Frames of Discernment in Assessment
- In Student Model, determining what aspects of
skill knowledge to use as explicit SM
variables--psych perspective, grainsize,
reporting requirements - In Evidence Model, evidence identification
(task scoring), evaluation rules map from unique
work product to common observed variables. - In Task Model, which aspects of situations are
important in task design to keep track of and
manipulate, to achieve assessments purpose? - Features vs. Values of Variables
6(Random) Variables
- We will start on variables with a finite number
of possible values. - Denote random variable by upper case, say X.
- Denote particular values and generic values by
lower case, x. - Y is the outcome of a coin flip yÃŽ h,t.
- Xi is the answer to Item i xi ÃŽ 0,1.
7Finite Probability Distributions
- Finite set of possible values x1,xn
- Prob(Xxj), P(Xxj), or more simply p(xj), is the
probability that X takes the value xj. - 0 p(xj) 1.
-
- P(Xxj or Xxm) p(xj) p(xm).
8Continuous Probability Distributions
- Infinitely many possible values eg, x
xÃŽ0,1, x xÃŽ(-,) - Events A1,Am are sets of possible values
- A1 x xlt0, A2 x xÃŽ(0,1), A3 x
xgt0, - P(Aj) is the probability that X takes a value in
Aj - 0 p(Aj) 1.
- If A1 Am are disjoint events that exhaust all
possible values of x, then - If Aj and Ak are disjoint events, P(Aj È Ak)
P(Aj) P(Ak).
9Jensens Icy Road Example
Police Inspector Smith is impatiently awaiting
the arrival of Mr. Holmes and Dr. Watson. They
are late, and Inspector Smith has another
important appointment (lunch). Looking out the
window he wonders whether the roads are icy.
Both are notoriously bad drivers, so if the roads
are icy they are likely to crash. His secretary
enters and tells him that Dr Watson has had a car
accident. Watson? OK. It could be worse icy
roads! Then Holmes has most probably crashed
too. Ill go for lunch now. Icy roads? the
secretary replies. It is far from being that
cold, and furthermore all of the roads are
salted. Inspector Smith is relieved. Bad luck
for Watson. Let us give Holmes ten minutes
more. (Jensen, 1996, p. 7)
Jensen, F.V. (1996). An introduction to Bayesian
networks. New York Springer-Verlag.
10From the Icy Road Example
- Ice Is there an icy road?
- Values Yes, No
- Initial Probabilities (.7, .3)
- (Note choice of values for variable icy road.)
11Icy Road Probabilities
Ice
P(Iceyes)
Yes
.7
.3
No
P(Iceno)
12Graph representation
X
the variable
13Hypergraph representation
X
p(x)
the probability distribution
the variable
14Joint probability distributions
- Two random variables, X and Y
- P(Xxj,Yyk), or p(xj, yk), is the probability
that X takes the value xj and Y takes the value
yk . - 0 p(xj , yk) 1.
-
15Marginal probability distributions 1
- Two discrete random variables, X and Y
- Recall P(Xxj,Yyk), or p(xj, yk), is the
probability that X takes the value xj and Y takes
the value yk - The marginal probability of a value xj of X is
the sum over all the possible joint probabilities
p(xj, yk) with that value of X -
16Conditional probability distributions
- Two random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . - This is how we express relationships among
real-world phenomena - Coin flip p(heads) vs. p(headsBobReport)
- P(heart attackage, family history, blood
pressure) - P(February 10 high temperature geographical
location, February 9 high temperature) - IRT P(Xj1) vs. P(Xj1q)
17Conditional probability distributions
- Two discrete random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . - 0 p(xj yk) 1 for each given yk.
- for each given yk
- P(Xxj or Xxm Yyk) p(xj yk) p(xm yk).
18A relationship between joint and conditional
probability distributions
- p(xj, yk) p(xj yk) p(yk)
- p(yk xj) p(xj) .
- Basis of Bayes Theorem
- p(yk xj) p(xj) p(xj yk) p(yk)
- Þ p(yk xj) p(xj yk) p(yk)
/ p(xj) .
19Graph representation
X
Y
the parent variable
the child variable
20Hypergraph representation
X
Y
p(x)
p(yx)
Marginal probability distribution for parent
Conditional probability distribution for child
given parent
Parent variable
Child variable
21Hypergraph representation
X
p(x)
Y
p(yx,z)
Z
p(z)
Conditional probability distribution for child
given parents
Child variable
Marginal probability distributions for parents
Parent variables
22Marginal probability distributions 2
- Two discrete random variables, X and Y
- Recall p(xj yk), is the probability that X xj
given Y yk . - The marginal probability of a value of X is the
sum of its conditional probabilities given all
possible values of Y, with each weighted by its
probability -
23Bayes Theorem
- The setup, with two random variables, X and Y
- You know conditional probabilities, p(xj yk),
which tell you what to believe about X if you
knew the value of Y. - You learn Xx what should you believe about Y?
- You combine two things
- Relative conditional probabilities (the
likelihood) - Previous probabilities about Y values
posterior likelihood
prior
24From the Icy Road Example
- Ice Is there an icy road?
- Values Yes, No
- Initial Probabilities (.7, .3)
- Watson Does Watson have a car crash?
- Values Yes, No
- Probabilities conditional on Icy Road
- (.8, .2) if IceYes, (.1, .9) if IceNo.
25Icy Road Conditional Probabilities
Watson
No
Yes
Ice
.2
Yes
.8
.9
.1
No
p(WatsonnoIceyes)
p(WatsonyesIceyes)
26Icy Road Conditional Probabilities
Watson
No
Yes
Ice
.2
Yes
.8
.9
.1
No
p(WatsonnoIceyes)
p(WatsonyesIceyes)
27Icy Road Likelihoods
Note 2/9 ratio
Watson
No
Yes
Ice
p(WatsonnoIceyes)
.2
Yes
.8
.9
.1
No
p(WatsonnoIceno)
28Icy Road Likelihoods
Note 8/1 ratio
Watson
No
Yes
Ice
p(WatsonyesIceyes)
.2
Yes
.8
.9
.1
No
p(WatsonyesIceno)
29Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
30Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
Note Sum .59, not 1.00. These arent
probabilities.
31Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
Yes
.95
.05
Divide through by normalizing constant .59 to get
posterior probabilities.
32Independence
- Independence
- The probability of the joint occurrence of
values of two variables is always equal to the
product of the probabilities individually - P(Xx,Yy) P(Xx) P(Yy).
- Equivalent to saying that learning the value of
one of the variables does not change your belief
about the other.
33Conditional independence
- Conditional independence
- The conditional probability of the joint
occurrence given the value of another variable is
always equal to the product of the conditional
probabilities - P(Xx,YyZz) P(Xx Zz) P(Yy Zz).
34Conditional independence
- Conditional independence is not a grace of
nature for which we must wait passively, but
rather a psychological necessity which we satisfy
actively by organizing our knowledge in a
specific way. - An important tool in such organization is the
identification of intermediate variables that
induce conditional independence among
observables if such variables are not in our
vocabulary, we create them. - In medical diagnosis, for instance, when some
symptoms directly influence one another, the
medical profession invents a name for that
interaction (e.g., syndrome, complication,
pathological state) and treats it as a new
auxiliary variable that induces conditional
independence dependency between any two
interacting systems is fully attributed to the
dependencies of each on the auxiliary variable.
(Pearl, 1988, p. 44)
35Example Icy Road