Discrete Bayes Nets - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Discrete Bayes Nets

Description:

If two discrete random variables are independent, the probability of the joint ... p(holm|rn,sprnk) x p(wat|rn) x p(rn) x p(sprnk) whereas... University of Maryland ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 58
Provided by: bobmi9
Category:
Tags: bayes | discrete | holm | nets

less

Transcript and Presenter's Notes

Title: Discrete Bayes Nets


1
Discrete Bayes Nets
  • Robert J. Mislevy
  • University of Maryland
  • September 13, 2004

2
Independence
  • If two discrete random variables are independent,
    the probability of the joint occurrence of values
    of two variables is equal to the product of the
    probabilities individually
  • P(Xx,Yy) P(Xx) P(Yy).
  • Example Flip a dime, flip a penny
  • Also, P(XxYy) P(Xx) Learning the value of
    Y does not influence your belief about X.

Dime
Penny
3
Conditional independence
  • If two variables are conditionally independent,
    the conditional probability of the joint
    occurrence given the value of another variable is
    equal to the product of the conditional
    probabilities
  • P(Xx,YyZz) P(Xx Zz) P(Yy Zz).
  • Also, P(XxYy, Zz) P(Xx Zz). It may be
    that learning Z will influence what you believe
    about X and about Y, but if you know the value of
    Z, learning the value of Y does not influence
    your belief about X.

4
Example of conditional independenceTwo flips of
the same biased coin
  • Two coins, with the probability of heads being .2
    for Coin 1 and .7 for Coin 2.
  • One of these coins is selected--say with 50-50
    probability--and flipped twice.
  • Flips are independent IF you know which coin was
    flipped, but dependent if you dont...

Flip1
(which) Coin
Flip2
5
Two flips of the same biased coin
2. Observe Flip1h prob for Flip2h increases.
Why?
1. Initial status
3. Status if you know Coin1
4. Observe Flip1h when know Coin1 no change
for prob of Flip2h. Why not?
6
The heart of measurement models Pearl on
Conditional independence
  • Conditional independence is not a grace of
    nature for which we must wait passively, but
    rather a psychological necessity which we satisfy
    actively by organizing our knowledge in a
    specific way.
  • An important tool in such organization is the
    identification of intermediate variables that
    induce conditional independence among
    observables if such variables are not in our
    vocabulary, we create them.
  • In medical diagnosis, for instance, when some
    symptoms directly influence one another, the
    medical profession invents a name for that
    interaction (e.g., syndrome, complication,
    pathological state) and treats it as a new
    auxiliary variable that induces conditional
    independence dependency between any two
    interacting systems is fully attributed to the
    dependencies of each on the auxiliary variable.
    (Pearl, 1988, p. 44)

7
Building up complex networks
  • Interrelationships among many variables modeled
    in terms of important relationships among smaller
    subsets of variables
  • (sometimes unobservable ones).

8
Building up complex networks
  • Recursive representation of probability
    distributions
  • All orderings are equally correct, but some are
    more beneficial because they capitalize on
    causal, dependence, time-order, or theoretical
    relationships that we posit
  • Terms simplify when there is conditional
    independence.

9
Jensens Wet Lawn Example
  • Mr Holmes now lives in Los Angeles. One morning
    when Holmes leaves his house, he realizes that
    his lawn is wet. Is it due to rain (R), or has
    he forgotten to turn off his sprinkler (S)? His
    belief in both events increases.
  • Next he notices that the grass of his neighbor,
    Dr. Watson, is also wet. Elementary Holmes is
    almost certain that it has been raining. (p. 8)

Jensen, F.V. (1996). An introduction to Bayesian
networks. New York Springer-Verlag.
10
Jensens Wet Lawn Example
  • p(holmes, watson,rain, sprinkler)
  • p(holmwat,rn,sprnk) x p(watrn,sprnk) x
    p(rnsprnk) x p(sprnk)
  • p(holmrn,sprnk) x p(watrn) x
    p(rn) x p(sprnk)
  • whereas...

11
Jensens Wet Lawn Example
  • p(rain, sprinkler, watson, holmes)
  • p(rnsprnk,wat,holm) x p(sprnkwat,holm) x
    p(watholm) x p(holm)
  • This doesnt simplify. You get the same
    answers, but less efficiently.

12
Building up complex networks
  • Acyclic directed graphs (DAGs)
  • Nodes variables, edges arrows, cant have
    loops.
  • The relationship between recursive
    representations and acyclic directed graphs
  • Edges (arrows) represent explicit dependence
    relationships
  • No edges means no explicit dependence, although
    there can be dependence through relationships
    with other variables.
  • There can be conditional independence
    relationships that are not revealed in a DAG
    (e.g., the inefficient WetGrass representation).

13
Computation in Bayes nets
  • Concepts basics of computing strategy
  • Chapter 5 of Almond, Mislevy, Steinberg,
    Williamson, Yan (in progress). Bayesian
    networks in educational assessment.
  • For more detail, see
  • Jensen, F.V. (1996). An introduction to Bayesian
    networks. New York Springer-Verlag.
  • Lauritzen, S.L., Spiegelhalter, D.J. (1988).
    Local computations with probabilities on
    graphical structures and their application to
    expert systems (with discussion). Journal of the
    Royal Statistical Society, Series B, 50, 157-224.

14
Why does Bayes Theorem work?
  • The setup, with two random variables, X and Y.
  • Joint probabilities

Yy1
Yy2
Total
p(x1,y1)
p(x1,y2)
Xx1
p(x1)
p(x2,y1)
p(x2,y2)
Xx2
p(x2)
p(x3,y1)
p(x3,y2)
Xx3
p(x3)
p(y1)
Total
p(y2)
15
Why does Bayes Theorem work?
  • The setup, with two random variables, X and Y.
  • Joint probabilities

Yy1
Yy2
Total
p(x1,y1)
p(x1,y2)
Xx1
p(x1)
These are the cells in which Yy2 p(xj,y2)
p(y2 xj) p(xj). Divide each by the total of
the column, or p(y2). Result is the proportion
each cell represents in that column, or p(xj
y2).
p(x2,y1)
p(x2,y2)
Xx2
p(x2)
p(x3,y1)
p(x3,y2)
Xx3
p(x3)
p(y1)
Total
p(y2)
16
Why does Bayes Theorem work?
  • The setup, with two random variables, X and Y.
  • Joint probabilities

Yy1
Yy2
Total
p(x1,y1)
p(x1,y2)
Xx1
p(x1)
These are the cells in which Xx1 p(x1,yk)
p(x1 yk) p(yk). Divide each by the total of
the row, or p(x1). Result is the proportion
each cell represents in that column, or p(yk
x1).
p(x2,y1)
p(x2,y2)
Xx2
p(x2)
p(x3,y1)
p(x3,y2)
Xx3
p(x3)
p(y1)
Total
p(y2)
17
Bayes Theorem with 2 Variables
  • The setup, with two random variables, X and Y
  • You know conditional probabilities, p(xj yk),
    which tell you what to believe about X if you
    knew the value of Y.
  • You learn Xx what should you believe about Y?
  • You combine two things
  • Relative conditional probabilities (the
    likelihood)
  • i.e., p(x yk) as a function of yk with X
    fixed at x.
  • Previous probabilities about Y values p(yk),

posterior likelihood
prior
18
Inference in a chain
Recursive representation
p(u,v,x,y,z) p(zy,x,v,u) p(yx,v,u) p(xv,u)
p(vu) p(u) p(zy)
p(yx) p(xv) p(vu) p(u).
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
19
Inference in a chain
Suppose we learn the value of X
Start here, by revising belief about X
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
20
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about X, use conditional
probability to revise belief about Y
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
21
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about Y, use conditional
probability to revise belief about Z
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
22
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about X, use Bayes Theorem to
revise belief about V
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
23
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about V, use Bayes Theorem to
revise belief about U
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
24
Inference in singly-connected nets
Singly connected There is never more than one
path from one variable to another variable.
Chains and trees are singly connected. Can use
repeated applications of Bayes theorem and
conditional probability to propagate
evidence. (Pearl, early 1980s)
V
U
X
Y
Z
25
Inference in multiply-connected nets
In a multiply- connected graph, in at least one
instance there is more than one path from one
variable to another variable. Repeated
applications of Bayes theorem and conditional
probability at the level of individual variables
doesnt work.
V
W
U
X
Y
Z
26
Inference in multiply-connected nets
V
W
  • Key idea Group variables into subsets
    (cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
27
Inference in multiply-connected nets
V
W
  • Key idea Group variables into subsets
    (cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
U,V,X
28
Inference in multiply-connected nets
V
W
  • Key idea Group variables into subsets
    (cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
U,V,X
U,X,Y
29
Inference in multiply-connected nets
V
W
  • Key idea Group variables into subsets
    (cliques) such that the subsets form a tree.
  • Can the update cliques with a generalized
    version of updating individual variables in
    cliques.

U
X
Y
Z
X,Z
U,V,W
U,V,X
U,X,Y
30
The Lauritzen-Spiegelhalter algorithm
  • 1. Recursive representation of the joint
    distribution of variables.
  • 2. Directed graph representation of (1).
  • 3. Moralized, undirected, triangulated, graph.
  • 4. Determination of cliques and clique
    intersections
  • 5. Join tree representation.
  • 6. Potential tables.
  • 7. Updating scheme.

31
Example from Andreassen, Jensen, Olesen
  • Two possible diseases flu and throat infection
    (FLU and THRINF)
  • Two possible symptoms, fever and sore throat (FEV
    and SORETHR).
  • The diseases are modeled as independent,
  • the symptoms as conditionally independent given
    disease states.

32
Example from Andreassen, Jensen, Olesen
  • Aside Medical diagnosis with observable
    symptoms of latent disease states has many
    parallels to measurement modeling in assessment
  • State is a construct, inferred from theory
    experience--proposed to organize our knowledge
  • Conditional independence of observations given
    (possibly complex) state
  • Persistent interest in the underlying state
  • Observations mainly of transitory interest
  • States relationships meant to aid thinking
    about unique cases but surely oversimplified
  • State is the level at which treatment prognosis
    is discussed,
  • although there is often therapeutic/educational
    value in addressing specifics from observational
    setting

33
1) Recursive representation of joint distribution
P(FEV, SORTHR, FLU, THRINF) P(FEV SORTHR,
FLU, THRINF) P(SORTHR FLU, THRINF)
P(FLU THRINF) P(THRINF) P(FEV FLU, THRINF)
P(SORTHR FLU, THRINF) P( FLU) P( THRINF).
34
2) DAG representation
35
2) DAG representation
Aside A look ahead toward cognitive diagnosis
Good differential diagnosis value for neither
vs. at least one of the two
Good differential diagnosis value for throat
infection vs. no throat infection
36
2) DAG representation
Aside A look ahead toward cognitive diagnosis
No differential diagnosis value for which of
the two?
Good differential diagnosis value for which of
the two?
37
3a) Moralized graph
Marry parents Look at the set of parents of
each variable. If they are not already
connected, connect them. Direction doesnt
matter, since well drop it in the next step.
Rationale If variables are all parents of the
same variable, then even if they were independent
otherwise, learning the value of their common
child generally introduces dependence among them
(think Holmes Watson on the icy road, or the
coin/penny coin flipping example). We will need
to include this possibility in our computational
machinery.
38
3b) Undirected graph
FLU
THRINF
FLU
THRINF
FEVER
SORTHR
FEVER
SORTHR
Drop the directionality of the edges. Although
the conditional probability directions were
important for constructing the graph and will be
important for building potential tables, we want
a structure for computing that can go in any
direction.
39
3c) Triangulated graph
A1
A2
OR
A3
A5
A4
Triangulation means looking at the undirected
graph for cycles from a variable to itself going
through a sequence of other variables. There
should be no cycle with length greater than
three. Whenever there is, add undirected edges
so there are not cycles. The Flu/Throat-infection
moral graph is already triangulated, so it is
not an issue here. A different example is shown
above. Why do we do this? It is essential to
producing cliques of variables that are trees.
Can be many ways to do this finding best one
is NP-hard. People develop heuristic approaches.
40
4) Determine cliques and clique intersections
FLU
THRINF
FLU
THRINF
FLU
THRINF
AND
FEVER
SORTHR
FEVER
SORTHR
FEVER
SORTHR
From the triangulated graph, one determines
cliques, subsets of variables that are all
linked pairwise to one another. Cliques overlap,
with sets of overlapping variables called clique
intersections. The two cliques here are
FEVER, FLU, THRINF and FLU, THRINF, SORTHR.
The clique intersection is FLU, THRINF
41
4) Determine cliques and clique intersections
  • Cliques and intersections are the structure for
    local updating.
  • Can be multiple ways to define cliques from a
    triangulated graph. Finding the best is
    NP-hard. Heuristics developed.
  • The amount of computation grows roughly
    geometrically with clique size, as measured by
    the number of possible configurations of all
    values of all variables in a clique.
  • A clique representation with many small cliques
    is therefore preferred to a representation with a
    few larger cliques.
  • Strategies for increased efficiency include
    defining collector variables, adding variables
    to break loops, and dropping associations when
    the consequences are benign.

42
5) Join tree representation
FEVER, FLU, THRINF
A join-tree representation depicts the
singly-connected structure of cliques and clique
intersections. A join tree has the running
intersection property If a variable appears in
two cliques, it appears in all cliques and clique
intersections in the single path connecting them.
FLU, THRINF
FLU, THRINF, SORTHR
43
6) Potential tables
  • Local calculation is carried out with tables that
    convey the joint distributions of variables
    within cliques, or potential tables.
  • Similar tables for clique intersections are used
    to pass updating information from one clique to
    another.

44
6) Potential tables
For each clique, determine the joint
probabilities for all the possible combinations
of values of all variables. For convenience, we
have written them as matrices. These potential
tables indicate the initial status of the network
in our example--before specific knowledge of a
particular individuals symptoms or disease
states is known.
45
6) Potential tables
The potential table for the clique intersection
is the marginal distribution of flu and throat
infection.
FLU THRINF PROB
Marginal probs for FLU THRINF
46
6) Potential tables
The potential table for Clique 1 is calculated
using the prior probabilities of .11 for both flu
and throat infection, the assumption that they
are independent, and the conditional
probabilities of sore throat for each
flu/throat-infection combination.
FLU THRINF PROB
yes yes .012
x
yes no .098
no yes .098
no no .792
Conditional probs for FEVER given FLU THRINF
Marginal probs for FLU THRINF
47
6) Potential tables
Similar calculation for the other clique
Marginal probabilities of FLU THRINF, times the
conditionals for SORTHR. Note that the implied
distributions for FLUTHRINF are consistent
across both clique potential tables and the
clique intersection table. From these, we can
reconstruct a coherent joint distribution for the
entire set of variables.
48
7) Updating scheme
  • Absorbing new evidence about a single variable is
    effected by re-adjusting the appropriate margin
    in a potential table that contains that variable,
    then propagating the resulting change to the
    clique to other cliques via the clique
    intersections.
  • This process continues outward from the clique
    where the process began, until all cliques have
    been updated.
  • The single-connectedness and running intersection
    properties of the join tree assure that coherent
    probabilities result.

49
7) Updating scheme
Suppose we learn FEVERyes. Go to any clique
where FEVER appears (actually theres just one in
this example). Zero out the entries for
FEVERno. The remaining values express our new
beliefs about the proportional chances that the
other variables in that clique take their
respective joint values.
50
7) Updating scheme
Propagate the new beliefs about FLU, THRINF to
the clique intersection. You could normalize
these if you wanted to, but the proportional
information is what matters.
NEW
OLD
51
7) Updating scheme
Propagate the new beliefs about FLU, THRINF to
the next clique. Divide each row by the old
weight for that combination of clique-intersection
variables and multiply it by the new one. I.e.,
the adjustment factor for each row is New Weight
/ Old Weight.
NEW
OLD
52
7) Updating scheme
Clique 2
Apply the adjustment factor for each row, then
renormalize with respect to all values.
53
7) Updating scheme
Clique 2
Apply the adjustment factor for each row, then
renormalize with respect to all values.
Predictive distribution for SORTHR
54
Comments
  • Triangulation clique determination
    NP-hard--need hueristics
  • HUGIN has multiple options tells you cliques
  • Computation depends on largest clique size (large
    potential tables)
  • More conditional independence is generally better

55
Some Favorable Unfavorable Structures
  • Multiple children are good (think IRT)
  • Multiple parents are not good. Why?

56
Some Favorable Unfavorable Structures
  • Multiple children are good (think IRT)
  • Multiple simple cliques
  • Multiple parents are not good. Why?
  • Moralization forces a clique containing all
    parents the child.

57
Key points for measurement models
  • Student model (SM) variables are of transcending
    interest
  • They characterize student knowledge, skill,
    strategies
  • Cannot be directly observed
  • Observable variables are means of getting
    evidence about SM variables
  • They characterize salient aspects of performance
  • Observable variables from performances are
    modeled as conditionally independent across (but
    not necessarily within) tasks, given SM variables.
Write a Comment
User Comments (0)
About PowerShow.com