Title: Intelligent Systems 2II40 C8
1Intelligent Systems (2II40)C8
October 2003
2VI. Uncertainty
- VI.2. Probabilistic reasoning
- Conditional independence
- Bayesian networks syntax and semantics
- Exact inference
- Approximate inference
3Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
4Probability density functions
- Instead of probability distributions
- For continuous variables
- Ex. let X denote tomorrows maximum temperature
in the summer in Eindhoven - Belief that X is distributed uniformly between 18
and 26 degree Celsius - P(Xx) U18,26(x)
- P(X20,5) U18,26(20,5)0,125/C
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
9Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
- Option 1 discretization
- possibly large errors, large CPTs
- Option 2 finitely parameterized canonical
families - Continuous variable, discrete continuous
parents (e.g., Cost) - Discrete variable, continuous parents (e.g.,
Buys?)
10a) Continuous child variables
- Need one conditional density function for child
variable given continuous parents, for each
possible assignment to discrete parents - Most common is the linear Gaussian model, e.g.
- Mean Cost varies linearly w. Harvest, variance is
fixed - Linear variation is unreasonable over the full
range, but works OK if the likely range of
Harvest is narrow
11Continuous child variables ex.
- All-continuous network w. LG distribution ?
full joint is a multivariate Gaussian - Discrete continuous LG network is a conditional
Gaussian network, i.e., a multivariate Gaussian
over all continuous variables for each
combination of discrete variable values
12b) Discrete child, continuous parent
- P(buysCostc) ??((-c ??) / ?)
- with ? - threshold for buying
- Probit distribution
- ? - the integral on the standard normal
distribution - Logit distribution
- Uses the sigmoid function
13VI.2. Probabilistic reasoning
- Conditional independence
- Bayesian networks syntax and semantics
- Exact inference
- Exact inference by enumeration
- Exact inference by variable elimination
- Approximate inference
14VI.2. D. Approximate inference
- Sampling from an empty network
- Rejection sampling reject samples disagreeing w.
evidence - Likelihood weighting use evidence to weight
samples - MCMC sample from a stochastic process whose
stationary distribution is the true posterior
15i. Sampling from an empty network cont.
- Probability that PRIOR-SAMPLE generates a
particular event - SPS(x1, xn) ? n i1P(XiParents(Xi))P(x1,xn)
- NPS (Yy) no. of samples generated for which Yy
for any set of variables Y. - Then, P(Yy) NPS(Yy)/N and
- lim N??? P(Yy) ?h SPS(Yy,Hh)
- ?h P(Yy,Hh)
- P(Yy)
- ? estimates derived from PRIOR-SAMPLE are
consistent
16iii. Likelihood weighting analysis
- Sampling probability for WEIGHTED-SAMPLE is
- SWS(y,e) ? l i1P(yiParents(Yi))
- Note pays attention to evidence in ancestors
only ? somewhere in between prior and posterior
distribution - Weight for a given sample y,e, is
- w(y,e) ? n i1P(eiParents(Ei))
- Weighted sampling probability is
- SWS(y,e) w(y,e) ? l i1P(yiParents(Yi)) ? m
i1P(eiParents(Ei)) P(y,e) by
standard global semantics of network - Hence, likelihood weighting is consistent
- But performance still degrades w. many evidence
variables
17iv. MCMC Example
- Estimate P(RainSprinklertrue, WetGrasstrue)
- Sample Cloudy then Rain, repeat.
- Markov blanket of Cloudy is Sprinkler and Rain.
- Markov blanket of Rain is Cloudy, Sprinkler and
WetGrass.
18iv. MCMC Example cont.
- 0. Random initial state
- Cloudytrue and Rainfalse
- P(CloudyMB(Cloudy)) P(CloudySprinkler, ?Rain)
- sample ? false
- P(RainMB(Rain)) P(Rain?Cloudy,
Sprinkler,WetGrass) - sample ? true
- Visit 100 states
- 31 have Raintrue, 69 have Rainfalse
-
- P(RainSprinklertrue,WetGrasstrue)
NORMALIZE(lt31,69gt) lt0.31,0.69gt
19Probability of x, given MB(x)
20MCMC algorithm
21Performance of statistical algorithms
- Polytime approximation
- Stochastic approximation techniques such as
likelihood weighting and MCMC - can give reasonable estimates of true posterior
probabilities in a network, and - can cope with much larger networks
22Summary uncertainty
- Bayesian networks (BN) are DAG w. random
variables as nodes each node has a conditional
distribution for the node, given its parents - BN specify a full joint distribution
- Most widely used BN Printer Wizard in Microsoft
Windows and the Office Assistant in Microsoft
Office (Horvitz, 98) - Possibility theory (Zadeh, 78) simulates
probability theory for Fuzzy Logics (Zadeh, 65)
23VII. Learning
24VII. Learning
- Determinants
- Neural Networks
25- Agents that can improve their behavior through
diligent study of their own experiences. - Russel, Norvig
26VII. 1. Determinants of learning
- Components
- Feedback
- Representation
- Prior knowledge
- Methods
27A. Components of learning agents
- Direct mapping from conditions on the current
state to actions. - Means to infer relevant properties of the world
from the percept sequence. - Info about how the world evolves the results of
possible actions the agent can take. - Utility info indicating the desirability of world
states. - Action-value info indicating the desirability of
actions. - Goals that describe classes of states whose
achievement maximizes the agents utility.
28B. Type of feedback
- Supervised
- Learning a function from example I-O
- Unsupervised
- Learning patterns from I
- Reinforcement
- Learning from rewards and penalties
29C. Representation of learned info
- Ex.
- linear weighted polynomials for utility functions
e.g., games agents - PL, FOL logical agents
- Probabilistic descriptions (Bayesian networks)
decision-theoretic agents
30D. Prior knowledge
- Most AI learning algorithms learn from scratch.
- Humans usually have a lot of diverse prior
knowledge.
31E. Methods of learning
- In PL
- Inductive learning, decision trees, etc.
- In FOL
- Inductive Logic Programming (ILP), prior
knowledge as attributes or relations, deductive
methods, etc. - In Bayesian networks
- Learning hidden Markov models, etc.
- In Neural Networks
- Support Vector Machines (Kernel)
- Vapnik very effective fashionable!!
32VII. 2. Neural Networks
- Introduction NN
- Discrete Neuron Perceptron
- Perceptron learning
33VII.2.A. Introduction NN
34Applications
Why NNs?
vs
35Applications
Why NNs?
36Man-machine hardware comparison
37Man-machine information processing
38What are humans good at and machines not?
- Humans
- pattern recognition
- Reasoning with incomplete knowledge
- Computers
- Precise computing
- Number crunching
39Purkinje cell
40The Biological Neuron
41The Artificial Neuron
Functions Inside z -
synapse Outside f - threshold
42(very small) Biological NN
43An ANN
Input
Layer 1
44An ANN
Input
Black Box
Layer 1
45Feedforward NNs
46Recurrent NNs
47- Lets look in the Black Box!
48NEURON LINK
w12 weight
neuron 1
neuron 2
49neuron computation
y2
yn
w2k
y1
w1k
wnk
O
50Typical I-O external activation f
- Standard sigmoid function f(z) 1/(1e-z)
- Discrete neuron fires at max. speed, or does not
fire - xi0,1 f(z) 1, z ? 0 0 zlt0
f
z
f
z
51Other I-O external activation f
- 3. Linear neuron f(z)z
- output xizi bi
- 4. Stochastic neuron xi ? 0,1 output 0 or 1
- input zi ? j wij xi bi
- probability that neuron fires f(zi)
- probability that it doesnt fire 1- f(zi)
52Perceptron
- simple case
- no hidden layers
- Only one neuron
- Get rid of threshold ( b) becomes w0
- f Boolean function ? 0 fires lt 0 doesnt fire
- discrete neuron
53What can this perceptron do?
(w0 - t -1)
?
f
?
54f A or B
55f A and B
56 57Learning weight computation
- w1(A1) w2(A1) ? (t1)
- w1(A0) w2(A1) lt (t1)
- w1(A1) w2(A0) lt (t1)
- w1(A0) w2(A0) lt (t1)
58Perceptron Learning Ruleincremental version
ROSENBLATT (1962)
- FOR i 0 TO n DO wirandom initial value
ENDFOR - REPEAT
- select a pair (x,t) in X
- ( each pair must have a positive probability
of being selected ) - IF wT x' gt 0 THEN y1 ELSE y0 ENDIF
- IF y ? t THEN
- FOR i 0 TO n DO wi wi ? (t-y) xi' ENDFOR
ENDIF - UNTIL X is correctly classified
59f or(x1, and (x2,x3) )
w11 w2w30,5
60f or(and(x1,xk),and (xk1,xn) )
Is this correct?
w1wk1/k wk1wn 1/(n-k)
61Homework 8
- STEP 8 the first veto each student should also
evaluate the other students in her/his group and
give a passing/not passing grade. This is to
avoid that some students dont work at all. Send
this information to me in an e-mail
(a.i.cristea_at_tue.nl), as a list of names from
your group, and a word next to each name, saying
passing or not passing. E.g., - John Doe passing
- Note this is a warning only, so you should be
very strict and severe, signaling in time if
somebody is not cooperating (so they have a
chance to change). The final veto is the one that
counts. - (This was homework 8!!)
- Continue till step 9 and 12 (!) with your project.