Intelligent Systems 2II40 C8

About This Presentation

Title:

Intelligent Systems 2II40 C8

Description:

Bayesian networks: syntax and semantics. Exact inference ... Probit distribution: - the integral on the standard normal distribution. Logit distribution: ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 62

Provided by: wwwisW

Category:

more less

Transcript and Presenter's Notes

Title: Intelligent Systems 2II40 C8

1
Intelligent Systems (2II40)C8

Alexandra I. Cristea

October 2003
2
VI. Uncertainty

VI.2. Probabilistic reasoning
Conditional independence
Bayesian networks syntax and semantics
Exact inference
Approximate inference

3
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

How to deal with this?

4
Probability density functions

Instead of probability distributions
For continuous variables
Ex. let X denote tomorrows maximum temperature
in the summer in Eindhoven
Belief that X is distributed uniformly between 18
and 26 degree Celsius
P(Xx) U18,26(x)
P(X20,5) U18,26(20,5)0,125/C

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

How to deal with this?

9
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

Option 1 discretization
possibly large errors, large CPTs
Option 2 finitely parameterized canonical
families
Continuous variable, discrete continuous
parents (e.g., Cost)
Discrete variable, continuous parents (e.g.,
Buys?)

10
a) Continuous child variables

Need one conditional density function for child
variable given continuous parents, for each
possible assignment to discrete parents
Most common is the linear Gaussian model, e.g.
Mean Cost varies linearly w. Harvest, variance is
fixed
Linear variation is unreasonable over the full
range, but works OK if the likely range of
Harvest is narrow

11
Continuous child variables ex.

All-continuous network w. LG distribution ?
full joint is a multivariate Gaussian
Discrete continuous LG network is a conditional
Gaussian network, i.e., a multivariate Gaussian
over all continuous variables for each
combination of discrete variable values

12
b) Discrete child, continuous parent

P(buysCostc) ??((-c ??) / ?)
with ? - threshold for buying
Probit distribution
? - the integral on the standard normal
distribution
Logit distribution
Uses the sigmoid function

13
VI.2. Probabilistic reasoning

Conditional independence
Bayesian networks syntax and semantics
Exact inference
Exact inference by enumeration
Exact inference by variable elimination
Approximate inference

14
VI.2. D. Approximate inference

Sampling from an empty network
Rejection sampling reject samples disagreeing w.
evidence
Likelihood weighting use evidence to weight
samples
MCMC sample from a stochastic process whose
stationary distribution is the true posterior

15
i. Sampling from an empty network cont.

Probability that PRIOR-SAMPLE generates a
particular event
SPS(x1, xn) ? n i1P(XiParents(Xi))P(x1,xn)
NPS (Yy) no. of samples generated for which Yy
for any set of variables Y.
Then, P(Yy) NPS(Yy)/N and
lim N??? P(Yy) ?h SPS(Yy,Hh)
?h P(Yy,Hh)
P(Yy)
? estimates derived from PRIOR-SAMPLE are
consistent

16
iii. Likelihood weighting analysis

Sampling probability for WEIGHTED-SAMPLE is
SWS(y,e) ? l i1P(yiParents(Yi))
Note pays attention to evidence in ancestors
only ? somewhere in between prior and posterior
distribution
Weight for a given sample y,e, is
w(y,e) ? n i1P(eiParents(Ei))
Weighted sampling probability is
SWS(y,e) w(y,e) ? l i1P(yiParents(Yi)) ? m
i1P(eiParents(Ei)) P(y,e) by
standard global semantics of network
Hence, likelihood weighting is consistent
But performance still degrades w. many evidence
variables

17
iv. MCMC Example

Estimate P(RainSprinklertrue, WetGrasstrue)
Sample Cloudy then Rain, repeat.
Markov blanket of Cloudy is Sprinkler and Rain.
Markov blanket of Rain is Cloudy, Sprinkler and
WetGrass.

18
iv. MCMC Example cont.

0. Random initial state
Cloudytrue and Rainfalse
P(CloudyMB(Cloudy)) P(CloudySprinkler, ?Rain)
sample ? false
P(RainMB(Rain)) P(Rain?Cloudy,
Sprinkler,WetGrass)
sample ? true
Visit 100 states
31 have Raintrue, 69 have Rainfalse
P(RainSprinklertrue,WetGrasstrue)
NORMALIZE(lt31,69gt) lt0.31,0.69gt

19
Probability of x, given MB(x)
20
MCMC algorithm
21
Performance of statistical algorithms

Polytime approximation
Stochastic approximation techniques such as
likelihood weighting and MCMC
can give reasonable estimates of true posterior
probabilities in a network, and
can cope with much larger networks

22
Summary uncertainty

Bayesian networks (BN) are DAG w. random
variables as nodes each node has a conditional
distribution for the node, given its parents
BN specify a full joint distribution
Most widely used BN Printer Wizard in Microsoft
Windows and the Office Assistant in Microsoft
Office (Horvitz, 98)
Possibility theory (Zadeh, 78) simulates
probability theory for Fuzzy Logics (Zadeh, 65)

23
VII. Learning
24
VII. Learning

Determinants
Neural Networks

Agents that can improve their behavior through
diligent study of their own experiences.
Russel, Norvig

26
VII. 1. Determinants of learning

Components
Feedback
Representation
Prior knowledge
Methods

27
A. Components of learning agents

Direct mapping from conditions on the current
state to actions.
Means to infer relevant properties of the world
from the percept sequence.
Info about how the world evolves the results of
possible actions the agent can take.
Utility info indicating the desirability of world
states.
Action-value info indicating the desirability of
actions.
Goals that describe classes of states whose
achievement maximizes the agents utility.

28
B. Type of feedback

Supervised
Learning a function from example I-O
Unsupervised
Learning patterns from I
Reinforcement
Learning from rewards and penalties

29
C. Representation of learned info

Ex.
linear weighted polynomials for utility functions
e.g., games agents
PL, FOL logical agents
Probabilistic descriptions (Bayesian networks)
decision-theoretic agents

30
D. Prior knowledge

Most AI learning algorithms learn from scratch.
Humans usually have a lot of diverse prior
knowledge.

31
E. Methods of learning

In PL
Inductive learning, decision trees, etc.
In FOL
Inductive Logic Programming (ILP), prior
knowledge as attributes or relations, deductive
methods, etc.
In Bayesian networks
Learning hidden Markov models, etc.
In Neural Networks
Support Vector Machines (Kernel)
Vapnik very effective fashionable!!

32
VII. 2. Neural Networks

Introduction NN
Discrete Neuron Perceptron
Perceptron learning

33
VII.2.A. Introduction NN
34
Applications
Why NNs?
vs
35
Applications
Why NNs?
36
Man-machine hardware comparison
37
Man-machine information processing
38
What are humans good at and machines not?

Humans
pattern recognition
Reasoning with incomplete knowledge
Computers
Precise computing
Number crunching

39
Purkinje cell
40
The Biological Neuron
41
The Artificial Neuron
Functions Inside z -
synapse Outside f - threshold
42
(very small) Biological NN
43
An ANN
Input
Layer 1
44
An ANN
Input
Black Box
Layer 1
45
Feedforward NNs
46
Recurrent NNs
47

Lets look in the Black Box!

48
NEURON LINK
w12 weight
neuron 1
neuron 2
49
neuron computation
y2

yn
w2k
y1
w1k
wnk
O
50
Typical I-O external activation f

Standard sigmoid function f(z) 1/(1e-z)
Discrete neuron fires at max. speed, or does not
fire
xi0,1 f(z) 1, z ? 0 0 zlt0

f
z
f
z
51
Other I-O external activation f

3. Linear neuron f(z)z
output xizi bi
4. Stochastic neuron xi ? 0,1 output 0 or 1
input zi ? j wij xi bi
probability that neuron fires f(zi)
probability that it doesnt fire 1- f(zi)

52
Perceptron

simple case
no hidden layers
Only one neuron
Get rid of threshold ( b) becomes w0
f Boolean function ? 0 fires lt 0 doesnt fire
discrete neuron

53
What can this perceptron do?
(w0 - t -1)
?
f
?
54
f A or B
55
f A and B
56

What is learning?

57
Learning weight computation

w1(A1) w2(A1) ? (t1)
w1(A0) w2(A1) lt (t1)
w1(A1) w2(A0) lt (t1)
w1(A0) w2(A0) lt (t1)

58
Perceptron Learning Ruleincremental version
ROSENBLATT (1962)

FOR i 0 TO n DO wirandom initial value
ENDFOR
REPEAT
select a pair (x,t) in X
( each pair must have a positive probability
of being selected )
IF wT x' gt 0 THEN y1 ELSE y0 ENDIF
IF y ? t THEN
FOR i 0 TO n DO wi wi ? (t-y) xi' ENDFOR
ENDIF
UNTIL X is correctly classified

59
f or(x1, and (x2,x3) )
w11 w2w30,5
60
f or(and(x1,xk),and (xk1,xn) )
Is this correct?
w1wk1/k wk1wn 1/(n-k)
61
Homework 8

STEP 8 the first veto each student should also
evaluate the other students in her/his group and
give a passing/not passing grade. This is to
avoid that some students dont work at all. Send
this information to me in an e-mail
(a.i.cristea_at_tue.nl), as a list of names from
your group, and a word next to each name, saying
passing or not passing. E.g.,
John Doe passing
Note this is a warning only, so you should be
very strict and severe, signaling in time if
somebody is not cooperating (so they have a
chance to change). The final veto is the one that
counts.
(This was homework 8!!)
Continue till step 9 and 12 (!) with your project.

Write a Comment

User Comments (0)

About PowerShow.com

Intelligent Systems 2II40 C8 - PowerPoint PPT Presentation

Intelligent Systems 2II40 C8

Bayesian networks: syntax and semantics. Exact inference ... Probit distribution: - the integral on the standard normal distribution. Logit distribution: ... – PowerPoint PPT presentation