Intelligent Systems 2II40 C8 - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Intelligent Systems 2II40 C8

Description:

Bayesian networks: syntax and semantics. Exact inference ... Probit distribution: - the integral on the standard normal distribution. Logit distribution: ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 62
Provided by: wwwisW
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Systems 2II40 C8


1
Intelligent Systems (2II40)C8
  • Alexandra I. Cristea

October 2003
2
VI. Uncertainty
  • VI.2. Probabilistic reasoning
  • Conditional independence
  • Bayesian networks syntax and semantics
  • Exact inference
  • Approximate inference

3
Hybrid (discrete continuous) networks
  • Discrete (Subsidy? and Buys?)
  • Continuous (Harvest and Cost)
  • How to deal with this?

4
Probability density functions
  • Instead of probability distributions
  • For continuous variables
  • Ex. let X denote tomorrows maximum temperature
    in the summer in Eindhoven
  • Belief that X is distributed uniformly between 18
    and 26 degree Celsius
  • P(Xx) U18,26(x)
  • P(X20,5) U18,26(20,5)0,125/C

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Hybrid (discrete continuous) networks
  • Discrete (Subsidy? and Buys?)
  • Continuous (Harvest and Cost)
  • How to deal with this?

9
Hybrid (discrete continuous) networks
  • Discrete (Subsidy? and Buys?)
  • Continuous (Harvest and Cost)
  • Option 1 discretization
  • possibly large errors, large CPTs
  • Option 2 finitely parameterized canonical
    families
  • Continuous variable, discrete continuous
    parents (e.g., Cost)
  • Discrete variable, continuous parents (e.g.,
    Buys?)

10
a) Continuous child variables
  • Need one conditional density function for child
    variable given continuous parents, for each
    possible assignment to discrete parents
  • Most common is the linear Gaussian model, e.g.
  • Mean Cost varies linearly w. Harvest, variance is
    fixed
  • Linear variation is unreasonable over the full
    range, but works OK if the likely range of
    Harvest is narrow

11
Continuous child variables ex.
  • All-continuous network w. LG distribution ?
    full joint is a multivariate Gaussian
  • Discrete continuous LG network is a conditional
    Gaussian network, i.e., a multivariate Gaussian
    over all continuous variables for each
    combination of discrete variable values

12
b) Discrete child, continuous parent
  • P(buysCostc) ??((-c ??) / ?)
  • with ? - threshold for buying
  • Probit distribution
  • ? - the integral on the standard normal
    distribution
  • Logit distribution
  • Uses the sigmoid function

13
VI.2. Probabilistic reasoning
  • Conditional independence
  • Bayesian networks syntax and semantics
  • Exact inference
  • Exact inference by enumeration
  • Exact inference by variable elimination
  • Approximate inference

14
VI.2. D. Approximate inference
  • Sampling from an empty network
  • Rejection sampling reject samples disagreeing w.
    evidence
  • Likelihood weighting use evidence to weight
    samples
  • MCMC sample from a stochastic process whose
    stationary distribution is the true posterior

15
i. Sampling from an empty network cont.
  • Probability that PRIOR-SAMPLE generates a
    particular event
  • SPS(x1, xn) ? n i1P(XiParents(Xi))P(x1,xn)
  • NPS (Yy) no. of samples generated for which Yy
    for any set of variables Y.
  • Then, P(Yy) NPS(Yy)/N and
  • lim N??? P(Yy) ?h SPS(Yy,Hh)
  • ?h P(Yy,Hh)
  • P(Yy)
  • ? estimates derived from PRIOR-SAMPLE are
    consistent

16
iii. Likelihood weighting analysis
  • Sampling probability for WEIGHTED-SAMPLE is
  • SWS(y,e) ? l i1P(yiParents(Yi))
  • Note pays attention to evidence in ancestors
    only ? somewhere in between prior and posterior
    distribution
  • Weight for a given sample y,e, is
  • w(y,e) ? n i1P(eiParents(Ei))
  • Weighted sampling probability is
  • SWS(y,e) w(y,e) ? l i1P(yiParents(Yi)) ? m
    i1P(eiParents(Ei)) P(y,e) by
    standard global semantics of network
  • Hence, likelihood weighting is consistent
  • But performance still degrades w. many evidence
    variables

17
iv. MCMC Example
  • Estimate P(RainSprinklertrue, WetGrasstrue)
  • Sample Cloudy then Rain, repeat.
  • Markov blanket of Cloudy is Sprinkler and Rain.
  • Markov blanket of Rain is Cloudy, Sprinkler and
    WetGrass.

18
iv. MCMC Example cont.
  • 0. Random initial state
  • Cloudytrue and Rainfalse
  • P(CloudyMB(Cloudy)) P(CloudySprinkler, ?Rain)
  • sample ? false
  • P(RainMB(Rain)) P(Rain?Cloudy,
    Sprinkler,WetGrass)
  • sample ? true
  • Visit 100 states
  • 31 have Raintrue, 69 have Rainfalse
  • P(RainSprinklertrue,WetGrasstrue)
    NORMALIZE(lt31,69gt) lt0.31,0.69gt

19
Probability of x, given MB(x)
20
MCMC algorithm
21
Performance of statistical algorithms
  • Polytime approximation
  • Stochastic approximation techniques such as
    likelihood weighting and MCMC
  • can give reasonable estimates of true posterior
    probabilities in a network, and
  • can cope with much larger networks

22
Summary uncertainty
  • Bayesian networks (BN) are DAG w. random
    variables as nodes each node has a conditional
    distribution for the node, given its parents
  • BN specify a full joint distribution
  • Most widely used BN Printer Wizard in Microsoft
    Windows and the Office Assistant in Microsoft
    Office (Horvitz, 98)
  • Possibility theory (Zadeh, 78) simulates
    probability theory for Fuzzy Logics (Zadeh, 65)

23
VII. Learning
24
VII. Learning
  • Determinants
  • Neural Networks

25
  • Agents that can improve their behavior through
    diligent study of their own experiences.
  • Russel, Norvig

26
VII. 1. Determinants of learning
  • Components
  • Feedback
  • Representation
  • Prior knowledge
  • Methods

27
A. Components of learning agents
  • Direct mapping from conditions on the current
    state to actions.
  • Means to infer relevant properties of the world
    from the percept sequence.
  • Info about how the world evolves the results of
    possible actions the agent can take.
  • Utility info indicating the desirability of world
    states.
  • Action-value info indicating the desirability of
    actions.
  • Goals that describe classes of states whose
    achievement maximizes the agents utility.

28
B. Type of feedback
  • Supervised
  • Learning a function from example I-O
  • Unsupervised
  • Learning patterns from I
  • Reinforcement
  • Learning from rewards and penalties

29
C. Representation of learned info
  • Ex.
  • linear weighted polynomials for utility functions
    e.g., games agents
  • PL, FOL logical agents
  • Probabilistic descriptions (Bayesian networks)
    decision-theoretic agents

30
D. Prior knowledge
  • Most AI learning algorithms learn from scratch.
  • Humans usually have a lot of diverse prior
    knowledge.

31
E. Methods of learning
  • In PL
  • Inductive learning, decision trees, etc.
  • In FOL
  • Inductive Logic Programming (ILP), prior
    knowledge as attributes or relations, deductive
    methods, etc.
  • In Bayesian networks
  • Learning hidden Markov models, etc.
  • In Neural Networks
  • Support Vector Machines (Kernel)
  • Vapnik very effective fashionable!!

32
VII. 2. Neural Networks
  • Introduction NN
  • Discrete Neuron Perceptron
  • Perceptron learning

33
VII.2.A. Introduction NN
34
Applications
Why NNs?
vs
35
Applications
Why NNs?
36
Man-machine hardware comparison
37
Man-machine information processing
38
What are humans good at and machines not?
  • Humans
  • pattern recognition
  • Reasoning with incomplete knowledge
  • Computers
  • Precise computing
  • Number crunching

39
Purkinje cell
40
The Biological Neuron
41
The Artificial Neuron
Functions Inside z -
synapse Outside f - threshold
42
(very small) Biological NN
43
An ANN
Input
Layer 1
44
An ANN
Input
Black Box
Layer 1
45
Feedforward NNs
46
Recurrent NNs
47
  • Lets look in the Black Box!

48
NEURON LINK
w12 weight
neuron 1
neuron 2
49
neuron computation
y2

yn
w2k
y1
w1k
wnk
O
50
Typical I-O external activation f
  • Standard sigmoid function f(z) 1/(1e-z)
  • Discrete neuron fires at max. speed, or does not
    fire
  • xi0,1 f(z) 1, z ? 0 0 zlt0

f
z
f
z
51
Other I-O external activation f
  • 3. Linear neuron f(z)z
  • output xizi bi
  • 4. Stochastic neuron xi ? 0,1 output 0 or 1
  • input zi ? j wij xi bi
  • probability that neuron fires f(zi)
  • probability that it doesnt fire 1- f(zi)

52
Perceptron
  • simple case
  • no hidden layers
  • Only one neuron
  • Get rid of threshold ( b) becomes w0
  • f Boolean function ? 0 fires lt 0 doesnt fire
  • discrete neuron

53
What can this perceptron do?
(w0 - t -1)
?
f
?
54
f A or B
55
f A and B
56
  • What is learning?

57
Learning weight computation
  • w1(A1) w2(A1) ? (t1)
  • w1(A0) w2(A1) lt (t1)
  • w1(A1) w2(A0) lt (t1)
  • w1(A0) w2(A0) lt (t1)

58
Perceptron Learning Ruleincremental version
ROSENBLATT (1962)
  • FOR i 0 TO n DO wirandom initial value
    ENDFOR
  • REPEAT
  • select a pair (x,t) in X
  • ( each pair must have a positive probability
    of being selected )
  • IF wT x' gt 0 THEN y1 ELSE y0 ENDIF
  • IF y ? t THEN
  • FOR i 0 TO n DO wi wi ? (t-y) xi' ENDFOR
    ENDIF
  • UNTIL X is correctly classified

59
f or(x1, and (x2,x3) )
w11 w2w30,5
60
f or(and(x1,xk),and (xk1,xn) )
Is this correct?
w1wk1/k wk1wn 1/(n-k)
61
Homework 8
  • STEP 8 the first veto each student should also
    evaluate the other students in her/his group and
    give a passing/not passing grade. This is to
    avoid that some students dont work at all. Send
    this information to me in an e-mail
    (a.i.cristea_at_tue.nl), as a list of names from
    your group, and a word next to each name, saying
    passing or not passing. E.g.,
  • John Doe passing
  • Note this is a warning only, so you should be
    very strict and severe, signaling in time if
    somebody is not cooperating (so they have a
    chance to change). The final veto is the one that
    counts.
  • (This was homework 8!!)
  • Continue till step 9 and 12 (!) with your project.
Write a Comment
User Comments (0)
About PowerShow.com