CIS732-Lecture-23-20070308 - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

CIS732-Lecture-23-20070308

Description:

Methodology (model parameters): aj, uij, bk, vjk (hyperparameters) ... Hidden layer activation: hj (x) = tanh (aj i uij xi) Classifier Output: Prediction ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 25

Provided by: lindajacks

Category:

Tags: aj | cis732 | lecture

more less

Transcript and Presenter's Notes

Title: CIS732-Lecture-23-20070308

1
Lecture 23 of 42
Bayesian Networks Midterm Review 1 of 2
Thursday, 08 March 2007 William H.
Hsu Department of Computing and Information
Sciences, KSU http//www.kddresearch.org/Courses/S
pring-2007/CIS732 Readings Chapters 1-7,
Mitchell Chapters 14-15, 18, Russell and Norvig
2
Case StudyBOC and Gibbs Classifier for ANNs 1
3
Case StudyBOC and Gibbs Classifier for ANNs 2
4
BOC and Gibbs Sampling

Gibbs Sampling Approximating the BOC
Collect many Gibbs samples
Interleave the update of parameters and
hyperparameters
e.g., train ANN weights using Gibbs sampling
Accept a candidate ?w if it improves error or
rand() ? current threshold
After every few thousand such transitions, sample
hyperparameters
Convergence lower current threshold slowly
Hypothesis return model (e.g., network weights)
Intuitive idea sample models (e.g., ANN
snapshots) according to likelihood
How Close to Bayes Optimality Can Gibbs Sampling
Get?
Depends on how many samples taken (how slowly
current threshold is lowered)
Simulated annealing terminology annealing
schedule
More on this when we get to genetic algorithms

5
Graphical Modelsof Probability Distributions

Idea
Want model that can be used to perform inference
Desired properties
Ability to represent functional, logical,
stochastic relationships
Express uncertainty
Observe the laws of probability
Tractable inference when possible
Can be learned from data
Additional Desiderata
Ability to incorporate knowledge
Knowledge acquisition and elicitation in format
familiar to domain experts
Language of subjective probabilities and relative
probabilities
Support decision making
Represent utilities (cost or value of
information, state)
Probability theory utility theory decision
theory
Ability to reason over time (temporal models)

6
Using Graphical Models

A Graphical View of Simple (Naïve) Bayes
xi ? 0, 1 for each i ? 1, 2, , n y ? 0, 1
Given P(xi y) for each i ? 1, 2, , n P(y)
Assume conditional independence
? i ? 1, 2, , n ? P(xi x?i, y) ? P(xi x1,
x2, , xi-1, xi1, xi2, , xn, y) P(xi y)
NB this assumption entails the Naïve Bayes
assumption
Why?
Can compute P(y x) given this info
Can also compute the joint pdf over all n 1
variables
Inference Problem for a (Simple) Bayesian Network
Use the above model to compute the probability of
any conditional event
Exercise P(x1, x2, y x3, x4)

7
In-Class ExerciseProbabilistic Inference
8
Unsupervised Learningand Conditional Independence
9
Bayesian Belief Networks (BBNS)Definition
P(Summer, Off, Drizzle, Wet, Not-Slippery) P(S)
P(O S) P(D S) P(W O, D) P(N W)
10
Bayesian Belief NetworksProperties