CMSC 671 Fall 2001 - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

CMSC 671 Fall 2001

Description:

CMSC 671 Fall 2001 Class #25-26 Tuesday, November 27 / Thursday, November 29 Today s class Neural networks Bayesian learning Machine Learning: Neural and ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 25

Provided by: TimFinin1

Learn more at: https://redirect.cs.umbc.edu

Category:

more less

Transcript and Presenter's Notes

Title: CMSC 671 Fall 2001

1
CMSC 671Fall 2001

Class 25-26 Tuesday, November 27 / Thursday,
November 29

2
Todays class

Neural networks
Bayesian learning

3
Machine Learning Neural and Bayesian

Chapter 19

Some material adapted from lecture notes by Lise
Getoor and Ron Parr
4
Neural function

Brain function (thought) occurs as the result of
the firing of neurons
Neurons connect to each other through synapses,
which propagate action potential (electrical
impulses) by releasing neurotransmitters
Synapses can be excitatory (potential-increasing)
or inhibitory (potential-decreasing), and have
varying activation thresholds
Learning occurs as a result of the synapses
plasticicity They exhibit long-term changes in
connection strength
There are about 1011 neurons and about 1014
synapses in the human brain

5
Biology of a neuron
6
Brain structure

Different areas of the brain have different
functions
Some areas seem to have the same function in all
humans (e.g., Brocas region) the overall layout
is generally consistent
Some areas are more plastic, and vary in their
function also, the lower-level structure and
function vary greatly
We dont know how different functions are
assigned or acquired
Partly the result of the physical layout /
connection to inputs (sensors) and outputs
(effectors)
Partly the result of experience (learning)
We really dont understand how this neural
structure leads to what we perceive as
consciousness or thought
Our neural networks are not nearly as complex or
intricate as the actual brain structure

7
Comparison of computing power

Computers are way faster than neurons
But there are a lot more neurons than we can
reasonably model in modern digital computers, and
they all fire in parallel
Neural networks are designed to be massively
parallel
The brain is effectively a billion times faster

8
Neural networks

Neural networks are made up of nodes or units,
connected by links
Each link has an associated weight and activation
level
Each node has an input function (typically
summing over weighted inputs), an activation
function, and an output

9
Layered feed-forward network
Output units
Hidden units
Input units
10
Neural unit
11
Executing neural networks

Input units are set by some exterior function
(think of these as sensors), which causes their
output links to be activated at the specified
level
Working forward through the network, the input
function of each unit is applied to compute the
input value
Usually this is just the weighted sum of the
activation on the links feeding into this node
The activation function transforms this input
function into a final value
Typically this is a nonlinear function, often a
sigmoid function corresponding to the threshold
of that node

12
Learning neural networks

Backpropagation
Cascade correlation adding hidden units

Take it away, Chih-Yun!
Next up Sohel
13
Learning Bayesian networks

Given training set
Find B that best matches D
model selection
parameter estimation

Inducer
Data D
14
Parameter estimation

Assume known structure
Goal estimate BN parameters q
entries in local probability models, P(X
Parents(X))
A parameterization q is good if it is likely to
generate the observed data
Maximum Likelihood Estimation (MLE) Principle
Choose q so as to maximize L

i.i.d. samples
15
Parameter estimation in BNs

The likelihood decomposes according to the
structure of the network
? we get a separate estimation task for each
parameter
The MLE (maximum likelihood estimate) solution
for each value x of a node X
and each instantiation u of Parents(X)
Just need to collect the counts for every
combination of parents and children observed in
the data
MLE is equivalent to an assumption of a uniform
prior over parameter values

sufficient statistics
16
Sufficient statistics Example
Moon-phase

Why are the counts sufficient?

Light-level
Earthquake
Burglary
Alarm
17
Model selection

Goal Select the best network structure, given
the data
Input
Training data
Scoring function
Output
A network that maximizes the score

18
Structure selection Scoring

Bayesian prior over parameters and structure
get balance between model complexity and fit to
data as a byproduct
Score (GD) log P(GD) ? log P(DG) P(G)
Marginal likelihood just comes from our parameter
estimates
Prior on structure can be any measure we want
typically a function of the network complexity

Marginal likelihood
Prior
19
Heuristic search
20
Exploiting decomposability
21
Variations on a theme

Known structure, fully observable only need to
do parameter estimation
Unknown structure, fully observable do heuristic
search through structure space, then parameter
estimation
Known structure, missing values use expectation
maximization (EM) to estimate parameters
Known structure, hidden variables apply adaptive
probabilistic network (APN) techniques
Unknown structure, hidden variables too hard to
solve!

22
Handling missing data

Suppose that in some cases, we observe
earthquake, alarm, light-level, and moon-phase,
but not burglary
Should we throw that data away??
Idea Guess the missing valuesbased on the other
data

Moon-phase
Light-level
Earthquake
Burglary
Alarm
23
EM (expectation maximization)

Guess probabilities for nodes with missing values
(e.g., based on other observations)
Compute the probability distribution over the
missing values, given our guess
Update the probabilities based on the guessed
values
Repeat until convergence

24
EM example

Suppose we have observed Earthquake and Alarm but
not Burglary for an observation on November 27
We estimate the CPTs based on the rest of the
data
We then estimate P(Burglary) for November 27 from
those CPTs
Now we recompute the CPTs as if that estimated
value had been observed
Repeat until convergence!

Earthquake
Burglary
Alarm

Write a Comment

User Comments (0)