CMSC 671 Fall 2001 - PowerPoint PPT Presentation

About This Presentation
Title:

CMSC 671 Fall 2001

Description:

We really don't understand how this neural structure leads to what we perceive ... Neural networks are designed to be massively parallel ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: timfininma
Category:
Tags: cmsc | fall | neural

less

Transcript and Presenter's Notes

Title: CMSC 671 Fall 2001


1
CMSC 671Fall 2001
  • Class 25-26 Tuesday, November 27 / Thursday,
    November 29

2
Todays class
  • Neural networks
  • Bayesian learning

3
Machine Learning Neural and Bayesian
  • Chapter 19

Some material adapted from lecture notes by Lise
Getoor and Ron Parr
4
Neural function
  • Brain function (thought) occurs as the result of
    the firing of neurons
  • Neurons connect to each other through synapses,
    which propagate action potential (electrical
    impulses) by releasing neurotransmitters
  • Synapses can be excitatory (potential-increasing)
    or inhibitory (potential-decreasing), and have
    varying activation thresholds
  • Learning occurs as a result of the synapses
    plasticicity They exhibit long-term changes in
    connection strength
  • There are about 1011 neurons and about 1014
    synapses in the human brain

5
Biology of a neuron
6
Brain structure
  • Different areas of the brain have different
    functions
  • Some areas seem to have the same function in all
    humans (e.g., Brocas region) the overall layout
    is generally consistent
  • Some areas are more plastic, and vary in their
    function also, the lower-level structure and
    function vary greatly
  • We dont know how different functions are
    assigned or acquired
  • Partly the result of the physical layout /
    connection to inputs (sensors) and outputs
    (effectors)
  • Partly the result of experience (learning)
  • We really dont understand how this neural
    structure leads to what we perceive as
    consciousness or thought
  • Our neural networks are not nearly as complex or
    intricate as the actual brain structure

7
Comparison of computing power
  • Computers are way faster than neurons
  • But there are a lot more neurons than we can
    reasonably model in modern digital computers, and
    they all fire in parallel
  • Neural networks are designed to be massively
    parallel
  • The brain is effectively a billion times faster

8
Neural networks
  • Neural networks are made up of nodes or units,
    connected by links
  • Each link has an associated weight and activation
    level
  • Each node has an input function (typically
    summing over weighted inputs), an activation
    function, and an output

9
Layered feed-forward network
Output units
Hidden units
Input units
10
Neural unit
11
Executing neural networks
  • Input units are set by some exterior function
    (think of these as sensors), which causes their
    output links to be activated at the specified
    level
  • Working forward through the network, the input
    function of each unit is applied to compute the
    input value
  • Usually this is just the weighted sum of the
    activation on the links feeding into this node
  • The activation function transforms this input
    function into a final value
  • Typically this is a nonlinear function, often a
    sigmoid function corresponding to the threshold
    of that node

12
Learning neural networks
  • Backpropagation
  • Cascade correlation adding hidden units

Take it away, Chih-Yun!
Next up Sohel
13
Learning Bayesian networks
  • Given training set
  • Find B that best matches D
  • model selection
  • parameter estimation

Inducer
Data D
14
Parameter estimation
  • Assume known structure
  • Goal estimate BN parameters q
  • entries in local probability models, P(X
    Parents(X))
  • A parameterization q is good if it is likely to
    generate the observed data
  • Maximum Likelihood Estimation (MLE) Principle
    Choose q so as to maximize L

i.i.d. samples
15
Parameter estimation in BNs
  • The likelihood decomposes according to the
    structure of the network
  • ? we get a separate estimation task for each
    parameter
  • The MLE (maximum likelihood estimate) solution
  • for each value x of a node X
  • and each instantiation u of Parents(X)
  • Just need to collect the counts for every
    combination of parents and children observed in
    the data
  • MLE is equivalent to an assumption of a uniform
    prior over parameter values

sufficient statistics
16
Sufficient statistics Example
Moon-phase
  • Why are the counts sufficient?

Light-level
Earthquake
Burglary
Alarm
17
Model selection
  • Goal Select the best network structure, given
    the data
  • Input
  • Training data
  • Scoring function
  • Output
  • A network that maximizes the score

18
Structure selection Scoring
  • Bayesian prior over parameters and structure
  • get balance between model complexity and fit to
    data as a byproduct
  • Score (GD) log P(GD) ? log P(DG) P(G)
  • Marginal likelihood just comes from our parameter
    estimates
  • Prior on structure can be any measure we want
    typically a function of the network complexity

Marginal likelihood
Prior
19
Heuristic search
20
Exploiting decomposability
21
Variations on a theme
  • Known structure, fully observable only need to
    do parameter estimation
  • Unknown structure, fully observable do heuristic
    search through structure space, then parameter
    estimation
  • Known structure, missing values use expectation
    maximization (EM) to estimate parameters
  • Known structure, hidden variables apply adaptive
    probabilistic network (APN) techniques
  • Unknown structure, hidden variables too hard to
    solve!

22
Handling missing data
  • Suppose that in some cases, we observe
    earthquake, alarm, light-level, and moon-phase,
    but not burglary
  • Should we throw that data away??
  • Idea Guess the missing valuesbased on the other
    data

Moon-phase
Light-level
Earthquake
Burglary
Alarm
23
EM (expectation maximization)
  • Guess probabilities for nodes with missing values
    (e.g., based on other observations)
  • Compute the probability distribution over the
    missing values, given our guess
  • Update the probabilities based on the guessed
    values
  • Repeat until convergence

24
EM example
  • Suppose we have observed Earthquake and Alarm but
    not Burglary for an observation on November 27
  • We estimate the CPTs based on the rest of the
    data
  • We then estimate P(Burglary) for November 27 from
    those CPTs
  • Now we recompute the CPTs as if that estimated
    value had been observed
  • Repeat until convergence!

Earthquake
Burglary
Alarm
Write a Comment
User Comments (0)
About PowerShow.com