Statistical Learning Methods - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Statistical Learning Methods

Description:

Bag from new manufacturer; fraction of red cherry candies; any is possible ... In games, data often becomes available sequentially; not necessary to train in one go ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 36
Provided by: Mar5334
Category:

less

Transcript and Presenter's Notes

Title: Statistical Learning Methods


1
Statistical Learning Methods
  • Marco Loog

2
Introduction
  • Agents can handle uncertainty by using the
    methods of probability and decision theory
    utility theory probability theory
  • But first they must learn their probabilistic
    theories of the world from experience...

3
Key Concepts
  • Data evidence, i.e., instantiation of one or
    more random variables describing the domain
  • Hypotheses probabilistic theories of how the
    domain works

4
Outline
  • Bayesian learning
  • Maximum a posteriori and maximum likelihood
    learning
  • Instance-based learning
  • Intro neural networks

5
Bayesian Learning
  • Let D be all data, with observed value d, then
    probability of a hypothesis hi, using Bayes rule
    P(hid) aP(dhi)P(hi)
  • For prediction about quantity X P(Xd) ?
    P(Xd,hi)P(hid) ? P(Xhi)P(hid)

6
Bayesian Learning
  • For prediction about quantity X P(Xd) ?
    P(Xd,hi)P(hid) ? P(Xhi)P(hid)
  • No single best-guess hypothesis, all hypothesis
    are involved

7
Bayesian Learning
  • Simply calculates probability of each hypothesis,
    given data, and makes predictions based on this
  • I.e., predictions based on all hypothesis,
    weighted by their probabilities, rather than
    using only single best hypothesis

8
Candy
  • Suppose five kinds of bags of candies
  • 10 are h1 100 cherry candies
  • 20 are h2 75 cherry candies 25 lime
    candies
  • 40 are h3 50 cherry candies 50 lime
    candies
  • 20 are h4 25 cherry candies 75 lime
    candies
  • 10 are h5 100 lime candies
  • We observe candies drawn from some bag

9
Mo Candy
  • We observe candies drawn from some bag
  • Assume observations are i.i.d.,e.g. because many
    candies in the bag
  • Assume we dont like the green lime candy
  • Important questions
  • What kind of bag is it? h1, h2,...,h5?
  • What flavor will the next candy be?

10
Posterior Probability of Hypotheses
11
Posterior Probability of Hypotheses
  • True hypothesis will eventually dominate the
    Bayesian prediction prior is of no influence in
    the long run
  • More importantly maybe not for us? Bayesian
    prediction is optimal

12
The Price for Being Optimal
  • For real learning problems the hypothesis space
    is large, possibly infinite
  • Summation / integration over hypothesis cannot be
    carried out
  • Resort to approximate or simplified methods

13
Maximum A Posteriori
  • Common approximation method make predictions on
    the single most probable hypothesis
  • I.e. take the hi that maximizes P(hid)
  • Such a MAP hypothesis is approximately Bayesian,
    i.e., P(Xd) P(Xhi) the more evidence the
    better the approximation

14
Hypothesis Prior
  • Both in Bayesian learning and in MAP learning,
    hypothesis prior plays an important role
  • If hypothesis space is too expressive overfitting
    can occur cf. Chapter 18
  • Prior is used to penalize complexity instead of
    explicitly limiting the space the more complex
    the hypothesis the lower the prior probability
  • If enough evidence available, eventually complex
    hypothesis chosen if necessary

15
Maximum Likelihood Approximation
  • For enough data, prior becomes irrelevant
  • Maximum likelihood ML learning choose h that
    maximizes P(dhi)
  • I.e., simply get the best fit to the data
  • Identical to MAP for uniform prior P(hi)
  • Also reasonable if all hypotheses are of the same
    complexity
  • ML is the standard non-Bayesian / classical
    statistical learning method

16
E.g.
  • Bag from new manufacturer fraction ? of red
    cherry candies any ? is possible
  • Suppose unwrap N candies, c cherries and l N -
    c limes
  • Likelihood
  • Maximize for ? using log likelihood

17
E.g. 2
  • Gaussian model often denoted by N(µ,?)
  • Log likelihood is given by
  • If ? is known, find maximum likelihood for µ
  • If µ is known, find maximum likelihood for ?

18
Halfway Summary and Additional Remarks
  • Full Bayesian learning gives best possible
    predictions but is intractable
  • MAP selects single best hypothesis prior is
    still used
  • Maximum likelihood assumes uniform prior, OK for
    large data sets
  • Choose parameterized family of models to describe
    the data
  • Write down likelihood of data as function of
    parameters
  • Write down derivative of log likelihood w.r.t.
    each parameter
  • Find parameter values such that the derivatives
    are zero
  • ML estimation may be hard / impossible modern
    optimization techniques help
  • In games, data often becomes available
    sequentially not necessary to train in one go

19
Outline
  • Bayesian learning v
  • Maximum a posteriori and maximum likelihood
    learning v
  • Instance-based learning
  • Intro neural networks

20
Instance-Based Learning
  • We saw statistical learning as parameter
    learning, i.e., given a specific
    parameter-dependent family of probability models
    fit it to the data by tweaking parameters
  • Often simple and effective
  • Fixed complexity
  • Maybe good for very little data

21
Instance-Based Learning
  • We saw statistical learning as parameter learning
  • Nonparametric learning methods allow hypothesis
    complexity to grow with the data
  • The more data we have, the more wigglier the
    hypothesis can be

22
Nearest-Neighbor Method
  • Key idea properties of an input point x are
    likely to be similar to points in the
    neighborhood of x
  • E.g. classification estimate unknown class of x
    using classes of neighboring points
  • Simple, but how does one define what a
    neighborhood is?
  • One solution find the k nearest neighbors
  • But now the problem is how to decide what nearest
    is...

23
k Nearest-Neighbor Classification
  • Check the class / output label of your k
    neighbors and simply take for example of
    neighbors having class label x
    kas the posterior probability of
    having class label x
  • When assigning a single label take MAP!

24
kNN Probability Density Estimation
25
Kernel Models
  • Idea Put little density function a kernel in
    every data point and take the normalized sum of
    these
  • Somehow similar to kNN
  • Often providing comparable performance

26
Probability Density Estimation
27
Outline
  • Bayesian learning v
  • Maximum a posteriori and maximum likelihood
    learning v
  • Instance-based learning v
  • Intro neural networks

28
Neural Networks and Games
29
Neural Networks and Games
30
Neural Networks and Games
31
Neural Networks and Games
32
Neural Networks and Games
33
Neural Networks and Games
34
Neural Networks and Games
35
So First... Neural Networks
  • According to Robert Hecht-Nielsen, a neural
    network is simply a computing system made up of
    a number of simple, highly interconnected
    processing elements, which process information by
    their dynamic state response to external inputs
    Simply...
  • We skip the biology for now
  • And provide the bare basics
Write a Comment
User Comments (0)
About PowerShow.com