Classification and Regression - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Classification and Regression

Description:

{credit history, salary}- credit approval ( Yes/No) {Temp, Humidity} -- Rain (Yes/No) ... The data above the green line belongs to class x' The data below ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 18
Provided by: stephe87
Category:

less

Transcript and Presenter's Notes

Title: Classification and Regression


1
Classification and Regression
  • What is classification? What is prediction?
  • Issues regarding classification and prediction
  • Classification by decision tree induction
  • Classification by Neural Networks
  • Classification by Support Vector Machines (SVM)
  • Bayesian Classification
  • Instance Based methods
  • Regression
  • Classification accuracy
  • Summary

2
Classification
  • Classification
  • predicts categorical class labels
  • Typical Applications
  • credit history, salary-gt credit approval (
    Yes/No)
  • Temp, Humidity --gt Rain (Yes/No)

3
Linear Classification
  • Binary Classification problem
  • Earlier, known as linear discriminant
  • The data above the green line belongs to class
    x
  • The data below green line belongs to class o
  • Examples SVM, Perceptron, Probabilistic
    Classifiers

x
x
x
x
x
x
x
o
x
x
o
o
x
o
o
o
o
o
o
o
o
o
o
4
Fishers Linear Discriminant
  • From statistics.
  • Try to maximize
  • Where
  • m and s (m- and s-) are the mean and standard
    deviation of one positive (negative) partition.
  • It tries to cut into two partitions such that
    their means are as far apart as possible and
    within each partition, the variance as small as
    possible.
  • Skip details

5
Neural Networks
  • Analogy to Biological Systems (Indeed a great
    example of a good learning system)
  • Massive Parallelism allowing for computational
    efficiency
  • The first CS learning algorithm came in 1959
    (Rosenblatt) who suggested that if a target
    output value is provided for a single neuron with
    fixed inputs, one can incrementally change
    weights to learn to produce these outputs using
    the perceptron learning rule

6
A Neuron
  • The n-dimensional input vector x is mapped into
    variable y by means of the scalar product and a
    nonlinear function mapping

7
A Neuron
8
A Neuron
Need to learn this
-
q
1
9
Perceptron Update Rule
  • How to get the weight vector?
  • Where
  • Can show that it converges when training data is
    linearly separable and learning rate is small

Current output
Learning rate, typically 0.1
Actual class
10
x
x
x
o
x
x
x
o
x
o
o
o
x
o
o
x
11
Sigmoid activation function
  • Instead of a step function use sigmoid function
    as activation function
  • differentiable 8-)
  • Use it to construct more complex neural network

12
Multi-Layer Perceptron
Output vector
Output nodes (Output Layer)
Hidden nodes (Hidden Layer)
Input nodes
wij
Input vector xi
13
Back propagation rule
Forward phase
Backward phase
14
Points to be aware of
  • Can further generalize to more layers.
  • But more layers can be bad. Typically two layers
    are good enough.
  • The idea of back propagation is based on gradient
    descent (will be covered in machine learning
    course in greater detail I believe).
  • Most of the time, we get to a local minimum

Training error
Weight space
15
Network Training
  • The ultimate objective of training
  • obtain a set of weights that makes almost all the
    tuples in the training data classified correctly
  • Steps
  • Initialize weights with random values
  • Feed the input tuples into the network one by one
  • For each unit
  • Compute the net input to the unit as a linear
    combination of all the inputs to the unit
  • Compute the output value using the activation
    function
  • Compute the error
  • Update the weights and the bias

16
Network Pruning and Rule Extraction
  • Network pruning
  • Fully connected network will be hard to
    articulate
  • N input nodes, h hidden nodes and m output nodes
    lead to h(mN) weights
  • Pruning Remove some of the links without
    affecting classification accuracy of the network
  • Extracting rules from a trained network
  • Discretize activation values replace individual
    activation value by the cluster average
    maintaining the network accuracy
  • Enumerate the output from the discretized
    activation values to find rules between
    activation value and output
  • Find the relationship between the input and
    activation value
  • Combine the above two to have rules relating the
    output to input

17
Discriminative Classifiers
  • Advantages
  • prediction accuracy is generally high
  • robust, works when training examples contain
    errors
  • fast evaluation of the learned target function
  • Criticism
  • long training time
  • difficult to understand the learned function
    (weights)
  • Decision trees can be converted to a set of
    rules.
  • not easy to incorporate domain knowledge

18
Discriminative Classifiers
  • Advantages
  • prediction accuracy is generally high
  • robust, works when training examples contain
    errors
  • fast evaluation of the learned target function
  • Criticism
  • long training time
  • difficult to understand the learned function
    (weights)
  • not easy to incorporate domain knowledge
Write a Comment
User Comments (0)
About PowerShow.com