Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Classification

Description:

Classification A task of induction to find patterns – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 31
Provided by: Hua136
Category:

less

Transcript and Presenter's Notes

Title: Classification


1
Classification
  • A task of induction to find patterns

2
Outline
  • Data and its format
  • Problem of Classification
  • Learning a classifier
  • Different approaches
  • Key issues

3
Data and its format
  • Data
  • attribute-value pairs
  • with/without class
  • Data type
  • continuous/discrete
  • nominal
  • Data format
  • Flat
  • If not flat, what should we do?

4
Sample data
5
Induction from databases
  • Inferring knowledge from data
  • The task of deduction
  • infer information that is a logical consequence
    of querying a database
  • Who conducted this class before?
  • Which courses are attended by Mary?
  • Deductive databases extending the RDBMS

6
Classification
  • It is one type of induction
  • data with class labels
  • Examples -
  • If weather is rainy then no golf
  • If
  • If

7
Different approaches
  • There exist many techniques
  • Decision trees
  • Neural networks
  • K-nearest neighbors
  • Naïve Bayesian classifiers
  • Support Vector Machines
  • Ensemble methods
  • Semi-supervised
  • and many more ...

8
A decision tree
9
Inducing a decision tree
  • There are many possible trees
  • lets try it on the golfing data
  • How to find the most compact one
  • that is consistent with the data?
  • Why the most compact?
  • Occams razor principle
  • Issue of efficiency w.r.t. optimality

10
Information gain
and
  • Entropy -
  • Information gain - the difference between the
    node before and after splitting

11
Building a compact tree
  • The key to building a decision tree - which
    attribute to choose in order to branch.
  • The heuristic is to choose the attribute with the
    maximum IG.
  • Another explanation is to reduce uncertainty as
    much as possible.

12
Learn a decision tree
Outlook
sunny
overcast
rain
Humidity
Wind
YES
high
normal
strong
weak
NO
YES
NO
YES
13
Issues of Decision Trees
  • Number of values of an attribute
  • Your solution?
  • When to stop
  • Data fragmentation problem
  • Any solution?
  • Mixed data types
  • Scalability

14
Rules and Tree stumps
  • Generating rules from decision trees
  • One path is a rule
  • We can do better. Why?
  • Tree stumps and 1R
  • For each attribute value, determine a default
    class (of values of rules)
  • Calculate the of errors for each rule
  • Find of errors for that attributes rule set
  • Choose one rule set that has the least of errors

15
K-Nearest Neighbor
  • One of the most intuitive classification
    algorithm
  • An unseen instances class is determined by its
    nearest neighbor
  • The problem is it is sensitive to noise
  • Instead of using one neighbor, we can use k
    neighbors

16
K-NN
  • New problems
  • How large should k be
  • lazy learning does it learn?
  • large storage
  • A toy example (noise, majority)
  • How good is k-NN?
  • How to compare
  • Speed
  • Accuracy

17
Naïve Bayes Classifier
  • This is a direct application of Bayes rule
  • P(CX) P(XC)P(C)/P(X)
  • X - a vector of x1,x2,,xn
  • Thats the best classifier we can build
  • But, there are problems
  • There are only a limited number of instances
  • How to estimate P(xC)
  • Your suggestions?

18
NBC (2)
  • Assume conditional independence between xis
  • We have
  • P(Cx) P(x1C) P(xiC) (xnC)P(C)
  • Whats missing? Is it really correct?
  • An example
  • How good is it in reality?

19
No Free Lunch
  • If the goal is to obtain good generalization
    performance, there are no context-independent or
    usage-independent reasons to favor one learning
    or classification method over another.
  • http//en.wikipedia.org/wiki/No-Free-Lunch_theorem
    s
  • What does it indicate?
  • Or is it easy to choose a good classifier for
    your application?
  • Again, there is no off-the-shelf solution for a
    reasonably challenging application.

20
Ensemble Methods
  • Motivation
  • Stability
  • Model generation
  • Bagging (Bootstrap Aggregating)
  • Boosting
  • Model combination
  • Majority voting
  • Meta learning
  • Stacking (using different types of classifiers)
  • Examples (classify-ensemble.ppt)

21
AdaBoost.M1 (from Weka Book)
Model generation
  • Assign equal weight to each training instance
  • For t iterations
  • Apply learning algorithm to weighted dataset,
  • store resulting model
  • Compute models error e on weighted dataset
  • If e 0 or e gt 0.5
  • Terminate model generation
  • For each instance in dataset
  • If classified correctly by model
  • Multiply instances weight by e/(1-e)
  • Normalize weight of all instances

Classification
Assign weight 0 to all classes For each of the
t models (or fewer) For the class this model
predicts add log e/(1-e) to this classs
weight Return class with highest weight
22
Using many different classifiers
  • We have learned some basic and often used
    classifiers
  • There are many more out there.
  • Regression
  • Discriminant analysis
  • Neural networks
  • Support vector machines
  • Pick the most suitable one for an application
  • Where to find all these classifiers?
  • Dont reinvent the wheel that is not as round
  • We will likely come back to classification and
    discuss support vector machines as requested

23
Assignment 3
  1. Pick one of your favorite software package (feel
    free to use any at your disposal, as we discussed
    in class)
  2. Use the mushroom dataset found at UC Irvine
    Machine Learning Repository
  3. Run a decision tree induction algorithm to get
    the following
  4. Use resubstituion error to measure
  5. Use 10-fold cross validation to measure
  6. Show the confusion matrix for the above two error
    measures
  7. Summarize and report your observations and
    conjectures if any
  8. Submit a hardcopy report on Wednesday 3/1/06

24
Classification via Neural Networks
Squash
?
A perceptron
25
What can a perceptron do?
  • Neuron as a computing device
  • To separate a linearly separable points
  • Nice things about a perceptron
  • distributed representation
  • local learning
  • weight adjusting

26
Linear threshold unit
  • Basic concepts projection, thresholding

W vectors evoke 1
W .11 .6
L .7 .7
.5
27
E.g. 1 solution region for AND problem
  • Find a weight vector that satisfies all the
    constraints

AND problem 0 0 0 0 1 0 1 0 0 1
1 1
28
E.g. 2 Solution region for XOR problem?
XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
29
Learning by error reduction
  • Perceptron learning algorithm
  • If the activation level of the output unit is 1
    when it should be 0, reduce the weight on the
    link to the ith input unit by rLi, where Li is
    the ith input value and r a learning rate
  • If the activation level of the output unit is 0
    when it should be 1, increase the weight on the
    link to the ith input unit by rLi
  • Otherwise, do nothing

30
Multi-layer perceptrons
  • Using the chain rule, we can back-propagate the
    errors for a multi-layer perceptrons.

Output layer
Hidden layer
Input layer
Write a Comment
User Comments (0)
About PowerShow.com