Classification - PowerPoint PPT Presentation

About This Presentation

Title:

Classification

Description:

Classification A task of induction to find patterns – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 31

Provided by: Hua136

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Classification

1
Classification

A task of induction to find patterns

2
Outline

Data and its format
Problem of Classification
Learning a classifier
Different approaches
Key issues

3
Data and its format

Data
attribute-value pairs
with/without class
Data type
continuous/discrete
nominal
Data format
Flat
If not flat, what should we do?

4
Sample data
5
Induction from databases

Inferring knowledge from data
The task of deduction
infer information that is a logical consequence
of querying a database
Who conducted this class before?
Which courses are attended by Mary?
Deductive databases extending the RDBMS

6
Classification

It is one type of induction
data with class labels
Examples -
If weather is rainy then no golf
If
If

7
Different approaches

There exist many techniques
Decision trees
Neural networks
K-nearest neighbors
Naïve Bayesian classifiers
Support Vector Machines
Ensemble methods
Semi-supervised
and many more ...

8
A decision tree
9
Inducing a decision tree

There are many possible trees
lets try it on the golfing data
How to find the most compact one
that is consistent with the data?
Why the most compact?
Occams razor principle
Issue of efficiency w.r.t. optimality

10
Information gain
and

Entropy -
Information gain - the difference between the
node before and after splitting

11
Building a compact tree

The key to building a decision tree - which
attribute to choose in order to branch.
The heuristic is to choose the attribute with the
maximum IG.
Another explanation is to reduce uncertainty as
much as possible.

12
Learn a decision tree
Outlook
sunny
overcast
rain
Humidity
Wind
YES
high
normal
strong
weak
NO
YES
NO
YES
13
Issues of Decision Trees

Number of values of an attribute
Your solution?
When to stop
Data fragmentation problem
Any solution?
Mixed data types
Scalability

14
Rules and Tree stumps

Generating rules from decision trees
One path is a rule
We can do better. Why?
Tree stumps and 1R
For each attribute value, determine a default
class (of values of rules)
Calculate the of errors for each rule
Find of errors for that attributes rule set
Choose one rule set that has the least of errors

15
K-Nearest Neighbor

One of the most intuitive classification
algorithm
An unseen instances class is determined by its
nearest neighbor
The problem is it is sensitive to noise
Instead of using one neighbor, we can use k
neighbors

16
K-NN

New problems
How large should k be
lazy learning does it learn?
large storage
A toy example (noise, majority)
How good is k-NN?
How to compare
Speed
Accuracy

17
Naïve Bayes Classifier

This is a direct application of Bayes rule
P(CX) P(XC)P(C)/P(X)
X - a vector of x1,x2,,xn
Thats the best classifier we can build
But, there are problems
There are only a limited number of instances
How to estimate P(xC)
Your suggestions?

18
NBC (2)

Assume conditional independence between xis
We have
P(Cx) P(x1C) P(xiC) (xnC)P(C)
Whats missing? Is it really correct?
An example
How good is it in reality?

19
No Free Lunch

If the goal is to obtain good generalization
performance, there are no context-independent or
usage-independent reasons to favor one learning
or classification method over another.
http//en.wikipedia.org/wiki/No-Free-Lunch_theorem
s
What does it indicate?
Or is it easy to choose a good classifier for
your application?
Again, there is no off-the-shelf solution for a
reasonably challenging application.

20
Ensemble Methods

Motivation
Stability
Model generation
Bagging (Bootstrap Aggregating)
Boosting
Model combination
Majority voting
Meta learning
Stacking (using different types of classifiers)
Examples (classify-ensemble.ppt)

21
AdaBoost.M1 (from Weka Book)
Model generation

Assign equal weight to each training instance
For t iterations
Apply learning algorithm to weighted dataset,
store resulting model
Compute models error e on weighted dataset
If e 0 or e gt 0.5
Terminate model generation
For each instance in dataset
If classified correctly by model
Multiply instances weight by e/(1-e)
Normalize weight of all instances

Classification
Assign weight 0 to all classes For each of the
t models (or fewer) For the class this model
predicts add log e/(1-e) to this classs
weight Return class with highest weight
22
Using many different classifiers

We have learned some basic and often used
classifiers
There are many more out there.
Regression
Discriminant analysis
Neural networks
Support vector machines
Pick the most suitable one for an application
Where to find all these classifiers?
Dont reinvent the wheel that is not as round
We will likely come back to classification and
discuss support vector machines as requested

23
Assignment 3

Pick one of your favorite software package (feel
free to use any at your disposal, as we discussed
in class)
Use the mushroom dataset found at UC Irvine
Machine Learning Repository
Run a decision tree induction algorithm to get
the following
Use resubstituion error to measure
Use 10-fold cross validation to measure
Show the confusion matrix for the above two error
measures
Summarize and report your observations and
conjectures if any
Submit a hardcopy report on Wednesday 3/1/06

24
Classification via Neural Networks
Squash
?
A perceptron
25
What can a perceptron do?

Neuron as a computing device
To separate a linearly separable points
Nice things about a perceptron
distributed representation
local learning
weight adjusting

26
Linear threshold unit

Basic concepts projection, thresholding

W vectors evoke 1
W .11 .6
L .7 .7
.5
27
E.g. 1 solution region for AND problem

Find a weight vector that satisfies all the
constraints

AND problem 0 0 0 0 1 0 1 0 0 1
1 1
28
E.g. 2 Solution region for XOR problem?
XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
29
Learning by error reduction

Perceptron learning algorithm
If the activation level of the output unit is 1
when it should be 0, reduce the weight on the
link to the ith input unit by rLi, where Li is
the ith input value and r a learning rate
If the activation level of the output unit is 0
when it should be 1, increase the weight on the
link to the ith input unit by rLi
Otherwise, do nothing

30
Multi-layer perceptrons