CSE 980: Data Mining

About This Presentation

Title:

CSE 980: Data Mining

Description:

Output node sums up each of its input value according to the weights of its links ... Adjust the weights in such a way that the output of ANN is consistent with class ... – PowerPoint PPT presentation

Number of Views:12

Avg rating:3.0/5.0

Slides: 30

Provided by: Computa3

Category:

more less

Transcript and Presenter's Notes

Title: CSE 980: Data Mining

1
CSE 980 Data Mining

Lecture 7 Alternative Classification Techniques

2
Artificial Neural Networks (ANN)
Output Y is 1 if at least two of the three inputs
are equal to 1.
3
Artificial Neural Networks (ANN)
4
Artificial Neural Networks (ANN)

Model is an assembly of inter-connected nodes and
weighted links
Output node sums up each of its input value
according to the weights of its links
Compare output node against some threshold t

Perceptron Model
or
5
General Structure of ANN
Training ANN means learning the weights of the
neurons
6
Algorithm for learning ANN

Initialize the weights (w0, w1, , wk)
Adjust the weights in such a way that the output
of ANN is consistent with class labels of
training examples
Objective function
Find the weights wis that minimize the above
objective function
e.g., backpropagation algorithm (see lecture
notes)

7
Support Vector Machines

Find a linear hyperplane (decision boundary) that
will separate the data

8
Support Vector Machines

One Possible Solution

9
Support Vector Machines

Another possible solution

10
Support Vector Machines

Other possible solutions

11
Support Vector Machines

Which one is better? B1 or B2?
How do you define better?

12
Support Vector Machines

Find hyperplane maximizes the margin gt B1 is
better than B2

13
Support Vector Machines
14
Support Vector Machines

We want to maximize
Which is equivalent to minimizing
But subjected to the following constraints
This is a constrained optimization problem
Numerical approaches to solve it (e.g., quadratic
programming)

15
Support Vector Machines

What if the problem is not linearly separable?

16
Support Vector Machines

What if the problem is not linearly separable?
Introduce slack variables
Need to minimize
Subject to

17
Nonlinear Support Vector Machines

What if decision boundary is not linear?

18
Nonlinear Support Vector Machines

Transform data into higher dimensional space

19
Ensemble Methods

Construct a set of classifiers from the training
data
Predict class label of previously unseen records
by aggregating predictions made by multiple
classifiers

20
General Idea
21
Why does it work?

Suppose there are 25 base classifiers
Each classifier has error rate, ? 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a
wrong prediction

22
Examples of Ensemble Methods

How to generate an ensemble of classifiers?
Bagging
Boosting

23
Bagging

Sampling with replacement
Build classifier on each bootstrap sample
Each sample has probability (1 1/n)n of being
selected

24
Boosting

An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
Initially, all N records are assigned equal
weights
Unlike bagging, weights may change at the end of
boosting round

25
Boosting

Records that are wrongly classified will have
their weights increased
Records that are classified correctly will have
their weights decreased

Example 4 is hard to classify
Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds

26
Example AdaBoost

Base classifiers C1, C2, , CT
Error rate
Importance of a classifier

27
Example AdaBoost

Weight update
If any intermediate rounds produce error rate
higher than 50, the weights are reverted back to
1/n and the resampling procedure is repeated
Classification

28
Illustrating AdaBoost
29
Illustrating AdaBoost

Write a Comment

User Comments (0)