CSE 980: Data Mining - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

CSE 980: Data Mining

Description:

Output node sums up each of its input value according to the weights of its links ... Adjust the weights in such a way that the output of ANN is consistent with class ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 30
Provided by: Computa3
Category:
Tags: cse | data | mining | weights

less

Transcript and Presenter's Notes

Title: CSE 980: Data Mining


1
CSE 980 Data Mining
  • Lecture 7 Alternative Classification Techniques

2
Artificial Neural Networks (ANN)
Output Y is 1 if at least two of the three inputs
are equal to 1.
3
Artificial Neural Networks (ANN)
4
Artificial Neural Networks (ANN)
  • Model is an assembly of inter-connected nodes and
    weighted links
  • Output node sums up each of its input value
    according to the weights of its links
  • Compare output node against some threshold t

Perceptron Model
or
5
General Structure of ANN
Training ANN means learning the weights of the
neurons
6
Algorithm for learning ANN
  • Initialize the weights (w0, w1, , wk)
  • Adjust the weights in such a way that the output
    of ANN is consistent with class labels of
    training examples
  • Objective function
  • Find the weights wis that minimize the above
    objective function
  • e.g., backpropagation algorithm (see lecture
    notes)

7
Support Vector Machines
  • Find a linear hyperplane (decision boundary) that
    will separate the data

8
Support Vector Machines
  • One Possible Solution

9
Support Vector Machines
  • Another possible solution

10
Support Vector Machines
  • Other possible solutions

11
Support Vector Machines
  • Which one is better? B1 or B2?
  • How do you define better?

12
Support Vector Machines
  • Find hyperplane maximizes the margin gt B1 is
    better than B2

13
Support Vector Machines
14
Support Vector Machines
  • We want to maximize
  • Which is equivalent to minimizing
  • But subjected to the following constraints
  • This is a constrained optimization problem
  • Numerical approaches to solve it (e.g., quadratic
    programming)

15
Support Vector Machines
  • What if the problem is not linearly separable?

16
Support Vector Machines
  • What if the problem is not linearly separable?
  • Introduce slack variables
  • Need to minimize
  • Subject to

17
Nonlinear Support Vector Machines
  • What if decision boundary is not linear?

18
Nonlinear Support Vector Machines
  • Transform data into higher dimensional space

19
Ensemble Methods
  • Construct a set of classifiers from the training
    data
  • Predict class label of previously unseen records
    by aggregating predictions made by multiple
    classifiers

20
General Idea
21
Why does it work?
  • Suppose there are 25 base classifiers
  • Each classifier has error rate, ? 0.35
  • Assume classifiers are independent
  • Probability that the ensemble classifier makes a
    wrong prediction

22
Examples of Ensemble Methods
  • How to generate an ensemble of classifiers?
  • Bagging
  • Boosting

23
Bagging
  • Sampling with replacement
  • Build classifier on each bootstrap sample
  • Each sample has probability (1 1/n)n of being
    selected

24
Boosting
  • An iterative procedure to adaptively change
    distribution of training data by focusing more on
    previously misclassified records
  • Initially, all N records are assigned equal
    weights
  • Unlike bagging, weights may change at the end of
    boosting round

25
Boosting
  • Records that are wrongly classified will have
    their weights increased
  • Records that are classified correctly will have
    their weights decreased
  • Example 4 is hard to classify
  • Its weight is increased, therefore it is more
    likely to be chosen again in subsequent rounds

26
Example AdaBoost
  • Base classifiers C1, C2, , CT
  • Error rate
  • Importance of a classifier

27
Example AdaBoost
  • Weight update
  • If any intermediate rounds produce error rate
    higher than 50, the weights are reverted back to
    1/n and the resampling procedure is repeated
  • Classification

28
Illustrating AdaBoost
29
Illustrating AdaBoost
Write a Comment
User Comments (0)
About PowerShow.com