Lecture 3: Perceptron - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Lecture 3: Perceptron

Description:

sepal length. sepal width. petal length. petal width. Three classes (species of ... Features 1 and 2 (sepal width/length) Features 3 and 4 (petal width/length) ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 26
Provided by: SanjoyD5
Category:

less

Transcript and Presenter's Notes

Title: Lecture 3: Perceptron


1
Lecture 3 Perceptron
2
Recap Perceptron algorithm
  • Datapoints (x1,y1), (x2, y2), , xt 2 Rd, yt 2
    1,-1, are separable by a hyperplane through
    the origin
  • w 0
  • for t 1,2,
  • if yt(w xt) 0
  • w w yt xt
  • Claim Suppose
  • xt R for all t
  • There is some unit vector u 2 Rd and some
    margin ? gt 0 such that yt (u xt) ? for all
    t
  • Then Perceptron makes at most (R/?)2
    mistakes/updates.

3
Preprocessing step
  • Points (x,y) where x 2 Rd, y 2 1,-1
  • Add an extra feature to x, and set it to 1
  • x0 (x,1) 2 Rd1
  • Then points (x,y) linearly separable ? points
    (x0, y) linearly separable by a hyperplane
    through the origin

4
(No Transcript)
5
Fishers IRIS data
Four features sepal length sepal width petal
length petal width Three classes (species of
iris) setosa versicolor virginica 50 instances
of each
6
(No Transcript)
7
(No Transcript)
8
Features 1 and 2 (sepal width/length)
9
Features 3 and 4 (petal width/length)
10
Features 1 and 2 goal separate setosa from
other two
1500 updates (different permutation 900)
11
Features 3 and 4 goal separate setosa from
other two
Point 51
Points 1,2
Iteration 1 1,51 Iteration 2 1,2 Iteration 3

12
(No Transcript)
13
(No Transcript)
14
Linear separator vs nearest neighbor
Linear separators parametric model fixed number
of parameters to learn Nearest
neighbor nonparametric prediction on test point
x depends only on training data near x, not on
the rest of the training data Advantages of
linear separators compact fast
convergence potentially meaningful
15
Nonseparable data
What if data is not linearly separable?
In this case almost separable how will the
perceptron perform?
16
Online perceptron
Data comes in an endless stream convergence is
not an issue. But how many mistakes does it
make? Suppose that for all t 0 there is some
u 2 Rd and some kt 0 such that for all but kt
of the first t datapoints (x,y), yt(u xt)
? Then for all t 0 the perceptron algorithm
makes at most (R/?)2 kt(1 2R/?) updates
(ie. mistakes) upto time t.
17
Batch perceptron
Batch algorithm w 0 while some (xi,yi) is
misclassified w w yi xi Nonseparable
data will never converge. How can this be fixed?
Dream somehow find the separator that
misclassifies the fewest points but this is
NP-hard (in fact, even NP-hard to approximately
solve).
18
Fixing the batch perceptron
Idea one only go through the data once, or a
fixed number of times w 0 for k 1 to
K for i 1 to m if (xi,yi) is
misclassified w w yi xi At least this
stops! Problem the final w might not be
good Eg. right before terminating, the alg might
perform an update on a total outlier
19
Voted-perceptron
Idea two keep around intermediate hypotheses,
and have them vote Freund and Schapire,
1998 n 1 w1 0 c1 0 for k 1 to
K for i 1 to m if (xi,yi) is
misclassified wn1 wn yi xi cn1
1 n n 1 else cn cn 1 At the
end, a collection of linear separators w0, w1,
w2, , along with survival times cn amount of
time that wn survived.
20
Voted-perceptron, contd
Idea two keep around intermediate hypotheses,
and have them vote Freund and Schapire,
1998 At the end, a collection of linear
separators w0, w1, w2, , along with survival
times cn amount of time that wn
survived. This cn is a good measure of the
reliability of wn. To classify a test point x,
use a weighted majority vote
21
Voted-perceptron, contd
  • Problem need to keep around a lot of wn vectors
  • Solutions
  • Find representatives
  • Alternative prediction rule

wavg
22
IRIS features 3 and 4 goal separate setosa
(circle) from the rest
Corrupted setosa
Run Voted-Perc for five rounds cn 0 1 2
3 1 1 5 117 2 41 2 13 1 3
8 222 2 173 3 95 3 52 Final
hypothesis 1 wrong (either voting or averaging)
23
IRIS features 3 and 4 goal separate from o/x
100 rounds, 1595 updates (5 errors) Final
hypothesis 5 errors for voting, 6 for averaging
24
Postscript multiclass
What if there are k classes?
Reduce to binary all-vs-one
Not always easy to do
1
2
2
1
3
4
3
25
Some open problems
Modify the (voted) perceptron algorithm to 1
Find a linear separator with large
margin 2 Give up on troublesome
points after a while
Write a Comment
User Comments (0)
About PowerShow.com