Linear Separators presentation

About This Presentation

Transcript and Presenter's Notes

Title: Linear Separators

1
Linear Separators
2
Bankruptcy example

R is the ratio of earnings to expenses
L is the number of late payments on credit cards
over the past year.
We would like here to draw a linear separator,
and get so a classifier.

3
1-Nearest Neighbor Boundary

The decision boundary will be the boundary
between cells defined by points of different
classes, as illustrated by the bold line shown
here.

4
Decision Tree Boundary

Similarly, a decision tree also defines a
decision boundary in the feature space.

Although both 1-NN and decision trees agree on
all the training points, they disagree on the
precise decision boundary and so will classify
some query points differently. This is the
essential difference between different learning
algorithms.
5
Linear Boundary

Linear separators are characterized by a single
linear decision boundary in the space.
The bankruptcy data can be successfully separated
in that manner.
But, there is no guarantee that a single linear
separator will successfully classify any set of
training data.

6
Linear Hypothesis Class

Line equation (assume 2D first)
w2x2w1x1b0
Fact1 All the points (x1, x2) lying on the line
make the equation true.
Fact2 The line separates the plane in two
half-planes.
Fact3 The points (x1, x2) in one half-plane give
us an inequality with respect to 0, which has the
same direction for each of the points in the
half-plane.
Fact4 The points (x1, x2) in the other
half-plane give us the reverse inequality with
respect to 0.

7
Fact 3 proof

w2x2w1x1b0
We can write it as

(p,r) is on the line so
But qltr, so we get
i.e.
Since (p,q) was an arbitrary point in the
half-plane, we say that the same direction of
inequality holds for any other point of the
half-plane.
8
Fact 4 proof

w2x2w1x1b0
We can write it as

(p,r) is on the line so
But sgtr, so we get
i.e.
Since (p,s) was an arbitrary point in the
(other) half-plane, we say that the same
direction of inequality holds for any other point
of that half-plane.
9
Corollary

Depending on the slope of the line direction, the
inequalities might alternate for the two
half-planes.
However, it will be the same direction among the
points belonging to the same half-plane.
Whats an easy way to determine the direction of
the inequalities for each subplane?
In order to determine the inequality direction,
try it for the point (0,0), and determine the
direction for the half-plane where (0,0) belongs.
The points of the other half-plane will have the
opposite inequality direction.
How much bigger (or smaller) than zero is
w2pw1qb is proportional to the distance of the
point (p,q) from the line.
The same can be said for an n-dimensional space.
Simply, we dont talk about half-planes but
half-spaces (line is now hyperplane creating
two half-spaces)

10
Linear classifier

We can now exploit the sign of this distance to
define a linear classifier, one whose decision
boundary is a hyperplane.
Instead of using 0 and 1 as the class labels
(which was an arbitrary choice anyway) we use the
sign of the distance, either 1 or -1 as the
labels (that is the values of the yi s).

Which outputs 1 or 1.
11
Margin

A variant of the signed distance of a training
point to a hyperplane is the margin of the point.
The margin (gamma) is the product of w.xib for
the training point xi and the known sign of the
class, yi.
If they agree (the training point is correctly
classified), then the margin is positive
If they disagree (the classification is in
error), then the margin is negative.

margin ?i yi(w.xib) its proportional to
perpendicular distance of point xi to line
(hyperplane). ?i gt 0 point is correctly
classified (sign of distance yi) ?i lt 0
point is incorrectly classified (sign of distance
? yi)
12
Perceptron algorithm

How to find a linear separator?
The perceptron algorithm, was developed by
Rosenblatt in the mid 50's.
This is a greedy, "mistake driven" algorithm.
We will be using the extended form of the weight
and data-point vectors in this algorithm. The
extended form is in fact a trick

This will simplify a bit the presentation.

13
Perceptron algorithm

Pick initial weight vector (including b), e.g.
.1, , .1
Repeat until all points get correctly classified
Repeat for each point xi
Calculate margin yi.w.xi (this is number)
If margin gt 0, point xi is correctly classified
Else, change weights to increase margin
change weights proportional to yi.xi
Note that, if yi1
If xji gt 0 then wj increases (margin increases)
If xji lt 0 then wj decreases (margin again
increases)
Similarly, for yi-1, margin always increases

14
Perceptron algorithm (explanations)

The first step is to start with an initial value
of the weight vector, usually all zeros.
Then we repeat the inner loop until all the
points are correctly classified using the current
weight vector.
The inner loop is to consider each point.
If the point's margin is positive then it is
correctly classified and we do nothing.
Otherwise, if it is negative or zero, we have a
mistake and we want to change the weights so as
to increase the margin (so that it ultimately
becomes positive).
The trick is how to change the weights. It turns
out that using a value proportional to yi.xi is
the right thing. We'll see why, formally, later.

15
Perceptron algorithm

So, each change of w increases the margin on a
particular point.
However, the changes for the different points
interfere with each other, that is, different
points might change the weights in opposing
directions.
So, it will not be the case that one pass through
the points will produce a correct weight vector.
In general, we will have to go around multiple
times.
The remarkable fact is that the algorithm is
guaranteed to terminate with the weights for a
separating hyperplane as long as the data is
linearly separable.
The proof of this fact is beyond our scope.
Notice that if the data is not separable, then
this algorithm is an infinite loop.
It turns out that it is a good idea to keep track
of the best separator we've seen so far (the one
that makes the fewest mistakes) and after we get
tired of going around the loop, return that one.

16
Perceptron algorithm Bankruptcy data

This shows a trace of the perceptron algorithm on
the bankruptcy data.
Here it took 49 iterations through the data (the
outer loop) for the algorithm to stop.
The separator at the end of the loop is 0.4,
0.94, -2.2
We usually pick some small "rate" constant to
scale the change to w.
.1 is used, but other small values also work well.

17
Gradient Ascent/Descent

Why pick yi.xi as increment to weights?
The margin is a multiple input variable function.
The variables are w2, w1, w0 (or in general
wn,,w0)
In order to reach the maximum of this function,
it is good to change the variables in the
direction of the slope of the function.
The slope is represented by the gradient of the
function.
The gradient is the vector of first (partial)
derivatives of the function with respect to each
of the input variables.

Write a Comment

User Comments (0)

About PowerShow.com

Linear Separators PowerPoint PPT Presentation