Linear Separators - PowerPoint PPT Presentation

About This Presentation
Title:

Linear Separators

Description:

L is the number of late payments on credit cards over the past year. ... fact is that the algorithm is guaranteed to terminate with the weights for a ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 18
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Linear Separators


1
Linear Separators
2
Bankruptcy example
  • R is the ratio of earnings to expenses
  • L is the number of late payments on credit cards
    over the past year.
  • We would like here to draw a linear separator,
    and get so a classifier.

3
1-Nearest Neighbor Boundary
  • The decision boundary will be the boundary
    between cells defined by points of different
    classes, as illustrated by the bold line shown
    here.

4
Decision Tree Boundary
  • Similarly, a decision tree also defines a
    decision boundary in the feature space.

Although both 1-NN and decision trees agree on
all the training points, they disagree on the
precise decision boundary and so will classify
some query points differently. This is the
essential difference between different learning
algorithms.
5
Linear Boundary
  • Linear separators are characterized by a single
    linear decision boundary in the space.
  • The bankruptcy data can be successfully separated
    in that manner.
  • But, there is no guarantee that a single linear
    separator will successfully classify any set of
    training data.

6
Linear Hypothesis Class
  • Line equation (assume 2D first)
  • w2x2w1x1b0
  • Fact1 All the points (x1, x2) lying on the line
    make the equation true.
  • Fact2 The line separates the plane in two
    half-planes.
  • Fact3 The points (x1, x2) in one half-plane give
    us an inequality with respect to 0, which has the
    same direction for each of the points in the
    half-plane.
  • Fact4 The points (x1, x2) in the other
    half-plane give us the reverse inequality with
    respect to 0.

7
Fact 3 proof
  • w2x2w1x1b0
  • We can write it as

(p,r) is on the line so
But qltr, so we get
i.e.
Since (p,q) was an arbitrary point in the
half-plane, we say that the same direction of
inequality holds for any other point of the
half-plane.
8
Fact 4 proof
  • w2x2w1x1b0
  • We can write it as

(p,r) is on the line so
But sgtr, so we get
i.e.
Since (p,s) was an arbitrary point in the
(other) half-plane, we say that the same
direction of inequality holds for any other point
of that half-plane.
9
Corollary
  • Depending on the slope of the line direction, the
    inequalities might alternate for the two
    half-planes.
  • However, it will be the same direction among the
    points belonging to the same half-plane.
  • Whats an easy way to determine the direction of
    the inequalities for each subplane?
  • In order to determine the inequality direction,
    try it for the point (0,0), and determine the
    direction for the half-plane where (0,0) belongs.
  • The points of the other half-plane will have the
    opposite inequality direction.
  • How much bigger (or smaller) than zero is
    w2pw1qb is proportional to the distance of the
    point (p,q) from the line.
  • The same can be said for an n-dimensional space.
    Simply, we dont talk about half-planes but
    half-spaces (line is now hyperplane creating
    two half-spaces)

10
Linear classifier
  • We can now exploit the sign of this distance to
    define a linear classifier, one whose decision
    boundary is a hyperplane.
  • Instead of using 0 and 1 as the class labels
    (which was an arbitrary choice anyway) we use the
    sign of the distance, either 1 or -1 as the
    labels (that is the values of the yi s).

Which outputs 1 or 1.
11
Margin
  • A variant of the signed distance of a training
    point to a hyperplane is the margin of the point.
  • The margin (gamma) is the product of w.xib for
    the training point xi and the known sign of the
    class, yi.
  • If they agree (the training point is correctly
    classified), then the margin is positive
  • If they disagree (the classification is in
    error), then the margin is negative.

margin ?i yi(w.xib) its proportional to
perpendicular distance of point xi to line
(hyperplane). ?i gt 0 point is correctly
classified (sign of distance yi) ?i lt 0
point is incorrectly classified (sign of distance
? yi)
12
Perceptron algorithm
  • How to find a linear separator?
  • The perceptron algorithm, was developed by
    Rosenblatt in the mid 50's.
  • This is a greedy, "mistake driven" algorithm.
  • We will be using the extended form of the weight
    and data-point vectors in this algorithm. The
    extended form is in fact a trick
  • This will simplify a bit the presentation.

13
Perceptron algorithm
  • Pick initial weight vector (including b), e.g.
    .1, , .1
  • Repeat until all points get correctly classified
  • Repeat for each point xi
  • Calculate margin yi.w.xi (this is number)
  • If margin gt 0, point xi is correctly classified
  • Else, change weights to increase margin
  • change weights proportional to yi.xi
  • Note that, if yi1
  • If xji gt 0 then wj increases (margin increases)
  • If xji lt 0 then wj decreases (margin again
    increases)
  • Similarly, for yi-1, margin always increases

14
Perceptron algorithm (explanations)
  • The first step is to start with an initial value
    of the weight vector, usually all zeros.
  • Then we repeat the inner loop until all the
    points are correctly classified using the current
    weight vector.
  • The inner loop is to consider each point.
  • If the point's margin is positive then it is
    correctly classified and we do nothing.
  • Otherwise, if it is negative or zero, we have a
    mistake and we want to change the weights so as
    to increase the margin (so that it ultimately
    becomes positive).
  • The trick is how to change the weights. It turns
    out that using a value proportional to yi.xi is
    the right thing. We'll see why, formally, later.

15
Perceptron algorithm
  • So, each change of w increases the margin on a
    particular point.
  • However, the changes for the different points
    interfere with each other, that is, different
    points might change the weights in opposing
    directions.
  • So, it will not be the case that one pass through
    the points will produce a correct weight vector.
  • In general, we will have to go around multiple
    times.
  • The remarkable fact is that the algorithm is
    guaranteed to terminate with the weights for a
    separating hyperplane as long as the data is
    linearly separable.
  • The proof of this fact is beyond our scope.
  • Notice that if the data is not separable, then
    this algorithm is an infinite loop.
  • It turns out that it is a good idea to keep track
    of the best separator we've seen so far (the one
    that makes the fewest mistakes) and after we get
    tired of going around the loop, return that one.

16
Perceptron algorithm Bankruptcy data
  • This shows a trace of the perceptron algorithm on
    the bankruptcy data.
  • Here it took 49 iterations through the data (the
    outer loop) for the algorithm to stop.
  • The separator at the end of the loop is 0.4,
    0.94, -2.2
  • We usually pick some small "rate" constant to
    scale the change to w.
  • .1 is used, but other small values also work well.

17
Gradient Ascent/Descent
  • Why pick yi.xi as increment to weights?
  • The margin is a multiple input variable function.
  • The variables are w2, w1, w0 (or in general
    wn,,w0)
  • In order to reach the maximum of this function,
    it is good to change the variables in the
    direction of the slope of the function.
  • The slope is represented by the gradient of the
    function.
  • The gradient is the vector of first (partial)
    derivatives of the function with respect to each
    of the input variables.
Write a Comment
User Comments (0)
About PowerShow.com