Analysis of perceptron-based active learning - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of perceptron-based active learning

Description:

Title: Slide 1 Author: MoreMusic Last modified by: Claire Created Date: 5/2/2005 9:47:44 PM Document presentation format: On-screen Show Company: CSAIL – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 23
Provided by: MoreMusic
Category:

less

Transcript and Presenter's Notes

Title: Analysis of perceptron-based active learning


1
  • Analysis of perceptron-based active learning
  • Sanjoy Dasgupta, UCSD
  • Adam Tauman Kalai, TTI-Chicago
  • Claire Monteleoni, MIT

2
Selective sampling, online constraints
  • Selective sampling framework
  • Unlabeled examples, xt, are received one at a
    time.
  • Learner makes a prediction at each time-step.
  • A noiseless oracle to label yt, can be queried
    at a cost.
  • Goal minimize number of labels to reach error ??
  • ? is the error rate (w.r.t. the target) on the
    sampling distribution.
  • Online constraints
  • Space Learner cannot store all previously seen
    examples (and then perform batch learning).
  • Time Running time of learners belief update
    step should not scale with number of seen
    examples/mistakes.

3
AC Milan v. Inter Milan
4
Problem framework
Target Current hypothesis Error
region Assumptions Separability u is through
origin xUniform on S error rate
u
vt
?t
?t
5
Related work
  • Analysis under selective sampling model, of Query
    By Committee algorithm Seung,OpperSompolinsky92
  • Theorem Freund,Seung,ShamirTishby 97 Under
    selective sampling from the uniform, QBC can
    learn a half-space through the origin to
    generalization error ?, using Õ(d log 1/?)
    labels.
  • ! BUT space required, and time complexity of the
    update both scale with number of seen mistakes!

6
Related work
  • Perceptron a simple online algorithm
  • If yt ? SGN(vt xt), then Filtering rule
  • vt1 vt yt xt Update step
  • Distribution-free mistake bound O(1/?2), if
    exists margin ?.
  • Theorem Baum89 Perceptron, given sequential
    labeled examples from the uniform distribution,
    can converge to generalization error ? after
    Õ(d/?2) mistakes.

7
Our contributions
  • A lower bound for Perceptron in active learning
    context of ?(1/?2) labels.
  • A modified Perceptron update with a Õ(d log 1/?)
    mistake bound.
  • An active learning rule and a label bound of Õ(d
    log 1/?).
  • A bound of Õ(d log 1/?) on total errors (labeled
    or not).

8
Perceptron
  • Perceptron update vt1 vt yt xt
  • ? error does not decrease monotonically.

vt1
u
vt
xt
9
Lower bound on labels for Perceptron
  • Theorem 1 The Perceptron algorithm, using any
    active learning rule, requires ?(1/?2) labels to
    reach generalization error ??w.r.t. the uniform
    distribution.
  • Proof idea Lemma For small ?t, the Perceptron
    update will increase ?t unless kvtk
  • is large ?(1/sin ?t). But, kvtk growth
    rate
  • So need t 1/sin2?t.
  • Under uniform,
  • ?t / ?t sin ?t.

vt1
u
vt
xt
10
A modified Perceptron update
  • Standard Perceptron update
  • vt1 vt yt xt
  • Instead, weight the update by confidence w.r.t.
    current hypothesis vt
  • vt1 vt 2 yt vt xt xt (v1 y0x0)
  • (similar to update in Blum et al.96 for
    noise-tolerant learning)
  • Unlike Perceptron
  • Error decreases monotonically
  • cos(?t1) u vt1 u vt 2 vt xtu
    xt
  • u vt cos(?t)
  • kvtk 1 (due to factor of 2)

11
A modified Perceptron update
  • Perceptron update vt1 vt yt xt
  • Modified Perceptron update vt1 vt 2 yt vt
    xt xt

vt1
vt1
u
vt
vt1
vt
xt
12
Mistake bound
  • Theorem 2 In the supervised setting, the
    modified Perceptron converges to generalization
    error ??after Õ(d log 1/?) mistakes.
  • Proof idea The exponential convergence follows
    from a multiplicative decrease in ?t
  • On an update,
  • ! We lower bound 2vt xtu xt, with high
    probability, using our distributional assumption.

13
Mistake bound
  • Theorem 2 In the supervised setting, the
    modified Perceptron converges to generalization
    error ??after Õ(d log 1/?) mistakes.
  • Lemma (band) For any fixed a kak1, ?? 1 and
    for xU on S
  • Apply to vt x and u x ) 2vt xtu
    xt is
  • large enough in expectation (using size of ?t).

a
k

x a x k
14
Active learning rule
  • Goal Filter to label just those points in the
    error region.
  • ! but ?t, and thus ?t unknown!
  • Define labeling region
  • Tradeoff in choosing threshold st
  • If too high, may wait too long for an error.
  • If too low, resulting update is too small.
  • makes
  • constant.
  • ! But ?t unknown! Choose st adaptively
  • Start high. Halve, if no error in R consecutive
    labels.

vt
u
st

L
15
Label bound
  • Theorem 3 In the active learning setting, the
    modified Perceptron, using the adaptive filtering
    rule, will converge to generalization error
    ??after Õ(d log 1/?) labels.
  • Corollary The total errors (labeled and
    unlabeled) will be Õ(d log 1/?).

16
Proof technique
  • Proof outline We show the following lemmas hold
    with sufficient probability
  • Lemma 1. st does not decrease too quickly
  • Lemma 2. We query labels on a constant fraction
    of ?t.
  • Lemma 3. With constant probability the update
    is good.
  • By algorithm, 1/R labels are mistakes. 9 R
    Õ(1).
  • ) Can thus bound labels and total errors by
    mistakes.

17
Proof technique
  • Lemma 1. st is large enough
  • Proof (By contradiction) Let t be first time
  • Then
  • A halving event means we saw R labels with no
    mistakes, so
  • Lemma 1a For any particular i, this event
    happens w.p. 3/4

18
Proof technique

Lemma 1a. Proof idea Using this value of st,
band lemma in Rd-1 gives constant probability
of x0 falling in appropriately defined band
w.r.t. u0. where x0 component of x
orthogonal to vt u0 component of u orthogonal to
vt )
vt
u
st
19
Proof technique
  • Lemma 2. We query labels on a constant fraction
    of ?t.
  • Proof Assume Lemma 1 for lower bound on st.
    Apply Lemma 1a and band lemma )
  • Lemma 3. With constant probability the update is
    good.
  • Proof Assuming Lemma 1, by Lemma 2, each error
    is labeled w. constant p. From mistake bound
    proof, each update is good (multiplicative
    decrease in error) w. constant p.
  • Finally, solve for R Every R labels there is at
    least 1 update or we halve st, so
  • There exists R Õ(1) s.t.

20
Summary of contributions
  • samples mistakes labels
    total errors online?
  • PAC
  • complexity
  • Long03
  • Long95
  • Perceptron
  • Baum97
  • QBC
  • FSST97
  • DKM05

Õ(d/?) ?(d/?)
Õ(d/?3) ?(1/?2) Õ(d/?2) ?(1/?2) ?(1/?2)
Õ(d/??log 1/?) Õ(d?log 1/?) Õ(d?log 1/?)
Õ(d/??log 1/?) Õ(d?log 1/?) Õ(d?log 1/?) Õ(d?log 1/?)
21
Conclusions and open problems
  • Achieve optimal label-complexity for this problem
  • unlike QBC, a fully online algorithm
  • Matching bound on total errors (labeled and
    unlabeled).
  • Future work
  • Relax distributional assumptions
  • Uniform is sufficient but not necessary for
    proof.
  • Note this bound is not possible under
    arbitrary distributions Dasgupta04.
  • Relax separability assumption
  • Allow margin of tolerated error.
  • Analyze margin version
  • for exponential convergence, without d
    dependence.

22
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com