Online learning with mistake bounds - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Online learning with mistake bounds

Description:

1) Number of mistakes on positive examples (false negative) Relevant weights never decrease ... Each 'false positive' mistake subtracts at least at least n /2 from T. ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 13
Provided by: Mars144
Category:

less

Transcript and Presenter's Notes

Title: Online learning with mistake bounds


1
On-line learning with mistake bounds
(Reminder) Algorithms for learning linear
threshold functions Model of learning X0,1n
Instance space Cc cX?-1,1 Class of
concepts, classify each instance x as negative
(false) or positive (true) The goal of concept
learning discover unknown target concept c
from labeled instances. Target concept can be
described by Boolean function. This concept is
described by weights vector w. The goal of
learner to make few mistakes.
2
On-line learning with mistake bounds
Algorithms learning behavior is evaluated by
counting the worst-case number of mistakes
(mistake bound) that it will make while learning
some worst-case function from a specified class
of functions or receive worst-case
examples sequence.
3
Algorithms for learning linear threshold
functions
  • General algorithm
  • Initialize vector w1 t1
  • On round t (given vector xt and yt - label for
    xt )
  • predict yt sgn(wtxt-?)
  • if yt ltgt yt update wt1 using wt,, xt , yt
  • Difference between perceptron and winnow
    algorithm
  • How should we initialize vector w1
  • How should we update vector wt

4
Winnow algorithm
  • Input vectors xt and labels yt
  • Goal find vector w (w1,w2,..wn)
  • Each wi is non-negative, real number
  • Parameters
  • ? threshold
  • a parameter for weights change

5
Winnow algorithm
  • Special case ?n, a2
  • Its algorithm for learning monotone disjunction
    disjunction in which no literal appears negated
    , that is a function of the form
  • f(x1,x2,..xn) xi1V xi2 V
    V xik
  • Monotone disjunction is linearly separable
  • for all x c(x) 1 ? w x ?

6
Winnow algorithm
  • Initialize ? n a2 w1i 1 ,i1 to N
  • For each data point xt
  • predict yt sgn(wtxt - ?)
  • if yt lt 0 and yt 1
  • then for each xti 1 wti wti a
  • else if yt 0 and yt -1
  • then for each xti 1 wti wti /a

Perceptron wti wti sgn(wtxt - ?)
7
Winnow algorithm
  • Example

f x2 v x3
n4
8
Winnow algorithm
  • Theorem Winnow algorithm makes at most O(k lg n)
    errors.
  • k-number of variables in target disjunction
    function
  • n-number of attributes (parameters)
  • Proof
  • Let u be number of cases when w were doubled
    (i.e. false negative) and v be number of
    cases when w were halved (i.e. false positive).
  • If attribute i is part of target function, call
    wi relevant weight.
  • 1) Number of mistakes on positive examples (false
    negative)
  • Relevant weights never decrease
  • Each weight 2n
  • Conclusion
  • - no relevant weight need to be doubled
    more then 1log2(n) times
  • - There are at most k times this
    number of mistakes on false negative. gt u
    k(1log2(n) )

9
Winnow algorithm
  • 2) Number of mistakes on negative examples (false
    positive)
  • Let Ttotal weight. Initially Tn and Tgt0 always.
  • Each false negative mistake adds at most n to
    T.
  • T nun n(k(1log2(n) )n
  • Each false positive mistake subtracts at least
    at least n /2 from T.
  • v T / n/2 (n(k(1log2(n) )n) / n/2
  • v 2k(1log2(n) )2
  • The total number of mistakes is at most
  • u v 2 3k(1log2(n) ) or O(k(1log(n)))

10
Winnow algorithm
  • If not all examples are consistent with target
    function
  • Define mc number of mistakes made by concept
    c
  • Ac number of attributes errors in data
    for concept c.
  • For each example xt
  • if c(xt )gt0 and xt has no relevant variables
    of c
  • Ac Ac 1
  • if c(xt )lt0 and xt satisfied r relevant
    variables of c
  • Ac Ac r
  • Conclusion if c is disjunction of k variables
    then mc Ac kmc
  • It may be shown that
  • for any sequence of examples and any
    disjunction c , the number of mistakes made by
    Winnow is O(Acklogn)

11
Perceptron vs. Winnow
Committees of experts
  • Perceptron number of mistakes O( nk)
  • yt(uxt) s for all t,
  • u (1/ vk)(0, 1, 0, , 1)
  • xt (1/ vn)(1, -1, -1, , 1)
  • (uxt) ltgt 0 for all xt gt yt(uxt) 1/ vnk
  • Suppose s 1/ vnk gt if number of mistakes 1/
    s 2 gt
  • number of mistakes nk
  • Winnow number of mistakes O(k log n)
  • Mistake bound does not depend on order of
    examples or on specific examples

12
Perceptron vs. Winnow
  • Winnow
  • Online can adjust to changing target, over time
  • Advantages
  • Simple
  • Guaranteed to learn a linearly separable problem
  • Suitable for problems with many irrelevant
    attributes
  • Limitations
  • only linear separations
  • only converges for linearly separable data
  • not really efficient with many features
  • Perceptron
  • Online can adjust to changing target, over time
  • Advantages
  • Simple
  • Guaranteed to learn a linearly separable problem
  • Limitations
  • only linear separations
  • only converges for linearly separable data
  • not really efficient with many features
Write a Comment
User Comments (0)
About PowerShow.com