Online learning with mistake bounds - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Online learning with mistake bounds

Description:

1) Number of mistakes on positive examples (false negative) Relevant weights never decrease ... Each 'false positive' mistake subtracts at least at least n /2 from T. ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 13

Provided by: Mars144

Category:

more less

Transcript and Presenter's Notes

Title: Online learning with mistake bounds

1
On-line learning with mistake bounds
(Reminder) Algorithms for learning linear
threshold functions Model of learning X0,1n
Instance space Cc cX?-1,1 Class of
concepts, classify each instance x as negative
(false) or positive (true) The goal of concept
learning discover unknown target concept c
from labeled instances. Target concept can be
described by Boolean function. This concept is
described by weights vector w. The goal of
learner to make few mistakes.
2
On-line learning with mistake bounds
Algorithms learning behavior is evaluated by
counting the worst-case number of mistakes
(mistake bound) that it will make while learning
some worst-case function from a specified class
of functions or receive worst-case
examples sequence.
3
Algorithms for learning linear threshold
functions

General algorithm
Initialize vector w1 t1
On round t (given vector xt and yt - label for
xt )
predict yt sgn(wtxt-?)
if yt ltgt yt update wt1 using wt,, xt , yt
Difference between perceptron and winnow
algorithm
How should we initialize vector w1
How should we update vector wt

4
Winnow algorithm

Input vectors xt and labels yt
Goal find vector w (w1,w2,..wn)
Each wi is non-negative, real number
Parameters
? threshold
a parameter for weights change

5
Winnow algorithm

Special case ?n, a2
Its algorithm for learning monotone disjunction
disjunction in which no literal appears negated
, that is a function of the form
f(x1,x2,..xn) xi1V xi2 V
V xik
Monotone disjunction is linearly separable
for all x c(x) 1 ? w x ?

6
Winnow algorithm

Initialize ? n a2 w1i 1 ,i1 to N
For each data point xt
predict yt sgn(wtxt - ?)
if yt lt 0 and yt 1
then for each xti 1 wti wti a
else if yt 0 and yt -1
then for each xti 1 wti wti /a

Perceptron wti wti sgn(wtxt - ?)
7
Winnow algorithm

Example

f x2 v x3
n4
8
Winnow algorithm

Theorem Winnow algorithm makes at most O(k lg n)
errors.
k-number of variables in target disjunction
function
n-number of attributes (parameters)
Proof
Let u be number of cases when w were doubled
(i.e. false negative) and v be number of
cases when w were halved (i.e. false positive).
If attribute i is part of target function, call
wi relevant weight.
1) Number of mistakes on positive examples (false
negative)
Relevant weights never decrease
Each weight 2n
Conclusion
- no relevant weight need to be doubled
more then 1log2(n) times
- There are at most k times this
number of mistakes on false negative. gt u
k(1log2(n) )

9
Winnow algorithm

2) Number of mistakes on negative examples (false
positive)
Let Ttotal weight. Initially Tn and Tgt0 always.
Each false negative mistake adds at most n to
T.
T nun n(k(1log2(n) )n
Each false positive mistake subtracts at least
at least n /2 from T.
v T / n/2 (n(k(1log2(n) )n) / n/2
v 2k(1log2(n) )2
The total number of mistakes is at most
u v 2 3k(1log2(n) ) or O(k(1log(n)))

10
Winnow algorithm

If not all examples are consistent with target
function
Define mc number of mistakes made by concept
c
Ac number of attributes errors in data
for concept c.
For each example xt
if c(xt )gt0 and xt has no relevant variables
of c
Ac Ac 1
if c(xt )lt0 and xt satisfied r relevant
variables of c
Ac Ac r
Conclusion if c is disjunction of k variables
then mc Ac kmc
It may be shown that
for any sequence of examples and any
disjunction c , the number of mistakes made by
Winnow is O(Acklogn)