The Online model N. Littlestone - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

The Online model N. Littlestone

Description:

The Online model (N. Littlestone) Note, that the teacher can be an adversary or ... 2. Pi runs in polynomial time. 3. The goal is to achieve Pi = f after a ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 19
Provided by: guyk6
Category:

less

Transcript and Presenter's Notes

Title: The Online model N. Littlestone


1
The Online model (N. Littlestone)
Note, that the teacher can be an adversary or a
friend, and that this may affect the learning
process.
2
The Online model, remarks
  • 1. There is no separation between learning
  • phase and the testing phase (e.g. PAC)
  • 2. Pi runs in polynomial time
  • 3. The goal is to achieve Pi f after a
  • polynomial number of mistakes

3
Linear Threshold Functions
  • Input X(x1,x2,,xn)
  • Output
  • if w1x1w2x2wnxngtk f(x)true
  • else f(x)false
  • Where (w1,w2,,wn) is a vector of constants,
  • and k is a constant.

4
Linear Threshold Functions
  • When learning LTF, we can work with k0, because
    we can always add a dummy input x01 and w0k.
  • The function well work with is of the form
  • w1x1w2x2wnxngt0

what is the geometric meaning?
5
Learning LTF Online
  • k is known.
  • The teacher sends us vectors (x0i,x1i,,xni)
  • Our goal is to find (w1,w2,,wn)
  • assume that x1 For simplicity we will
  • And also for the target function w , w1

what's the geometric meaning of this?
6
The Perceptron Algorithm, Inspiration
7
The Perceptron Algorithm, Inspiration
8
The Perceptron Algorithm
  • 1. Initialization (w1,w2,,wn)(0,0,0,0)
  • 2. The teacher sends us x, we predict
  • f(x)true iff wx gt0
  • 3. On a mistake, update our hypothesis
  • as follows
  • Mistake on positive w?wx
  • Mistake on negativew?w-x

how does the line wx0 improve on a positive
mistake?
9
The Perceptron Alg. - Example
  • W 1,0 k0

for simplicity, in the example x will not be a
normalized vector
10
The Perceptron convergence theorem (Rosenblatt
1962)
The Perceptron algorithm will learn to classify
any linearly seperable set of inputs
11
Bounding the number of mistakes
  • Define
  • s minx wx/x
  • where w is a vector, and x is
  • one of the possible inputs for the TF.
  • Theorem Learning w the Perceptron algorithm
    will make at most (1/ s2) mistakes.

12
Bounding the number of mistakes
  • Proof of the Theorem
  • Claim 1
  • Let w be the vector which the Perceptron
    algorithm holds, and w the one he learns.
  • Every time it makes a mistake, the quantity ww
    increases by at least s.

13
Bounding the number of mistakes
  • Proof of Claim 1
  • If x was a positive example,
  • (wx)w ww xw ww s
  • If x was a negative example,
  • (w-x)w ww - xw
  • ww (-x)w ww s

14
Bounding the number of mistakes
  • Claim 2
  • Every time the Perceptron algorithm makes a
    mistake,the quantity w2 increases by at most 1.
  • Proof of Claim 2
  • If x is a positive example then (xw)(xw)
  • 12wxwwlt 1w2
  • The last inequality holds since wxlt0

15
Bounding the number of mistakes
  • Proof of Claim 2 continued
  • If x is a negative example, then
  • (w-x)(w-x) w2 -2wx 1 lt w2 1
  • The last inequality holds since wx gt0

16
Bounding the number of mistakes
  • Using the two Claims to prove the mistakes bound
  • Let M be the number of mistakes.
  • According to claim 2, after M mistakes
  • it holds that w2 M , or w vM
  • (recall that at initialization w0)

17
Bounding the number of mistakes
  • We have shown that w vM , where M
  • is the number of mistakes.
  • Now, according to Claim 1, ww goes up
  • by at least s every mistake we make.
  • But since we assumed w is a unit vector,
  • ww w vM ? 0Ms vM

18
Bounding the number of mistakes
  • s minxwx/x M
    1/s2
  • let n be the number of
    possible data vectors
  • What does that bound that we found mean?
  • If s 2-n then the maximum number of mistakes is
    exponential in n.
  • Now, note that s wx/x is actually the
    distance of the closest point to the ideal line
    which separates the examples.
  • So, if the data is well separated, there exists a
    line that separates the positive and negative
    examples and is far enough from any example.
Write a Comment
User Comments (0)
About PowerShow.com