Title: The Online model N. Littlestone
1The Online model (N. Littlestone)
Note, that the teacher can be an adversary or a
friend, and that this may affect the learning
process.
2The Online model, remarks
- 1. There is no separation between learning
- phase and the testing phase (e.g. PAC)
- 2. Pi runs in polynomial time
- 3. The goal is to achieve Pi f after a
- polynomial number of mistakes
3Linear Threshold Functions
- Input X(x1,x2,,xn)
- Output
- if w1x1w2x2wnxngtk f(x)true
- else f(x)false
- Where (w1,w2,,wn) is a vector of constants,
- and k is a constant.
4Linear Threshold Functions
- When learning LTF, we can work with k0, because
we can always add a dummy input x01 and w0k. - The function well work with is of the form
- w1x1w2x2wnxngt0
what is the geometric meaning?
5Learning LTF Online
- k is known.
- The teacher sends us vectors (x0i,x1i,,xni)
- Our goal is to find (w1,w2,,wn)
- assume that x1 For simplicity we will
- And also for the target function w , w1
what's the geometric meaning of this?
6The Perceptron Algorithm, Inspiration
7The Perceptron Algorithm, Inspiration
8The Perceptron Algorithm
- 1. Initialization (w1,w2,,wn)(0,0,0,0)
- 2. The teacher sends us x, we predict
- f(x)true iff wx gt0
- 3. On a mistake, update our hypothesis
- as follows
- Mistake on positive w?wx
- Mistake on negativew?w-x
how does the line wx0 improve on a positive
mistake?
9The Perceptron Alg. - Example
for simplicity, in the example x will not be a
normalized vector
10The Perceptron convergence theorem (Rosenblatt
1962)
The Perceptron algorithm will learn to classify
any linearly seperable set of inputs
11Bounding the number of mistakes
- Define
- s minx wx/x
- where w is a vector, and x is
- one of the possible inputs for the TF.
- Theorem Learning w the Perceptron algorithm
will make at most (1/ s2) mistakes.
12Bounding the number of mistakes
- Proof of the Theorem
- Claim 1
- Let w be the vector which the Perceptron
algorithm holds, and w the one he learns. - Every time it makes a mistake, the quantity ww
increases by at least s.
13Bounding the number of mistakes
- Proof of Claim 1
- If x was a positive example,
- (wx)w ww xw ww s
- If x was a negative example,
- (w-x)w ww - xw
- ww (-x)w ww s
14Bounding the number of mistakes
- Claim 2
- Every time the Perceptron algorithm makes a
mistake,the quantity w2 increases by at most 1. - Proof of Claim 2
- If x is a positive example then (xw)(xw)
- 12wxwwlt 1w2
- The last inequality holds since wxlt0
15Bounding the number of mistakes
- Proof of Claim 2 continued
- If x is a negative example, then
- (w-x)(w-x) w2 -2wx 1 lt w2 1
- The last inequality holds since wx gt0
16Bounding the number of mistakes
- Using the two Claims to prove the mistakes bound
- Let M be the number of mistakes.
- According to claim 2, after M mistakes
- it holds that w2 M , or w vM
- (recall that at initialization w0)
17Bounding the number of mistakes
- We have shown that w vM , where M
- is the number of mistakes.
- Now, according to Claim 1, ww goes up
- by at least s every mistake we make.
- But since we assumed w is a unit vector,
- ww w vM ? 0Ms vM
18Bounding the number of mistakes
- s minxwx/x M
1/s2 - let n be the number of
possible data vectors - What does that bound that we found mean?
- If s 2-n then the maximum number of mistakes is
exponential in n. - Now, note that s wx/x is actually the
distance of the closest point to the ideal line
which separates the examples. - So, if the data is well separated, there exists a
line that separates the positive and negative
examples and is far enough from any example. -