Experts and Boosting Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Experts and Boosting Algorithms

Description:

Set bt= et/(1-et) wt 1(i) = wt(i) (bt)e, where e=1-|ht(xi) ... Assume et= 1/2 - g. We bound: Learning OR with few attributes. Target function: OR of k literals ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 43
Provided by: Compu428
Category:

less

Transcript and Presenter's Notes

Title: Experts and Boosting Algorithms


1
Experts and Boosting Algorithms
2
Experts Motivation
  • Given a set of experts
  • No prior information
  • No consistent behavior
  • Goal Predict as the best expert
  • Model
  • online model
  • Input historical results.

3
Experts Model
  • N strategies (experts)
  • At time t
  • Learner A chooses a distribution over N.
  • Let pt(i) probability of i-th expert.
  • Clearly Spt(i) 1
  • Receiving a loss vector lt
  • Loss at time t Spt(i) lt(i)
  • Assume bounded loss, lt(i) in 0,1

4
Expert Goal
  • Match the loss of best expert.
  • Loss
  • LA
  • Li
  • Can we hope to do better?

5
Example Guessing letters
  • Setting
  • Alphabet S of k letters
  • Loss
  • 1 incorrect guess
  • 0 correct guess
  • Experts
  • Each expert guesses a certain letter always.
  • Game guess the most popular letter online.

6
Example 2 Rock-Paper-Scissors
  • Two player game.
  • Each player chooses Rock, Paper, or Scissors.
  • Loss Matrix
  • Goal Play as best as we can given the opponent.

Rock Paper Scissors
Rock 1/2 1 0
Paper 0 1/2 1
Scissors 1 0 1/2
7
Example 3 Placing a point
  • Action choosing a point d.
  • Loss (give the true location y) d-y.
  • Experts One for each point.
  • Important Loss is Convex
  • Goal Find a center

8
Experts Algorithm Greedy
  • For each expert define its cumulative loss
  • Greedy At time t choose the expert with minimum
    loss, namely, arg min Lit

9
Greedy Analysis
  • Theorem Let LGT be the loss of Greedy at time T,
    then
  • Proof!

10
Better Expert Algorithms
  • Would like to bound

11
Expert Algorithm Hedge(b)
  • Maintains weight vector wt
  • Probabilities pt(k) wt(k) / S wt(j)
  • Initialization w1(i) 1/N
  • Updates
  • wt1(k) wt(k) Ub(lt(k))
  • where b in 0,1 and
  • br lt Ub (r) lt 1-(1-b)r

12
Hedge Analysis
  • Lemma For any sequence of losses
  • Proof!
  • Corollary

13
Hedge Properties
  • Bounding the weights
  • Similarly for a subset of experts.

14
Hedge Performance
  • Let k be with minimal loss
  • Therefore

15
Hedge Optimizing b
  • For b1/2 we have
  • Better selection of b

16
Occam Razor
17
Occam Razor
  • Finding the shortest consistent hypothesis.
  • Definition (a,b)-Occam algorithm
  • a gt0 and b lt1
  • Input a sample S of size m
  • Output hypothesis h
  • for every (x,b) in S h(x)b
  • size(h) lt sizea(ct) mb
  • Efficiency.

18
Occam algorithm and compression
A
B
S (xi,bi)
x1, , xm
19
compression
  • Option 1
  • A sends B the values b1 , , bm
  • m bits of information
  • Option 2
  • A sends B the hypothesis h
  • Occam large enough m has size(h) lt m
  • Option 3 (MDL)
  • A sends B a hypothesis h and corrections
  • complexity size(h) size(errors)

20
Occam Razor Theorem
  • A (a,b)-Occam algorithm for C using H
  • D distribution over inputs X
  • ct in C the target function
  • Sample size
  • with probability 1-d A(S)h has error(h) lt e

21
Occam Razor Theorem
  • Use the bound for finite hypothesis class.
  • Effective hypothesis class size 2size(h)
  • size(h) lt na mb
  • Sample size

22
Weak and Strong Learning
23
PAC Learning model
  • There exists a distribution D over domain X
  • Examples ltx, c(x)gt
  • use c for target function (rather than ct)
  • Goal
  • With high probability (1-d)
  • find h in H such that
  • error(h,c ) lt e
  • e arbitrarily small.

24
Weak Learning Model
  • Goal error(h,c) lt ½ - g
  • The parameter g is small
  • constant
  • 1/poly
  • Intuitively A much easier task
  • Question
  • Assume C is weak learnable,
  • C is PAC (strong) learnable

25
Majority Algorithm
  • Hypothesis hM(x) MAJ h1(x), ... , hT(x)
  • size(hM) lt T size(ht)
  • Using Occam Razor

26
Majority outline
  • Sample m example
  • Start with a distribution 1/m per example.
  • Modify the distribution and get ht
  • Hypothesis is the majority
  • Terminate when perfect classification
  • of the sample

27
Majority Algorithm
  • Use the Hedge algorithm.
  • The experts will be associate with points.
  • Loss would be a correct classification.
  • lt(i) 1 - ht(xi) c(xi)
  • Setting b 1- g
  • hM(x) MAJORITY( hi(x))
  • Q How do we set T?

28
Majority Analysis
  • Consider the set of errors S
  • Si hM(xi)?c(xi)
  • For ever i in S
  • Li/T lt ½ (Proof!)
  • From Hedge properties

29
MAJORITY Correctness
  • Error Probability
  • Number of Rounds
  • Terminate when error less than 1/m

30
AdaBoost Dynamic Boosting
  • Better bounds on the error
  • No need to know g
  • Each round a different b
  • as a function of the error

31
AdaBoost Input
  • Sample of size m lt xi,c(xi) gt
  • A distribution D over examples
  • We will use D(xi)1/m
  • Weak learning algorithm
  • A constant T (number of iterations)

32
AdaBoost Algorithm
  • Initialization w1(i) D(xi)
  • For t 1 to T DO
  • pt(i) wt(i) / Swt(j)
  • Call Weak Learner with pt
  • Receive ht
  • Compute the error et of ht on pt
  • Set bt et/(1-et)
  • wt1(i) wt(i) (bt)e, where e1-ht(xi)-c(xi)
  • Output

33
AdaBoost Analysis
  • Theorem
  • Given e1, ... , eT
  • the error e of hA is bounded by

34
AdaBoost Proof
  • Let lt(i) 1-ht(xi)-c(xi)
  • By definition pt lt 1 et
  • Upper bounding the sum of weights
  • From the Hedge Analysis.
  • Error occurs only if

35
AdaBoost Analysis (cont.)
  • Bounding the weight of a point
  • Bounding the sum of weights
  • Final bound as function of bt
  • Optimizing bt
  • bt et / (1 et)

36
AdaBoost Fixed bias
  • Assume et 1/2 - g
  • We bound

37
Learning OR with few attributes
  • Target function OR of k literals
  • Goal learn in time
  • polynomial in k and log n
  • e and d constant
  • ELIM makes slow progress
  • disqualifies one literal per round
  • May remain with O(n) literals

38
Set Cover - Definition
  • Input S1 , , St and Si ? U
  • Output Si1, , Sik and ?j SjkU
  • Question Are there k sets that cover U?
  • NP-complete

39
Set Cover Greedy algorithm
  • j0 UjU C?
  • While Uj ? ?
  • Let Si be arg max Si ? Uj
  • Add Si to C
  • Let Uj1 Uj Si
  • j j1

40
Set Cover Greedy Analysis
  • At termination, C is a cover.
  • Assume there is a cover C of size k.
  • C is a cover for every Uj
  • Some S in C covers Uj/k elements of Uj
  • Analysis of Uj Uj1 ? Uj - Uj/k
  • Solving the recursion.
  • Number of sets j lt k ln U

41
Building an Occam algorithm
  • Given a sample S of size m
  • Run ELIM on S
  • Let LIT be the set of literals
  • There exists k literals in LIT that classify
    correctly all S
  • Negative examples
  • any subset of LIT classifies theme correctly

42
Building an Occam algorithm
  • Positive examples
  • Search for a small subset of LIT
  • Which classifies S correctly
  • For a literal z build Tzx z satisfies x
  • There are k sets that cover S
  • Find k ln m sets that cover S
  • Output h the OR of the k ln m literals
  • Size (h) lt k ln m log 2n
  • Sample size m O( k log n log (k log n))
Write a Comment
User Comments (0)
About PowerShow.com