The Informational Complexity of Interactive Machine Learning - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

The Informational Complexity of Interactive Machine Learning

Description:

... only the algorithm had an image of a car. What's this a picture of? Horse. Planet. Person. Car. Steve Hanneke ... Click on a picture of a car, if there is one. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 34
Provided by: scie5
Category:

less

Transcript and Presenter's Notes

Title: The Informational Complexity of Interactive Machine Learning


1
The Informational Complexity of Interactive
Machine Learning
  • Steve Hanneke

2
Passive Learning
Data Source
Expert / Oracle
Learning Algorithm
Raw Unlabeled Data
Labeled examples
Algorithm outputs a classifier
3
Learning by Interaction The Big Picture
Data Source
Learning Algorithm
Expert / Oracle
Raw Unlabeled Data
Learner asks a question about the data
Expert answers the question
Learner asks a question about the data
Expert answers the question
. . .
Algorithm outputs a classifier
4
Interactive Learning A Manifesto
  • Machine learning is a collaborative effort
    between human and machine.
  • In passive learning, there is often a bottleneck
    on the human side (data annotation).
  • Conclusion
  • Passive algorithms are lazy collaborators.
  • Interactive algorithms may only require the human
    to expend effort providing relevant details,
    minimizing unnecessary redundancy.

5
The Value of Interaction
  • But how much improvement can we expect for any
    particular learning problem?
  • How much interaction is necessary and sufficient
    for learning?

6
Outline
  • Active learning with label requests
  • Disagreement Coefficient (Hanneke, ICML 2007)
  • Teaching Dimension (Hanneke, COLT 2007)
  • Class-conditional queries
  • Arbitrary Sample-based queries

7
Active Learning with Label Requests
8
Active Learning with Label Requests
  • This is clearly an upper bound on the label
    complexity of active learning.
  • Other than noise rate, VC dimension summarizes
    sample complexity.
  • The algorithm achieving this is ERM, and often
    must be approximated.

9
Outline
  • Active learning with label requests
  • Disagreement Coefficient (Hanneke, ICML 2007)
  • Teaching Dimension (Hanneke, COLT 2007)
  • Class-conditional queries
  • Arbitrary Sample-based queries

10
Reducing Uncertainty
  • Real knowledge is to know the extent of ones
    ignorance.
  • -- Confucius
  • As we know, There are known knowns. There are
    things we know we know. We also know There are
    known unknowns. That is to say We know there
    are some things We do not know. But there are
    also unknown unknowns, The ones we don't know
    We don't know.
  • Donald Rumsfeld, Feb. 12, 2002, Department of
    Defense news briefing

11
Reducing Uncertainty
DIS(B(h,r))
h
Concepts in B(h,r) look like this
12
Reducing Uncertainty A2 Algorithm
Version Space-based Passive Learning
Add the labeled example to the data set.
Repeat
x
(x,y)
D
x
Sample an example from the distribution.
h
h
h
y
Discard concepts we are statistically confident
are suboptimal.
Request its label from the Expert.
Expert
13
Reducing Uncertainty A2 Algorithm
  • A2 (Balcan, Beygelzimer Langford, 2006)

14
Reducing Uncertainty A2 Algorithm
  • A2 BBL06 (slightly oversimplified
    explanation)

Version Space-based Agnostic Active Learning
Add the labeled example to the data set.
If it is not in the region of disagreement,
ignore it (move on to next sample).
Repeat
x
(x,y)
x
D
x
Sample an example from the distribution.
h
h
h
Discard concepts we are statistically confident
are suboptimal (wrt the filtered distribution).
y
If it is in the region of disagreement, request
its label from the Expert.
Expert
15
Reducing Uncertainty
16
Outline
  • Active learning with label requests
  • Disagreement Coefficient (Hanneke, ICML 2007)
  • Teaching Dimension (Hanneke, COLT 2007)
  • Class-conditional queries
  • Arbitrary Sample-based queries

17
Exact Learning Halving Algorithm
  • Suppose we can hand the teacher a concept, and
    ask for an example that contradicts it if one
    exists. (Equivalence queries)
  • The Halving algorithm (Littlestone, 88)
  • Let hmaj be the majority vote concept of C
  • Ask for an example (X,Y) where hmaj is wrong
  • If no such example exists, return hmaj
  • Else remove from C any h with h(X) ? Y
  • The Halving algorithm needs at most logC
    queries to identify any target function in C.

18
Exact Learning Membership Queries
  • Suppose, instead of equivalence queries, we can
    request the label of any example in X.
  • We still want to run the Halving algorithm.
  • How many label requests does it take to build an
    equivalence query?

19
Teaching Dimension (Hegedüs, 95)
20
Teaching Dimension for PAC
Say V is linear separators.
Sample U from D.
A specifying set uniquely identifies (at most)
one labeling in VU.
As an example, take f to be this colored region.
21
XTD and Label Complexity
22
XTD and Label Complexity
Conjecture a bound of this form is valid, even
with no knowledge of the noise rate (i.e., for
agnostic learning).
23
Outline
  • Active learning with label requests
  • Disagreement Coefficient (Hanneke, ICML 2007)
  • Teaching Dimension (Hanneke, COLT 2007)
  • Class-conditional queries
  • Arbitrary Sample-based queries

24
What about other types of queries?
  • Ask the question you want answered

For example, consider multiclass image
classification. Perhaps learning would be easier
if only the algorithm had an image of a car.
Whats this a picture of?
Horse
Planet
Person
Car
25
Class-Conditional Queries
  • Ask the question you want answered

For example, consider multiclass image
classification. Perhaps learning would be easier
if only the algorithm had an image of a car.
Click on a picture of a car, if there is one.
Can do this for each class individually (except
perhaps the other class)
26
Class-Conditional Queries
  • A concrete example Conjunctions (without noise).

27
Outline
  • Active learning with label requests
  • Disagreement Coefficient (Hanneke, ICML 2007)
  • Teaching Dimension (Hanneke, COLT 2007)
  • Class-conditional queries
  • Arbitrary Sample-based queries

28
Arbitrary Example-based Queries
  • Suppose we let the algorithm ask any question it
    wants about the data labels.

29
Cost Complexity
30
Questions? (cost free ?)
31
Open Problems for Label Queries
  • The value of having more unlabeled data?
    (especially for Agnostic learning).
  • Optimal agnostic active learning algorithm?

32
Open Problems
  • Unknown cost functions
  • E.g., maybe examples near the separator are more
    expensive to label.
  • Other types of queries
  • E.g., give me a rule/explanation you used to
    decide the label of this example.

33
Definition of GIC
  • Say the teacher gets drunk, and doesnt
    necessarily answer accurately. But she manages
    to scribble her answers to every question on a
    piece of paper.
  • We have a spy who steals the paper and
    photocopies it.
  • The spy tells us exactly which questions to ask
    so that using minimum cost there is at most one
    concept in C consistent with the answers.
  • Define GIC(C,c) as the worst-case cost of this
    game.
Write a Comment
User Comments (0)
About PowerShow.com