Concept Learning - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Concept Learning

Description:

'Days on which my friend Aldo enjoys his favorite water sport' ... 'Aldo enjoys his favorite sport only on sunny days with Strong wind' ... – PowerPoint PPT presentation

Number of Views:417
Avg rating:3.0/5.0
Slides: 51
Provided by: berrinya
Category:
Tags: aldo | concept | learning

less

Transcript and Presenter's Notes

Title: Concept Learning


1
  • Concept Learning
  • Machine Learning by T. Mitchell (McGraw-Hill)
  • Chp. 2

2
  • A note about this chapter
  • Concept learning, as presented in this chapter
    should be taken as a toy problem that represents
    some important concepts
  • size of the hypothesis space
  • general-to-specific ordering of hypotheses
  • decision boundaries of some hypothesis classes
  • ...
  • as well as a mind exercise.
  • As a real problem, it is rather simple compared
    to what can be done in ML at the present.

3
  • Much of learning involves acquiring general
    concepts from specific training examples
  • e.g. what is a bird? what is a chair?
  • Concept learning Inferring a boolean-valued
    function from training examples of its input and
    output.

4
A Concept Learning Task Example
  • Target Concept
  • Days on which my friend Aldo enjoys his favorite
    water sport
  • (you may find it more intuitive to think of
  • Days on which the beach
    will be crowded concept)
  • Task
  • Learn to predict the value of EnjoySport/Crowded
    for an arbitrary day
  • Training Examples for the Target Concept
  • 6 attributes (Nominal-valued (symbolic)
    attributes)
  • Sky (SUNNY, RAINY, CLOUDY), Temp (WARM,COLD),
    Humidity (NORMAL, HIGH),
  • Wind (STRONG, WEAK), Water (WARM, COOL), Forecast
    (SAME, CHANGE)

5
A Learning Problem
Unknown Function
x1
x2
y f (x1, x2, x3, x4 )
x3
x4
Hypothesis Space (H) Set of all possible
hypotheses that the learner may consider during
learning the target concept. How many?
6
Hypothesis SpaceUnrestricted Case
  • A ? B B A
  • H4 ? H 0,1 ? 0,1 ? 0,1 ? 0,1 ?
    0,1 224 65536 function values
  • After 7 examples, still have 29 512
    possibilities (out of 65536) for f
  • Is learning possible without any assumptions?

7
A Concept Learning Task
  • Hypothesis h Conjunction of Constraints on
    Attributes
  • Constraint Values
  • Specific value (e.g., Water Warm)
  • All values allowed for that attribute (e.g.,
    Water ?)
  • No value allowed for that attribute (e.g., Water
    Ø)
  • Hypothesis Representation
  • Example Hypothesis for EnjoySport
  • Aldo enjoys his favorite sport only on sunny
    days with Strong wind
  • Sky AirTemp Humidity
    Wind Water Forecast
  • ltSunny ? ? Strong ? ?
    gt
  • The most general hypothesis every day is a
    positive example
  • lt?, ?, ?, ?, ?, ?gt
  • The most specific possible hypothesis no day is
    a positive ex.
  • lt ? ,?, ?, ?, ?, ?gt
  • Is this hypothesis consistent with the training
    examples?

8
A Concept Learning Task(2)
  • The instance space, X (book uses set of
    instances)
  • all possible days represented by attributes Sky,
    AirTemp,...
  • Target concept, c
  • Any boolean-valued function defined over the
    instance space X
  • c X ? 0, 1 (I.e. if EnjoySport Yes, then
    c(x) 1 )
  • Training Examples (denoted by D) ordered pair
    ltx, c(x)gt
  • Positive example member of the target concept,
    c(x) 1
  • Negative example nonmember of the target
    concept, c(x) 0
  • Assumption no missing X values
  • No noise in values of c (contradictory labels).
  • Hypotheses Space, H
  • Often picked by the designer
  • H is the set of Boolean valued functions defined
    over X
  • Or you may narrow it down to conjunction of
    constraints on attributes

9
A Concept Learning Task(3)
  • Although the learning task is to determine a
    hypothesis h identical to c, over the entire set
    of instances X, the only information available
    about c is its value over the training instances
    D.
  • Inductive Learning Hypothesis
  • Any hypothesis found to approximate the target
    function well over a sufficiently large set of
    training examples will also approximate the
    target function well over other unobserved
    examples.

10
Concept Learning As Search
  • Concept Learning As Search
  • Concept learning can be viewed as the task of
    searching through a large space of hypotheses
    implicitly defined by the hypothesis
    representation.
  • The goal of this search is to find the
    hypothesis that (best) fits the training
    examples.
  • Sky AirTemp Humidity Wind
    Water Forecast
  • ltSunny/Rainy/Cloudy Warm/Cold Normal/High
    Weak/Strong Warm/Cold Change/Same gt
  • EnjoySport Learning Task
  • Size of the instance space X
  • 3 ? 2 ? 2 ? 2 ? 2 ? 2 96
  • Syntactically distinct hypotheses (including ?,
    Ø)
  • 5 ? 4 ? 4 ? 4 ? 4 ? 4 5120
  • Semantically distinct hypotheses (Ø anywhere
    means the empty set of instances and classifies
    each possible instance as a negative example)
  • 1 (4 ? 3 ? 3 ? 3 ? 3 ? 3) 973
  • Often much larger, sometimes infinite hypotheses
    spaces

11
Concept Learning As Search(2)
  • How to (efficiently) search the hypothesis space?
  • General-to-Specific Ordering of Hypotheses
  • Very useful structure over the hypothesis space H
    for any concept learning problem
  • without explicit enumeration
  • Let hj and hk be boolean-valued functions defined
    over X.
  • hj is more_general_than_or_equal_to hk //accepts
    more instances
  • ..
  • hj is more_general_than hk
  • hj gtg hk
  • if and only if (hj ?g hk) ? not (hk ?g
    hj)
  • hj more_specific_than hk when hk is
    more_general_than hj
  • The relation ?g is independent of the target
    concept

12
Concept Learning As Search(2)
  • How to (efficiently) search the hypothesis space?
  • General-to-Specific Ordering of Hypotheses
  • Very useful structure over the hypothesis space H
    for any concept learning problem
  • without explicit enumeration
  • Let hj and hk be boolean-valued functions defined
    over X.
  • hj is more_general_than_or_equal_to hk //accepts
    more instances
  • hj ?g hk
  • if and only if (?x ? X) (hk(x) 1) ?
    (hj(x) 1)
  • hj is more_general_than hk
  • hj gtg hk
  • if and only if (hj ?g hk) ? not (hk ?g
    hj)
  • hj more_specific_than hk when hk is
    more_general_than hj
  • The relation ?g is independent of the target
    concept

13
Concept Learning As Search (3)
  • h1ltSunny,?,?,Strong, ?, ?gt
  • h2ltSunny,?,?, ?, ?, ?gt
  • h3 ltSunny,?,?, ?, Cool,?gt
  • h1 versus h2
  • h2 imposes fewer constraints
  • h2 classifies more examples as positive
  • any instance classified as positive by h1 is
    classified as positive by h2
  • h2 is more general than h1
  • How about h3?
  • Partial ordering
  • The structure imposed by this partial ordering on
    the hypothesis space H can be exploited for
    efficiently exploring H.

14
Instances, Hypotheses, andthe Partial Ordering
Less-Specific-Than
Instances X
Hypotheses H
Specific
General
h1 ltSunny, ?, ?, Strong, ?, ?gt h2 ltSunny, ?,
?, ?, ?, ?gt h3 ltSunny, ?, ?, ?, Cool, ?gt
x1 ltSunny, Warm, High, Strong, Cool, Samegt x2
ltSunny, Warm, High, Light, Warm, Samegt
h2 ?P h1 h2 ?P h3
?P ? Less-Specific-Than ? More-General-Than
15
  • Idea Exploit the partial order to effectively
    search the space of hypotheses
  • Find_S
  • finds the maximally specific h
  • Candidate_Elimination
  • finds the Version Space containing all consistent
    hypotheses efficiently
  • List-Then-Eliminate Algorithm
  • Dummy algorithm that checks all possible
    hypotheses
  • Mentioned along with CE as a bad alternative

16
Find-S Finding a maximally specific hypothesis
  • Method
  • Begin with the most specific possible hypothesis
    in H
  • Generalize this hypothesis each time if fails to
    cover an observed positive training example.
  • Algorithm
  • Initialize h to the most specific hypothesis in H
  • For each positive training instance x
  • For each attribute constraint ai in h
  • If the constraint ai is NOT satisfied by x
  • Replace ai in h by the next more general
    constraint satisfied by x
  • Output hypothesis h

17
Hypothesis Space Searchby Find-S
Instances X
Hypotheses H
h0 ltØ, Ø, Ø, Ø, Ø, Øgt x1 ltSunny, Warm,
Normal, Strong, Warm, Samegt, h1 ltSunny, Warm,
Normal, Strong, Warm, Samegt x2 ltSunny, Warm,
High, Strong, Warm, Samegt, h2 ltSunny,
Warm, ?, Strong, Warm, Samegt x3
ltRainy, Cold, High, Strong, Warm, Changegt,
- h3 ltSunny, Warm, ?, Strong, Warm,
Samegt x4 ltSunny, Warm, High, Strong, Cool,
Changegt, h4 ltSunny, Warm, ?,
Strong, ?, ?gt
18
Hypothesis Space Searchby Find-S
Instances X
Hypotheses H
Question asked in class What about a different
set of data? (red indicating the change from
previous) Dont we need an update after x3 which
will be accepted by the current
hypothesis? Find-S says ignore negative
examples. What explains the situation!?
h0 ltØ, Ø, Ø, Ø, Ø, Øgt x1 ltSunny, Warm,
Normal, Strong, Warm, Samegt, h1 ltSunny, Warm,
Normal, Strong, Warm, Samegt x2 ltCloudy, Warm,
High, Strong, Warm, Samegt, h2 lt ? , ,
Warm, ?, Strong, Warm, Samegt x3
ltRainy, Warm, High, Strong, Warm, Samegt,
- h3 lt? , Warm, ?, Strong, Warm,
Samegt
19
Find-S(2)
  • Find-S algorithm simply ignores every negative
    example!
  • Current hypothesis h is already consistent with
    the new negative example.
  • No revision is needed
  • Based on the assumptions that
  • H contains a hypothesis describing the true
    concept c
  • Data contains no error
  • Formal proof that h does not need revision in
    response to a negative example
  • Let h be the current concept and c be the target
    concept assumed to be in H
  • c is more_general_than or equal to h (current
    hypothesis)
  • since c covers all of the positive examples, it
    covers more examples as positive compared to h
  • c never covers a negative instance
  • since c is the target concept and is noise-free
  • hence, neither will h
  • by definition of more_general_than
  • Alternative A proof based on contradiction.
    Assume h accepts a negative instance

20
Find-S(3)Shortcomings
  • The algorithm finds one hypothesis, but cant
    tell whether it has found the only hypothesis
    which is consistent with the data or if there are
    more such hypotheses
  • Why prefer the most specific hypothesis?
  • Multiple hypotheses consistent with the training
    example
  • Find-S will find the most specific.
  • Are the training examples consistent?
  • The training examples will contain at least some
    error or noise
  • Such inconsistent sets of training examples can
    mislead Find-S
  • What if there are several maximally specific
    consistent hypotheses?
  • Several maximally specific hypotheses consistent
    with the data,
  • No maximally specific consistent hypothesis

21
Definitions
  • Consistent
  • A hypothesis h is consistent with a set of
    training examples D
  • if and only if h(x) c(x) for each example ltx,
    c(x)gt in D.
  • Consistent(h, D) ? ? ltx, c(x)gt ? D , h(x)
    c(x)
  • Related definitions
  • x satisfies the constraints of hypothesis h when
    h(x) 1
  • h covers a positive training example x if it
    correctly classifies x as positive

22
Version Space
  • Version space
  • The version space, denoted VSH,D, with respect to
    hypothesis space H and training example D, is the
    subset of hypotheses from H which are consistent
    with the training examples in D.
  • VSH,D ? h ? H Consistent(h, D)

23
List-Then-Eliminate
  • The List-Then-Eliminate Algorithm
  • Version Space ? a list containing every
    hypothesis in H
  • For each training example, ltx, c(x)gt

  • remove from Version Space any hypothesis h
    which h(x) ? c(x)
  • Output the list of hypotheses in Version Space
  • Guaranteed to output all hypotheses consistent
    with the training data
  • Can be applied whenever the hypothesis space H is
    finite
  • It requires exhaustively enumerating all
    hypotheses in H
  • not realistic

24
Candidate-Elimination
  • Candidate-Elimination algorithm outputs the set
    of all hypotheses consistent with the training
    examples
  • Without enumerating all hypotheses

25
Version Space
This Version Space, containing all 6 hypotheses
can be compactly represented with its most
specific (S) and most general (G) sets. How to
generate all h in VS, given G and S?
x1 ltSunny, Warm, Normal, Strong, Warm, Samegt,
x2 ltSunny, Warm, High, Strong, Warm,
Samegt, x3 ltRainy, Cold, High, Strong,
Warm, Changegt, - x4 ltSunny, Warm, High,
Strong, Cool, Changegt,
26
Version Space and the Candidate-Elimination(3)
  • The Specific boundary S
  • With respect to hypothesis space H and training
    data D,
  • S is the set of minimally general (i.e.
    maximally specific) members of H consistent with
    D.
  • S ? s ?HConsistent(s,D) ? (?s ?H)s gtg s) ?
    Consistent(s,D)
  • Most specific ? maximal elements of VSH,D
  • ? set of sufficient conditions
  • The General boundary G
  • With respect to hypothesis space H and training
    data D, is the set of maximally general members
    of H consistent with D.
  • G ? g ?HConsistent(g,D) ? (?g ?H)g gtg g) ?
    Consistent(g,D)
  • Most general ? minimal elements of VSH,D
  • ? set of necessary conditions

27
Version Space and the Candidate-Elimination(4)
  • Version space is the set of hypotheses contained
  • in G,
  • plus those contained in S,
  • plus those that lie between G and S in the
    partially ordered hypothesis space.
  • Version space representation theorem
  • Let X be an arbitrary set of instances and let H
    be a set of boolean-valued hypotheses defined
    over X.
  • Let c X ? 0, 1 be an arbitrary target
    concept defined over X,
  • and let D be an arbitrary set of training
    examples ltx, c(x)gt.
  • For all X, H, c, and D such that S and G are
    well defined,
  • VSH,D h ? H (?s ? S) (?g ? G) (g ?g h ?g
    s)
  • Proof Show that all h in VS (as defined before)
    satisfies the rhs condition and all h in rhs is
    in VS (ex. Mitchell-2.6).

28
Representing Version Spaces Another Take on the
Same Definitions
  • Hypothesis Space
  • A finite semilattice (partial ordering
    Less-Specific-Than ? ? all ?)
  • Every pair of hypotheses has a greatest lower
    bound (GLB)
  • VSH,D ? the consistent poset (partially-ordered
    subset of H)
  • Definition General Boundary
  • General boundary G of version space VSH,D set
    of most general members
  • Most general ? minimal elements of VSH,D ? set
    of necessary conditions
  • Definition Specific Boundary
  • Specific boundary S of version space VSH,D set
    of most specific members
  • Most specific ? maximal elements of VSH,D ? set
    of sufficient conditions
  • Version Space
  • Every member of the version space lies between S
    and G
  • VSH,D ? h ? H ? s ? S . ? g ? G . g ?P h ?P
    s where ?P ? Less-Specific-Than

29
Version Space and the Candidate-Elimination(4)
  • The Candidate Elimination algorithm works on the
    same principle as List-than-Eliminate, but using
    a more compact representation of the Version
    Space
  • Version Space is represented by its most general
    and least general(specific) members.
  • Candidate-Elimination Learning Algorithm
  • - Initialize G to the set of maximally general
    hypotheses in H
  • - Initialize S to the set of maximally specific
    hypotheses in H
  • G0 ? lt?, ?, ?, ?, ?, ?gt
  • S0 ? lt ? ,?, ?, ?, ?, ?gt
  • ...

30
Candidate-Elimination(5)
  • Candidate-Elimination Learning Algorithm (cont.)
  • - For each training example d, do
  • If d is a negative example
  • //Specialize G...
  • For each hypothesis g in G that is not consistent
    with d
  • Remove g from G
  • Add to G all minimal specializations h of g such
    that
  • h is consistent with d and some member of S is
    more specific than h
  • Remove from G any hypothesis that is less general
    than another h in G
  • Remove from S any hypothesis inconsistent with d
  • If d is a positive example
  • //Generalize S...
  • For each hypothesis s in S that is not consistent
    with d
  • Remove s from S
  • Add to S all minimal generalizations h of s such
    that
  • h is consistent with d and some member of G is
    more general than h
  • Remove from S any hypothesis that is more general
    than another h in S
  • Remove from G any hypothesis inconsistent with d

31
Candidate-Elimination(6)
  • Candidate-Elimination Algorithm works by
  • computing minimal generalizations and
    specializations,
  • identifying non-minimal, non-maximal hypothesis
  • The algorithm can be applied to any concept
    learning task and hypothesis space for which
    these operations are well-defined

32
Candidate_EliminationExample Trace
d1 ltSunny, Warm, Normal, Strong, Warm, Same, Yesgt
d2 ltSunny, Warm, High, Strong, Warm, Same, Yesgt
d3 ltRainy, Cold, High, Strong, Warm, Change, Nogt
d4 ltSunny, Warm, High, Strong, Cool, Change, Yesgt
G4 Last element of G3 is inconsistent with d4,
must be removed.
G3 What about lt?, ?, Normal, ?, ?, ?gt or
ltCloudy, ?,?, ?, ?, ?gt they
are inconsistent with previous positive examples
that S2 summarizes
33
  • //S summarizes all past positive examples
  • Any hypothesis h more general than S is
    guaranteed to be
  • consistent with all the previous positive
    examples
  • Let h be a generalization of s in S
  • h covers more points than s since it is more
    general
  • In particular, h covers all points covered by s
  • Since s is consistent with all examples, so is
    h
  • //G summarizes all past negative examples
  • Any hypothesis h more specific than G is
    guaranteed to be
  • consistent with all the previous negative
    examples
  • Let h be the specialization of a g in G
  • h covers less points than g
  • In particular, h rejects all negative examples
    rejected by s
  • Since g is consistent with all - examples, so is
    h
  • The learned version space is independent of the
    order in which the training examples are
    presented
  • After all, the VS shows all the consistent
    hypotheses
  • S and G boundary will move closer together with
    more examples, up to convergence

34
Remarks on version spaces and C-E
  • Version Space converge to the correct hypothesis
    provided that
  • - there is no errors in the training examples.
  • Contains Error?
  • Removes the correct target concept from VS since
    all h inconsistent with the training data is
    removed
  • Would be detected as empty set of hypotheses
  • - there is some hypothesis in H that correctly
    describes the target concept
  • Target concept not in H?
  • E.g. if the target concept is a disjunction of
    feature attributes and the hypothesis space
    supports only conjunctive descriptions.
  • The target concept is exactly learned when the S
    and G boundary sets converge to a single,
    identical hypothesis

35
What Next Training Example?
36
What Next Training Example?
  • What Training Example Should the Learner Request
    Next?
  • e.g. ltSunny, Warm, Normal, Light, Warm, Samegt
  • ltSunny, Warm, Normal, Strong, Cool, Changegt
  • ltRainy, Cold, Normal, Light, Warm, Samegt
  • Optimal query strategy for a concept learner is
    to generate instances that satisfy exactly half
    the hypotheses in the current version space
  • If the size of VS is reduced by half with each
    new example correct target concept be found only
    ?log2 VS? experiments.

37
Summary Points Terminology
  • Supervised Learning
  • Concept - function from observations to
    categories (so far, boolean-valued /-)
  • Target (function) - true function f
  • Hypothesis - proposed function h believed to be
    similar to f
  • Hypothesis space - space of all hypotheses that
    can be generated by the learning system
  • Example - tuples of the form ltx, f(x)gt
  • Instance space (aka example space) - space of all
    possible examples
  • Classifier - discrete-valued function whose range
    is a set of class labels
  • The Version Space Algorithm
  • Algorithms Find-S, List-Then-Eliminate,
    Candidate Elimination
  • Consistent hypothesis - one that correctly
    predicts observed examples
  • Version space - space of all currently consistent
    (or satisfiable) hypotheses
  • Inductive Learning
  • Inductive generalization - process of generating
    hypotheses that describe cases not yet observed
  • The inductive learning hypothesis

38
Summary Points
  • Concept Learning as Search through H
  • Hypothesis space H as a state space
  • Learning finding the correct hypothesis
  • General-to-Specific Ordering over H
  • Partially-ordered set Less-Specific-Than
    (More-General-Than) relation
  • Upper and lower bounds in H
  • Version Space Candidate Elimination Algorithm
  • S and G boundaries characterize learners
    uncertainty
  • Version space can be used to make predictions
    over unseen cases
  • Learner Can Generate Useful Queries
  • Next Lecture When and Why Are Inductive Leaps
    Possible?

39
Remarks on version spaces and C-E
  • How can partially learned concepts be used?
  • No additional training example, multiple
    remaining hypotheses
  • Example Table 2.6 (p 39)
  • Instance A satisfies every member of S
  • No need to look further, it will satisfy all h in
    VS
  • Classify as a positive example
  • Instance B satisfies none of the members of G
  • No need to look further, it will not satisfy any
    h in VS
  • Classify as a negative example
  • Instance C Half of VS is positive and Half of VS
    is negative
  • most ambiguous new information for refining the
    version space
  • Instance D classified as positive by two of VS,
    as negative by others
  • output the majority vote with a confidence rating

40
  • Inductive Bias
  • Mitchell-Chp. 2

41
What Justifies This Inductive Leap?
  • Example Inductive Generalization
  • Positive example ltSunny, Warm, Normal, Strong,
    Cool, Change, Yesgt
  • Positive example ltSunny, Warm, Normal, Light,
    Warm, Same, Yesgt
  • Induced S ltSunny, Warm, Normal, ?,
    ?, ?gt
  • Why Believe We Can Classify The Unseen?
  • e.g., ltSunny, Warm, Normal,
    Strong, Warm, Samegt

42
Inductive Bias
  • A biased Hypothesis space
  • EnjoySport example
  • Restriction only conjunctions of attribute
    values.
  • No representation for a disjunctive target
    concept
  • Sky Sunny or Wind Weak
  • Potential problem
  • We biased the learner (inductive bias) to
    consider only conjunctive hypotheses
  • But the concept may require a more expressive
    hypothesis space

43
UnBiased Learner
  • An Unbiased Learner
  • Obvious solution Provide a hypothesis space
    capable of representing every teachable concept
    every possible subset of the instance space X
  • The set of all subsets of a set X is called the
    power set of X
  • EnjoySport Example
  • Instance space 96
  • Power set of X 296 792281625142643375935439
    50336
  • Conjunctive hypothesis space 973
  • Very biased hypothesis space indeed!!

44
Need for Inductive Bias
  • An Unbiased Learner
  • Reformulate the EnjoySport learning task in an
    unbiased way
  • Defining a new hypothesis space H that can
    represent every subset of X
  • Allow arbitrary disjunctions, conjunctions, and
    negations
  • Example Sky Sunny or Wind Weak
  • ltSunny, ?, ?, ?, ?, ?gt V lt?, ?, ?, Weak, ?, ?gt
  • New problem Completely unable to generalize
    beyond the observed example! Intuition?
  • What is S and G boundaries?
  • The S boundary of VS will contain just the
    disjunction of the positive examples.
  • Three positive examples (x1, x2, x3), S (x1 V
    x2 V x3)
  • The G boundary of VS will consist of the
    hypothesis that rules out only the observed
    negative examples.
  • Two negative examples (x4, x5), G (x4 V x5)
  • In order to converge to a single final concept,
  • we would have to present every single instance
    in X as a training example.

45
Need for Inductive Bias
  • How about taking a vote among the consistent
    hypotheses in VS?
  • For the unseen instances, taking a vote is futile
  • Half of the hypotheses in VS will decide
    positive,
  • Half of the hypotheses in VS will decide negative
  • Assume a previously unseen instance x
  • For any hypothesis h that covers x as positive,
    there will be another hypothesis h that is
    identical to h except for its classification of
    x.
  • If h is in VS, so will be h because it agrees
    with h on all the observed training samples
  • The problem is not specific to Candidate
    Elimination algorithm

46
Need For Inductive Bias
  • Fundamental property of inductive inference
  • A learner that makes no a priori assumptions
    regarding the identity of the target concept has
    no rational basis for classifying any unseen
    instances

47
Inductive Bias Definition
  • Consider
  • Concept learning algorithm L, instance X, target
    concept c
  • Training examples Dc ltx, c(x)gt,
  • Let L(xi, Dc) denote the classification(of xi by
    L after training on Dc)
  • The Label L(xi, Dc) need not be correct.
  • What assumptions should we make so that it
    follows deductively?
  • Definition
  • Inductive bias of L is any minimal set of
    assertions B such that
  • for any target concept c and corresponding
    training examples Dc assumptions in B justify
    its inductive inferences as deductive inferences
  • (? xi ?X) (B ? Dc ? xi ) L(xi, Dc)
  • where y z means
    that z follows deductively from y

48
Inductive Bias Candidate Elimination
  • Inductive bias of Candidate-Elimination
    algorithm, assuming that CE classifies a new
    instance x if the vote is unanimous1
  • Inductive bias The target concept c is contained
    in the given hypothesis space H
  • If c is in H, it is also in VS.
  • If all h in VS votes unanimously, it must be that
    c(xi) L(xi,Dc)
  • 1Note that the h in VS may classiffy a new
    instance x differently if we do not assume
    unanimous voting

49
Three Learners with Different Biases
  • Rote Learner
  • Stores each observed training example in memory
  • Classifies x if and only if it matches previously
    observed example
  • Weakest bias no bias (the classification follows
    deductively from training examples)
  • Candidate Elimination Algorithm
  • Stores extremal generalizations and
    specializations
  • Classifies x if and only if all members of VS
    agree on the classification
  • Stronger bias the target concept is contained in
    the given hypothesis space H
  • Find-S
  • Stores the most specific hypothesis
  • Classifies all subsequent data
  • Even stronger bias the target concept is
    contained in the given hypothesis space H
  • and all instances are negative unless the
    opposite is entailed by its maximally specific

50
  • It is useful to characterize different learning
    approaches by the inductive bias they employ
  • More strongly biased methods make a larger
    inductive jump/leap
  • They classify (i.e. not reject) a greater
    proportion of unseen instances
  • The correctness is another issue!
  • Types of biases
  • Categorical assumptions that completely rule out
    certain concepts
  • Preferential biases
  • Implicit/unchangeable by the learner or not

51
Summary
  • Concept Learning can be cast as a problem of
    searching through a large predefined space of
    potential hypotheses
  • The general-to-specific partial ordering of
    hypotheses provides a useful structure for
    organizing the search through the hypothesis
    space
  • Find-S algorithm, Candidate-Elimination
    algorithm(Non noisy data)
  • S,G set delimit the entire set of hypotheses
    consistent with the data
  • Inductive learning algorithms are able to
    classify unseen examples
  • Every possible subset of instances(the power set
    of the instances)
  • Remove any inductive bias from the Candidate
    Elimination algorithm
  • Also remove the ability to classify any instance
    beyond the observed training example
  • Unbiased learner cannot make inductive leaps to
    classify unseen exam.
Write a Comment
User Comments (0)
About PowerShow.com