Concept Learning

About This Presentation

Title:

Concept Learning

Description:

'Days on which my friend Aldo enjoys his favorite water sport' ... 'Aldo enjoys his favorite sport only on sunny days with Strong wind' ... – PowerPoint PPT presentation

Number of Views:417

Avg rating:3.0/5.0

Slides: 51

Provided by: berrinya

Category:

more less

Transcript and Presenter's Notes

Title: Concept Learning

1

Concept Learning
Machine Learning by T. Mitchell (McGraw-Hill)
Chp. 2

A note about this chapter
Concept learning, as presented in this chapter
should be taken as a toy problem that represents
some important concepts
size of the hypothesis space
general-to-specific ordering of hypotheses
decision boundaries of some hypothesis classes
...
as well as a mind exercise.
As a real problem, it is rather simple compared
to what can be done in ML at the present.

Much of learning involves acquiring general
concepts from specific training examples
e.g. what is a bird? what is a chair?
Concept learning Inferring a boolean-valued
function from training examples of its input and
output.

4
A Concept Learning Task Example

Target Concept
Days on which my friend Aldo enjoys his favorite
water sport
(you may find it more intuitive to think of
Days on which the beach
will be crowded concept)
Task
Learn to predict the value of EnjoySport/Crowded
for an arbitrary day
Training Examples for the Target Concept

6 attributes (Nominal-valued (symbolic)
attributes)
Sky (SUNNY, RAINY, CLOUDY), Temp (WARM,COLD),
Humidity (NORMAL, HIGH),
Wind (STRONG, WEAK), Water (WARM, COOL), Forecast
(SAME, CHANGE)

5
A Learning Problem
Unknown Function
x1
x2
y f (x1, x2, x3, x4 )
x3
x4
Hypothesis Space (H) Set of all possible
hypotheses that the learner may consider during
learning the target concept. How many?
6
Hypothesis SpaceUnrestricted Case

A ? B B A
H4 ? H 0,1 ? 0,1 ? 0,1 ? 0,1 ?
0,1 224 65536 function values
After 7 examples, still have 29 512
possibilities (out of 65536) for f
Is learning possible without any assumptions?

7
A Concept Learning Task

Hypothesis h Conjunction of Constraints on
Attributes
Constraint Values
Specific value (e.g., Water Warm)
All values allowed for that attribute (e.g.,
Water ?)
No value allowed for that attribute (e.g., Water
Ø)
Hypothesis Representation
Example Hypothesis for EnjoySport
Aldo enjoys his favorite sport only on sunny
days with Strong wind
Sky AirTemp Humidity
Wind Water Forecast
ltSunny ? ? Strong ? ?
gt
The most general hypothesis every day is a
positive example
lt?, ?, ?, ?, ?, ?gt
The most specific possible hypothesis no day is
a positive ex.
lt ? ,?, ?, ?, ?, ?gt
Is this hypothesis consistent with the training
examples?

8
A Concept Learning Task(2)

The instance space, X (book uses set of
instances)
all possible days represented by attributes Sky,
AirTemp,...
Target concept, c
Any boolean-valued function defined over the
instance space X
c X ? 0, 1 (I.e. if EnjoySport Yes, then
c(x) 1 )
Training Examples (denoted by D) ordered pair
ltx, c(x)gt
Positive example member of the target concept,
c(x) 1
Negative example nonmember of the target
concept, c(x) 0
Assumption no missing X values
No noise in values of c (contradictory labels).
Hypotheses Space, H
Often picked by the designer
H is the set of Boolean valued functions defined
over X
Or you may narrow it down to conjunction of
constraints on attributes

9
A Concept Learning Task(3)

Although the learning task is to determine a
hypothesis h identical to c, over the entire set
of instances X, the only information available
about c is its value over the training instances
D.
Inductive Learning Hypothesis
Any hypothesis found to approximate the target
function well over a sufficiently large set of
training examples will also approximate the
target function well over other unobserved
examples.

10
Concept Learning As Search

Concept Learning As Search
Concept learning can be viewed as the task of
searching through a large space of hypotheses
implicitly defined by the hypothesis
representation.
The goal of this search is to find the
hypothesis that (best) fits the training
examples.
Sky AirTemp Humidity Wind
Water Forecast
ltSunny/Rainy/Cloudy Warm/Cold Normal/High
Weak/Strong Warm/Cold Change/Same gt
EnjoySport Learning Task
Size of the instance space X
3 ? 2 ? 2 ? 2 ? 2 ? 2 96
Syntactically distinct hypotheses (including ?,
Ø)
5 ? 4 ? 4 ? 4 ? 4 ? 4 5120
Semantically distinct hypotheses (Ø anywhere
means the empty set of instances and classifies
each possible instance as a negative example)
1 (4 ? 3 ? 3 ? 3 ? 3 ? 3) 973
Often much larger, sometimes infinite hypotheses
spaces

11
Concept Learning As Search(2)

How to (efficiently) search the hypothesis space?
General-to-Specific Ordering of Hypotheses
Very useful structure over the hypothesis space H
for any concept learning problem
without explicit enumeration
Let hj and hk be boolean-valued functions defined
over X.
hj is more_general_than_or_equal_to hk //accepts
more instances
..
hj is more_general_than hk
hj gtg hk
if and only if (hj ?g hk) ? not (hk ?g
hj)
hj more_specific_than hk when hk is
more_general_than hj
The relation ?g is independent of the target
concept

12
Concept Learning As Search(2)

How to (efficiently) search the hypothesis space?
General-to-Specific Ordering of Hypotheses
Very useful structure over the hypothesis space H
for any concept learning problem
without explicit enumeration
Let hj and hk be boolean-valued functions defined
over X.
hj is more_general_than_or_equal_to hk //accepts
more instances
hj ?g hk
if and only if (?x ? X) (hk(x) 1) ?
(hj(x) 1)
hj is more_general_than hk
hj gtg hk
if and only if (hj ?g hk) ? not (hk ?g
hj)
hj more_specific_than hk when hk is
more_general_than hj
The relation ?g is independent of the target
concept

13
Concept Learning As Search (3)

h1ltSunny,?,?,Strong, ?, ?gt
h2ltSunny,?,?, ?, ?, ?gt
h3 ltSunny,?,?, ?, Cool,?gt
h1 versus h2
h2 imposes fewer constraints
h2 classifies more examples as positive
any instance classified as positive by h1 is
classified as positive by h2
h2 is more general than h1
How about h3?
Partial ordering
The structure imposed by this partial ordering on
the hypothesis space H can be exploited for
efficiently exploring H.

14
Instances, Hypotheses, andthe Partial Ordering
Less-Specific-Than
Instances X
Hypotheses H
Specific
General
h1 ltSunny, ?, ?, Strong, ?, ?gt h2 ltSunny, ?,
?, ?, ?, ?gt h3 ltSunny, ?, ?, ?, Cool, ?gt
x1 ltSunny, Warm, High, Strong, Cool, Samegt x2
ltSunny, Warm, High, Light, Warm, Samegt
h2 ?P h1 h2 ?P h3
?P ? Less-Specific-Than ? More-General-Than
15

Idea Exploit the partial order to effectively
search the space of hypotheses
Find_S
finds the maximally specific h
Candidate_Elimination
finds the Version Space containing all consistent
hypotheses efficiently
List-Then-Eliminate Algorithm
Dummy algorithm that checks all possible
hypotheses
Mentioned along with CE as a bad alternative

16
Find-S Finding a maximally specific hypothesis

Method
Begin with the most specific possible hypothesis
in H
Generalize this hypothesis each time if fails to
cover an observed positive training example.
Algorithm
Initialize h to the most specific hypothesis in H
For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is NOT satisfied by x
Replace ai in h by the next more general
constraint satisfied by x
Output hypothesis h

17
Hypothesis Space Searchby Find-S
Instances X
Hypotheses H
h0 ltØ, Ø, Ø, Ø, Ø, Øgt x1 ltSunny, Warm,
Normal, Strong, Warm, Samegt, h1 ltSunny, Warm,
Normal, Strong, Warm, Samegt x2 ltSunny, Warm,
High, Strong, Warm, Samegt, h2 ltSunny,
Warm, ?, Strong, Warm, Samegt x3
ltRainy, Cold, High, Strong, Warm, Changegt,
- h3 ltSunny, Warm, ?, Strong, Warm,
Samegt x4 ltSunny, Warm, High, Strong, Cool,
Changegt, h4 ltSunny, Warm, ?,
Strong, ?, ?gt
18
Hypothesis Space Searchby Find-S
Instances X
Hypotheses H
Question asked in class What about a different
set of data? (red indicating the change from
previous) Dont we need an update after x3 which
will be accepted by the current
hypothesis? Find-S says ignore negative
examples. What explains the situation!?
h0 ltØ, Ø, Ø, Ø, Ø, Øgt x1 ltSunny, Warm,
Normal, Strong, Warm, Samegt, h1 ltSunny, Warm,
Normal, Strong, Warm, Samegt x2 ltCloudy, Warm,
High, Strong, Warm, Samegt, h2 lt ? , ,
Warm, ?, Strong, Warm, Samegt x3
ltRainy, Warm, High, Strong, Warm, Samegt,
- h3 lt? , Warm, ?, Strong, Warm,
Samegt
19
Find-S(2)

Find-S algorithm simply ignores every negative
example!
Current hypothesis h is already consistent with
the new negative example.
No revision is needed
Based on the assumptions that
H contains a hypothesis describing the true
concept c
Data contains no error
Formal proof that h does not need revision in
response to a negative example
Let h be the current concept and c be the target
concept assumed to be in H
c is more_general_than or equal to h (current
hypothesis)
since c covers all of the positive examples, it
covers more examples as positive compared to h
c never covers a negative instance
since c is the target concept and is noise-free
hence, neither will h
by definition of more_general_than
Alternative A proof based on contradiction.
Assume h accepts a negative instance

20
Find-S(3)Shortcomings

The algorithm finds one hypothesis, but cant
tell whether it has found the only hypothesis
which is consistent with the data or if there are
more such hypotheses
Why prefer the most specific hypothesis?
Multiple hypotheses consistent with the training
example
Find-S will find the most specific.
Are the training examples consistent?
The training examples will contain at least some
error or noise
Such inconsistent sets of training examples can
mislead Find-S
What if there are several maximally specific
consistent hypotheses?
Several maximally specific hypotheses consistent
with the data,
No maximally specific consistent hypothesis

21
Definitions

Consistent
A hypothesis h is consistent with a set of
training examples D
if and only if h(x) c(x) for each example ltx,
c(x)gt in D.
Consistent(h, D) ? ? ltx, c(x)gt ? D , h(x)
c(x)
Related definitions
x satisfies the constraints of hypothesis h when
h(x) 1
h covers a positive training example x if it
correctly classifies x as positive

22
Version Space

Version space
The version space, denoted VSH,D, with respect to
hypothesis space H and training example D, is the
subset of hypotheses from H which are consistent
with the training examples in D.
VSH,D ? h ? H Consistent(h, D)

23
List-Then-Eliminate

The List-Then-Eliminate Algorithm
Version Space ? a list containing every
hypothesis in H
For each training example, ltx, c(x)gt
remove from Version Space any hypothesis h
which h(x) ? c(x)
Output the list of hypotheses in Version Space
Guaranteed to output all hypotheses consistent
with the training data
Can be applied whenever the hypothesis space H is
finite
It requires exhaustively enumerating all
hypotheses in H
not realistic

24
Candidate-Elimination

Candidate-Elimination algorithm outputs the set
of all hypotheses consistent with the training
examples
Without enumerating all hypotheses

25
Version Space
This Version Space, containing all 6 hypotheses
can be compactly represented with its most
specific (S) and most general (G) sets. How to
generate all h in VS, given G and S?
x1 ltSunny, Warm, Normal, Strong, Warm, Samegt,
x2 ltSunny, Warm, High, Strong, Warm,
Samegt, x3 ltRainy, Cold, High, Strong,
Warm, Changegt, - x4 ltSunny, Warm, High,
Strong, Cool, Changegt,
26
Version Space and the Candidate-Elimination(3)

The Specific boundary S
With respect to hypothesis space H and training
data D,
S is the set of minimally general (i.e.
maximally specific) members of H consistent with
D.
S ? s ?HConsistent(s,D) ? (?s ?H)s gtg s) ?
Consistent(s,D)
Most specific ? maximal elements of VSH,D
? set of sufficient conditions
The General boundary G
With respect to hypothesis space H and training
data D, is the set of maximally general members
of H consistent with D.
G ? g ?HConsistent(g,D) ? (?g ?H)g gtg g) ?
Consistent(g,D)
Most general ? minimal elements of VSH,D
? set of necessary conditions

27
Version Space and the Candidate-Elimination(4)

Version space is the set of hypotheses contained
in G,
plus those contained in S,
plus those that lie between G and S in the
partially ordered hypothesis space.
Version space representation theorem
Let X be an arbitrary set of instances and let H
be a set of boolean-valued hypotheses defined
over X.
Let c X ? 0, 1 be an arbitrary target
concept defined over X,
and let D be an arbitrary set of training
examples ltx, c(x)gt.
For all X, H, c, and D such that S and G are
well defined,
VSH,D h ? H (?s ? S) (?g ? G) (g ?g h ?g
s)
Proof Show that all h in VS (as defined before)
satisfies the rhs condition and all h in rhs is
in VS (ex. Mitchell-2.6).

28
Representing Version Spaces Another Take on the
Same Definitions

Hypothesis Space
A finite semilattice (partial ordering
Less-Specific-Than ? ? all ?)
Every pair of hypotheses has a greatest lower
bound (GLB)
VSH,D ? the consistent poset (partially-ordered
subset of H)
Definition General Boundary
General boundary G of version space VSH,D set
of most general members
Most general ? minimal elements of VSH,D ? set
of necessary conditions
Definition Specific Boundary
Specific boundary S of version space VSH,D set
of most specific members
Most specific ? maximal elements of VSH,D ? set
of sufficient conditions
Version Space
Every member of the version space lies between S
and G
VSH,D ? h ? H ? s ? S . ? g ? G . g ?P h ?P
s where ?P ? Less-Specific-Than

29
Version Space and the Candidate-Elimination(4)

The Candidate Elimination algorithm works on the
same principle as List-than-Eliminate, but using
a more compact representation of the Version
Space
Version Space is represented by its most general
and least general(specific) members.
Candidate-Elimination Learning Algorithm
- Initialize G to the set of maximally general
hypotheses in H
- Initialize S to the set of maximally specific
hypotheses in H
G0 ? lt?, ?, ?, ?, ?, ?gt
S0 ? lt ? ,?, ?, ?, ?, ?gt
...

30
Candidate-Elimination(5)

Candidate-Elimination Learning Algorithm (cont.)
- For each training example d, do
If d is a negative example
//Specialize G...
For each hypothesis g in G that is not consistent
with d
Remove g from G
Add to G all minimal specializations h of g such
that
h is consistent with d and some member of S is
more specific than h
Remove from G any hypothesis that is less general
than another h in G
Remove from S any hypothesis inconsistent with d
If d is a positive example
//Generalize S...
For each hypothesis s in S that is not consistent
with d
Remove s from S
Add to S all minimal generalizations h of s such
that
h is consistent with d and some member of G is
more general than h
Remove from S any hypothesis that is more general
than another h in S
Remove from G any hypothesis inconsistent with d

31
Candidate-Elimination(6)

Candidate-Elimination Algorithm works by
computing minimal generalizations and
specializations,
identifying non-minimal, non-maximal hypothesis
The algorithm can be applied to any concept
learning task and hypothesis space for which
these operations are well-defined

32
Candidate_EliminationExample Trace
d1 ltSunny, Warm, Normal, Strong, Warm, Same, Yesgt
d2 ltSunny, Warm, High, Strong, Warm, Same, Yesgt
d3 ltRainy, Cold, High, Strong, Warm, Change, Nogt
d4 ltSunny, Warm, High, Strong, Cool, Change, Yesgt
G4 Last element of G3 is inconsistent with d4,
must be removed.
G3 What about lt?, ?, Normal, ?, ?, ?gt or
ltCloudy, ?,?, ?, ?, ?gt they
are inconsistent with previous positive examples
that S2 summarizes
33

//S summarizes all past positive examples
Any hypothesis h more general than S is
guaranteed to be
consistent with all the previous positive
examples
Let h be a generalization of s in S
h covers more points than s since it is more
general
In particular, h covers all points covered by s
Since s is consistent with all examples, so is
h
//G summarizes all past negative examples
Any hypothesis h more specific than G is
guaranteed to be
consistent with all the previous negative
examples
Let h be the specialization of a g in G
h covers less points than g
In particular, h rejects all negative examples
rejected by s
Since g is consistent with all - examples, so is
h
The learned version space is independent of the
order in which the training examples are
presented
After all, the VS shows all the consistent
hypotheses
S and G boundary will move closer together with
more examples, up to convergence

34
Remarks on version spaces and C-E

Version Space converge to the correct hypothesis
provided that
- there is no errors in the training examples.
Contains Error?
Removes the correct target concept from VS since
all h inconsistent with the training data is
removed
Would be detected as empty set of hypotheses
- there is some hypothesis in H that correctly
describes the target concept
Target concept not in H?
E.g. if the target concept is a disjunction of
feature attributes and the hypothesis space
supports only conjunctive descriptions.
The target concept is exactly learned when the S
and G boundary sets converge to a single,
identical hypothesis

35
What Next Training Example?
36
What Next Training Example?

What Training Example Should the Learner Request
Next?
e.g. ltSunny, Warm, Normal, Light, Warm, Samegt
ltSunny, Warm, Normal, Strong, Cool, Changegt
ltRainy, Cold, Normal, Light, Warm, Samegt
Optimal query strategy for a concept learner is
to generate instances that satisfy exactly half
the hypotheses in the current version space
If the size of VS is reduced by half with each
new example correct target concept be found only
?log2 VS? experiments.

37
Summary Points Terminology

Supervised Learning
Concept - function from observations to
categories (so far, boolean-valued /-)
Target (function) - true function f
Hypothesis - proposed function h believed to be
similar to f
Hypothesis space - space of all hypotheses that
can be generated by the learning system
Example - tuples of the form ltx, f(x)gt
Instance space (aka example space) - space of all
possible examples
Classifier - discrete-valued function whose range
is a set of class labels
The Version Space Algorithm
Algorithms Find-S, List-Then-Eliminate,
Candidate Elimination
Consistent hypothesis - one that correctly
predicts observed examples
Version space - space of all currently consistent
(or satisfiable) hypotheses
Inductive Learning
Inductive generalization - process of generating
hypotheses that describe cases not yet observed
The inductive learning hypothesis

38
Summary Points

Concept Learning as Search through H
Hypothesis space H as a state space
Learning finding the correct hypothesis
General-to-Specific Ordering over H
Partially-ordered set Less-Specific-Than
(More-General-Than) relation
Upper and lower bounds in H
Version Space Candidate Elimination Algorithm
S and G boundaries characterize learners
uncertainty
Version space can be used to make predictions
over unseen cases
Learner Can Generate Useful Queries
Next Lecture When and Why Are Inductive Leaps
Possible?

39
Remarks on version spaces and C-E

How can partially learned concepts be used?
No additional training example, multiple
remaining hypotheses
Example Table 2.6 (p 39)
Instance A satisfies every member of S
No need to look further, it will satisfy all h in
VS
Classify as a positive example
Instance B satisfies none of the members of G
No need to look further, it will not satisfy any
h in VS
Classify as a negative example
Instance C Half of VS is positive and Half of VS
is negative
most ambiguous new information for refining the
version space
Instance D classified as positive by two of VS,
as negative by others
output the majority vote with a confidence rating

Inductive Bias
Mitchell-Chp. 2

41
What Justifies This Inductive Leap?

Example Inductive Generalization
Positive example ltSunny, Warm, Normal, Strong,
Cool, Change, Yesgt
Positive example ltSunny, Warm, Normal, Light,
Warm, Same, Yesgt
Induced S ltSunny, Warm, Normal, ?,
?, ?gt
Why Believe We Can Classify The Unseen?
e.g., ltSunny, Warm, Normal,
Strong, Warm, Samegt

42
Inductive Bias

A biased Hypothesis space
EnjoySport example
Restriction only conjunctions of attribute
values.
No representation for a disjunctive target
concept
Sky Sunny or Wind Weak
Potential problem
We biased the learner (inductive bias) to
consider only conjunctive hypotheses
But the concept may require a more expressive
hypothesis space

43
UnBiased Learner

An Unbiased Learner
Obvious solution Provide a hypothesis space
capable of representing every teachable concept
every possible subset of the instance space X
The set of all subsets of a set X is called the
power set of X
EnjoySport Example
Instance space 96
Power set of X 296 792281625142643375935439
50336
Conjunctive hypothesis space 973
Very biased hypothesis space indeed!!

44
Need for Inductive Bias

An Unbiased Learner
Reformulate the EnjoySport learning task in an
unbiased way
Defining a new hypothesis space H that can
represent every subset of X
Allow arbitrary disjunctions, conjunctions, and
negations
Example Sky Sunny or Wind Weak
ltSunny, ?, ?, ?, ?, ?gt V lt?, ?, ?, Weak, ?, ?gt
New problem Completely unable to generalize
beyond the observed example! Intuition?
What is S and G boundaries?
The S boundary of VS will contain just the
disjunction of the positive examples.
Three positive examples (x1, x2, x3), S (x1 V
x2 V x3)
The G boundary of VS will consist of the
hypothesis that rules out only the observed
negative examples.
Two negative examples (x4, x5), G (x4 V x5)
In order to converge to a single final concept,
we would have to present every single instance
in X as a training example.

45
Need for Inductive Bias

How about taking a vote among the consistent
hypotheses in VS?
For the unseen instances, taking a vote is futile
Half of the hypotheses in VS will decide
positive,
Half of the hypotheses in VS will decide negative
Assume a previously unseen instance x
For any hypothesis h that covers x as positive,
there will be another hypothesis h that is
identical to h except for its classification of
x.
If h is in VS, so will be h because it agrees
with h on all the observed training samples
The problem is not specific to Candidate
Elimination algorithm

46
Need For Inductive Bias

Fundamental property of inductive inference
A learner that makes no a priori assumptions
regarding the identity of the target concept has
no rational basis for classifying any unseen
instances

47
Inductive Bias Definition

Consider
Concept learning algorithm L, instance X, target
concept c
Training examples Dc ltx, c(x)gt,
Let L(xi, Dc) denote the classification(of xi by
L after training on Dc)
The Label L(xi, Dc) need not be correct.
What assumptions should we make so that it
follows deductively?
Definition
Inductive bias of L is any minimal set of
assertions B such that
for any target concept c and corresponding
training examples Dc assumptions in B justify
its inductive inferences as deductive inferences
(? xi ?X) (B ? Dc ? xi ) L(xi, Dc)
where y z means
that z follows deductively from y

48
Inductive Bias Candidate Elimination

Inductive bias of Candidate-Elimination
algorithm, assuming that CE classifies a new
instance x if the vote is unanimous1
Inductive bias The target concept c is contained
in the given hypothesis space H
If c is in H, it is also in VS.
If all h in VS votes unanimously, it must be that
c(xi) L(xi,Dc)
1Note that the h in VS may classiffy a new
instance x differently if we do not assume
unanimous voting

49
Three Learners with Different Biases

Rote Learner
Stores each observed training example in memory
Classifies x if and only if it matches previously
observed example
Weakest bias no bias (the classification follows
deductively from training examples)
Candidate Elimination Algorithm
Stores extremal generalizations and
specializations
Classifies x if and only if all members of VS
agree on the classification
Stronger bias the target concept is contained in
the given hypothesis space H
Find-S
Stores the most specific hypothesis
Classifies all subsequent data
Even stronger bias the target concept is
contained in the given hypothesis space H
and all instances are negative unless the
opposite is entailed by its maximally specific

It is useful to characterize different learning
approaches by the inductive bias they employ
More strongly biased methods make a larger
inductive jump/leap
They classify (i.e. not reject) a greater
proportion of unseen instances
The correctness is another issue!
Types of biases
Categorical assumptions that completely rule out
certain concepts
Preferential biases
Implicit/unchangeable by the learner or not

51
Summary

Concept Learning can be cast as a problem of
searching through a large predefined space of
potential hypotheses
The general-to-specific partial ordering of
hypotheses provides a useful structure for
organizing the search through the hypothesis
space
Find-S algorithm, Candidate-Elimination
algorithm(Non noisy data)
S,G set delimit the entire set of hypotheses
consistent with the data
Inductive learning algorithms are able to
classify unseen examples
Every possible subset of instances(the power set
of the instances)
Remove any inductive bias from the Candidate
Elimination algorithm
Also remove the ability to classify any instance
beyond the observed training example
Unbiased learner cannot make inductive leaps to
classify unseen exam.