RDM Chapter 3: Intro to Learning and Search - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

RDM Chapter 3: Intro to Learning and Search

Description:

3.1 Representing Hypothesis and Instances. language Le to ... cover relation over Lh x Le, c(h,e) = true if and only if h(e) = 1 (see Figures 3.1 and 3.2) ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 34

Provided by: xing75

Category:

more less

Transcript and Presenter's Notes

Title: RDM Chapter 3: Intro to Learning and Search

1
RDM Chapter 3 Intro to Learning and Search

prepared for COMP422/522-2008, Bernhard Pfahringer

2
3.1 Representing Hypothesis and Instances

language Le to represent examples
language Lh to represent hypothesis
h ? Lh h Le -gt Y, e.g. Y0,1
cover relation over Lh x Le, c(h,e) true if and
only if h(e) 1 (see Figures 3.1 and 3.2)

3
3.2 Boolean data

simplify
item-sets, true/false, variable assignment
Herbrand interpretation, sausage,beer,mustard,win
e
Le I I ? b,m,s,w
Le Lh (single presentation trick)

4
Machine Learning point of view

given Le, Lh, unknown fLe -gt Y
examples E-(e1,f(e1)), ..
loss(h,E) measures quality of h wrt E
find h argmin loss(h,E)
zero-one loss (empirical risk)
loss(h,E) 1/E ? f(e)-h(e)
regression squared loss
probabilistic settings log-likelihood

5
Data Mining POV

given Le, Lh, data D ? Le
quality criterion Q(h,D), find set
Th(Q,D,Lh) h? Lh Q(h,D) holds
Q local,global, or heuristic
e.g. freq(h,D) c(h,D) or
rfreq(h,D) c(h,D) / D
local rfreq(h,D) gt y
acc(h,P,N) freq(h,P)/(freq(h,P)freq(h,N))
global Q(h,P,N) argmax acc(h,P,N)

6
Generate-and-test

FORALL h ? Lh DO
IF Q(h,D) true THEN output h
Lh must be enumerable
naïve, inefficient, but complete
see Example 3.3

7
3.6 Search space structure

h1 is more general than h2, h1 ? h2, if c(h2) ?
c(h1),
proper generalization, if true subset
reflexive, and transitive
but syntactic variants -gt problematic
(canonical forms, partial order, )
see Example 3.5, Fig 3.5 Hasse diagram, top T and
bottom ? element

8
Monotonicity

Q is monotonic (true for all specialisations)
?s,g ? Lh, ? D ? Le g?s ? Q(g,D) ? Q(s,D)
Q is anti-monotonic (true for all
generalisations) ?s,g ? Lh, ? D ? Le g?s ?
Q(s,D) ? Q(g,D)

9
Examples

freq(h,D) x, minFreq, is anti-monotonic
freq(h,D) x, maxFreq, is monotonic
specific example e is covered (e ? c(h)), is
anti-monotonic
specific example e is not covered (e ?c(h)), is
monotonic
acc(h,P,N) x, is neither
do Exercises 3.6, 3.7, 3.8.

10
Pruning

if monotonic Q is false for h, then Q is also
false for all generalisations of h
if anti-monotonic Q is false for h, then Q is
false for all specialisations of h
see examples 3.11/3.12, figures 3.6/3.7

11
Min/max

max(T) h ? T ?t ? T h lt t maximal
elements are most specific
min(T) h ? T ?t ? T t lt h minimal
elements are most general
if Lh is infinite, might not exist
examples 3.13, 3.14

12
Borders

S border of maximally specific hypothesis for
which Q holds
S(Th(Q,D,Lh)) max(Th(Q,D,Lh))
similarly G maximally general
G(Th(Q,D,Lh)) max(Th(Q,D,Lh))
example 3.15

13
Border properties

borders fully specify all solutions for
anti-monotonic Q Th(Q,D,Lh) h ? Lh ?s ?
S(Th(Q,D,Lh) h s
monotonic Q Th(Q,D,Lh) h ? Lh ?g ?
G(Th(Q,D,Lh) g h

14
Version space

Q is a conjunction of two criteria M? A, one
monotonic (M), one anti-monotonic (A), then T is
a version space if
T h ? Lh ?s ?S(T), g ? G(T) g h s
S and G are condensed representations, often
much smaller
example 3.18 / figure 3.8, example 3.20

15
Negative borders

the elements just outside the (positive) borders
S-(Th) min(Lh-h ? Lh ?s ? S(Th) h s)
G-(Th) max(Lh-h ? Lh ?g ? g(Th) g h)
example 3.21
border sets can be large item-sets -gt G
exponentially large in N

16
Refinement operators

generalization op ?g Lh ? 2Lh with ?h ? Lh
?g(h) ? c?Lhch
specialisation op ?s Lh ? 2Lh with ?h ? Lh
?s(h) ? c?Lhhc
can be applied repeatedly

17
Ideal refinement operator

ideal specialisation ?h ? Lh ?s(h) min(
h?Lhhh )
returns exactly all children of a node in the
Hasse diagram
used in heuristic search (e.g. hill-climbing)

18
Optimal operator

no hypothesis is generated twice gt efficient
used in complete search
see example 3.22
optimal operators define a canonical form and
vice versa

19
MGG minimally general gens

mgg(h1,h2) min h?Lh hh1 ? h?h2
if unique, than also called lgg (least general
generalization) and lub (least upper bound)

20
MGS maximally general specs

mgs(h1,h2) max h?Lh h1h ? h2?h
if unique, than also called glb (greatest lower
bound)
if lub and glb exist for h1,h2, than they form a
lattice (e.g. item-sets do), example 3.23,
exercises 3.24/3.25

21
Generic learning algorithm

Queue Init
Th
WHILE not Stop DO
Delete h from Queue
IF Q(h,D) true THEN
add h to Th
ELSE Queue Queue ??(h)
Queue prune(Queue)
return Th

22
Generic algorithm continued

lots of parameters
Init defines start point
Delete defines search strategy
first-in-first-out (queue)gt breadth-first
last-in-first-out (stack)gt depth-first
best gt best-first search
Stop Queue gt all
Prune heuristic or sound

23
Complete general-to-specific

Queue T Q is anti-monotonic
Th
WHILE not Queue DO
Delete h from Queue
IF Q(h,D) true THEN
add h to Th
ELSE Queue Queue ??o(h)
return Th
see example 3.26, 3.27

24
Heuristic general-to-specific

Queue T
Th
WHILE Th DO
Delete best h from Queue
IF Q(h,D) true THEN
add h to Th
ELSE Queue Queue ??i(h)
Queue prune(Queue)
return Th
useful when a single good solution suffices
works for general Q if prune only keeps k best
gt beam-search see also example 3.28

25
Branch-and-bound

assume bound b(h) exists
?h?Lh h?h ? b(h) ? f(h)
then given current best v for bound we can prune
all h with v ? b(h)
can be viewed as a kind of combination of
complete and heuristic
see example 3.29

26
(Cautious) Specific-to-general

Queue ?
Th
WHILE Queue ? DO
Delete some h from Queue
IF Q(h,D) true THEN
add h to Th
ELSE select a d ? D such that ?(h?d)
Queue Queue ? lgg(h,d)
return Th
see example 3.31 can be seen as computing S for
(anti-monotonic) rfreq(h,D) ? 1

27
Computing the G border general-to-specific

Queue T
Th
WHILE Queue ? DO
Delete h from Queue
IF Q(h,D) true AND h?G THEN
add h to Th
ELSE IF Q(h,D) false THEN
Queue Queue ??o(h)
return Th
similar for S possible when computing both S and
G, more pruning is possible (see example 3.34)

28
Computing S and G incrementally

inrementally update a version space (SG), e.g.
finding all correct h (rfreq(h,P) 1 ?
rfreq(h,N)0)
need msg(g,e) operation, which excludes e from
g, i.e. minimally specialises g to not cover e
(example 3.35)

29
Mitchells candidate elimination

S ? G T
FOR ALL examples e DO
IF e ? N THEN
process negative example
ELSE
process positive example

30
Process negative example

S - s ? S e ? c(s)
FOR ALL g ? G e ? c(g) DO
?g g ? ms(g,e) ?s?S g?s
G G ? ?g
G min(G)

31
Process positive example

G - g ? G e ? c(g)
FOR ALL s ? S e ? c(s) DO
?s s ? lgg(s,e) ?g?G g?s
S S ? ?s
S max(S)
see example 3.36, exercise 3.37

32
Interesting properties

S and G contain only one identical h gt converged
on single solution
S or G empty gt no solutions exists
S and G can determine if any h is still possible
S and G can predict some e, i.e. these e carry
no additional information
try exercise 3.39

33
Intersection of version spaces

two version spaces can be intersected by
computing the new S as lgg(s1,s2) for all pairs
of elements from S1 and S2, and by computing the
new G as glb(g1,g2) for all pairs of elements
from G1 and G2
can use this to compute separate VS for every
single positive example against all negative
examples,
then incrementally intersect these VSs

Write a Comment

User Comments (0)