RDM Chapter 3: Intro to Learning and Search - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

RDM Chapter 3: Intro to Learning and Search

Description:

3.1 Representing Hypothesis and Instances. language Le to ... cover relation over Lh x Le, c(h,e) = true if and only if h(e) = 1 (see Figures 3.1 and 3.2) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 34
Provided by: xing75
Category:

less

Transcript and Presenter's Notes

Title: RDM Chapter 3: Intro to Learning and Search


1
RDM Chapter 3 Intro to Learning and Search
  • prepared for COMP422/522-2008, Bernhard Pfahringer

2
3.1 Representing Hypothesis and Instances
  • language Le to represent examples
  • language Lh to represent hypothesis
  • h ? Lh h Le -gt Y, e.g. Y0,1
  • cover relation over Lh x Le, c(h,e) true if and
    only if h(e) 1 (see Figures 3.1 and 3.2)

3
3.2 Boolean data
  • simplify
  • item-sets, true/false, variable assignment
    Herbrand interpretation, sausage,beer,mustard,win
    e
  • Le I I ? b,m,s,w
  • Le Lh (single presentation trick)

4
Machine Learning point of view
  • given Le, Lh, unknown fLe -gt Y
  • examples E-(e1,f(e1)), ..
  • loss(h,E) measures quality of h wrt E
  • find h argmin loss(h,E)
  • zero-one loss (empirical risk)
  • loss(h,E) 1/E ? f(e)-h(e)
  • regression squared loss
  • probabilistic settings log-likelihood

5
Data Mining POV
  • given Le, Lh, data D ? Le
  • quality criterion Q(h,D), find set
  • Th(Q,D,Lh) h? Lh Q(h,D) holds
  • Q local,global, or heuristic
  • e.g. freq(h,D) c(h,D) or
  • rfreq(h,D) c(h,D) / D
  • local rfreq(h,D) gt y
  • acc(h,P,N) freq(h,P)/(freq(h,P)freq(h,N))
  • global Q(h,P,N) argmax acc(h,P,N)

6
Generate-and-test
  • FORALL h ? Lh DO
  • IF Q(h,D) true THEN output h
  • Lh must be enumerable
  • naïve, inefficient, but complete
  • see Example 3.3

7
3.6 Search space structure
  • h1 is more general than h2, h1 ? h2, if c(h2) ?
    c(h1),
  • proper generalization, if true subset
  • reflexive, and transitive
  • but syntactic variants -gt problematic
  • (canonical forms, partial order, )
  • see Example 3.5, Fig 3.5 Hasse diagram, top T and
    bottom ? element

8
Monotonicity
  • Q is monotonic (true for all specialisations)
    ?s,g ? Lh, ? D ? Le g?s ? Q(g,D) ? Q(s,D)
  • Q is anti-monotonic (true for all
    generalisations) ?s,g ? Lh, ? D ? Le g?s ?
    Q(s,D) ? Q(g,D)

9
Examples
  • freq(h,D) x, minFreq, is anti-monotonic
  • freq(h,D) x, maxFreq, is monotonic
  • specific example e is covered (e ? c(h)), is
    anti-monotonic
  • specific example e is not covered (e ?c(h)), is
    monotonic
  • acc(h,P,N) x, is neither
  • do Exercises 3.6, 3.7, 3.8.

10
Pruning
  • if monotonic Q is false for h, then Q is also
    false for all generalisations of h
  • if anti-monotonic Q is false for h, then Q is
    false for all specialisations of h
  • see examples 3.11/3.12, figures 3.6/3.7

11
Min/max
  • max(T) h ? T ?t ? T h lt t maximal
    elements are most specific
  • min(T) h ? T ?t ? T t lt h minimal
    elements are most general
  • if Lh is infinite, might not exist
  • examples 3.13, 3.14

12
Borders
  • S border of maximally specific hypothesis for
    which Q holds
  • S(Th(Q,D,Lh)) max(Th(Q,D,Lh))
  • similarly G maximally general
  • G(Th(Q,D,Lh)) max(Th(Q,D,Lh))
  • example 3.15

13
Border properties
  • borders fully specify all solutions for
  • anti-monotonic Q Th(Q,D,Lh) h ? Lh ?s ?
    S(Th(Q,D,Lh) h s
  • monotonic Q Th(Q,D,Lh) h ? Lh ?g ?
    G(Th(Q,D,Lh) g h

14
Version space
  • Q is a conjunction of two criteria M? A, one
    monotonic (M), one anti-monotonic (A), then T is
    a version space if
  • T h ? Lh ?s ?S(T), g ? G(T) g h s
  • S and G are condensed representations, often
    much smaller
  • example 3.18 / figure 3.8, example 3.20

15
Negative borders
  • the elements just outside the (positive) borders
  • S-(Th) min(Lh-h ? Lh ?s ? S(Th) h s)
  • G-(Th) max(Lh-h ? Lh ?g ? g(Th) g h)
  • example 3.21
  • border sets can be large item-sets -gt G
    exponentially large in N

16
Refinement operators
  • generalization op ?g Lh ? 2Lh with ?h ? Lh
    ?g(h) ? c?Lhch
  • specialisation op ?s Lh ? 2Lh with ?h ? Lh
    ?s(h) ? c?Lhhc
  • can be applied repeatedly

17
Ideal refinement operator
  • ideal specialisation ?h ? Lh ?s(h) min(
    h?Lhhh )
  • returns exactly all children of a node in the
    Hasse diagram
  • used in heuristic search (e.g. hill-climbing)

18
Optimal operator
  • no hypothesis is generated twice gt efficient
  • used in complete search
  • see example 3.22
  • optimal operators define a canonical form and
    vice versa

19
MGG minimally general gens
  • mgg(h1,h2) min h?Lh hh1 ? h?h2
  • if unique, than also called lgg (least general
    generalization) and lub (least upper bound)

20
MGS maximally general specs
  • mgs(h1,h2) max h?Lh h1h ? h2?h
  • if unique, than also called glb (greatest lower
    bound)
  • if lub and glb exist for h1,h2, than they form a
    lattice (e.g. item-sets do), example 3.23,
    exercises 3.24/3.25

21
Generic learning algorithm
  • Queue Init
  • Th
  • WHILE not Stop DO
  • Delete h from Queue
  • IF Q(h,D) true THEN
  • add h to Th
  • ELSE Queue Queue ??(h)
  • Queue prune(Queue)
  • return Th

22
Generic algorithm continued
  • lots of parameters
  • Init defines start point
  • Delete defines search strategy
  • first-in-first-out (queue)gt breadth-first
  • last-in-first-out (stack)gt depth-first
  • best gt best-first search
  • Stop Queue gt all
  • Prune heuristic or sound

23
Complete general-to-specific
  • Queue T Q is anti-monotonic
  • Th
  • WHILE not Queue DO
  • Delete h from Queue
  • IF Q(h,D) true THEN
  • add h to Th
  • ELSE Queue Queue ??o(h)
  • return Th
  • see example 3.26, 3.27

24
Heuristic general-to-specific
  • Queue T
  • Th
  • WHILE Th DO
  • Delete best h from Queue
  • IF Q(h,D) true THEN
  • add h to Th
  • ELSE Queue Queue ??i(h)
  • Queue prune(Queue)
  • return Th
  • useful when a single good solution suffices
    works for general Q if prune only keeps k best
    gt beam-search see also example 3.28

25
Branch-and-bound
  • assume bound b(h) exists
  • ?h?Lh h?h ? b(h) ? f(h)
  • then given current best v for bound we can prune
    all h with v ? b(h)
  • can be viewed as a kind of combination of
    complete and heuristic
  • see example 3.29

26
(Cautious) Specific-to-general
  • Queue ?
  • Th
  • WHILE Queue ? DO
  • Delete some h from Queue
  • IF Q(h,D) true THEN
  • add h to Th
  • ELSE select a d ? D such that ?(h?d)
  • Queue Queue ? lgg(h,d)
  • return Th
  • see example 3.31 can be seen as computing S for
    (anti-monotonic) rfreq(h,D) ? 1

27
Computing the G border general-to-specific
  • Queue T
  • Th
  • WHILE Queue ? DO
  • Delete h from Queue
  • IF Q(h,D) true AND h?G THEN
  • add h to Th
  • ELSE IF Q(h,D) false THEN
  • Queue Queue ??o(h)
  • return Th
  • similar for S possible when computing both S and
    G, more pruning is possible (see example 3.34)

28
Computing S and G incrementally
  • inrementally update a version space (SG), e.g.
    finding all correct h (rfreq(h,P) 1 ?
    rfreq(h,N)0)
  • need msg(g,e) operation, which excludes e from
    g, i.e. minimally specialises g to not cover e
    (example 3.35)

29
Mitchells candidate elimination
  • S ? G T
  • FOR ALL examples e DO
  • IF e ? N THEN
  • process negative example
  • ELSE
  • process positive example

30
Process negative example
  • S - s ? S e ? c(s)
  • FOR ALL g ? G e ? c(g) DO
  • ?g g ? ms(g,e) ?s?S g?s
  • G G ? ?g
  • G min(G)

31
Process positive example
  • G - g ? G e ? c(g)
  • FOR ALL s ? S e ? c(s) DO
  • ?s s ? lgg(s,e) ?g?G g?s
  • S S ? ?s
  • S max(S)
  • see example 3.36, exercise 3.37

32
Interesting properties
  • S and G contain only one identical h gt converged
    on single solution
  • S or G empty gt no solutions exists
  • S and G can determine if any h is still possible
  • S and G can predict some e, i.e. these e carry
    no additional information
  • try exercise 3.39

33
Intersection of version spaces
  • two version spaces can be intersected by
    computing the new S as lgg(s1,s2) for all pairs
    of elements from S1 and S2, and by computing the
    new G as glb(g1,g2) for all pairs of elements
    from G1 and G2
  • can use this to compute separate VS for every
    single positive example against all negative
    examples,
  • then incrementally intersect these VSs
Write a Comment
User Comments (0)
About PowerShow.com