CIS732Lecture3320070411 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CIS732Lecture3320070411

Description:

Father, Mother, Parent, Male, Female. Learning Problem. Formulation ... Predicates: e.g., Father-Of, Greater-Than. Functions: e.g., age, cosine ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 33
Provided by: lindajacks
Category:

less

Transcript and Presenter's Notes

Title: CIS732Lecture3320070411


1
Lecture 33 of 42
Intro to Rule Learning
Wednesday, 11 April 2007 William H.
Hsu Department of Computing and Information
Sciences, KSU http//www.cis.ksu.edu/bhsu
Readings Chapter 10, Mitchell
2
Lecture Outline
  • Readings Sections 10.6-10.8, Mitchell Section
    21.4, Russell and Norvig
  • Suggested Exercises 10.5, Mitchell
  • Induction as Inverse of Deduction
  • Problem of inductive learning revisited
  • Operators for automated deductive inference
  • Resolution rule for deduction
  • First-order predicate calculus (FOPC) and
    resolution theorem proving
  • Inverting resolution
  • Propositional case
  • First-order case
  • Inductive Logic Programming (ILP)
  • Cigol
  • Progol

3
Induction as Inverted DeductionDesign Principles
4
Induction as Inverted DeductionExample
  • Deductive Query
  • Pairs ltu, vgt of people such that u is a child of
    v
  • Relations (predicates)
  • Child (target predicate)
  • Father, Mother, Parent, Male, Female
  • Learning Problem
  • Formulation
  • Concept learning target function f is
    Boolean-valued
  • i.e., target predicate
  • Components
  • Target function f(xi) Child (Bob, Sharon)
  • xi Male (Bob), Female (Sharon), Father (Sharon,
    Bob)
  • B Parent (x, y) ? Father (x, y). Parent (x, y)
    ? Mother (x, y).
  • What satisfies ? ltxi, f(xi)gt ? D . (B ? D ? xi)
    ? f(xi)?
  • h1 Child (u, v) ? Father (v, u). - doesnt use B
  • h2 Child (u, v) ? Parent (v, u). - uses B

5
Perspectives onLearning and Inference
  • Jevons (1874)
  • First published insight that induction can be
    interpreted as inverted deduction
  • Induction is, in fact, the inverse operation of
    deduction, and cannot be conceived to exist
    without the corresponding operation, so that the
    question of relative importance cannot arise.
    Who thinks of asking whether addition or
    subtraction is the more important process in
    arithmetic? But at the same time much difference
    in difficulty may exist between a direct and
    inverse operation it must be allowed that
    inductive investigations are of a far higher
    degree of difficulty and complexity that any
    questions of deduction
  • Aristotle (circa 330 B.C.)
  • Early views on learning from observations
    (examples) and interplay between induction and
    deduction
  • scientific knowledge through demonstration
    i.e., deduction is impossible unless a man
    knows the primary immediate premises we must get
    to know the primary premises by induction for
    the method by which even sense-perception
    implants the universal is inductive

6
Induction as Inverted DeductionOperators
  • Deductive Operators
  • Have mechanical operators (F) for finding
    logically entailed conclusions (C)
  • F(A, B) C where A ? B ? C
  • A, B, C logical formulas
  • F deduction algorithm
  • Intuitive idea apply deductive inference (aka
    sequent) rules to A, B to generate C
  • Inductive Operators
  • Need operators O to find inductively inferred
    hypotheses (h, primary premises)
  • O(B, D) h where ? ltxi, f(xi)gt ? D . (B ? D ?
    xi) ? f(xi)
  • B, D, h logical formulas describing observations
  • O induction algorithm

7
Induction as Inverted DeductionAdvantages and
Disadvantages
  • Advantages (Pros)
  • Subsumes earlier idea of finding h that fits
    training data
  • Domain theory B helps define meaning of fitting
    the data B ? D ? xi ? f(xi)
  • Suggests algorithms that search H guided by B
  • Theory-guided constructive induction Donoho and
    Rendell, 1995
  • aka Knowledge-guided constructive induction
    Donoho, 1996
  • Disadvantages (Cons)
  • Doesnt allow for noisy data
  • Q Why not?
  • A Consider what ? ltxi, f(xi)gt ? D . (B ? D ? xi)
    ? f(xi) stipulates
  • First-order logic gives a huge hypothesis space H
  • Overfitting
  • Intractability of calculating all acceptable hs

8
DeductionResolution Rule
9
Inverting ResolutionExample
C2 Know-Material ? ?Study
C1 Pass-Exam ? ?Know-Material
Resolution
C1 Pass-Exam ? ?Know-Material
Inverse Resolution
C Pass-Exam ? ?Study
10
Inverted ResolutionPropositional Logic
11
Quick ReviewFirst-Order Predicate Calculus
(FOPC)
  • Components of FOPC Formulas Quick Intro to
    Terminology
  • Constants e.g., John, Kansas, 42
  • Variables e.g., Name, State, x
  • Predicates e.g., Father-Of, Greater-Than
  • Functions e.g., age, cosine
  • Term constant, variable, or function(term)
  • Literals (atoms) Predicate(term) or negation
    (e.g., ?Greater-Than (age (John), 42)
  • Clause disjunction of literals with implicit
    universal quantification
  • Horn clause at most one positive literal (H ?
    ?L1 ? ?L2 ? ? ?Ln)
  • FOPC Representation Language for First-Order
    Resolution
  • aka First-Order Logic (FOL)
  • Applications
  • Resolution using Horn clauses logic programming
    (Prolog)
  • Automated deduction (deductive inference),
    theorem proving
  • Goal learn first-order rules by inverting
    first-order resolution

12
First-Order Resolution
13
Inverted Resolution First-Order Logic
14
Inverse Resolution Algorithm (Cigol) Example
Father (Tom, Bob)
Father (Shannon, Tom)
GrandChild (Bob, Shannon)
15
Progol
  • Problem Searching Resolution Space Results in
    Combinatorial Explosion
  • Solution Approach
  • Reduce explosion by generating most specific
    acceptable h
  • Conduct general-to-specific search (cf. Find-G,
    CN2 ? Learn-One-Rule)
  • Procedure
  • 1. User specifies H by stating predicates,
    functions, and forms of arguments allowed for
    each
  • 2. Progol uses sequential covering algorithm
  • FOR each ltxi, f(xi)gt DO
  • Find most specific hypothesis hi such that B ? hi
    ? xi ? f(xi)
  • Actually, considers only entailment within k
    steps
  • 3. Conduct general-to-specific search bounded by
    specific hypothesis hi, choosing hypothesis with
    minimum description length

16
Learning First-Order RulesNumerical versus
Symbolic Approaches
  • Numerical Approaches
  • Method 1 learning classifiers and extracting
    rules
  • Simultaneous covering decision trees, ANNs
  • NB extraction methods may not be simple
    enumeration of model
  • Method 2 learning rules directly using numerical
    criteria
  • Sequential covering algorithms and search
  • Criteria MDL (information gain), accuracy,
    m-estimate, other heuristic evaluation functions
  • Symbolic Approaches
  • Invert forward inference (deduction) operators
  • Resolution rule
  • Propositional and first-order variants
  • Issues
  • Need to control search
  • Ability to tolerate noise (contradictions)
    paraconsistent reasoning

17
Learning Disjunctive Sets of Rules
  • Method 1 Rule Extraction from Trees
  • Learn decision tree
  • Convert to rules
  • One rule per root-to-leaf path
  • Recall can post-prune rules (drop pre-conditions
    to improve validation set accuracy)
  • Method 2 Sequential Covering
  • Idea greedily (sequentially) find rules that
    apply to (cover) instances in D
  • Algorithm
  • Learn one rule with high accuracy, any coverage
  • Remove positive examples (of target attribute)
    covered by this rule
  • Repeat

18
Sequential CoveringAlgorithm
  • Algorithm Sequential-Covering (Target-Attribute,
    Attributes, D, Threshold)
  • Learned-Rules ?
  • New-Rule ? Learn-One-Rule (Target-Attribute,
    Attributes, D)
  • WHILE Performance (Rule, Examples) gt Threshold DO
  • Learned-Rules New-Rule // add new rule to set
  • D.Remove-Covered-By (New-Rule) // remove examples
    covered by New-Rule
  • New-Rule ? Learn-One-Rule (Target-Attribute,
    Attributes, D)
  • Sort-By-Performance (Learned-Rules,
    Target-Attribute, D)
  • RETURN Learned-Rules
  • What Does Sequential-Covering Do?
  • Learns one rule, New-Rule
  • Takes out every example in D to which New-Rule
    applies (every covered example)

19
Learn-One-Rule(Beam) Search for Preconditions
IF THEN Play-Tennis Yes
20
Learn-One-RuleAlgorithm
  • Algorithm Sequential-Covering (Target-Attribute,
    Attributes, D)
  • Pos ? D.Positive-Examples()
  • Neg ? D.Negative-Examples()
  • WHILE NOT Pos.Empty() DO // learn new rule
  • Learn-One-Rule (Target-Attribute, Attributes, D)
  • Learned-Rules.Add-Rule (New-Rule)
  • Pos.Remove-Covered-By (New-Rule)
  • RETURN (Learned-Rules)
  • Algorithm Learn-One-Rule (Target-Attribute,
    Attributes, D)
  • New-Rule ? most general rule possible
  • New-Rule-Neg ? Neg
  • WHILE NOT New-Rule-Neg.Empty() DO // specialize
    New-Rule
  • 1. Candidate-Literals ? Generate-Candidates() //
    NB rank by Performance()
  • 2. Best-Literal ? argmaxL? Candidate-Literals
    Performance (Specialize-Rule (New-Rule, L),
    Target-Attribute, D) // all possible new
    constraints
  • 3. New-Rule.Add-Precondition (Best-Literal) //
    add the best one
  • 4. New-Rule-Neg ? New-Rule-Neg.Filter-By
    (New-Rule)
  • RETURN (New-Rule)

21
Learn-One-RuleSubtle Issues
  • How Does Learn-One-Rule Implement Search?
  • Effective approach Learn-One-Rule organizes H in
    same general fashion as ID3
  • Difference
  • Follows only most promising branch in tree at
    each step
  • Only one attribute-value pair (versus splitting
    on all possible values)
  • General to specific search (depicted in figure)
  • Problem greedy depth-first search susceptible to
    local optima
  • Solution approach beam search (rank by
    performance, always expand k best)
  • Easily generalizes to multi-valued target
    functions (how?)
  • Designing Evaluation Function to Guide Search
  • Performance (Rule, Target-Attribute, D)
  • Possible choices
  • Entropy (i.e., information gain) as for ID3
  • Sample accuracy (nc / n ? correct rule
    predictions / total predictions)
  • m estimate (nc mp) / (n m) where m ? weight,
    p ? prior of rule RHS

22
Variants of Rule Learning Programs
  • Sequential or Simultaneous Covering of Data?
  • Sequential isolate components of hypothesis
    (e.g., search for one rule at a time)
  • Simultaneous whole hypothesis at once (e.g.,
    search for whole tree at a time)
  • General-to-Specific or Specific-to-General?
  • General-to-specific add preconditions, Find-G
  • Specific-to-general drop preconditions, Find-S
  • Generate-and-Test or Example-Driven?
  • Generate-and-test search through syntactically
    legal hypotheses
  • Example-driven Find-S, Candidate-Elimination,
    Cigol (next time)
  • Post-Pruning of Rules?
  • Recall (Lecture 5) very popular overfitting
    recovery method
  • What Statistical Evaluation Method?
  • Entropy
  • Sample accuracy (aka relative frequency)
  • m-estimate of accuracy

23
First-Order Rules
  • What Are First-Order Rules?
  • Well-formed formulas (WFFs) of first-order
    predicate calculus (FOPC)
  • Sentences of first-order logic (FOL)
  • Example (recursive)
  • Ancestor (x, y) ? Parent (x, y).
  • Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
    y).
  • Components of FOPC Formulas Quick Intro to
    Terminology
  • Constants e.g., John, Kansas, 42
  • Variables e.g., Name, State, x
  • Predicates e.g., Father-Of, Greater-Than
  • Functions e.g., age, cosine
  • Term constant, variable, or function(term)
  • Literals (atoms) Predicate(term) or negation
    (e.g., ?Greater-Than (age(John), 42))
  • Clause disjunction of literals with implicit
    universal quantification
  • Horn clause at most one positive literal (H ?
    ?L1 ? ?L2 ? ? ?Ln)

24
Learning First-Order Rules
  • Why Do That?
  • Can learn sets of rules such as
  • Ancestor (x, y) ? Parent (x, y).
  • Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
    y).
  • General-purpose (Turing-complete) programming
    language PROLOG
  • Programs are such sets of rules (Horn clauses)
  • Inductive logic programming (next time) kind of
    program synthesis
  • Caveat
  • Arbitrary inference using first-order rules is
    semi-decidable
  • Recursive enumerable but not recursive (reduction
    to halting problem LH)
  • Compare resolution theorem-proving arbitrary
    queries in Prolog
  • Generally, may have to restrict power
  • Inferential completeness
  • Expressive power of Horn clauses
  • Learning part

25
First-Order RuleExample
  • Prolog (FOPC) Rule for Classifying Web Pages
  • Slattery, 1997
  • Course (A) ?
  • Has-Word (A, instructor),
  • not Has-Word (A, good),
  • Link-From (A, B),
  • Has-Word (B, assign),
  • not Link-From (B, C).
  • Train 31/31, test 31/34
  • How Are Such Rules Used?
  • Implement search-based (inferential) programs
  • References
  • Chapters 1-10, Russell and Norvig
  • Online resources at http//archive.comlab.ox.ac.uk
    /logic-prog.html

26
First-Order Inductive Learning (FOIL)Algorithm
  • Algorithm FOIL (Target-Predicate, Predicates, D)
  • Pos ? D.Filter-By(Target-Predicate) // examples
    for which it is true
  • Neg ? D.Filter-By(Not (Target-Predicate)) //
    examples for which it is false
  • WHILE NOT Pos.Empty() DO // learn new rule
  • Learn-One-First-Order-Rule (Target-Predicate,
    Predicates, D)
  • Learned-Rules.Add-Rule (New-Rule)
  • Pos.Remove-Covered-By (New-Rule)
  • RETURN (Learned-Rules)
  • Algorithm Learn-One-First-Order-Rule
    (Target-Predicate, Predicate, D)
  • New-Rule ? the rule that predicts
    Target-Predicate with no preconditions
  • New-Rule-Neg ? Neg
  • WHILE NOT New-Rule-Neg.Empty() DO // specialize
    New-Rule
  • 1. Candidate-Literals ? Generate-Candidates() //
    based on Predicates
  • 2. Best-Literal ? argmaxL? Candidate-Literals
    FOIL-Gain (L, New-Rule, Target-Predicate,
    D) // all possible new literals
  • 3. New-Rule.Add-Precondition (Best-Literal) //
    add the best one
  • 4. New-Rule-Neg ? New-Rule-Neg,Filter-By
    (New-Rule)
  • RETURN (New-Rule)

27
Specializing Rules in FOIL
  • Learning Rule P(x1, x2, , xk) ? L1 ? L2 ? ?
    Ln.
  • Candidate Specializations
  • Add new literal to get more specific Horn clause
  • Form of literal
  • Q(v1, v2, , vr), where at least one of the vi in
    the created literal must already exist as a
    variable in the rule
  • Equal(xj, xk), where xj and xk are variables
    already present in the rule
  • The negation of either of the above forms of
    literals

28
Information Gain in FOIL
29
FOILLearning Recursive Rule Sets
  • Recursive Rules
  • So far ignored possibility of recursive WFFs
  • New literals added to rule body could refer to
    target predicate itself
  • i.e., predicate occurs in rule head
  • Example
  • Ancestor (x, y) ? Parent (x, z) ? Ancestor (z,
    y).
  • Rule IF Parent (x, z) ? Ancestor (z, y) THEN
    Ancestor (x, y)
  • Learning Recursive Rules from Relations
  • Given appropriate set of training examples
  • Can learn using FOIL-based search
  • Requirement Ancestor ? Predicates (symbol is
    member of candidate set)
  • Recursive rules still have to outscore competing
    candidates at FOIL-Gain
  • NB how to ensure termination? (well-founded
    ordering, i.e., no infinite recursion)
  • Quinlan, 1990 Cameron-Jones and Quinlan, 1993

30
FOILSummary
  • Extends Sequential-Covering Algorithm
  • Handles case of learning first-order rules
    similar to Horn clauses
  • Result more powerful rules for performance
    element (automated reasoning)
  • General-to-Specific Search
  • Adds literals (predicates and negations over
    functions, variables, constants)
  • Can learn sets of recursive rules
  • Caveat might learn infinitely recursive rule
    sets
  • Has been shown to successfully induce recursive
    rules in some cases
  • Overfitting
  • If no noise, might keep adding new literals until
    rule covers no negative examples
  • Solution approach tradeoff (heuristic evaluation
    function on rules)
  • Accuracy, coverage, complexity
  • FOIL-Gain an MDL function
  • Overfitting recovery in FOIL post-pruning

31
Terminology
  • Induction and Deduction
  • Induction finding h such that ? ltxi, f(xi)gt ? D
    . (B ? D ? xi) ? f(xi)
  • Inductive learning B ? background knowledge
    (inductive bias, etc.)
  • Developing inverse deduction operators
  • Deduction finding entailed logical statements
    F(A, B) C where A ? B ? C
  • Inverse deduction finding hypotheses O(B, D) h
    where ? ltxi, f(xi)gt ? D . (B ? D ? xi) ?
    f(xi)
  • Resolution rule deductive inference rule (P ? L,
    ?L ? R ? P ? R)
  • Propositional logic boolean terms, connectives
    (?, ?, ?, ?)
  • First-order predicate calculus (FOPC)
    well-formed formulas (WFFs), aka clauses (defined
    over literals, connectives, implicit quantifiers)
  • Inverse entailment inverse of resolution
    operator
  • Inductive Logic Programming (ILP)
  • Cigol ILP algorithm that uses inverse entailment
  • Progol sequential covering (general-to-specific
    search) algorithm for ILP

32
Summary Points
  • Induction as Inverse of Deduction
  • Problem of induction revisited
  • Definition of induction
  • Inductive learning as specific case
  • Role of induction, deduction in automated
    reasoning
  • Operators for automated deductive inference
  • Resolution rule (and operator) for deduction
  • First-order predicate calculus (FOPC) and
    resolution theorem proving
  • Inverting resolution
  • Propositional case
  • First-order case (inverse entailment operator)
  • Inductive Logic Programming (ILP)
  • Cigol inverse entailment (very susceptible to
    combinatorial explosion)
  • Progol sequential covering, general-to-specific
    search using inverse entailment
  • Next Week Knowledge Discovery in Databases
    (KDD), Final Review
Write a Comment
User Comments (0)
About PowerShow.com