Title: Monday, January 22, 2001
1Lecture 3
Data Mining Basics
Monday, January 22, 2001 William H.
Hsu Department of Computing and Information
Sciences, KSU http//www.cis.ksu.edu/bhsu Readin
gs Chapter 1-2, Witten and Frank Sections
2.7-2.8, Mitchell
2Lecture Outline
- Read Chapters 1-2, Witten and Frank 2.7-2.8,
Mitchell - Homework 1 Due Friday, February 2, 2001 (before
12 AM CST) - Paper Commentary 1 Due This Friday (in class)
- U. Fayyad, From Data Mining to Knowledge
Discovery - See guidelines in course notes
- Supervised Learning (continued)
- Version spaces
- Candidate elimination algorithm
- Derivation
- Examples
- The Need for Inductive Bias
- Representations (hypothesis languages) a
worst-case scenario - Change of representation
- Computational Learning Theory
3Representing Version Spaces
- Hypothesis Space
- A finite meet semilattice (partial ordering
Less-Specific-Than ? ? all ?) - Every pair of hypotheses has a greatest lower
bound (GLB) - VSH,D ? the consistent poset (partially-ordered
subset of H) - Definition General Boundary
- General boundary G of version space VSH,D set
of most general members - Most general ? minimal elements of VSH,D ? set
of necessary conditions - Definition Specific Boundary
- Specific boundary S of version space VSH,D set
of most specific members - Most specific ? maximal elements of VSH,D ? set
of sufficient conditions - Version Space
- Every member of the version space lies between S
and G - VSH,D ? h ? H ? s ? S . ? g ? G . g ?P h ?P
s where ?P ? Less-Specific-Than
4Candidate Elimination Algorithm 1
1. Initialization G ? (singleton) set containing
most general hypothesis in H, denoted lt?, ,
?gt S ? set of most specific hypotheses in H,
denoted ltØ, , Øgt 2. For each training example
d If d is a positive example (Update-S) Remove
from G any hypotheses inconsistent with d For
each hypothesis s in S that is not consistent
with d Remove s from S Add to S all minimal
generalizations h of s such that 1. h is
consistent with d 2. Some member of G is more
general than h (These are the greatest lower
bounds, or meets, s ? d, in VSH,D) Remove from S
any hypothesis that is more general than another
hypothesis in S (remove any dominated elements)
5Candidate Elimination Algorithm 2
(continued) If d is a negative example
(Update-G) Remove from S any hypotheses
inconsistent with d For each hypothesis g in G
that is not consistent with d Remove g from G Add
to G all minimal specializations h of g such
that 1. h is consistent with d 2. Some member
of S is more specific than h (These are the least
upper bounds, or joins, g ? d, in VSH,D) Remove
from G any hypothesis that is less general than
another hypothesis in G (remove any dominating
elements)
6Example Trace
d1 ltSunny, Warm, Normal, Strong, Warm, Same, Yesgt
d2 ltSunny, Warm, High, Strong, Warm, Same, Yesgt
d3 ltRainy, Cold, High, Strong, Warm, Change, Nogt
d4 ltSunny, Warm, High, Strong, Cool, Change, Yesgt
7What Next Training Example?
- What Query Should The Learner Make Next?
- How Should These Be Classified?
- ltSunny, Warm, Normal, Strong, Cool, Changegt
- ltRainy, Cold, Normal, Light, Warm, Samegt
- ltSunny, Warm, Normal, Light, Warm, Samegt
8What Justifies This Inductive Leap?
- Example Inductive Generalization
- Positive example ltSunny, Warm, Normal, Strong,
Cool, Change, Yesgt - Positive example ltSunny, Warm, Normal, Light,
Warm, Same, Yesgt - Induced S ltSunny, Warm, Normal, ?, ?, ?gt
- Why Believe We Can Classify The Unseen?
- e.g., ltSunny, Warm, Normal, Strong, Warm, Samegt
- When is there enough information (in a new case)
to make a prediction?
9An Unbiased Learner
- Example of A Biased H
- Conjunctive concepts with dont cares
- What concepts can H not express? (Hint what
are its syntactic limitations?) - Idea
- Choose H that expresses every teachable concept
- i.e., H is the power set of X
- Recall A ? B B A (A X B
labels H A ? B) - Rainy, Sunny ? Warm, Cold ? Normal, High ?
None, Mild, Strong ? Cool, Warm ? Same,
Change ? 0, 1 - An Exhaustive Hypothesis Language
- Consider H disjunctions (?), conjunctions
(?), negations () over previous H - H 2(2 2 2 3 2 2) 296 H
1 (3 3 3 4 3 3) 973 - What Are S, G For The Hypothesis Language H?
- S ? disjunction of all positive examples
- G ? conjunction of all negated negative examples
10Inductive Bias
- Components of An Inductive Bias Definition
- Concept learning algorithm L
- Instances X, target concept c
- Training examples Dc ltx, c(x)gt
- L(xi, Dc) classification assigned to instance
xi by L after training on Dc - Definition
- The inductive bias of L is any minimal set of
assertions B such that, for any target concept c
and corresponding training examples Dc, ? xi
? X . (B ? Dc ? xi) ? L(xi, Dc) where A ? B
means A logically entails B - Informal idea preference for (i.e., restriction
to) certain hypotheses by structural (syntactic)
means - Rationale
- Prior assumptions regarding target concept
- Basis for inductive generalization
11Inductive Systemsand Equivalent Deductive Systems
12Three Learners with Different Biases
- Rote Learner
- Weakest bias anything seen before, i.e., no bias
- Store examples
- Classify x if and only if it matches previously
observed example - Version Space Candidate Elimination Algorithm
- Stronger bias concepts belonging to conjunctive
H - Store extremal generalizations and
specializations - Classify x if and only if it falls within S and
G boundaries (all members agree) - Find-S
- Even stronger bias most specific hypothesis
- Prior assumption any instance not observed to be
positive is negative - Classify x based on S set
13Hypothesis SpaceA Syntactic Restriction
- Recall 4-Variable Concept Learning Problem
- Bias Simple Conjunctive Rules
- Only 16 simple conjunctive rules of the form y
xi ? xj ? xk - y Ø, x1, , x4, x1 ? x2, , x3 ? x4, x1 ? x2 ?
x3, , x2 ? x3 ? x4, x1 ? x2 ? x3 ? x4 - Example above no simple rule explains the data
(counterexamples?) - Similarly for simple clauses (conjunction and
disjunction allowed)
14Hypothesis Spacem-of-n Rules
- m-of-n Rules
- 32 possible rules of the form y 1 iff
at least m of the following n variables are 1 - Found A Consistent Hypothesis!
15Views of Learning
- Removal of (Remaining) Uncertainty
- Suppose unknown function was known to be m-of-n
Boolean function - Could use training data to infer the function
- Learning and Hypothesis Languages
- Possible approach to guess a good, small
hypothesis language - Start with a very small language
- Enlarge until it contains a hypothesis that fits
the data - Inductive bias
- Preference for certain languages
- Analogous to data compression (removal of
redundancy) - Later coding the model versus coding the
uncertainty (error) - We Could Be Wrong!
- Prior knowledge could be wrong (e.g., y x4 ?
one-of (x1, x3) also consistent) - If guessed language was wrong, errors will occur
on new cases
16Two Strategies for Machine Learning
- Develop Ways to Express Prior Knowledge
- Role of prior knowledge guides search for
hypotheses / hypothesis languages - Expression languages for prior knowledge
- Rule grammars stochastic models etc.
- Restrictions on computational models other
(formal) specification methods - Develop Flexible Hypothesis Spaces
- Structured collections of hypotheses
- Agglomeration nested collections (hierarchies)
- Partitioning decision trees, lists, rules
- Neural networks cases, etc.
- Hypothesis spaces of adaptive size
- Either Case Develop Algorithms for Finding A
Hypothesis That Fits Well - Ideally, will generalize well
- Later Bias Optimization (Meta-Learning, Wrappers)
17Terminology
- The Version Space Algorithm
- Version space constructive definition
- S and G boundaries characterize learners
uncertainty - Version space can be used to make predictions
over unseen cases - Algorithms Find-S, List-Then-Eliminate,
candidate elimination - Consistent hypothesis - one that correctly
predicts observed examples - Version space - space of all currently consistent
(or satisfiable) hypotheses - Inductive Bias
- Strength of inductive bias how few hypotheses?
- Specific biases based on specific languages
- Hypothesis Language
- Searchable subset of the space of possible
descriptors - m-of-n, conjunctive, disjunctive, clauses
- Ability to represent a concept
18Summary Points
- Introduction to Supervised Concept Learning
- Inductive Leaps Possible Only if Learner Is
Biased - Futility of learning without bias
- Strength of inductive bias proportional to
restrictions on hypotheses - Modeling Inductive Learners with Equivalent
Deductive Systems - Representing inductive learning as theorem
proving - Equivalent learning and inference problems
- Syntactic Restrictions
- Example m-of-n concept
- Other examples?
- Views of Learning and Strategies
- Removing uncertainty (data compression)
- Role of knowledge
- Next Lecture More on Knowledge Discovery in
Databases (KDD)