Title: Introducing Semantics into Machine Learning
1Knowledge-Based DiscoveryUsing Semantics in
Machine Learning
Bruce Buchanan Joe Phillips University of
Pittsburgh
buchanan _at_ cs.pitt.edu josephp _at_ cs.pitt.edu
2Intelligent Systems Laboratory
- Faculty Bruce Buchanan, P.I., John Aronis
- Collaborators John Rosenberg (Biol.Sci.), Greg
Cooper (Medicine), Bob Ferrell (Genetics), Janyce
Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson
(Rehab.Sci.), Russ Altman (Stanford MIS) - Research Associates Joe Phillips, Paul Hodor,
Vanathi Gopalakrishnan, Wendy Chapman - Ph.D. Students Gary Livingston, Dan Hennessy,
Venkat Kolluri, Will Bridewell, Lili Ma - M.S. Students Karl Gossett
3GOALS
(A) Learn understandable interesting rules
from data (B) Construct an understandable
coherent model from rules METHOD To use
background knowledge to search for simple rules
with familiar predicates interesting and novel
rules coherent models
4Rules or ModelsUnderstandable
Interesting
- Familiar Syntax
- (conditional rules)
- Syntactically Simple
- Semantically Simple
- Familiar Predicates
- Accurate Predictions
- Meaningful Rules
- Relevant to Question
- Novel
- Cost-Effective
- Coherent Model
5The RL Program
Explicit Bias
Partial Domain Model
New Cases
Training Examples
Performance Program
MODEL
RL
Predictions
HAMB
RULES
6(A) Individual Rules
- J. Phillips
- Rehabilitation Medicine Data
7Simple single rules
- Syntactic Simplicity
- Fewer terms on the LHS
- Explicitly stated constraints (rules with no more
than N terms) - Tagged attributes (e.g. must have at least one
control attribute to be interesting)
8Simple sets of rules
- Syntactic simplicity
- Fewer rules
- independent rules
- E.g. in physics
- U(x) Ugravity(x) Uelectronic(x)
Umagnetic(x) - HAMB removes highly similar terms from feature
set - less independence when theres feedback
- e.g. medicine
9Interestingness
- Given, controlled and observed
- explicitly state observed attributes as
interesting target - Temporal
- future (or distant past) predictions are
interesting - Influence diagram (e.g. Bayes net)
- strong but more indirect influences are
interesting
10Using typed attribute background knowledge
- Organize terms into given, controlled and
observed - E.g. in medical domain demographics,
intervention and outcome - Benefits
- Categorization of rules by whether they use
givens (default), controls (controllable) or both
(conditionally controllable)
11Typed attribute example
- Rehab. (RL Phillips, Buchanan, Penrod)
- gt 2000 records
observed
given
controlled
temporal medical
demographic medical
admit general_condition
time rate
age race sex
specific_condition
absolute normalize
12Example interestingness
- Group rules by whether they predict by medical,
demographic or both - by medical
- Left_Body_Stroke gt poor improvement
(interesting, expected) - by demographic
- High_age gt poor improvement (interesting,
expected) - (RaceX) gt poor improvement (interesting, NOT
expected)
13Using temporal background knowledge
- Organize data by time
- Utility may or may not extend to other metric
spaces (e.g. space, mass) - Benefits
- Predictions parameterized by time f(t)
- Future or distant past may be interesting
- Cyclical patterns
14Temporal example
- Geophysics (Scienceomatic Phillips 2000)
- Subduction zone discoveries of type
- d(qafter) d(qmain) mt(qafter)-t(qmain) b
- NOTE This is not an accurate prediction!
- interesting, generally quakes cant be predicted
X
d
15Using influence diagram background knowledge
- This is future work!
- Organize terms to follow pre-existing influence
diagram - E.g. Bayesian nets, but do not need conditional
probabilities - Benefits
- Suggest hidden variables, new influences
- f(x) gt f(x,y)
16Interestingness summary
- How different types of background knowledge help
us achieve interestingness - Explicitly stated observed attributes
- Implicitly stated parameterized equations with
interesting parameters - Learned new influence factors
17(B) Coherent Models
18EXAMPLEPredicting Ca Binding Sites
(G.Livingston)
Given 3-d descriptions of 16 sites in proteins
that bind calcium ions 100 other sites that do
not Find a model that allows predicting whether
a proposed new site will bind Ca in terms of
subset of 63 attributes
19 Ca binding sites in proteins
SOME ATTRIBUTES ATOM-NAME-IS-C
ATOM-NAME-IS-O
CHARGE CHARGE-WITH-HIS
HYDROPHOBICITY
MOBILITY
RESIDUE-CLASS1-IS-CHARGED
RESIDUE-CLASS1-IS-HYDROPHOBIC
RESIDUE-CLASS2-IS-ACIDIC
RESIDUE-CLASS2-IS-NONPOLAR
RESIDUE-CLASS2-IS-UNKNOWN
RESIDUE-NAME-IS-ASP
RESIDUE-NAME-IS-GLU
RESIDUE-NAME-IS-HOH
RESIDUE-NAME-IS-LEU
RESIDUE-NAME-IS-VAL RING-SYSTEM
SECONDARY-STRUCTURE1-IS-4-HELIX
SECONDARY-STRUCTURE1-IS-BEND
SECONDARY-STRUCTURE1-IS-HET
SECONDARY-STRUCTURE1-IS-TURN
SECONDARY-STRUCTURE2-IS-BETA
SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME
20Predicting Ca Binding Sites
semantic types of attributes
e.g.,
Physical
Chemical
Structural
solvent accessibility charge VDW volume
heteroatom oxygen carbonyl ASN
helix beta-turn ring-system mobility
21Coherent Model
subset of locally acceptable rules that
explains as much of the data uses entrenched
predicates Goodman uses predicates of same
semantic type uses predicates of same grain
size uses classes AND their complements
avoids rules that are "too similar"
identical subsuming sem.close
22EXAMPLEpredict Ca binding sites in proteins
158 rules found independently. E.g., R1 IF a
site (a) is charged gt 18.5 AND
(b) no. of CO gt 18.75 THEN it binds
calcium R2 IF a site (a) is charged gt 18.5
AND (b) no. of ASN gt 15
THEN it binds calcium
23Predicting Ca Binding Sites
semantic network of attributes
Heteroatoms
Sulfur Oxygen ...
Nitrogen
"Hydroxyl" Carbonyl Amide Amine
SH OH ASP GLU ASN GLN...PRO
/ CYS SER THR TYR
...
...
24 Ca binding sites in proteins
58 rules above threshold threshold at least
80 TP AND no more than
20 FP 42 rules predict SITE 16
rules predict NON-SITE Average accuracy for five
5-fold x-validations 100 for the
redundant model with 58 rules
25Predicting Ca Binding Sites
Prefer complementary rules -- e.g., R59 IF,
within 5 A of a site ,
oxygens gt 6.5 THEN it
binds calcium R101 IF, within 5 A of a site ,
oxygens lt
6.5 THEN it does NOT bind calcium
o
o
265 A Radius Model
o
Five perfect rules R1. Oxygen LE 6.5 --gt
NON-SITE R2. Hydrophobicity GT -8.429
--gt NON-SITE
R3. Oxygen GT 6.5 --gt SITE R4.
Hydrophobicity LE -8.429 --gt SITE R5.
Carbonyl GT 4.5 Peptide LE 10.5
--gt SITE ( 100 of TP's and 0 FP's )
27Final Result Ca binding sites in proteins
Model with 5 rules
same accuracy no unique predicates no subsumed or
very similar rules more genl. rules for SITES
(prior prob. lt 0.01) more specific
rules for NON-SITES (prior prob.
gt 0.99)
28Predicting Ca Binding Sites
Attribute Hierarchies
RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
(ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU
MET PHE PRO VAL)
29Attribute Hierarchies
RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
ACIDIC (ARG ASP GLU) BASIC
( LYS) NONPOLAR (ALA ILE LEU MET
PHE PRO VAL)
HIS
TRP
30CONCLUSION
Induction systems can be augmented with semantic
criteria to provide (A) interesting
understandable rules syntactically simple
meaningful (B) coherent models equally
predictive closer to a theory
31CONCLUSION
- We have shown
- how specific types of background knowledge might
be incorporated in the rule discovery process - possible benefits of incorporating those types of
knowledge - more coherent models
- more understandable models
- more accurate models