Introducing Semantics into Machine Learning - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Introducing Semantics into Machine Learning

Description:

Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med. ... Research Associates: Joe Phillips, Paul Hodor, Vanathi ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 32

Provided by: Jose168

Learn more at: https://math.nist.gov

Category:

more less

Transcript and Presenter's Notes

Title: Introducing Semantics into Machine Learning

1
Knowledge-Based DiscoveryUsing Semantics in
Machine Learning
Bruce Buchanan Joe Phillips University of
Pittsburgh
buchanan _at_ cs.pitt.edu josephp _at_ cs.pitt.edu
2
Intelligent Systems Laboratory

Faculty Bruce Buchanan, P.I., John Aronis
Collaborators John Rosenberg (Biol.Sci.), Greg
Cooper (Medicine), Bob Ferrell (Genetics), Janyce
Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson
(Rehab.Sci.), Russ Altman (Stanford MIS)
Research Associates Joe Phillips, Paul Hodor,
Vanathi Gopalakrishnan, Wendy Chapman
Ph.D. Students Gary Livingston, Dan Hennessy,
Venkat Kolluri, Will Bridewell, Lili Ma
M.S. Students Karl Gossett

3
GOALS
(A) Learn understandable interesting rules
from data (B) Construct an understandable
coherent model from rules METHOD To use
background knowledge to search for simple rules
with familiar predicates interesting and novel
rules coherent models
4
Rules or ModelsUnderstandable
Interesting

Familiar Syntax
(conditional rules)
Syntactically Simple
Semantically Simple

Familiar Predicates
Accurate Predictions
Meaningful Rules
Relevant to Question
Novel
Cost-Effective
Coherent Model

5
The RL Program
Explicit Bias
Partial Domain Model
New Cases
Training Examples
Performance Program
MODEL
RL
Predictions
HAMB
RULES
6
(A) Individual Rules

J. Phillips
Rehabilitation Medicine Data

7
Simple single rules

Syntactic Simplicity
Fewer terms on the LHS
Explicitly stated constraints (rules with no more
than N terms)
Tagged attributes (e.g. must have at least one
control attribute to be interesting)

8
Simple sets of rules

Syntactic simplicity
Fewer rules
independent rules
E.g. in physics
U(x) Ugravity(x) Uelectronic(x)
Umagnetic(x)
HAMB removes highly similar terms from feature
set
less independence when theres feedback
e.g. medicine

9
Interestingness

Given, controlled and observed
explicitly state observed attributes as
interesting target
Temporal
future (or distant past) predictions are
interesting
Influence diagram (e.g. Bayes net)
strong but more indirect influences are
interesting

10
Using typed attribute background knowledge

Organize terms into given, controlled and
observed
E.g. in medical domain demographics,
intervention and outcome
Benefits
Categorization of rules by whether they use
givens (default), controls (controllable) or both
(conditionally controllable)

11
Typed attribute example

Rehab. (RL Phillips, Buchanan, Penrod)
gt 2000 records

observed
given
controlled
temporal medical
demographic medical
admit general_condition
time rate
age race sex
specific_condition
absolute normalize
12
Example interestingness

Group rules by whether they predict by medical,
demographic or both
by medical
Left_Body_Stroke gt poor improvement
(interesting, expected)
by demographic
High_age gt poor improvement (interesting,
expected)
(RaceX) gt poor improvement (interesting, NOT
expected)

13
Using temporal background knowledge

Organize data by time
Utility may or may not extend to other metric
spaces (e.g. space, mass)
Benefits
Predictions parameterized by time f(t)
Future or distant past may be interesting
Cyclical patterns

14
Temporal example

Geophysics (Scienceomatic Phillips 2000)
Subduction zone discoveries of type
d(qafter) d(qmain) mt(qafter)-t(qmain) b
NOTE This is not an accurate prediction!
interesting, generally quakes cant be predicted

X
d
15
Using influence diagram background knowledge

This is future work!
Organize terms to follow pre-existing influence
diagram
E.g. Bayesian nets, but do not need conditional
probabilities
Benefits
Suggest hidden variables, new influences
f(x) gt f(x,y)

16
Interestingness summary

How different types of background knowledge help
us achieve interestingness
Explicitly stated observed attributes
Implicitly stated parameterized equations with
interesting parameters
Learned new influence factors

17
(B) Coherent Models

B.Buchanan
Protein Data

18
EXAMPLEPredicting Ca Binding Sites
(G.Livingston)
Given 3-d descriptions of 16 sites in proteins
that bind calcium ions 100 other sites that do
not Find a model that allows predicting whether
a proposed new site will bind Ca in terms of
subset of 63 attributes
19
Ca binding sites in proteins
SOME ATTRIBUTES ATOM-NAME-IS-C
ATOM-NAME-IS-O
CHARGE CHARGE-WITH-HIS
HYDROPHOBICITY
MOBILITY
RESIDUE-CLASS1-IS-CHARGED
RESIDUE-CLASS1-IS-HYDROPHOBIC
RESIDUE-CLASS2-IS-ACIDIC
RESIDUE-CLASS2-IS-NONPOLAR
RESIDUE-CLASS2-IS-UNKNOWN

RESIDUE-NAME-IS-ASP
RESIDUE-NAME-IS-GLU
RESIDUE-NAME-IS-HOH
RESIDUE-NAME-IS-LEU
RESIDUE-NAME-IS-VAL RING-SYSTEM
SECONDARY-STRUCTURE1-IS-4-HELIX
SECONDARY-STRUCTURE1-IS-BEND
SECONDARY-STRUCTURE1-IS-HET
SECONDARY-STRUCTURE1-IS-TURN
SECONDARY-STRUCTURE2-IS-BETA
SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME
20
Predicting Ca Binding Sites
semantic types of attributes
e.g.,
Physical
Chemical
Structural
solvent accessibility charge VDW volume
heteroatom oxygen carbonyl ASN
helix beta-turn ring-system mobility
21
Coherent Model
subset of locally acceptable rules that
explains as much of the data uses entrenched
predicates Goodman uses predicates of same
semantic type uses predicates of same grain
size uses classes AND their complements
avoids rules that are "too similar"
identical subsuming sem.close
22
EXAMPLEpredict Ca binding sites in proteins
158 rules found independently. E.g., R1 IF a
site (a) is charged gt 18.5 AND
(b) no. of CO gt 18.75 THEN it binds
calcium R2 IF a site (a) is charged gt 18.5
AND (b) no. of ASN gt 15
THEN it binds calcium
23
Predicting Ca Binding Sites
semantic network of attributes
Heteroatoms
Sulfur Oxygen ...
Nitrogen

"Hydroxyl" Carbonyl Amide Amine

SH OH ASP GLU ASN GLN...PRO
/ CYS SER THR TYR
...
...
24
Ca binding sites in proteins
58 rules above threshold threshold at least
80 TP AND no more than
20 FP 42 rules predict SITE 16
rules predict NON-SITE Average accuracy for five
5-fold x-validations 100 for the
redundant model with 58 rules
25
Predicting Ca Binding Sites
Prefer complementary rules -- e.g., R59 IF,
within 5 A of a site ,
oxygens gt 6.5 THEN it
binds calcium R101 IF, within 5 A of a site ,
oxygens lt
6.5 THEN it does NOT bind calcium
o
o
26
5 A Radius Model
o
Five perfect rules R1. Oxygen LE 6.5 --gt
NON-SITE R2. Hydrophobicity GT -8.429
--gt NON-SITE
R3. Oxygen GT 6.5 --gt SITE R4.
Hydrophobicity LE -8.429 --gt SITE R5.
Carbonyl GT 4.5 Peptide LE 10.5
--gt SITE ( 100 of TP's and 0 FP's )
27
Final Result Ca binding sites in proteins
Model with 5 rules
same accuracy no unique predicates no subsumed or
very similar rules more genl. rules for SITES
(prior prob. lt 0.01) more specific
rules for NON-SITES (prior prob.
gt 0.99)
28
Predicting Ca Binding Sites
Attribute Hierarchies
RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
(ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU
MET PHE PRO VAL)
29
Attribute Hierarchies
RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
ACIDIC (ARG ASP GLU) BASIC
( LYS) NONPOLAR (ALA ILE LEU MET
PHE PRO VAL)
HIS
TRP
30
CONCLUSION
Induction systems can be augmented with semantic
criteria to provide (A) interesting
understandable rules syntactically simple
meaningful (B) coherent models equally
predictive closer to a theory
31
CONCLUSION

We have shown
how specific types of background knowledge might
be incorporated in the rule discovery process
possible benefits of incorporating those types of
knowledge
more coherent models
more understandable models
more accurate models

Write a Comment

User Comments (0)