Introducing Semantics into Machine Learning - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Introducing Semantics into Machine Learning

Description:

Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med. ... Research Associates: Joe Phillips, Paul Hodor, Vanathi ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 32
Provided by: Jose168
Learn more at: https://math.nist.gov
Category:

less

Transcript and Presenter's Notes

Title: Introducing Semantics into Machine Learning


1
Knowledge-Based DiscoveryUsing Semantics in
Machine Learning
Bruce Buchanan Joe Phillips University of
Pittsburgh
buchanan _at_ cs.pitt.edu josephp _at_ cs.pitt.edu
2
Intelligent Systems Laboratory
  • Faculty Bruce Buchanan, P.I., John Aronis
  • Collaborators John Rosenberg (Biol.Sci.), Greg
    Cooper (Medicine), Bob Ferrell (Genetics), Janyce
    Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson
    (Rehab.Sci.), Russ Altman (Stanford MIS)
  • Research Associates Joe Phillips, Paul Hodor,
    Vanathi Gopalakrishnan, Wendy Chapman
  • Ph.D. Students Gary Livingston, Dan Hennessy,
    Venkat Kolluri, Will Bridewell, Lili Ma
  • M.S. Students Karl Gossett

3
GOALS
(A) Learn understandable interesting rules
from data (B) Construct an understandable
coherent model from rules METHOD To use
background knowledge to search for simple rules
with familiar predicates interesting and novel
rules coherent models
4
Rules or ModelsUnderstandable
Interesting
  • Familiar Syntax
  • (conditional rules)
  • Syntactically Simple
  • Semantically Simple
  • Familiar Predicates
  • Accurate Predictions
  • Meaningful Rules
  • Relevant to Question
  • Novel
  • Cost-Effective
  • Coherent Model

5
The RL Program
Explicit Bias
Partial Domain Model
New Cases
Training Examples
Performance Program
MODEL
RL
Predictions
HAMB
RULES
6
(A) Individual Rules
  • J. Phillips
  • Rehabilitation Medicine Data

7
Simple single rules
  • Syntactic Simplicity
  • Fewer terms on the LHS
  • Explicitly stated constraints (rules with no more
    than N terms)
  • Tagged attributes (e.g. must have at least one
    control attribute to be interesting)

8
Simple sets of rules
  • Syntactic simplicity
  • Fewer rules
  • independent rules
  • E.g. in physics
  • U(x) Ugravity(x) Uelectronic(x)
    Umagnetic(x)
  • HAMB removes highly similar terms from feature
    set
  • less independence when theres feedback
  • e.g. medicine

9
Interestingness
  • Given, controlled and observed
  • explicitly state observed attributes as
    interesting target
  • Temporal
  • future (or distant past) predictions are
    interesting
  • Influence diagram (e.g. Bayes net)
  • strong but more indirect influences are
    interesting

10
Using typed attribute background knowledge
  • Organize terms into given, controlled and
    observed
  • E.g. in medical domain demographics,
    intervention and outcome
  • Benefits
  • Categorization of rules by whether they use
    givens (default), controls (controllable) or both
    (conditionally controllable)

11
Typed attribute example
  • Rehab. (RL Phillips, Buchanan, Penrod)
  • gt 2000 records

observed
given
controlled
temporal medical
demographic medical
admit general_condition
time rate
age race sex
specific_condition
absolute normalize
12
Example interestingness
  • Group rules by whether they predict by medical,
    demographic or both
  • by medical
  • Left_Body_Stroke gt poor improvement
    (interesting, expected)
  • by demographic
  • High_age gt poor improvement (interesting,
    expected)
  • (RaceX) gt poor improvement (interesting, NOT
    expected)

13
Using temporal background knowledge
  • Organize data by time
  • Utility may or may not extend to other metric
    spaces (e.g. space, mass)
  • Benefits
  • Predictions parameterized by time f(t)
  • Future or distant past may be interesting
  • Cyclical patterns

14
Temporal example
  • Geophysics (Scienceomatic Phillips 2000)
  • Subduction zone discoveries of type
  • d(qafter) d(qmain) mt(qafter)-t(qmain) b
  • NOTE This is not an accurate prediction!
  • interesting, generally quakes cant be predicted

X
d
15
Using influence diagram background knowledge
  • This is future work!
  • Organize terms to follow pre-existing influence
    diagram
  • E.g. Bayesian nets, but do not need conditional
    probabilities
  • Benefits
  • Suggest hidden variables, new influences
  • f(x) gt f(x,y)

16
Interestingness summary
  • How different types of background knowledge help
    us achieve interestingness
  • Explicitly stated observed attributes
  • Implicitly stated parameterized equations with
    interesting parameters
  • Learned new influence factors

17
(B) Coherent Models
  • B.Buchanan
  • Protein Data

18
EXAMPLEPredicting Ca Binding Sites
(G.Livingston)
Given 3-d descriptions of 16 sites in proteins
that bind calcium ions 100 other sites that do
not Find a model that allows predicting whether
a proposed new site will bind Ca in terms of
subset of 63 attributes
19
Ca binding sites in proteins
SOME ATTRIBUTES ATOM-NAME-IS-C
ATOM-NAME-IS-O
CHARGE CHARGE-WITH-HIS
HYDROPHOBICITY
MOBILITY
RESIDUE-CLASS1-IS-CHARGED
RESIDUE-CLASS1-IS-HYDROPHOBIC
RESIDUE-CLASS2-IS-ACIDIC
RESIDUE-CLASS2-IS-NONPOLAR
RESIDUE-CLASS2-IS-UNKNOWN

RESIDUE-NAME-IS-ASP
RESIDUE-NAME-IS-GLU
RESIDUE-NAME-IS-HOH
RESIDUE-NAME-IS-LEU
RESIDUE-NAME-IS-VAL RING-SYSTEM
SECONDARY-STRUCTURE1-IS-4-HELIX
SECONDARY-STRUCTURE1-IS-BEND
SECONDARY-STRUCTURE1-IS-HET
SECONDARY-STRUCTURE1-IS-TURN
SECONDARY-STRUCTURE2-IS-BETA
SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME
20
Predicting Ca Binding Sites
semantic types of attributes
e.g.,
Physical
Chemical
Structural
solvent accessibility charge VDW volume
heteroatom oxygen carbonyl ASN
helix beta-turn ring-system mobility
21
Coherent Model
subset of locally acceptable rules that
explains as much of the data uses entrenched
predicates Goodman uses predicates of same
semantic type uses predicates of same grain
size uses classes AND their complements
avoids rules that are "too similar"
identical subsuming sem.close
22
EXAMPLEpredict Ca binding sites in proteins
158 rules found independently. E.g., R1 IF a
site (a) is charged gt 18.5 AND
(b) no. of CO gt 18.75 THEN it binds
calcium R2 IF a site (a) is charged gt 18.5
AND (b) no. of ASN gt 15
THEN it binds calcium
23
Predicting Ca Binding Sites
semantic network of attributes
Heteroatoms
Sulfur Oxygen ...
Nitrogen

"Hydroxyl" Carbonyl Amide Amine

SH OH ASP GLU ASN GLN...PRO
/ CYS SER THR TYR
...
...
24
Ca binding sites in proteins
58 rules above threshold threshold at least
80 TP AND no more than
20 FP 42 rules predict SITE 16
rules predict NON-SITE Average accuracy for five
5-fold x-validations 100 for the
redundant model with 58 rules
25
Predicting Ca Binding Sites
Prefer complementary rules -- e.g., R59 IF,
within 5 A of a site ,
oxygens gt 6.5 THEN it
binds calcium R101 IF, within 5 A of a site ,
oxygens lt
6.5 THEN it does NOT bind calcium
o
o
26
5 A Radius Model
o
Five perfect rules R1. Oxygen LE 6.5 --gt
NON-SITE R2. Hydrophobicity GT -8.429
--gt NON-SITE
R3. Oxygen GT 6.5 --gt SITE R4.
Hydrophobicity LE -8.429 --gt SITE R5.
Carbonyl GT 4.5 Peptide LE 10.5
--gt SITE ( 100 of TP's and 0 FP's )
27
Final Result Ca binding sites in proteins
Model with 5 rules
same accuracy no unique predicates no subsumed or
very similar rules more genl. rules for SITES
(prior prob. lt 0.01) more specific
rules for NON-SITES (prior prob.
gt 0.99)
28
Predicting Ca Binding Sites
Attribute Hierarchies
RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
(ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU
MET PHE PRO VAL)
29
Attribute Hierarchies
RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS,
SER THR, TYR, TRP, GLY) CHARGED
ACIDIC (ARG ASP GLU) BASIC
( LYS) NONPOLAR (ALA ILE LEU MET
PHE PRO VAL)
HIS
TRP
30
CONCLUSION
Induction systems can be augmented with semantic
criteria to provide (A) interesting
understandable rules syntactically simple
meaningful (B) coherent models equally
predictive closer to a theory
31
CONCLUSION
  • We have shown
  • how specific types of background knowledge might
    be incorporated in the rule discovery process
  • possible benefits of incorporating those types of
    knowledge
  • more coherent models
  • more understandable models
  • more accurate models
Write a Comment
User Comments (0)
About PowerShow.com