Title: Learning Probabilistic Relational Models
1Learning Probabilistic Relational Models
2Learning from Relational Data
- Data sources
- relational and object-oriented databases
- frame-based knowledge bases
- World Wide Web
- Traditional approaches
- work well with flat representations
- fixed length attribute-value vectors
- assume IID samples
- Problem
- must fix attributes in advance ?
- can represent only some limited set of structures
- IID assumption may not hold
3Our Approach
- Probabilistic Relational Models (PRMs)
- rich representation language models
- relational dependencies
- probabilistic dependencies
- Learning PRMs
- parameter estimation
- model selection
- from data stored in relational databases
4Outline
- Motivation
- Probabilistic relational models
- Probabilistic Logic ProgrammingPoole, 1993
Ngo Haddawy 1994 - Probabilistic object-oriented knowledgeKoller
Pfeffer 1997 1998 Koller, Levy Pfeffer
1997 - Learning PRMs
- Experimental results
- Conclusions
5Probabilistic Relational Models
- Combine advantages of predicate logic BNs
- natural domain modeling objects, properties,
relations - generalization over a variety of situations
- compact, natural probability models.
- Integrate uncertainty with relational model
- properties of domain entities can depend on
properties of related entities - uncertainty over relational structure of domain.
6Relational Schema
Take
Teach
In
- Describes the types of objects and relations in
the database
7Example instance I
Professor Prof. Gump Popularity high Teaching
Ability medium Stress-Level low
Student John Doe Intelligence
high Performance average
Student Jane Doe Intelligence
high Performance average
Reg 5639 Grade A Satisfaction 3
Reg 5639 Grade A Satisfaction 3
Course Phil142 Difficulty low Rating high
Course Phil101 Difficulty low Rating high
Reg 5639 Grade A Satisfaction 3
8Whats Uncertain?
Professor Prof. Gump Popularity high Teaching
Ability medium Stress-Level low
Student John Doe Intelligence
high Performance average
Student Jane Doe Intelligence
high Performance average
Reg 5639 Grade A Satisfaction 3
Reg 5639 Grade A Satisfaction 3
Course Phil142 Difficulty low Rating high
Course Phil101 Difficulty low Rating high
Reg 5639 Grade A Satisfaction 3
9Attribute Uncertainty
Professor Prof. Gump Popularity ??? Teaching
Ability ??? Stress-Level ???
Student John Deer Intelligence
??? Performance ???
Student Jane Doe Intelligence
??? Performance ???
Reg 5639 Grade A Satisfaction 3
Reg 5639 Grade A Satisfaction 3
Course Phil142 Difficulty ??? Rating ???
Course Phil101 Difficulty ??? Rating ???
Reg 5639 Grade ??? Satisfaction ???
- Fixed skeleton ?
- set of objects in each class
- relations between them
- Uncertainty
- over assignments of values to attributes
10PRM Dependencies
Professor
Student
Intelligence
Performance
Course
Difficulty
Rating
Reg
Grade
Satisfaction
11PRM Dependencies (cont.)
Professor Prof. Gump Popularity high Teaching
Ability medium Stress-Level low
Student John Doe Intelligence
high Performance average
Student Jane Doe Intelligence
high Performance average
Reg 5639 Grade A Satisfaction 3
Reg 5639 Grade A Satisfaction 3
Course Phil142 Difficulty low Rating high
Course Phil101 Difficulty low Rating high
Reg 5639 Grade ? Satisfaction 3
12PRM aggregate dependencies
Reg
Grade
13PRM aggregate dependencies
Professor
Student
Intelligence
Performance
Course
Difficulty
count
avg
Rating
Reg
Grade
Satisfaction
avg
sum, min, max, avg, mode, count
14PRM Summary
- A PRM specifies
- a probabilistic dependency structure S
- a set of parents for each attribute X.A
- a set of local probability models q
- Given a skeleton structure ?, a PRM specifies a
probability distribution over instances I - over attribute values of all objects in ?
15Learning PRMs
Reg
Course
Database
Student
Instance I
PRM
Reg
Course
Student
Relational Schema
16Parameter estimation in PRMs
- Assume known dependency structure S
- Goal estimate PRM parameters q
- entries in local probability models,
- A parameterization q is good if it is likely to
generate the observed data, instance I . - MLE Principle Choose q so as to maximize l
crucial property decomposition separate terms
for different X.A
17ML parameter estimation
Reg
Grade
Satisfaction
sufficient statistics
DB technology well-suited to the computation of
suff statistics
Count
18Model Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- scoring models
- legal models
- searching model space
19Scoring Models
- Bayesian approach
- closed form solution
20Legal Models
- Dependency ordering over attributes
- PRM defines a coherent probability model over
skeleton ? if ?? is acyclic
21Guaranteeing Acyclicity
How do we guarantee that a PRM is acyclic for
every skeleton?
22Limitation of stratification
Father
Mother
Person
Person
Person
23Guaranteed acyclic relations
Father
Mother
Person
Person
Person
- Prior knowledge the Father-of relation is
acyclic - dependence of Person.A on Person.Father.B cannot
induce cycles
24Guaranteeing acyclicity
- With guaranteed acyclic relations, some cycles in
the dependency graph are guaranteed to be safe. - We color the edges in the dependency graph
X.A
X.A
X.A
yellow within single object
green via g.a. relation
red via other relations
X.B
Y.B
Y.B
- A cycle is safe if
- it has a green edge
- it has no red edge
25Searching Model Space
Phase 0 consider only dependencies within a class
Course
Student
Reg
Course
Student
Reg
Delete S.I?S.P
?score
Course
Student
Reg
26Phased structure search
Phase 1 consider dependencies from neighboring
classes, via schema relations
Course
Student
Reg
Course
Student
Reg
Add S.I?R.C
? score
Course
Student
Reg
27Phased structure search
Phase 2 consider dependencies from further
classes, via relation chains
Course
Student
Reg
Course
Student
Reg
Add S.I?C.B
Course
Student
Reg
? score
28Experimental ResultsMovie Domain (real data)
11,000 movies, 7,000 actors
Movie
Process
Decade
Genre
source http//www-db.stanford.edu/movies/doc.htm
l
29Genetics domain (synthetic data)
Father
Mother
Person
Person
30Experimental Results
-18000
-20000
-22000
-24000
Median Likelihood
Score
Gold Standard
-26000
-28000
-30000
-32000
200
300
400
500
600
700
800
Dataset Size
31Benefits
- Summarization
- PRM provides compact model
- Anomaly detection
- identify change and deviation
- Interpretability
- graphical representation of dependencies
- Dependency modeling
- relational statistical
32Future directions
- Learning in complex real-world domains
- drug treatment regimes
- collaborative filtering
- Missing data
- Learning with structural uncertainty
- Discovery
- hidden variables
- causal structure
- class hierarchy
33Conclusions
- PRMs natural extension of BNs
- well-founded (probabilistic) semantics
- compact representation of complex models
- Powerful learning techniques
- builds on BN learning techniques
- can learn directly from relational data
- Parameter estimation
- efficient, effective exploitation of DB
technology - Structure identification
- builds on well understood theory
- major issues
- guaranteeing coherence
- search heuristics