Title: Statistical Relational Learning
1Statistical Relational Learning
2Acknowledgements
- Lise Getoor, Nir Friedman, Daphne Koller, Ben
Taskar, Avi Pfeffer, David Jensen, Pedro
Domingos, Indrajit Bhattacharya and many others
3Why SRL?
- Probabilistic graphical models (particularly
Bayesian network) have been shown to be a useful
way of representing statistical patterns in real
world domain - Probabilistic relational models (PRMs) are a
recent development the extend the standard
attribute-based Bayesian network representation
to incorporate a much richer relational structure
- Allow the specification of a probability model
for classes of objects rather than simple
attributes. - Allow properties of an entity to depend
probabilistically on properties of other related
entities.
4An example
- A simple model of the performance of a student in
a course - Random variables
- Course difficulty high, medium, low
- Student intelligence high, low
- Understands material yes, no
- Good test taker yes, no
- Homework grade A, B, C, D, E
- Exam grade A, B, C, D, E
5Complete joint probability distribution
- Must specify a probability for each of the
exponentially many different instantiations of
the set - P(I,D,G,U,E,H) will consider all possible
assignment of the value of these variables - 222355600
- The naïve representationof the joint
distribution isinfeasible
6Bayesian networks
- Key insight each variable is directly influenced
by only a few others - Probabilistic conditional independence each node
is conditionally independent of its
non-descendants given values for its parents - Associate with each node a conditional
probability distribution (CPD), which specifies
for each node X the probability distribution over
the values of X given each combination of values
for its parents, denoted as Pa(X) - The joint distribution can be factorized into a
product of CPDs of all the variables
7Bayesian networks
- Bayesian networks provide a compact
representation of complex joint distributions
8However
- Bayesian networks are often inadequate to
properly model aspects of complex relational
domains. - A Bayesian network for a given domain involves a
pre-specified set of random variables, whose
relationship to each other is fixed in advance. - They cannot deal with domains where we may
encounterseveral entities in a varietyof
configurations, becausethey lack the concept
ofan object (or domain entity)
If we treat the circles as random variables,then
how to handle the AVG dependence
9Introduction to PRMs
- PRMs extend Bayesian networks with the concepts
of individuals, their properties and relations
between them - The relational framework is motivated primarily
by the concepts of relational database - Relational database vs. PRM schema instance
10PRMs definition Relational schema
- A schema for a relational model describe a set of
classes, ?X1, X2,Xn. - The domain entities of a class is called objects
- Each class is associated with a set of
descriptive attributes and a set of reference
slots - Each class correspond to a single table
- Descriptive attributes correspond to standard
attributes in the table - Reference slots correspond to attributes that are
foreign keys
11A more complex school example
- The rectangles are classes
- The underlined attributes are reference slots and
the others are descriptive attributes
Each single table is a class
Foreign keys are reference slots
Standard attributes are descriptive attributes
12Another movie example
Each single table is a class
Foreign keys are reference slots
Standard attributes are descriptive attributes
13Descriptive attributes
- The set of descriptive attribute of a class X is
denoted A(X), attributes of class X is denoted
X.A, and its domain value is V(X.A) - Examples
- A(Student) Intelligence, Ranking
- V(Student.Intellegence) high, low
- V(Actor.Gender) male, female
14Reference slots
- The set of reference slots of a class X is
denoted R(X). we use X.? to denote the reference
slot ? of X - The domain type Dom? X
- The range type Range? Y, Y is some class in ?
- Examples
- R(Registration) Course, Student
- RangeCourse.Instructor Professor
- R(Role) Actor, Movie
- RangeRole.Actor Actor
15Inverse slot slot chain
- For each reference slot ?, we can define an
inverse slot ?-1, interpreted as the inverse
function of ? - We define a slot chain t ?1,,?k to be a
sequence of slots such that for all i. Range?i
Dom?i1. The slot chain allows us to compose
slots, defining functions from objects to other
indirectly related objects - Examples
- The inverse slot for the Student slot of
Registration is called Registered-In - Student.Registered-In.Course.Instructor can be
used to denote a students set of instructors
16Schema Instance
- An instance I of a schema specifies
- A set of objects x, partitioned into classes
- A value for each descriptive attribute x.A
- A value for each reference slot x.?, which is an
object of the appropriate range type - A complete instantiation I is a set of object
with no missing values and no dangling references - School example
- One Professor
- Two Classes
- Three Registration
- Two Students
17PRMs definition relational skeleton
- A relational skeleton s of a relational schema is
a partial specification of an instance of the
schema. - It specifies the set of objects for each class
and the relations(dependency structure) that hold
between objects (similar to Bayesian network
structure) - It leaves the value of the attributes
unspecified - A PRM will specify a probability distributions
over all complete instantiations that extend
the skeleton
18PRMs definition relational skeleton
- The dependency structure of a relational skeleton
is defined by associating with each attributes
X.A a set of formal parents Pa(X.A) - X.A can depend on another probabilistic attribute
B of X - X.A can also depend on attributes of related
object X.t.C, where t is a slot chain - The class-level dependencies are instantiated
according to the relational skeleton, to define
object-level dependency - We use s(X) to refer the set of objects of class
X - Let x be some object in s(X), the actual parent
of x.A is x.B - The formal parent of x.A is y.C, where y belongs
to x.t
19Differences to Bayesian networks
- The PRM defines the dependency model at the class
level, allowing it to be used for any object in
the class. The class dependency model is
universally quantified and instantiated for every
object in the class domain - The PRM explicitly uses the relational structure
of the model, in that it allows the probabilistic
model of an attribute of an object to depend also
on attribute of related objects.The specific set
of related objects can vary with the relational
skeleton s
20Definition of PRM
- A probabilistic relational model (PRM) ? for a
relational schema S define for each class X ??and
each descriptive attribute A ? A(X), a set of
formal parents Pa(X.A) ,and a conditional
probability distribution (CPD) that represents
P(X.APa(X.A)) - A PRM consists of two components the qualitative
dependency structure S, and the set of parameters
?S associated with it. For the basic PRM for
attribute uncertainty, we assume that the
relational skeleton sr is given
21Qualitative structure
- The qualitative structure of the network is
defined via an instance dependency graph Gs,
whose nodes correspond to descriptive attributes
x.A of objects in the skeleton, and the edges
correspond to the direct attributes dependence
and the slot chain dependence - Note that the slot chain x.t might be
multi-valued, we must specify the probabilistic
dependence of x.A on the multi-set y.B y ?
x.t. - It is impractical to provide a dependency model
for each of the unboundedly many possible
multi-set size. - We use an aggregate function and define a
dependence on the computed aggregate value
22Aggregate function
- The dependence of x.A on x.t.B is interpreted as
a probabilistic dependence of x.A on some
aggregate property of this multi-set. - Many natural and useful notions of
aggregate?Mean, median, maximum, cardinality
etc. - We allow X.A to have a parent ?(X.t.B ).The
semantic is that for any x ? X, x.A will depend
on the value of ?(x.t.B ). We define V(?(X.t.B
)) to be the set of possible values of this
aggregate
23Parameters
- A PRM associates a CPD for each attribute of each
class. As for dependencies, we assume that the
parameters are shared by each object in the
class. - The school example
- P(GD,I)
- P(Ravg(G))An aggregate is used here
24PRM semantics
- Given a skeleton sr, we have a set of random
variables of interest. The set of random
variables for sr is the set of attributes of the
form x.A, where x ?s(Xi) ,and A ? A(Xi) fpr some
class Xi - The PRM specifies a probability distribution over
the possible joint assignments if values to these
random variables. It basically defines a ground
Bayesian network. - This ground Bayesian network leads to the
following chain rule which defines a distribution
over the instantiations compatible with the
skeleton sr
Each attribute
Each object
Each class
25Differences to Bayesian networks
- Our random variables are the attributes of a set
of objects. - The set of parents of a random variable can vary
according to the relational context of the object
the set of object to which it is related - The school example
The attribute Grade of the class Registration
depends on the attributes Intelligence of the
class Student.
For the Registration object 5639, it references
Jane-Doe.Intelligence.
But for some other Registration objects, they
might reference Bob.Intelligence or
Tony.Intelligence
26Coherent probability distribution
- We have to ensure that the resulting function
from instances to numbers does indeed define a
coherent probability, where the sum of the
probability of all instances is 1 - In Bayesian network, the requirement is satisfied
if the dependency graph is acyclic a variable is
not an ancestor of itself - We need to check whether a dependency structure S
is acyclic relative to a fixed skeleton s
27Dependency graph
- A stronger guarantee acyclic class dependency
graph - The class dependency graph has an edge from Y.B
to X.A if either XY and X.B is a parent of X.A,
or ?(X.t.B ) is a parent of X.A and
RangeX.tY - It is clear that if the classdependency graph is
acyclic, we can neverhave that x.A depends on
itself. The school example
28HoweverAnother example
- Blood test
- A cycle in the class dependency graph does not
imply that all skeleton induce cyclic instance
dependencies - Although the model appears to be cyclic at the
class level, we know that the cyclicity is always
resolved at the level of individual objects
29Learning PRMs
- Input
- A relational schema which specifies the basic
vocabulary in the domain the set of classes,
the attributes associated with different classes,
and the possible types of relations between
objects in the different classes - The training data consists of a fully specified
instance of that schema - Learning task
- Parameter estimation
- Structure learning
30Parameter estimation
- We assume that the qualitative dependency
structure S is known - The key ingredient in parameter estimation is the
likelihood function, the probability of the data
given the model - Then performing Maximum likelihood estimation
(MLE), to find the parameter setting ?S that
maximize the likelihood L(?SI,s,S) for a given
I,s, and S. The maximum likelihood model is the
model that best predicts the training data - We can also take a Bayesian approach to parameter
estimation by incorporating parameter priors
31Structure learning
- Hypothesis space
- Specify which structures are legal candidate
hypothesisAcyclic dependency graph - Scoring structures
- Bayesian scoreThis score is composed of the
prior probability of the structure and the
posterior probability of the structure given the
data - Structure search
- Hill-climbing search
- A heuristic search algorithm
32A heuristic search algorithm
Phase 0 consider only dependencies within a class
Author
Review
Paper
33A heuristic search algorithm
Phase 1 consider dependencies from neighboring
classes, via schema relations
Author
Review
Paper
Author
Review
Paper
Add P.A?R.M
? score
Author
Review
Paper
34A heuristic search algorithm
Phase 2 consider dependencies from further
classes, via relation chains
Author
Review
Paper
Author
Review
Paper
Add R.M?A.W
Author
Review
Paper
? score
35Limitations of BNs
- In BN, each instance has its own dependency
model, cannot generalize over instances - If John tends to like sitcoms, he will probably
likenext seasons offerings - whether a person enjoys sitcom reruns dependson
whether they watch primetime sitcoms - BN can only model relationships between atmost
one class of instances at a time - In previous model, cannot model
relationshipsbetween people - if my roommate watches Seinfeld I am morelikely
to join in
36PRM Summary
- PRMs inherit key advantages of probabilistic
graphical models - Coherent probabilistic semantics
- Exploit structure of local interactions
- Relational models inherently more expressive
- Web of influence use multiple sources of
information to reach conclusions - Exploit both relational information and power of
probabilistic reasoning
37Probabilistic Relational Models (PRMs)
- Developed by Daphne Kollers group at Stanford
- representation Avi Pfeffer
- builds on work in KBMC (knowledge-based model
construction) by Haddawy, Poole, Wellman and
others - Object Oriented Bayesian Networks
- Relational Probability Models
- learning Lise Getoor, Nir Friedman, Avi
- Attribute Uncertainty
- Structural Uncertainty
- Class Uncertainty
- Identity Uncertainty
- undirected models Ben Taskar
- Reference
- Learning Probabilistic Models of Link Structure.
Lise Getoor, Nir Friedman, Daphne Koller,
Benjamin Taskar. Journal of Machine Learning
Research, Volume 3, page 679- -707 - 2002
38Families of SRL Approaches
- Frame-based Probabilistic Models
- Probabilistic Relational Models (PRMs),
- Probabilistic Entity Relation Models (PERs),
- Object Oriented Bayesian Networks (OOBNs)
- First Order Probabilistic Logic (FOPL)
- BLOGs
- Relational Markov Logic (RML)
- Stochastic Functional Programs
- PRISM
- Stochastic Logic Programs (SLPs)
- IBAL
39Conclusion
- Statistical Relational Learning
- Supports multi-relational, heterogeneous domains
- Supports noisy, uncertain, non-IID data
- aka, real-world data!
- Different approaches
- rule-based vs. frame-based
- directed vs. undirected
- Many common issues
- Need for collective classification and
consolidation - Need for aggregation and combining rules
- Need to handle labeled and unlabeled data
- Need to handle structural uncertainty
- etc.
- Great opportunity for combining machine learning
for hierarchical statistical models with
probabilistic databases which can efficiently
store, query, update models
40Recent SRL Activities
- Dagstuhl Workshop on Probabilistic, Logical and
Relational Learning - Towards a
Synthesishttp//www.dagstuhl.de/05051/ - ICML 2004 workshop on Statistical Relational
Learning and its Connections to Other
Fieldshttp//www.cs.umd.edu/projects/srl2004/ - IJCAI 2003 workshop on Statistical Relational
Learninghttp//kdl.cs.umass.edu/srl2003/ - AAAI 2000 workshop on Statistical Relational
Learninghttp//robotics.stanford.edu/srl - Several related workshops
- KDD MRDM workshops
- http//www-ai.ijs.si/SasoDzeroski/MRDM2004/
- http//www-ai.ijs.si/SasoDzeroski/MRDM2003/
- http//www-ai.ijs.si/SasoDzeroski/MRDM2002/