Title: Learning Structure from Relational Observations
1Learning Structure from Relational Observations
Tina Eliassi-RadComputation DirectorateLawrence
Livermore National Laboratory
Work performed under the auspices of the U.S.
Department of Energy by University of California
Lawrence Livermore National Laboratory under
Contract W-7405-Eng-48.
2Relational observations are ubiquitous
- Relational observations
- Can be binary, integer, or real-valued
- Can be of different types and have observable
attributes
- Can contain objects of different types with their
own (observable) attributes
- Are often represented as dynamic heterogeneous
networks
3Structure in observations can be represented in
various forms
Zacharys Karate Club, 1977
Based on maximum likelihood configuration
posterior
likelihood
prior
Averaged over samples from the posterior
4Why do we want to find hidden structure?
- Discover groups of objects (a.k.a. communities)
- Predict missing links and attributes
- Understand the structures within the
observations
Results based on maximum likelihood configuration
5We use a non-parametric Bayesian approach to find
hidden structure in graphs
Given an adjacency matrix where Ri,j 0 or 1
(observables), find the hidden/latent group
assignment ?i.
Probabilistic - Bayesian Formulation
Non-parametric models can automatically infer an
adequate model complexity from data, without
needing to explicitly do Bayesian model
comparison.
6Relational observations tend to be large, noisy,
and missing some attribute values
- We need algorithms that
- are robust to noisy data,
- can cope with and accurately fill-in missing
values,
- scale linearly to the size of data graph (E),
- can incrementally update their results.
?
?
?
7We want a mixed membership model with
hierarchical representation
- A mixed membership model accounts for different
mixtures of identities (i.e., groups) for each
entity over all possible identities that are
common for the whole graph. - We want to learn both the groups of the
population but also all the existing mixtures of
them.
- Existing approaches assume each entity has a
single identity based on all the relations that
it participates in.
- A model with hierarchical representation can
express more complex tree-like patterns (e.g., an
organizational hierarchy).
Hierarchical Representation
Mixed Membership
8We extend nonparametric Bayesian approaches to
model each relationship
- Our prior is defined by the Chinese Restaurant
FranchiseY.W. Teh, et al, JASA 2006.
- The likelihood of a link Rmij depends only on
the identities of the participating entities for
that link ?i,m , ?j,m.
- Properties
- Number of clusters can be learned from the data
- It has a few more parameters, ?i, but also has
higher expressivity
- Inference with Gibbs sampling can be based on the
conditionals above
ni,t of customers already at table t in
restaurant i ti,m table assignment for customer
m at restaurant i sd of tables already eatin
g dish k di,t dish assignment for table t in re
staurant i M of tables
9We extend nonparametric Bayesian approaches to
model each relationship
- Posterior ? Likelihood Prior
- Pr(?V RV) ? Pr(RV ?V) Pr(?V) ( ?m Pr(Rij
?i,m , ?j,m) ) Pr(?V)
- Advantages
- Models mixed memberships
- Automatically infers an adequate model size from
the data
- Produces a set of possible groupings/configuratio
ns
- Each grouping has its own posterior
distribution
- Can use posterior distributions to learn about
the structure of the observations
Monk Data Set Sampson 1969 (O Outcasts, Y You
ng Turks, L Loyal Opposition and W Waverer)
Averaged over samples from the posterior
10We can handle relational observations between two
domains
- Two domains animals and features
- Animals form two groups birds and 4-legged
mammals
11Can we make accurate predictions about missing
links?
Results based on maximum likelihood configuration
- Eagle hunts is not small.
- Dog does not hunt.
- Cat does not run.
12Our models are slightly more accurate than IRMs
- Full-sample log-score (LSFS) averaged over all
observable links Ri,j
- M-H Chen, Q-M Shao, and JG Ibrahim. Monte Carlo
Methods in Bayesian Computation. Springer-Verlag,
January 2000.
- A higher LSFS score denotes a more accurate model
13We learn hierarchies with a nonparametric prior
on trees
- We want to simultaneously find mixture of
identities and order them in a tree.
- Each hierarchy node is a different
group/identity.
- The most general/common groups are at the top of
the tree.
- The most specific/idiosyncratic groups are at the
bottom of the tree.
- We use the nested hierarchical Chinese Restaurant
Process as our prior.
- The user specifies the number of levels in the
tree, which specifies the degree of detail that
he/she wants.
14We use our nonparametric Bayesian model to find
hierarchies of groups
- 43 liberals, 49 conservatives , 13 neutrals
- Links imply frequent co-purchasing by the same
buyers (Amazon.com)
Each hierarchy node is a different group.
most general
most specific
15We use our nonparametric Bayesian approach to
find hierarchies of groups
most general
MIT Reality Mining Proximity Graph
98 people nodes and 1766 links
most specific
Distribution of Titles
Each hierarchy node is a different group.
16We used our nonparametric Bayesian model to find
hierarchies of groups
Enron 159 people 900 emails
most general
most specific
Each hierarchy node is a different group.
17Nonparametric models with flexible priors easily
adapt to relational data
- Relational data contain significant information
about group structure
- Bayesian models allow the analyst to make
inferences about groups of interest while
quantifying the level of confidence, even when a
significant proportion of the data is missing - Our IMMM and hIMMM allow for increased
flexibility and provide additional information
about objects that simultaneously belong to
several groups - There are other approaches to structure
discovery
- Compression/MDL based
- Graph-theoretic based
18Compression-based approaches to structure
discovery rely on MDL
- Compressed form of the adjacency matrix
- Minimize total encoding cost
- Chakrabarti, PKDD04
- J. Sun, et al. KDD07
MIT Reality Mining Proximity Data
pi1 ni1 / (ni1 ni0)
Cost of describing ni1, ni0 and groups
Si
(ni1ni0) H(pi1)
Si
Code Cost
Description Cost
19Graph-theoretic methods find groups based on
higher-than-average density of links within them
Zacharys Karate Club, 1977
- Maximize modularity M.E.J. Newman
- Difference between total fraction of links
within communities and fraction of expected
links placed at random
- Generates dendrograms
- Find eigenvalues and eigenvectors of the
modularity matrix
Shades depict value of elements in leading
eigenvector of the modularity matrix
20Validation in structure discovery is often
anecdotal ?
- Validation in unsupervised approaches is hard
since no oracle exists
- Probabilistic approaches have an advantage here
- Just remove some links Ri,j from the adjacency
matrix (a.k.a. held-out testset)
- Calculate p(Ri,j R, I) using the posterior
predictive distribution, which can be easily
computed by averaging over the posterior samples
- Compare outcome with held-out testset to generate
the confusion matrix and its derivatives
21The future hybrid methods for efficient
inference structure discovery in temporal
relational data
- We need efficient inference procedures for
non-parametric Bayesian relational models.
- Need to go beyond Markov chain Monte Carlo
methods and variational inference
- Should consider hybrid algorithms that
incorporate graph-theoretic and/or
compression-based approaches
- We need models with nonparametric priors that
capture temporal sensitivity.
- Want to reformulate the model so that it learns
joint distributions over groups and time steps
- Start with dependent Dirichlet processes
Griffin and M.F.J. Steel 2006
22Thank you
- Contact information
- eliassi_at_llnl.gov
- http//www.llnl.gov/comp/bio.php/eliassirad1
- This is joint work with P.S. Steve
Koutsourelakis at Cornell University (formerly at
LLNL).