Learning Structure from Relational Observations - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Learning Structure from Relational Observations

Description:

As time goes by Judo has reached many different countries, regions, societies and cultures. ... General and Judo specifc methods of training were adopted ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 23

Provided by: kathytur

Category:

more less

Transcript and Presenter's Notes

Title: Learning Structure from Relational Observations

1
Learning Structure from Relational Observations
Tina Eliassi-RadComputation DirectorateLawrence
Livermore National Laboratory
Work performed under the auspices of the U.S.
Department of Energy by University of California
Lawrence Livermore National Laboratory under
Contract W-7405-Eng-48.
2
Relational observations are ubiquitous

Relational observations
Can be binary, integer, or real-valued
Can be of different types and have observable
attributes
Can contain objects of different types with their
own (observable) attributes
Are often represented as dynamic heterogeneous
networks

3
Structure in observations can be represented in
various forms
Zacharys Karate Club, 1977
Based on maximum likelihood configuration
posterior
likelihood
prior
Averaged over samples from the posterior
4
Why do we want to find hidden structure?

Discover groups of objects (a.k.a. communities)
Predict missing links and attributes
Understand the structures within the
observations

Results based on maximum likelihood configuration
5
We use a non-parametric Bayesian approach to find
hidden structure in graphs
Given an adjacency matrix where Ri,j 0 or 1
(observables), find the hidden/latent group
assignment ?i.
Probabilistic - Bayesian Formulation
Non-parametric models can automatically infer an
adequate model complexity from data, without
needing to explicitly do Bayesian model
comparison.
6
Relational observations tend to be large, noisy,
and missing some attribute values

We need algorithms that
are robust to noisy data,
can cope with and accurately fill-in missing
values,
scale linearly to the size of data graph (E),
can incrementally update their results.

?
?
?
7
We want a mixed membership model with
hierarchical representation

A mixed membership model accounts for different
mixtures of identities (i.e., groups) for each
entity over all possible identities that are
common for the whole graph.
We want to learn both the groups of the
population but also all the existing mixtures of
them.
Existing approaches assume each entity has a
single identity based on all the relations that
it participates in.
A model with hierarchical representation can
express more complex tree-like patterns (e.g., an
organizational hierarchy).

Hierarchical Representation
Mixed Membership
8
We extend nonparametric Bayesian approaches to
model each relationship

Our prior is defined by the Chinese Restaurant
FranchiseY.W. Teh, et al, JASA 2006.
The likelihood of a link Rmij depends only on
the identities of the participating entities for
that link ?i,m , ?j,m.
Properties
Number of clusters can be learned from the data
It has a few more parameters, ?i, but also has
higher expressivity
Inference with Gibbs sampling can be based on the
conditionals above

ni,t of customers already at table t in
restaurant i ti,m table assignment for customer
m at restaurant i sd of tables already eatin
g dish k di,t dish assignment for table t in re
staurant i M of tables
9
We extend nonparametric Bayesian approaches to
model each relationship

Posterior ? Likelihood Prior
Pr(?V RV) ? Pr(RV ?V) Pr(?V) ( ?m Pr(Rij
?i,m , ?j,m) ) Pr(?V)
Advantages
Models mixed memberships
Automatically infers an adequate model size from
the data
Produces a set of possible groupings/configuratio
ns
Each grouping has its own posterior
distribution
Can use posterior distributions to learn about
the structure of the observations

Monk Data Set Sampson 1969 (O Outcasts, Y You
ng Turks, L Loyal Opposition and W Waverer)
Averaged over samples from the posterior
10
We can handle relational observations between two
domains

Two domains animals and features
Animals form two groups birds and 4-legged
mammals

11
Can we make accurate predictions about missing
links?
Results based on maximum likelihood configuration

Eagle hunts is not small.
Dog does not hunt.
Cat does not run.

12
Our models are slightly more accurate than IRMs

Full-sample log-score (LSFS) averaged over all
observable links Ri,j
M-H Chen, Q-M Shao, and JG Ibrahim. Monte Carlo
Methods in Bayesian Computation. Springer-Verlag,
January 2000.
A higher LSFS score denotes a more accurate model

13
We learn hierarchies with a nonparametric prior
on trees

We want to simultaneously find mixture of
identities and order them in a tree.
Each hierarchy node is a different
group/identity.
The most general/common groups are at the top of
the tree.
The most specific/idiosyncratic groups are at the
bottom of the tree.
We use the nested hierarchical Chinese Restaurant
Process as our prior.
The user specifies the number of levels in the
tree, which specifies the degree of detail that
he/she wants.

14
We use our nonparametric Bayesian model to find
hierarchies of groups

43 liberals, 49 conservatives , 13 neutrals
Links imply frequent co-purchasing by the same
buyers (Amazon.com)

Each hierarchy node is a different group.
most general
most specific
15
We use our nonparametric Bayesian approach to
find hierarchies of groups
most general
MIT Reality Mining Proximity Graph
98 people nodes and 1766 links
most specific
Distribution of Titles
Each hierarchy node is a different group.
16
We used our nonparametric Bayesian model to find
hierarchies of groups
Enron 159 people 900 emails
most general
most specific
Each hierarchy node is a different group.
17
Nonparametric models with flexible priors easily
adapt to relational data

Relational data contain significant information
about group structure
Bayesian models allow the analyst to make
inferences about groups of interest while
quantifying the level of confidence, even when a
significant proportion of the data is missing
Our IMMM and hIMMM allow for increased
flexibility and provide additional information
about objects that simultaneously belong to
several groups
There are other approaches to structure
discovery
Compression/MDL based
Graph-theoretic based

18
Compression-based approaches to structure
discovery rely on MDL

Compressed form of the adjacency matrix
Minimize total encoding cost
Chakrabarti, PKDD04
J. Sun, et al. KDD07

MIT Reality Mining Proximity Data
pi1 ni1 / (ni1 ni0)
Cost of describing ni1, ni0 and groups
Si
(ni1ni0) H(pi1)
Si
Code Cost
Description Cost
19
Graph-theoretic methods find groups based on
higher-than-average density of links within them
Zacharys Karate Club, 1977

Maximize modularity M.E.J. Newman
Difference between total fraction of links
within communities and fraction of expected
links placed at random
Generates dendrograms
Find eigenvalues and eigenvectors of the
modularity matrix

Shades depict value of elements in leading
eigenvector of the modularity matrix
20
Validation in structure discovery is often
anecdotal ?

Validation in unsupervised approaches is hard
since no oracle exists
Probabilistic approaches have an advantage here
Just remove some links Ri,j from the adjacency
matrix (a.k.a. held-out testset)
Calculate p(Ri,j R, I) using the posterior
predictive distribution, which can be easily
computed by averaging over the posterior samples
Compare outcome with held-out testset to generate
the confusion matrix and its derivatives

21
The future hybrid methods for efficient
inference structure discovery in temporal
relational data

We need efficient inference procedures for
non-parametric Bayesian relational models.
Need to go beyond Markov chain Monte Carlo
methods and variational inference
Should consider hybrid algorithms that
incorporate graph-theoretic and/or
compression-based approaches
We need models with nonparametric priors that
capture temporal sensitivity.
Want to reformulate the model so that it learns
joint distributions over groups and time steps
Start with dependent Dirichlet processes
Griffin and M.F.J. Steel 2006

22
Thank you

Contact information
eliassi_at_llnl.gov
http//www.llnl.gov/comp/bio.php/eliassirad1
This is joint work with P.S. Steve
Koutsourelakis at Cornell University (formerly at
LLNL).

Write a Comment

User Comments (0)