Learning Structure from Relational Observations - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Learning Structure from Relational Observations

Description:

As time goes by Judo has reached many different countries, regions, societies and cultures. ... General and Judo specifc methods of training were adopted ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 23
Provided by: kathytur
Category:

less

Transcript and Presenter's Notes

Title: Learning Structure from Relational Observations


1
Learning Structure from Relational Observations
Tina Eliassi-RadComputation DirectorateLawrence
Livermore National Laboratory
Work performed under the auspices of the U.S.
Department of Energy by University of California
Lawrence Livermore National Laboratory under
Contract W-7405-Eng-48.
2
Relational observations are ubiquitous
  • Relational observations
  • Can be binary, integer, or real-valued
  • Can be of different types and have observable
    attributes
  • Can contain objects of different types with their
    own (observable) attributes
  • Are often represented as dynamic heterogeneous
    networks

3
Structure in observations can be represented in
various forms
Zacharys Karate Club, 1977
Based on maximum likelihood configuration
posterior
likelihood
prior
Averaged over samples from the posterior
4
Why do we want to find hidden structure?
  • Discover groups of objects (a.k.a. communities)
  • Predict missing links and attributes
  • Understand the structures within the
    observations

Results based on maximum likelihood configuration
5
We use a non-parametric Bayesian approach to find
hidden structure in graphs
Given an adjacency matrix where Ri,j 0 or 1
(observables), find the hidden/latent group
assignment ?i.
Probabilistic - Bayesian Formulation
Non-parametric models can automatically infer an
adequate model complexity from data, without
needing to explicitly do Bayesian model
comparison.
6
Relational observations tend to be large, noisy,
and missing some attribute values
  • We need algorithms that
  • are robust to noisy data,
  • can cope with and accurately fill-in missing
    values,
  • scale linearly to the size of data graph (E),
  • can incrementally update their results.

?
?
?
7
We want a mixed membership model with
hierarchical representation
  • A mixed membership model accounts for different
    mixtures of identities (i.e., groups) for each
    entity over all possible identities that are
    common for the whole graph.
  • We want to learn both the groups of the
    population but also all the existing mixtures of
    them.
  • Existing approaches assume each entity has a
    single identity based on all the relations that
    it participates in.
  • A model with hierarchical representation can
    express more complex tree-like patterns (e.g., an
    organizational hierarchy).

Hierarchical Representation
Mixed Membership
8
We extend nonparametric Bayesian approaches to
model each relationship
  • Our prior is defined by the Chinese Restaurant
    FranchiseY.W. Teh, et al, JASA 2006.
  • The likelihood of a link Rmij depends only on
    the identities of the participating entities for
    that link ?i,m , ?j,m.
  • Properties
  • Number of clusters can be learned from the data
  • It has a few more parameters, ?i, but also has
    higher expressivity
  • Inference with Gibbs sampling can be based on the
    conditionals above

ni,t of customers already at table t in
restaurant i ti,m table assignment for customer
m at restaurant i sd of tables already eatin
g dish k di,t dish assignment for table t in re
staurant i M of tables
9
We extend nonparametric Bayesian approaches to
model each relationship
  • Posterior ? Likelihood Prior
  • Pr(?V RV) ? Pr(RV ?V) Pr(?V) ( ?m Pr(Rij
    ?i,m , ?j,m) ) Pr(?V)
  • Advantages
  • Models mixed memberships
  • Automatically infers an adequate model size from
    the data
  • Produces a set of possible groupings/configuratio
    ns
  • Each grouping has its own posterior
    distribution
  • Can use posterior distributions to learn about
    the structure of the observations

Monk Data Set Sampson 1969 (O Outcasts, Y You
ng Turks, L Loyal Opposition and W Waverer)
Averaged over samples from the posterior
10
We can handle relational observations between two
domains
  • Two domains animals and features
  • Animals form two groups birds and 4-legged
    mammals

11
Can we make accurate predictions about missing
links?
Results based on maximum likelihood configuration
  • Eagle hunts is not small.
  • Dog does not hunt.
  • Cat does not run.

12
Our models are slightly more accurate than IRMs
  • Full-sample log-score (LSFS) averaged over all
    observable links Ri,j
  • M-H Chen, Q-M Shao, and JG Ibrahim. Monte Carlo
    Methods in Bayesian Computation. Springer-Verlag,
    January 2000.
  • A higher LSFS score denotes a more accurate model

13
We learn hierarchies with a nonparametric prior
on trees
  • We want to simultaneously find mixture of
    identities and order them in a tree.
  • Each hierarchy node is a different
    group/identity.
  • The most general/common groups are at the top of
    the tree.
  • The most specific/idiosyncratic groups are at the
    bottom of the tree.
  • We use the nested hierarchical Chinese Restaurant
    Process as our prior.
  • The user specifies the number of levels in the
    tree, which specifies the degree of detail that
    he/she wants.

14
We use our nonparametric Bayesian model to find
hierarchies of groups
  • 43 liberals, 49 conservatives , 13 neutrals
  • Links imply frequent co-purchasing by the same
    buyers (Amazon.com)

Each hierarchy node is a different group.
most general
most specific
15
We use our nonparametric Bayesian approach to
find hierarchies of groups
most general
MIT Reality Mining Proximity Graph
98 people nodes and 1766 links
most specific
Distribution of Titles
Each hierarchy node is a different group.
16
We used our nonparametric Bayesian model to find
hierarchies of groups
Enron 159 people 900 emails
most general
most specific
Each hierarchy node is a different group.
17
Nonparametric models with flexible priors easily
adapt to relational data
  • Relational data contain significant information
    about group structure
  • Bayesian models allow the analyst to make
    inferences about groups of interest while
    quantifying the level of confidence, even when a
    significant proportion of the data is missing
  • Our IMMM and hIMMM allow for increased
    flexibility and provide additional information
    about objects that simultaneously belong to
    several groups
  • There are other approaches to structure
    discovery
  • Compression/MDL based
  • Graph-theoretic based

18
Compression-based approaches to structure
discovery rely on MDL
  • Compressed form of the adjacency matrix
  • Minimize total encoding cost
  • Chakrabarti, PKDD04
  • J. Sun, et al. KDD07

MIT Reality Mining Proximity Data
pi1 ni1 / (ni1 ni0)
Cost of describing ni1, ni0 and groups
Si
(ni1ni0) H(pi1)
Si
Code Cost
Description Cost
19
Graph-theoretic methods find groups based on
higher-than-average density of links within them
Zacharys Karate Club, 1977
  • Maximize modularity M.E.J. Newman
  • Difference between total fraction of links
    within communities and fraction of expected
    links placed at random
  • Generates dendrograms
  • Find eigenvalues and eigenvectors of the
    modularity matrix

Shades depict value of elements in leading
eigenvector of the modularity matrix
20
Validation in structure discovery is often
anecdotal ?
  • Validation in unsupervised approaches is hard
    since no oracle exists
  • Probabilistic approaches have an advantage here
  • Just remove some links Ri,j from the adjacency
    matrix (a.k.a. held-out testset)
  • Calculate p(Ri,j R, I) using the posterior
    predictive distribution, which can be easily
    computed by averaging over the posterior samples
  • Compare outcome with held-out testset to generate
    the confusion matrix and its derivatives

21
The future hybrid methods for efficient
inference structure discovery in temporal
relational data
  • We need efficient inference procedures for
    non-parametric Bayesian relational models.
  • Need to go beyond Markov chain Monte Carlo
    methods and variational inference
  • Should consider hybrid algorithms that
    incorporate graph-theoretic and/or
    compression-based approaches
  • We need models with nonparametric priors that
    capture temporal sensitivity.
  • Want to reformulate the model so that it learns
    joint distributions over groups and time steps
  • Start with dependent Dirichlet processes
    Griffin and M.F.J. Steel 2006

22
Thank you
  • Contact information
  • eliassi_at_llnl.gov
  • http//www.llnl.gov/comp/bio.php/eliassirad1
  • This is joint work with P.S. Steve
    Koutsourelakis at Cornell University (formerly at
    LLNL).
Write a Comment
User Comments (0)
About PowerShow.com