Statistical Learning from Relational Data - PowerPoint PPT Presentation

About This Presentation

Title:

Statistical Learning from Relational Data

Description:

Webpages (& the entities they represent), hyperlinks. Social networks ... Topics of linked webpages are correlated. Data instances are not identically distributed: ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 71

Provided by: btas4

Learn more at: http://robotics.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Learning from Relational Data

1
Statistical Learning from Relational Data

Daphne Koller
Stanford University
Joint work with many many people

2
Relational Data is Everywhere

The web
Webpages ( the entities they represent),
hyperlinks
Social networks
People, institutions, friendship links
Biological data
Genes, proteins, interactions, regulation
Bibliometrics
Papers, authors, journals, citations
Corporate databases
Customers, products, transactions

3
Relational Data is Different

Data instances not independent
Topics of linked webpages are correlated
Data instances are not identically distributed
Heterogeneous instances (papers, authors)

No IID assumption ?
This is a good thing ?
4
New Learning Tasks

Collective classification of related instances
Labeling an entire website of related webpages
Relational clustering
Finding coherent clusters in the genome
Link prediction classification
Predicting when two people are likely to be
friends
Pattern detection in network of related objects
Finding groups (research groups, terrorist groups)

5
Probabilistic Models

Uncertainty model
space of possible worlds
probability distribution over this space.
Worlds often defined via a set of state
variables
medical diagnosis diseases, symptoms, findings,
each world an assignment of values to variables
Number of worlds is exponential in of vars
2n if we have n binary variables

6
Outline

Relational Bayesian networks
Relational Markov networks
Collective Classification
Relational clustering

with Avi Pfeffer, Nir Friedman, Lise Getoor
7
Bayesian Networks
Difficulty
Intelligence
Grade
nodes variables edges direct influence
SAT
Job
Graph structure encodes independence
assumptions Letter conditionally independent
of Intelligence given Grade
8
BN semantics
conditional independencies in BN structure
local probability models
full joint distribution over domain

Compact natural representation
nodes have ? k parents ?? 2kn vs. 2n params
parameters natural and easy to elicit

9
Reasoning using BNs
Difficulty
Intelligence
Grade
SAT
Letter
Letter
SAT
Full joint distribution specifies answer to any
query P(variable evidence about others)
10
Bayesian Networks Problem

Bayesian nets use propositional representation
Real world has objects, related to each other

Intelligence
Difficulty
These instances are not independent
A
C
Grade
11
Relational Schema

Specifies types of objects in domain, attributes
of each type of object types of relations
between objects

Classes
Student
Professor
Intelligence
Teaching-Ability
Teach
Take
Attributes
Relations
In
Course
Difficulty
12
St. Nordaf University
World ?
Prof. Smith
Prof. Jones
Teaches
Teaches
Grade
In-course
Registered
Satisfac
George
Grade
Registered
Satisfac
In-course
Welcome to
CS101
Grade
Registered
Jane
Satisfac
In-course
13
Relational Bayesian Networks

Universals Probabilistic patterns hold for all
objects in class
Locality Represent direct probabilistic
dependencies
Links define potential interactions

K. Pfeffer Poole Ngo Haddawy
14
RBN Semantics

Ground model
variables attributes of all objects
dependencies determined by
relational links template model

Prof. Smith
Prof. Jones
George
Welcome to
Welcome to
CS101
CS101
Jane
15
The Web of Influence
easy / hard
low / high
16
Likelihood Function

Likelihood of a BN with shared parameters
Joint likelihood is a product of likelihood terms
One for each attribute X.A and its family
For each X.A, the likelihood function aggregates
counts from all occurrences x.A in world ?

Friedman, Getoor, K., Pfeffer, 1999
17
Likelihood Function Multinomials
Log-likelihood
Sufficient statistics
18
RBN Parameter Estimation

MLE parameters
Bayesian estimation
Prior for each attribute X.A
Posterior uses aggregated sufficient statistics

aggregated sufficient statistics
19
Learning RBN Structure

Define set of legal RBN structures
Ones with legal class dependency graphs
Define scoring function ? Bayesian score
Product of family scores
One for each X.A
Uses aggregated sufficient statistics
Search for high-scoring legal structure

Friedman, Getoor, K., Pfeffer, 1999
20
Learning RBN Structure

All operations done at class level
Dependency structure parents for X.A
Acyclicity checked using class dependency graph
Score computed at class level
Individual objects only contribute to sufficient
statistics
Can be obtained efficiently using standard DB
queries

21
Outline

Relational Bayesian networks
Relational Markov networks
Collective Classification
Relational clustering

with Avi Pfeffer, Nir Friedman, Lise Getoor
with Ben Taskar, Pieter Abbeel
22
Why Undirected Models?

Symmetric, non-causal interactions
E.g., web categories of linked pages are
correlated
Cannot introduce direct edges because of cycles
Patterns involving multiple entities
E.g., web triangle patterns
Directed edges not appropriate
Solution Impose arbitrary direction
Not clear how to parameterize CPD for variables
involved in multiple interactions
Very difficult within a class-based
parameterization

Taskar, Abbeel, K. 2001
23
Markov Networks

A Markov network is an undirected graph over some
set of variables V
Graph associated with a set of potentials ?i
Each potential is factor over subset Vi
Variables in Vi must be a (sub)clique in network

24
Markov Networks
James
Mary
Kyle
Noah
Laura
25
Relational Markov Networks

Universals Probabilistic patterns hold for all
groups of objects
Locality Represent local probabilistic
dependencies
Sets of links give us possible interactions

26
RMN Semantics
Intelligence
Grade
Geo Study Group
George
Grade
Welcome to
CS101
Intelligence
Grade
Jane
CS Study Group
Grade
Intelligence
Jill
27
Outline

Relational Bayesian Networks
Relational Markov Networks
Collective Classification
Discriminative training
Web page classification
Link prediction
Relational clustering

with Ben Taskar, Carlos Guestrin, Ming Fai
Wong, Pieter Abbeel
28
Collective Classification
Probabilistic Relational Model
Training Data
Features ?.x Labels ?.y
Learning
Model Structure
New Data
Conclusions
Inference
Features ?.x
Labels ?.y
Example

Train on one year of student intelligence, course
difficulty, and grades
Given only grades in following year, predict all
students intelligence

29
Learning RMN Parameters
Parameterize potentials as log-linear model
Template potential ?
30
Max Likelihood Estimation
We dont care about the joint distribution
P(?.x, ?.y)
Estimation
Classification
maximizew
argmaxy
?.x ?.y
31
Web ? KB
Craven et al.
32
Web Classification Experiments

WebKB dataset
Four CS department websites
Bag of words on each page
Links between pages
Anchor text for links
Experimental setup
Trained on three universities
Tested on fourth
Repeated for all four combinations

33
Standard Classification
Page
Categories faculty course project student other
Professor department extract information computer
science machine learning
34
Standard Classification
Page
working with Tom Mitchell
Discriminatively trained naïve Markov Logistic
Regression
test set error
4-fold CV Trained on 3 universities Tested on 4th
35
Power of Context
Professor?
Student?
Post-doc?
36
Collective Classification
37
Collective Classification
Classify all pages collectively, maximizing the
joint label probability
test set error
Taskar, Abbeel, K., 2002
38
More Complex Structure
39
More Complex Structure
40
Collective Classification Results
35.4 error reduction over logistic
test set error
Taskar, Abbeel, K., 2002
Logistic
Links
Section
LinkSection
41
Max Conditional Likelihood
We dont care about the conditional
distribution P(?.y ?.x)
Estimation
Classification
maximizew
argmaxy
?.x ?.y
42
Max Margin Estimation
What we really want correct class labels
Quadratic program ?
margin
labeling mistakes in y
Exponentially many constraints ?
Taskar, Guestrin, K., 2003 (see also
Collins, 2002 Hoffman 2003)
43
Max Margin Markov Networks

We use structure of Markov network to provide
equivalent formulation of QP
Exponential only in tree width of network
Complexity max-likelihood classification
Can solve approximately in networks where induced
width is too large
Analogous to loopy belief propagation
Can use kernel-based features!
SVMs meet graphical models

Taskar, Guestrin, K., 2003
44
WebKB Revisited
16.1 relative reduction in error relative to
cond. likelihood RMNs
45
Predicting Relationships
Tom Mitchell Professor
WebKB Project
Sean Slattery Student

Even more interesting relationships between
objects

46
Predicting Relations

Introduce exists/type attribute for each
potential link
Learn discriminative model for this attribute
Collectively predict its value in new world

72.9 error reduction over flat
Page
To-
Page
From-
Category
Category
...
...
Word1
WordN
Word1
WordN
Relation
Exists/ Type
...
LinkWord1
LinkWordN
Taskar, Wong, Abbeel, K., 2003
47
Outline

Relational Bayesian Networks
Relational Markov Networks
Collective Classification
Relational clustering
Movie data
Biological data

with Ben Taskar, Eran Segal
with Eran Segal, Nir Friedman, Aviv Regev, Dana
Peer, Haidong Wang, Micha Shapira, David
Botstein
48
Relational Clustering
Probabilistic Relational Model
Unlabeled Relational Data
Learning
Clustering of instances
Model Structure
Example

Given only students grades, cluster similar
students

49
Learning w. Missing Data EM

EM Algorithm applies essentially unchanged
E-step computes expected sufficient statistics,
aggregated over all objects in class
M-step uses ML (or MAP) parameter estimation
Key difference
In general, the hidden variables are not
independent
Computation of expected sufficient statistics
requires inference over entire network

50
Learning w. Missing Data EM
Dempster et al. 77
low / high
easy / hard
51
Movie Data
Internet Movie Database http//www.imdb.com
52
Discovering Hidden Types
Learn model using EM
Type
Type
Type
Taskar, Segal, K., 2001
53
Discovering Hidden Types
Taskar, Segal, K., 2001
54
Biology 101 Gene Expression
Swi5
DNA
Cells express different subsets of their
genes in different tissues and under different
conditions
55
Gene Expression Microarrays

Measure mRNA level for all genes in one condition
Hundreds of experiments
Highly noisy

Expression of gene i in experiment j
Experiments
Induced
Genes
Repressed
56
Standard Analysis

Cluster genes by similarity of expression
profiles
Manually examine clusters to understand whats
common to genes in cluster

57
General Approach

Expression level is a function of gene properties
and experiment properties
Learn model that best explains the data

Observed properties gene sequence, array
condition,
Hidden properties gene cluster

Assignment to hidden variables (e.g., module
assignment)
Expression level as function of properties

58
Clustering as a PRM
Gene
Experiment
Cluster
ID
Level
Expression
59
Modular Regulation

Learn functional modules
Clusters of genes that are similarly controlled
Learn control program for modules
Expression as function of control genes

60
Module Network PRM
Gene
Experiment
Cluster
Controlk
Control2
Control1
Activity level of control gene in experiment
Level
Expression
Segal, Regev, Peer, Koller, Friedman, 2003
61
Experimental Results

Yeast Stress Data (Gasch et al.)
2355 genes that showed activity
173 experiments (microarrays)
Diverse environmental stress conditions (e.g.
heat shock)
Learned module network with 50 modules
Cluster assignments are hidden variables
Structure of dependency trees unknown
Learned model using structural EM algorithm

Segal et al., Nature Genetics, 2003
62
Biological Evaluation

Find sets of co-regulated genes (regulatory
module)
Find the regulators of each module

46/50
30/50
Segal et al., Nature Genetics, 2003
63
Experimental Results

Hypothesis Regulator X regulates process Y
Experiment Knock out X and rerun the experiment

X
Segal et al., Nature Genetics, 2003
64
Differentially Expressed Genes
Segal et al., Nature Genetics, 2003
65
Biological Experiments Validation

Were the differentially expressed genes predicted
as targets?
Rank modules by enrichment for diff. expressed
genes

Segal et al., Nature Genetics, 2003
66
Biology 102 Pathways

Pathways are sets of genes that act together to
achieve a common function

67
Finding Pathways Attempt I

Use protein-protein interaction data

68
Finding Pathways Attempt I

Use protein-protein interaction data

69
Finding Pathways Attempt I

Use protein-protein interaction data

Problems
Data is very noisy
Structure is lost
Large connected component in interaction graph
(3527/3589 genes)

70
Finding Pathways Attempt II

Use expression microarray clusters

Pathway I

Problems
Expression is only weak indicator of
interaction
Interacting pathways are not separable

Pathway II
71
Finding Pathways Our Approach

Use both types of data to find pathways
Find active interactions using gene expression
Find pathway-related co-expression using
interactions

Pathway I
Pathway III
Pathway II
Pathway IV
Segal, Wang, K., 2003
72
Probabilistic Model
Gene
1
Pathway
...
ExpN
Exp1
Interacts
Expression level in N arrays
protein product interaction
Cluster all genes collectively, maximizing the
joint model likelihood
Compatibility potential
Segal, Wang, K., 2003
73
Capturing Protein Complexes

Independent data set of interacting proteins

400
Our method
350
Standard expression clustering
300

124 complexes covered at 50 for our method
46 complexes covered at 50 for clustering

250
200
Num Complexes
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Segal, Wang, K., 2003
Complex Coverage ()
74
RNAse Complex Pathway
YHR081W RRP40 RRP42 MTR3 RRP45 RRP4 RRP43 DIS3 TRM
7 SKI6 RRP46 CSL4