Probabilistic Models of Relational Data - PowerPoint PPT Presentation

About This Presentation

Title:

Probabilistic Models of Relational Data

Description:

The real world is composed of objects that have properties ... Bruce Willis. Harrison Ford. Steven Seagal. Kurt Russell. Kevin Costner. Jean-Claude Van Damme ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 66

Provided by: btas4

Learn more at: http://robotics.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Models of Relational Data

1
Probabilistic Models of Relational Data

Daphne Koller
Stanford University
Joint work with

Ben Taskar
Lise Getoor
Eran Segal
Pieter Abbeel
Ming-Fai Wong
Avi Pfeffer
Nir Friedman
2
Why Relational?

The real world is composed of objects that have
properties and are related to each other
Natural language is all about objects and how
they relate to each other
George got an A in Geography 101

3
Attribute-Based Worlds
Smart students get As in easy classes

World assignment of values to attributes /
truth values to propositional symbols

4
Object-Relational Worlds
?x,y(Smart(x) Easy(y) Take(x,y)
? Grade(A,x,y))

World relational interpretation
Objects in the domain
Properties of these objects
Relations (links) between objects

5
Why Probabilities?

All universals are false
Smart students get As in easy classes
True universals are rarely useful
Smart students get either A, B, C, D, or F

(almost)
The actual science of logic is conversant at
present only with things either certain,
impossible, or entirely doubtful
Therefore the true logic for this world is the
calculus of probabilities
James Clerk Maxwell
6
Probable Worlds

Probabilistic semantics
A set of possible worlds
Each world associated with a probability

hard smart A
hard weak A
easy smart A
easy weak A
hard smart B
hard weak B
easy smart B
easy weak B
hard smart C
hard weak C
easy smart C
easy weak C
7
Representation Design Axes
n-gram models HMMs Prob. CFGs
Bayesian nets Markov nets
First-order logic Relational databases
Propositional logic CSPs
Automata Grammars
Attributes
Objects
Sequences
World state
8
Outline

Bayesian Networks
Representation Semantics
Reasoning
Probabilistic Relational Models
Collective Classification
Undirected discriminative models
Collective Classification Revisited
PRMs for NLP

9
Bayesian Networks
Difficulty
Intelligence
Grade
nodes variables edges direct influence
SAT
Letter
Graph structure encodes independence
assumptions Letter conditionally independent
of Intelligence given Grade
10
BN semantics
conditional independencies in BN structure
local probability models
full joint distribution over domain

Compact natural representation
nodes have ? k parents ?? 2kn vs. 2n params
parameters natural and easy to elicit

11
Reasoning using BNs
Probability theory is nothing but common sense
reduced to calculation. Pierre Simon Laplace
Difficulty
Intelligence
Grade
SAT
Letter
Letter
SAT
Full joint distribution specifies answer to any
query P(variable evidence about others)
12
BN Inference

BN Inference is NP-hard
Structure can use graph structure
Graph separation ? conditional independence
Do separate inference in parts
Results combined over interface.

Complexity exponential in largest separator
Structured BNs allow effective inference
Exact inference in dense BNs is intractable

13
Approximate BN Inference

Belief propagation is an iterative message
passing algorithm for approximate inference in
BNs
Each iteration (until convergence)
Nodes pass beliefs as messages to neighboring
nodes
Cons
Limited theoretical guarantees
Might not converge
Pros
Linear time per iteration
Works very well in practice, even for dense
networks

14
Outline

Bayesian Networks
Probabilistic Relational Models
Language Semantics
Web of Influence
Collective Classification
Undirected discriminative models
Collective Classification Revisited
PRMs for NLP

15
Bayesian Networks Problem

Bayesian nets use propositional representation
Real world has objects, related to each other

Intelligence
Difficulty
These instances are not independent
A
C
Grade
16
Probabilistic Relational Models

Combine advantages of relational logic BNs
Natural domain modeling objects, properties,
relations
Generalization over a variety of situations
Compact, natural probability models
Integrate uncertainty with relational model
Properties of domain entities can depend on
properties of related entities
Uncertainty over relational structure of domain

17
St. Nordaf University
Prof. Smith
Prof. Jones
Teaches
Teaches
Grade
In-course
Registered
Satisfac
George
Grade
Registered
Satisfac
In-course
Grade
Registered
Jane
Satisfac
In-course
18
Relational Schema

Specifies types of objects in domain, attributes
of each type of object types of relations
between objects

Classes
Student
Professor
Intelligence
Teaching-Ability
Teach
Take
Attributes
Relations
In
Course
Difficulty
19
Probabilistic Relational Models

Universals Probabilistic patterns hold for all
objects in class
Locality Represent direct probabilistic
dependencies
Links define potential interactions

K. Pfeffer Poole Ngo Haddawy
20
PRM Semantics

Instantiated PRM ?BN
variables attributes of all objects
dependencies determined by
links PRM

Prof. Smith
Prof. Jones
George
Jane
21
The Web of Influence
easy / hard
low / high
22
Outline

Bayesian Networks
Probabilistic Relational Models
Collective Classification Clustering
Learning models from data
Collective classification of webpages
Undirected discriminative models
Collective Classification Revisited
PRMs for NLP

23
Learning PRMs
D
Relational Database
Learner
Expert knowledge
Friedman, Getoor, K., Pfeffer
24
Learning PRMs

Parameter estimation
Probabilistic model with shared parameters
Grades for all students share same model
Can use standard techniques for max-likelihood or
Bayesian parameter estimation
Structure learning
Define scoring function over structures
Use combinatorial search to find high-scoring
structure

25
Web ? KB
Craven et al.
26
Web Classification Experiments

WebKB dataset
Four CS department websites
Bag of words on each page
Links between pages
Anchor text for links
Experimental setup
Trained on three universities
Tested on fourth
Repeated for all four combinations

27
Standard Classification
Categories faculty course project student other
Professor department extract information computer
science machine learning
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
words only
28
Exploiting Links
Page
working with Tom Mitchell
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
words only
link words
29
Collective Classification
To-
Exists
Classify all pages collectively, maximizing the
joint label probability
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Approx. inference belief propagation
Getoor, Segal, Taskar, Koller
words only
link words
collective
30
Learning w. Missing Data EM
Dempster et al. 77
low / high
easy / hard
31
Discovering Hidden Types
Internet Movie Database http//www.imdb.com
32
Discovering Hidden Types
Type
Type
Type
Taskar, Segal, Koller
33
Discovering Hidden Types
34
Outline

Bayesian Networks
Probabilistic Relational Models
Collective Classification Clustering
Undirected Discriminative Models
Markov Networks
Relational Markov Networks
Collective Classification Revisited
PRMs for NLP

35
Directed Models Limitations
Solution Undirected Models

Acyclicity constraint limits expressive power
Two objects linked to by a student probably not
both professors

Allow arbitrary patterns over sets of objects
links

Acyclicity forces modeling of all potential
links
Network size O(N2)
Inference is quadratic

Influence flows over existing links, exploiting
link graph sparsity
Network size O(N)

Generative training
Train to fit all of data, not to maximize accuracy

Allow discriminative training
Max P (labels observations)

Lafferty, McCallum, Pereira
36
Markov Networks
Eve
Alice
Betty
Chris
Dave
Graph structure encodes independence
assumptions Chris conditionally independent of
Eve given Alice Dave
37
Relational Markov Networks

Universals Probabilistic patterns hold for all
groups of objects
Locality Represent local probabilistic
dependencies
Sets of links give us possible interactions

Taskar, Abbeel, Koller 02
38
RMN Semantics

Instantiated RMN ? MN
variables attributes of all objects
dependencies determined by links RMN

Geo Study Group
George
Welcome to
CS101
CS Study Group
Jane
Jill
39
Outline

Bayesian Networks
Probabilistic Relational Models
Collective Classification Clustering
Undirected Discriminative Models
Collective Classification Revisited
Discriminative training of RMNs
Webpage classification
Link prediction
PRMs for NLP

40
Learning RMNs

Parameter estimation is not closed form
Convex problem ? unique global maximum

Maximize
P(Grades,IntelligenceDifficulty)
L log
low / high
easy / hard
ABC
Grade
Difficulty
Intelligence
Grade
Intelligence
Grade
Difficulty
Intelligence
Grade
41
Flat Models
Logistic Regression
P(CategoryWords)
42
Exploiting Links
42.1 relative reduction in error relative to
generative approach
43
More Complex Structure
44
Collective Classification Results
35.4 relative reduction in error relative to
strong flat approach
45
Scalability
Training
Classification

WebKB data set size
1300 entities
180K attributes
5800 links
Network size / school
Directed model
200,000 variables
360,000 edges
Undirected model
40,000 variables
44,000 edges
Difference in training time decreases
substantially when
some training data is unobserved
want to model with hidden variables

Directed models
Undirected models
46
Predicting Relationships
Tom Mitchell Professor
WebKB Project
Sean Slattery Student

Even more interesting are the relationships
between objects
e.g., verbs are almost always relationships

47
Flat Model
To-
Page
Page
From-
...
...
Word1
WordN
Word1
WordN
Type
Rel
NONE advisor instructor TA member project-of
...
LinkWordN
LinkWord1
48
Flat Model
49
Collective Classification Links
To-
Page
Page
From-
Category
Category
...
...
Word1
WordN
Word1
WordN
Type
Rel
...
LinkWordN
LinkWord1
50
Link Model
51
Triad Model
Advisor
Professor
Student
Member
Member
Group
52
Triad Model
Advisor
Professor
Student
TA
Instructor
Course
53
Triad Model
54
WebKB

Four new department web sites
Berkeley, CMU, MIT, Stanford
Labeled page type (8 types)
faculty, student, research scientist, staff,
research group, research project, course,
organization
Labeled hyperlinks and virtual links (6 types)
advisor, instructor, TA, member, project-of, NONE
Data set size
11K pages
110K links
2million words

55
Link Prediction Results
72.9 relative reduction in error relative to
strong flat approach

Error measured over links predicted to be present
Link presence cutoff is at precision/recall
break-even point (?30 for all models)

56
Summary

PRMs inherit key advantages of probabilistic
graphical models
Coherent probabilistic semantics
Exploit structure of local interactions
Relational models inherently more expressive
Web of influence use all available information
to reach powerful conclusions
Exploit both relational information and power of
probabilistic reasoning

57
Outline

Bayesian Networks
Probabilistic Relational Models
Collective Classification Clustering
Undirected Discriminative Models
Collective Classification Revisited
PRMs for NLP
Word-Sense Disambiguation
Relation Extraction
Natural Language Understanding (?)

or Why Should I Care?
An outsiders perspective
58
Word Sense Disambiguation
Her advisor gave her feedback about the draft.

Neighboring words alone may not provide enough
information to disambiguate
We can gain insight by considering compatibility
between senses of related words

59
Collective Disambiguation
Her advisor gave her feedback about the draft.
Can we infer grammatical structure and
disambiguate word senses simultaneously rather
than sequentially?

Objects words in text
Attributes sense, gender, number, pos,
Links
Grammatical relations (subject-object,
modifier,)
Close semantic relations (is-a, cause-of, )
Same word in different sentences
(one-sense-per-discourse)
Compatibility parameters
Learned from tagged data
Based on prior knowledge (e.g., WordNet, FrameNet)

Can we integrate inter-word relationships
directly into our probabilistic model?
60
Relation Extraction
ACMEs board of directors began a search for a
new CEO after the departure of current CEO, James
Jackson, following allegations of creative
accounting practices at ACME. 6/01
In an attempt to improve the companys image,
ACME is considering former judge Mary Miller for
the job. 7/01
As her first act in her new position, Miller
announced that ACME will be doing a stock
buyback. 9/01
Candidate
Departs
Hired??
Of
Made
Concerns
61
Understanding Language
Professor Sarah met Jane. She explained the hole
in her proof.
Most likely interpretation
N1
Student Jane
Professor Sarah
62
Resolving Ambiguity
Professor Sarah met Jane. She explained the hole
in her proof.

Professors often meet with students
Jane is probably a student
Professors like to explain
She is probably Prof. Sarah

Attribute values
Link types
Object identity
Probabilistic reasoning about objects, their
attributes, and the relationships between them
Goldman Charniak, Pasula Russell
63
Acquiring Semantic Models

Statistical NLP reveals patterns
Standard models learn patterns at word level
But word-patterns are only implicit surrogates
for underlying semantic patterns
Teacher objects tend to participate in certain
relationships
Can use this pattern for objects not explicitly
labeled as a teacher

train
be
hire
3
24
3
pay
1.5
teacher
1.4
fire
0.3
serenade
64
Competing Approaches
Complementary Approaches
Semantic Understanding
Scaling Up (via learning)
Noise Ambiguity
Desiderata
Logical
Statistical
PRMs
65
Statistics from Words to Semantics