Probabilistic Models of Relational Data - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic Models of Relational Data

Description:

The real world is composed of objects that have properties ... Bruce Willis. Harrison Ford. Steven Seagal. Kurt Russell. Kevin Costner. Jean-Claude Van Damme ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 66
Provided by: btas4
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Models of Relational Data


1
Probabilistic Models of Relational Data
  • Daphne Koller
  • Stanford University
  • Joint work with

Ben Taskar
Lise Getoor
Eran Segal
Pieter Abbeel
Ming-Fai Wong
Avi Pfeffer
Nir Friedman
2
Why Relational?
  • The real world is composed of objects that have
    properties and are related to each other
  • Natural language is all about objects and how
    they relate to each other
  • George got an A in Geography 101

3
Attribute-Based Worlds
Smart students get As in easy classes
  • World assignment of values to attributes /
    truth values to propositional symbols

4
Object-Relational Worlds
?x,y(Smart(x) Easy(y) Take(x,y)
? Grade(A,x,y))
  • World relational interpretation
  • Objects in the domain
  • Properties of these objects
  • Relations (links) between objects

5
Why Probabilities?
  • All universals are false
  • Smart students get As in easy classes
  • True universals are rarely useful
  • Smart students get either A, B, C, D, or F

(almost)
The actual science of logic is conversant at
present only with things either certain,
impossible, or entirely doubtful
Therefore the true logic for this world is the
calculus of probabilities
James Clerk Maxwell
6
Probable Worlds
  • Probabilistic semantics
  • A set of possible worlds
  • Each world associated with a probability

hard smart A
hard weak A
easy smart A
easy weak A
hard smart B
hard weak B
easy smart B
easy weak B
hard smart C
hard weak C
easy smart C
easy weak C
7
Representation Design Axes
n-gram models HMMs Prob. CFGs
Bayesian nets Markov nets
First-order logic Relational databases
Propositional logic CSPs
Automata Grammars
Attributes
Objects
Sequences
World state
8
Outline
  • Bayesian Networks
  • Representation Semantics
  • Reasoning
  • Probabilistic Relational Models
  • Collective Classification
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP

9
Bayesian Networks
Difficulty
Intelligence
Grade
nodes variables edges direct influence
SAT
Letter
Graph structure encodes independence
assumptions Letter conditionally independent
of Intelligence given Grade
10
BN semantics
conditional independencies in BN structure
local probability models
full joint distribution over domain

  • Compact natural representation
  • nodes have ? k parents ?? 2kn vs. 2n params
  • parameters natural and easy to elicit

11
Reasoning using BNs
Probability theory is nothing but common sense
reduced to calculation. Pierre Simon Laplace
Difficulty
Intelligence
Grade
SAT
Letter
Letter
SAT
Full joint distribution specifies answer to any
query P(variable evidence about others)
12
BN Inference
  • BN Inference is NP-hard
  • Structure can use graph structure
  • Graph separation ? conditional independence
  • Do separate inference in parts
  • Results combined over interface.
  • Complexity exponential in largest separator
  • Structured BNs allow effective inference
  • Exact inference in dense BNs is intractable

13
Approximate BN Inference
  • Belief propagation is an iterative message
    passing algorithm for approximate inference in
    BNs
  • Each iteration (until convergence)
  • Nodes pass beliefs as messages to neighboring
    nodes
  • Cons
  • Limited theoretical guarantees
  • Might not converge
  • Pros
  • Linear time per iteration
  • Works very well in practice, even for dense
    networks

14
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Language Semantics
  • Web of Influence
  • Collective Classification
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP

15
Bayesian Networks Problem
  • Bayesian nets use propositional representation
  • Real world has objects, related to each other

Intelligence
Difficulty
These instances are not independent
A
C
Grade
16
Probabilistic Relational Models
  • Combine advantages of relational logic BNs
  • Natural domain modeling objects, properties,
    relations
  • Generalization over a variety of situations
  • Compact, natural probability models
  • Integrate uncertainty with relational model
  • Properties of domain entities can depend on
    properties of related entities
  • Uncertainty over relational structure of domain

17
St. Nordaf University
Prof. Smith
Prof. Jones
Teaches
Teaches
Grade
In-course
Registered
Satisfac
George
Grade
Registered
Satisfac
In-course
Grade
Registered
Jane
Satisfac
In-course
18
Relational Schema
  • Specifies types of objects in domain, attributes
    of each type of object types of relations
    between objects

Classes
Student
Professor
Intelligence
Teaching-Ability
Teach
Take
Attributes
Relations
In
Course
Difficulty
19
Probabilistic Relational Models
  • Universals Probabilistic patterns hold for all
    objects in class
  • Locality Represent direct probabilistic
    dependencies
  • Links define potential interactions

K. Pfeffer Poole Ngo Haddawy
20
PRM Semantics
  • Instantiated PRM ?BN
  • variables attributes of all objects
  • dependencies determined by
  • links PRM

Prof. Smith
Prof. Jones
George
Jane
21
The Web of Influence
easy / hard
low / high
22
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification Clustering
  • Learning models from data
  • Collective classification of webpages
  • Undirected discriminative models
  • Collective Classification Revisited
  • PRMs for NLP

23
Learning PRMs
D
Relational Database
Learner
Expert knowledge
Friedman, Getoor, K., Pfeffer
24
Learning PRMs
  • Parameter estimation
  • Probabilistic model with shared parameters
  • Grades for all students share same model
  • Can use standard techniques for max-likelihood or
    Bayesian parameter estimation
  • Structure learning
  • Define scoring function over structures
  • Use combinatorial search to find high-scoring
    structure

25
Web ? KB
Craven et al.
26
Web Classification Experiments
  • WebKB dataset
  • Four CS department websites
  • Bag of words on each page
  • Links between pages
  • Anchor text for links
  • Experimental setup
  • Trained on three universities
  • Tested on fourth
  • Repeated for all four combinations

27
Standard Classification
Categories faculty course project student other
Professor department extract information computer
science machine learning
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
words only
28
Exploiting Links
Page
working with Tom Mitchell
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
words only
link words
29
Collective Classification
To-
Exists
Classify all pages collectively, maximizing the
joint label probability
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Approx. inference belief propagation
Getoor, Segal, Taskar, Koller
words only
link words
collective
30
Learning w. Missing Data EM
Dempster et al. 77
low / high
easy / hard
31
Discovering Hidden Types
Internet Movie Database http//www.imdb.com
32
Discovering Hidden Types
Type
Type
Type
Taskar, Segal, Koller
33
Discovering Hidden Types
34
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification Clustering
  • Undirected Discriminative Models
  • Markov Networks
  • Relational Markov Networks
  • Collective Classification Revisited
  • PRMs for NLP

35
Directed Models Limitations
Solution Undirected Models
  • Acyclicity constraint limits expressive power
  • Two objects linked to by a student probably not
    both professors
  • Allow arbitrary patterns over sets of objects
    links
  • Acyclicity forces modeling of all potential
    links
  • Network size O(N2)
  • Inference is quadratic
  • Influence flows over existing links, exploiting
    link graph sparsity
  • Network size O(N)
  • Generative training
  • Train to fit all of data, not to maximize accuracy
  • Allow discriminative training
  • Max P (labels observations)

Lafferty, McCallum, Pereira
36
Markov Networks
Eve
Alice
Betty
Chris
Dave
Graph structure encodes independence
assumptions Chris conditionally independent of
Eve given Alice Dave
37
Relational Markov Networks
  • Universals Probabilistic patterns hold for all
    groups of objects
  • Locality Represent local probabilistic
    dependencies
  • Sets of links give us possible interactions

Taskar, Abbeel, Koller 02
38
RMN Semantics
  • Instantiated RMN ? MN
  • variables attributes of all objects
  • dependencies determined by links RMN

Geo Study Group
George
Welcome to
CS101
CS Study Group
Jane
Jill
39
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification Clustering
  • Undirected Discriminative Models
  • Collective Classification Revisited
  • Discriminative training of RMNs
  • Webpage classification
  • Link prediction
  • PRMs for NLP

40
Learning RMNs
  • Parameter estimation is not closed form
  • Convex problem ? unique global maximum

Maximize
P(Grades,IntelligenceDifficulty)
L log
low / high
easy / hard
ABC
Grade
Difficulty
Intelligence
Grade
Intelligence
Grade
Difficulty
Intelligence
Grade
41
Flat Models
Logistic Regression
P(CategoryWords)
42
Exploiting Links
42.1 relative reduction in error relative to
generative approach
43
More Complex Structure
44
Collective Classification Results
35.4 relative reduction in error relative to
strong flat approach
45
Scalability
Training
Classification
  • WebKB data set size
  • 1300 entities
  • 180K attributes
  • 5800 links
  • Network size / school
  • Directed model
  • 200,000 variables
  • 360,000 edges
  • Undirected model
  • 40,000 variables
  • 44,000 edges
  • Difference in training time decreases
    substantially when
  • some training data is unobserved
  • want to model with hidden variables

Directed models
Undirected models
46
Predicting Relationships
Tom Mitchell Professor
WebKB Project
Sean Slattery Student
  • Even more interesting are the relationships
    between objects
  • e.g., verbs are almost always relationships

47
Flat Model
To-
Page
Page
From-
...
...
Word1
WordN
Word1
WordN
Type
Rel
NONE advisor instructor TA member project-of
...
LinkWordN
LinkWord1
48
Flat Model
49
Collective Classification Links
To-
Page
Page
From-
Category
Category
...
...
Word1
WordN
Word1
WordN
Type
Rel
...
LinkWordN
LinkWord1
50
Link Model
51
Triad Model
Advisor
Professor
Student
Member
Member
Group
52
Triad Model
Advisor
Professor
Student
TA
Instructor
Course
53
Triad Model
54
WebKB
  • Four new department web sites
  • Berkeley, CMU, MIT, Stanford
  • Labeled page type (8 types)
  • faculty, student, research scientist, staff,
    research group, research project, course,
    organization
  • Labeled hyperlinks and virtual links (6 types)
  • advisor, instructor, TA, member, project-of, NONE
  • Data set size
  • 11K pages
  • 110K links
  • 2million words

55
Link Prediction Results
72.9 relative reduction in error relative to
strong flat approach
  • Error measured over links predicted to be present
  • Link presence cutoff is at precision/recall
    break-even point (?30 for all models)

56
Summary
  • PRMs inherit key advantages of probabilistic
    graphical models
  • Coherent probabilistic semantics
  • Exploit structure of local interactions
  • Relational models inherently more expressive
  • Web of influence use all available information
    to reach powerful conclusions
  • Exploit both relational information and power of
    probabilistic reasoning

57
Outline
  • Bayesian Networks
  • Probabilistic Relational Models
  • Collective Classification Clustering
  • Undirected Discriminative Models
  • Collective Classification Revisited
  • PRMs for NLP
  • Word-Sense Disambiguation
  • Relation Extraction
  • Natural Language Understanding (?)

or Why Should I Care?
An outsiders perspective
58
Word Sense Disambiguation
Her advisor gave her feedback about the draft.
  • Neighboring words alone may not provide enough
    information to disambiguate
  • We can gain insight by considering compatibility
    between senses of related words

59
Collective Disambiguation
Her advisor gave her feedback about the draft.
Can we infer grammatical structure and
disambiguate word senses simultaneously rather
than sequentially?
  • Objects words in text
  • Attributes sense, gender, number, pos,
  • Links
  • Grammatical relations (subject-object,
    modifier,)
  • Close semantic relations (is-a, cause-of, )
  • Same word in different sentences
    (one-sense-per-discourse)
  • Compatibility parameters
  • Learned from tagged data
  • Based on prior knowledge (e.g., WordNet, FrameNet)

Can we integrate inter-word relationships
directly into our probabilistic model?
60
Relation Extraction
ACMEs board of directors began a search for a
new CEO after the departure of current CEO, James
Jackson, following allegations of creative
accounting practices at ACME. 6/01
In an attempt to improve the companys image,
ACME is considering former judge Mary Miller for
the job. 7/01
As her first act in her new position, Miller
announced that ACME will be doing a stock
buyback. 9/01
Candidate
Departs
Hired??
Of
Made
Concerns
61
Understanding Language
Professor Sarah met Jane. She explained the hole
in her proof.
Most likely interpretation
N1
Student Jane
Professor Sarah
62
Resolving Ambiguity
Professor Sarah met Jane. She explained the hole
in her proof.
  • Professors often meet with students
  • Jane is probably a student
  • Professors like to explain
  • She is probably Prof. Sarah

Attribute values
Link types
Object identity
Probabilistic reasoning about objects, their
attributes, and the relationships between them
Goldman Charniak, Pasula Russell
63
Acquiring Semantic Models
  • Statistical NLP reveals patterns
  • Standard models learn patterns at word level
  • But word-patterns are only implicit surrogates
    for underlying semantic patterns
  • Teacher objects tend to participate in certain
    relationships
  • Can use this pattern for objects not explicitly
    labeled as a teacher

train
be
hire
3
24
3
pay
1.5
teacher
1.4
fire
0.3
serenade
64
Competing Approaches
Complementary Approaches
Semantic Understanding
Scaling Up (via learning)
Noise Ambiguity
Desiderata
Logical
Statistical
PRMs
65
Statistics from Words to Semantics
  • Represent statistical patterns at semantic level
  • What types of objects participate in what types
    of relationships
  • Learn statistical models of semantics from text
  • Reason using the models to obtain global semantic
    understanding of the text

Georgia OKeefe Ladder to the Moon
Write a Comment
User Comments (0)
About PowerShow.com