Markov Logic: A Representation Language for Natural Language Semantics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Markov Logic: A Representation Language for Natural Language Semantics

1
Markov LogicA Representation Language
forNatural Language Semantics

Pedro Domingos
Dept. Computer Science Eng.
University of Washington
(Based on joint work with Stanley Kok,Matt
Richardson and Parag Singla)

2
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

3
Motivation

Natural language is characterized by
Complex relational structure
High uncertainty (ambiguity, imperfect knowledge)
First-order logic handles relational structure
Probability handles uncertainty
Lets combine the two

4
Markov LogicRichardson Domingos, 2006

Syntax First-order logic Weights
Semantics Templates for Markov nets
Inference Weighted satisfiability MCMC
Learning Voted perceptron ILP

5
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

6
Markov Networks

Undirected graphical models

B
A
D
C

Potential functions defined over cliques

7
Markov Networks

Undirected graphical models

B
A
D
C

Potential functions defined over cliques

Weight of Feature i
Feature i
8
First-Order Logic

Constants, variables, functions, predicatesE.g.
Anna, X, mother_of(X), friends(X, Y)
Grounding Replace all variables by
constantsE.g. friends (Anna, Bob)
World (model, interpretation)Assignment of
truth values to all ground predicates

9
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

10
Markov Logic Networks

A logical KB is a set of hard constraintson the
set of possible worlds
Lets make them soft constraintsWhen a world
violates a formula,It becomes less probable, not
impossible
Give each formula a weight(Higher weight ?
Stronger constraint)

11
Definition

A Markov Logic Network (MLN) is a set of pairs
(F, w) where
F is a formula in first-order logic
w is a real number
Together with a set of constants,it defines a
Markov network with
One node for each grounding of each predicate in
the MLN
One feature for each grounding of each formula F
in the MLN, with the corresponding weight w

12
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Smokes(A)
Smokes(B)
Cancer(A)
Cancer(B)
13
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
14
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
15
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
16
More on MLNs

MLN is template for ground Markov nets
Typed variables and constants greatly reduce size
of ground Markov net
Functions, existential quantifiers, etc.
MLN without variables Markov network(subsumes
graphical models)

17
Relation to First-Order Logic

Infinite weights ? First-order logic
Satisfiable KB, positive weights ? Satisfying
assignments Modes of distribution
MLNs allow contradictions between formulas

18
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

19
MPE/MAP Inference

Find most likely truth values of non-evidence
ground atoms given evidence
Apply weighted satisfiability solver(maxes sum
of weights of satisfied clauses)
MaxWalkSat algorithm Kautz et al., 1997
Start with random truth assignment
With prob p, flip atom that maxes weight
sumelse flip random atom in unsatisfied clause
Repeat n times
Restart m times

20
Conditional Inference

P(FormulaMLN,C) ?
MCMC Sample worlds, check formula holds
P(Formula1Formula2,MLN,C) ?
If Formula2 Conjunction of ground atoms
First construct min subset of network necessary
to answer query (generalization of KBMC)
Then apply MCMC (or other)

21
Ground Network Construction

Initialize Markov net to contain all query preds
For each node in network
Add nodes Markov blanket to network
Remove any evidence nodes
Repeat until done

22
Probabilistic Inference

Recall
Exact inference is P-complete
Conditioning on Markov blanket is easy
Gibbs sampling exploits this

23
Markov Chain Monte Carlo

Gibbs Sampler
1. Start with an initial assignment to nodes
2. One node at a time, sample node given
others
3. Repeat
4. Use samples to compute P(X)
Apply to ground network
Initialization MaxWalkSat
Can use multiple chains

24
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

25
Learning

Data is a relational database
Closed world assumption (if not EM)
Learning parameters (weights)
Generatively Pseudo-likelihood
Discriminatively Voted perceptron MaxWalkSat
Learning structure
Generalization of feature induction in Markov
nets
Learn and/or modify clauses
Inductive logic programming with
pseudo-likelihood as the objective function

26
Generative Weight Learning

Maximize likelihood (or posterior)
Use gradient ascent
Requires inference at each step (slow!)

Feature count according to data
Feature count according to model
27
Pseudo-Likelihood Besag, 1975

Likelihood of each variable given its Markov
blanket in the data
Does not require inference at each step
Widely used

28
Optimization

Parameter tying over groundings of same clause
Maximize using L-BFGS Liu Nocedal, 1989

where nsati(xv) is the number of satisfied
groundingsof clause i in the training data when
x takes value v

Most terms not affected by changes in weights
After initial setup, each iteration takesO(
ground predicates x first-order clauses)

29
Discriminative Weight Learning
Gradient of Conditional Log Likelihood
true groundings of formula in DB
Expected of true groundings slow!
Approximate expected count by MAP count
30
Voted PerceptronCollins, 2002

Used for discriminative training of HMMs
Expected count in gradient approximated by count
in MAP state
MAP state found using Viterbi algorithm
Weights averaged over all iterations

initialize wi0
for t1 to T do
find the MAP configuration using
Viterbi
?wi, ? (training count MAP
count)
end for

31
Voted Perceptron for MLNsSingla Domingos,
2004

HMM is special case of MLN
Expected count in gradient approximated by count
in MAP state
MAP state found using MaxWalkSat
Weights averaged over all iterations

initialize wi0
for t1 to T do
find the MAP configuration using
MaxWalkSat
?wi, ? (training count MAP
count)
end for

32
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

33
Applications to Date

Entity resolution (Cora, BibServ)
Information extraction for biology(won LLL-2005
competition)
Probabilistic Cyc
Link prediction
Topic propagation in scientific communities
Etc.

34
Entity Resolution

Most logical systems make unique names assumption
What if we dont?
Equality predicate Same(A,B), or A B
Equality axioms
Reflexivity, symmetry, transitivity
For every unary predicate P x1 x2 gt (P(x1)
ltgt P(x2))
For every binary predicate R x1 x2 ? y1 y2
gt (R(x1,y1) ltgt R(x2,y2))
Etc.
But in Markov logic these are soft and learnable
Can also introduce reverse directionR(x1,y1) ?
R(x2,y2) ? x1 x2 gt y1 y2
Surprisingly, this is all thats needed

35
Example Citation Matching
36
Markov Logic Formulation Predicates

Are two bibliography records the
same?SameBib(b1,b2)
Are two field values the same?SameAuthor(a1,a2)S
ameTitle(t1,t2)SameVenue(v1,v2)
How similar are two field strings?Predicates for
ranges of cosine TF-IDF scoreTitleTFIDF.0(t1,t2)
is true iff TF-IDF(t1,t2)0TitleTFIDF.2(a1,a2)
is true iff 0 ltTF-IDF(a1,a2) lt 0.2Etc.

37
Markov Logic Formulation Formulas

Unit clauses (defaults)! SameBib(b1,b2)
Two fields are same gt Corresponding bib. records
are sameAuthor(b1,a1) ? Author(b2,a2) ?
SameAuthor(a1,a2) gt SameBib(b1,b2)
Two bib. records are same gt Corresponding fields
are sameAuthor(b1,a1) ? Author(b2,a2) ?
SameBib(b1,b2) gt SameAuthor(a1,a2)
High similarity score gt Two fields are
sameTitleTFIDF.8(t1,t2) gtSameTitle(t1,t2)
Transitive closure (not incorporated in
experiments)SameBib(b1,b2) ? SameBib(b2,b3) gt
SameBib(b1,b3)
25 predicates, 46 first-order clauses

38
What Does This Buy You?

Objects are matched collectively
Multiple types matched simultaneously
Constraints are soft, and strengths can be
learned from data
Easy to add further knowledge
Constraints can be refined from data
Standard approach still embedded

39
Example
Record Title Author Venue
B1 Object Identification using CRFs Linda Stewart PKDD 04
B2 Object Identification using CRFs Linda Stewart 8th PKDD
B3 Learning Boolean Formulas Bill Johnson PKDD 04
B4 Learning of Boolean Formulas William Johnson 8th PKDD
Subset of a Bibliography Database
40
Standard Approach Fellegi Sunter, 1969
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author

record-match node
field-similarity node (evidence node)
41
Whats Missing?
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
If from b1b2, you infer that PKDD 04 is same
as 8th PKDD, how can you use that to help
figure out if b3b4?
42
Merging the Evidence Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b3b4 ?
b1b2 ?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Author
Still does not solve the problem. Why?
43
Introducing Field-Match Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
field-match node
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Full representation in Collective Model
44
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
45
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
46
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
47
Flow of Information
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
48
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
49
Experiments

Databases
Cora McCallum et al., IRJ, 20001295 records,
132 papers
BibServ.org Richardson Domingos,
ISWC-0321,805 records, unknown papers
Goal De-duplicate bib.records, authors and
venues
Pre-processing Form canopies McCallum et al,
KDD-00
Compared with naïve Bayes (standard method), etc.
Measured area under precision-recall curve (AUC)
Our approach wins across the board

50
ResultsMatching Venues on Cora
51
Overview

Motivation
Background
Representation
Inference
Learning
Applications
Discussion

52
Relation to Other Approaches
Representation Logical language Probabilistic language
Markov logic First-order logic Markov nets
RMNs Conjunctive queries Markov nets
PRMs Frame systems Bayes nets
KBMC Horn clauses Bayes nets
SLPs Horn clauses Bayes nets
53
Going Further

First-order logic is not enough
We can Markovize other representationsin the
same way
Lots to do ?

54
Summary

NLP involves relational structure, uncertainty
Markov logic combines first-order logic and
probabilistic graphical models
Syntax First-order logic Weights
Semantics Templates for Markov networks
Inference MaxWalkSat KBMC MCMC
Learning Voted perceptron PL ILP
Applications to date Entity resolution, IE, etc.
Software Alchemyhttp//www.cs.washington.edu/ai/
alchemy

Write a Comment

User Comments (0)

About PowerShow.com

Markov Logic: A Representation Language for Natural Language Semantics PowerPoint PPT Presentation