Title: Markov Logic: A Representation Language for Natural Language Semantics
1Markov LogicA Representation Language
forNatural Language Semantics
- Pedro Domingos
- Dept. Computer Science Eng.
- University of Washington
- (Based on joint work with Stanley Kok,Matt
Richardson and Parag Singla)
2Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
3Motivation
- Natural language is characterized by
- Complex relational structure
- High uncertainty (ambiguity, imperfect knowledge)
- First-order logic handles relational structure
- Probability handles uncertainty
- Lets combine the two
4Markov LogicRichardson Domingos, 2006
- Syntax First-order logic Weights
- Semantics Templates for Markov nets
- Inference Weighted satisfiability MCMC
- Learning Voted perceptron ILP
5Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
6Markov Networks
- Undirected graphical models
B
A
D
C
- Potential functions defined over cliques
7Markov Networks
- Undirected graphical models
B
A
D
C
- Potential functions defined over cliques
Weight of Feature i
Feature i
8First-Order Logic
- Constants, variables, functions, predicatesE.g.
Anna, X, mother_of(X), friends(X, Y) - Grounding Replace all variables by
constantsE.g. friends (Anna, Bob) - World (model, interpretation)Assignment of
truth values to all ground predicates
9Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
10Markov Logic Networks
- A logical KB is a set of hard constraintson the
set of possible worlds - Lets make them soft constraintsWhen a world
violates a formula,It becomes less probable, not
impossible - Give each formula a weight(Higher weight ?
Stronger constraint)
11Definition
- A Markov Logic Network (MLN) is a set of pairs
(F, w) where - F is a formula in first-order logic
- w is a real number
- Together with a set of constants,it defines a
Markov network with - One node for each grounding of each predicate in
the MLN - One feature for each grounding of each formula F
in the MLN, with the corresponding weight w
12Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Smokes(A)
Smokes(B)
Cancer(A)
Cancer(B)
13Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
14Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
15Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
16More on MLNs
- MLN is template for ground Markov nets
- Typed variables and constants greatly reduce size
of ground Markov net - Functions, existential quantifiers, etc.
- MLN without variables Markov network(subsumes
graphical models)
17Relation to First-Order Logic
- Infinite weights ? First-order logic
- Satisfiable KB, positive weights ? Satisfying
assignments Modes of distribution - MLNs allow contradictions between formulas
18Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
19MPE/MAP Inference
- Find most likely truth values of non-evidence
ground atoms given evidence - Apply weighted satisfiability solver(maxes sum
of weights of satisfied clauses) - MaxWalkSat algorithm Kautz et al., 1997
- Start with random truth assignment
- With prob p, flip atom that maxes weight
sumelse flip random atom in unsatisfied clause - Repeat n times
- Restart m times
20Conditional Inference
- P(FormulaMLN,C) ?
- MCMC Sample worlds, check formula holds
- P(Formula1Formula2,MLN,C) ?
- If Formula2 Conjunction of ground atoms
- First construct min subset of network necessary
to answer query (generalization of KBMC) - Then apply MCMC (or other)
21Ground Network Construction
- Initialize Markov net to contain all query preds
- For each node in network
- Add nodes Markov blanket to network
- Remove any evidence nodes
- Repeat until done
22Probabilistic Inference
- Recall
- Exact inference is P-complete
- Conditioning on Markov blanket is easy
- Gibbs sampling exploits this
23Markov Chain Monte Carlo
- Gibbs Sampler
- 1. Start with an initial assignment to nodes
- 2. One node at a time, sample node given
others - 3. Repeat
- 4. Use samples to compute P(X)
- Apply to ground network
- Initialization MaxWalkSat
- Can use multiple chains
24Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
25Learning
- Data is a relational database
- Closed world assumption (if not EM)
- Learning parameters (weights)
- Generatively Pseudo-likelihood
- Discriminatively Voted perceptron MaxWalkSat
- Learning structure
- Generalization of feature induction in Markov
nets - Learn and/or modify clauses
- Inductive logic programming with
pseudo-likelihood as the objective function
26Generative Weight Learning
- Maximize likelihood (or posterior)
- Use gradient ascent
- Requires inference at each step (slow!)
Feature count according to data
Feature count according to model
27Pseudo-Likelihood Besag, 1975
- Likelihood of each variable given its Markov
blanket in the data - Does not require inference at each step
- Widely used
28Optimization
- Parameter tying over groundings of same clause
- Maximize using L-BFGS Liu Nocedal, 1989
where nsati(xv) is the number of satisfied
groundingsof clause i in the training data when
x takes value v
- Most terms not affected by changes in weights
- After initial setup, each iteration takesO(
ground predicates x first-order clauses)
29Discriminative Weight Learning
Gradient of Conditional Log Likelihood
true groundings of formula in DB
Expected of true groundings slow!
Approximate expected count by MAP count
30Voted PerceptronCollins, 2002
- Used for discriminative training of HMMs
- Expected count in gradient approximated by count
in MAP state - MAP state found using Viterbi algorithm
- Weights averaged over all iterations
- initialize wi0
- for t1 to T do
- find the MAP configuration using
Viterbi - ?wi, ? (training count MAP
count) - end for
31Voted Perceptron for MLNsSingla Domingos,
2004
- HMM is special case of MLN
- Expected count in gradient approximated by count
in MAP state - MAP state found using MaxWalkSat
- Weights averaged over all iterations
- initialize wi0
- for t1 to T do
- find the MAP configuration using
MaxWalkSat - ?wi, ? (training count MAP
count) - end for
32Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
33Applications to Date
- Entity resolution (Cora, BibServ)
- Information extraction for biology(won LLL-2005
competition) - Probabilistic Cyc
- Link prediction
- Topic propagation in scientific communities
- Etc.
34Entity Resolution
- Most logical systems make unique names assumption
- What if we dont?
- Equality predicate Same(A,B), or A B
- Equality axioms
- Reflexivity, symmetry, transitivity
- For every unary predicate P x1 x2 gt (P(x1)
ltgt P(x2)) - For every binary predicate R x1 x2 ? y1 y2
gt (R(x1,y1) ltgt R(x2,y2)) - Etc.
- But in Markov logic these are soft and learnable
- Can also introduce reverse directionR(x1,y1) ?
R(x2,y2) ? x1 x2 gt y1 y2 - Surprisingly, this is all thats needed
35Example Citation Matching
36Markov Logic Formulation Predicates
- Are two bibliography records the
same?SameBib(b1,b2) - Are two field values the same?SameAuthor(a1,a2)S
ameTitle(t1,t2)SameVenue(v1,v2) - How similar are two field strings?Predicates for
ranges of cosine TF-IDF scoreTitleTFIDF.0(t1,t2)
is true iff TF-IDF(t1,t2)0TitleTFIDF.2(a1,a2)
is true iff 0 ltTF-IDF(a1,a2) lt 0.2Etc.
37Markov Logic Formulation Formulas
- Unit clauses (defaults)! SameBib(b1,b2)
- Two fields are same gt Corresponding bib. records
are sameAuthor(b1,a1) ? Author(b2,a2) ?
SameAuthor(a1,a2) gt SameBib(b1,b2) - Two bib. records are same gt Corresponding fields
are sameAuthor(b1,a1) ? Author(b2,a2) ?
SameBib(b1,b2) gt SameAuthor(a1,a2) - High similarity score gt Two fields are
sameTitleTFIDF.8(t1,t2) gtSameTitle(t1,t2) - Transitive closure (not incorporated in
experiments)SameBib(b1,b2) ? SameBib(b2,b3) gt
SameBib(b1,b3) - 25 predicates, 46 first-order clauses
38What Does This Buy You?
- Objects are matched collectively
- Multiple types matched simultaneously
- Constraints are soft, and strengths can be
learned from data - Easy to add further knowledge
- Constraints can be refined from data
- Standard approach still embedded
39Example
Record Title Author Venue
B1 Object Identification using CRFs Linda Stewart PKDD 04
B2 Object Identification using CRFs Linda Stewart 8th PKDD
B3 Learning Boolean Formulas Bill Johnson PKDD 04
B4 Learning of Boolean Formulas William Johnson 8th PKDD
Subset of a Bibliography Database
40Standard Approach Fellegi Sunter, 1969
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
record-match node
field-similarity node (evidence node)
41Whats Missing?
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
If from b1b2, you infer that PKDD 04 is same
as 8th PKDD, how can you use that to help
figure out if b3b4?
42Merging the Evidence Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b3b4 ?
b1b2 ?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Author
Still does not solve the problem. Why?
43Introducing Field-Match Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
field-match node
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Full representation in Collective Model
44Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
45Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
46Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
47Flow of Information
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
48Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
49Experiments
- Databases
- Cora McCallum et al., IRJ, 20001295 records,
132 papers - BibServ.org Richardson Domingos,
ISWC-0321,805 records, unknown papers - Goal De-duplicate bib.records, authors and
venues - Pre-processing Form canopies McCallum et al,
KDD-00 - Compared with naïve Bayes (standard method), etc.
- Measured area under precision-recall curve (AUC)
- Our approach wins across the board
50ResultsMatching Venues on Cora
51Overview
- Motivation
- Background
- Representation
- Inference
- Learning
- Applications
- Discussion
52Relation to Other Approaches
Representation Logical language Probabilistic language
Markov logic First-order logic Markov nets
RMNs Conjunctive queries Markov nets
PRMs Frame systems Bayes nets
KBMC Horn clauses Bayes nets
SLPs Horn clauses Bayes nets
53Going Further
- First-order logic is not enough
- We can Markovize other representationsin the
same way - Lots to do ?
54Summary
- NLP involves relational structure, uncertainty
- Markov logic combines first-order logic and
probabilistic graphical models - Syntax First-order logic Weights
- Semantics Templates for Markov networks
- Inference MaxWalkSat KBMC MCMC
- Learning Voted perceptron PL ILP
- Applications to date Entity resolution, IE, etc.
- Software Alchemyhttp//www.cs.washington.edu/ai/
alchemy