Markov Logic: A Representation Language for Natural Language Semantics PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Markov Logic: A Representation Language for Natural Language Semantics


1
Markov LogicA Representation Language
forNatural Language Semantics
  • Pedro Domingos
  • Dept. Computer Science Eng.
  • University of Washington
  • (Based on joint work with Stanley Kok,Matt
    Richardson and Parag Singla)

2
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

3
Motivation
  • Natural language is characterized by
  • Complex relational structure
  • High uncertainty (ambiguity, imperfect knowledge)
  • First-order logic handles relational structure
  • Probability handles uncertainty
  • Lets combine the two

4
Markov LogicRichardson Domingos, 2006
  • Syntax First-order logic Weights
  • Semantics Templates for Markov nets
  • Inference Weighted satisfiability MCMC
  • Learning Voted perceptron ILP

5
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

6
Markov Networks
  • Undirected graphical models

B
A
D
C
  • Potential functions defined over cliques

7
Markov Networks
  • Undirected graphical models

B
A
D
C
  • Potential functions defined over cliques

Weight of Feature i
Feature i
8
First-Order Logic
  • Constants, variables, functions, predicatesE.g.
    Anna, X, mother_of(X), friends(X, Y)
  • Grounding Replace all variables by
    constantsE.g. friends (Anna, Bob)
  • World (model, interpretation)Assignment of
    truth values to all ground predicates

9
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

10
Markov Logic Networks
  • A logical KB is a set of hard constraintson the
    set of possible worlds
  • Lets make them soft constraintsWhen a world
    violates a formula,It becomes less probable, not
    impossible
  • Give each formula a weight(Higher weight ?
    Stronger constraint)

11
Definition
  • A Markov Logic Network (MLN) is a set of pairs
    (F, w) where
  • F is a formula in first-order logic
  • w is a real number
  • Together with a set of constants,it defines a
    Markov network with
  • One node for each grounding of each predicate in
    the MLN
  • One feature for each grounding of each formula F
    in the MLN, with the corresponding weight w

12
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Smokes(A)
Smokes(B)
Cancer(A)
Cancer(B)
13
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
14
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
15
Example Friends Smokers
Suppose we have two constants Anna (A) and Bob
(B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
16
More on MLNs
  • MLN is template for ground Markov nets
  • Typed variables and constants greatly reduce size
    of ground Markov net
  • Functions, existential quantifiers, etc.
  • MLN without variables Markov network(subsumes
    graphical models)

17
Relation to First-Order Logic
  • Infinite weights ? First-order logic
  • Satisfiable KB, positive weights ? Satisfying
    assignments Modes of distribution
  • MLNs allow contradictions between formulas

18
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

19
MPE/MAP Inference
  • Find most likely truth values of non-evidence
    ground atoms given evidence
  • Apply weighted satisfiability solver(maxes sum
    of weights of satisfied clauses)
  • MaxWalkSat algorithm Kautz et al., 1997
  • Start with random truth assignment
  • With prob p, flip atom that maxes weight
    sumelse flip random atom in unsatisfied clause
  • Repeat n times
  • Restart m times

20
Conditional Inference
  • P(FormulaMLN,C) ?
  • MCMC Sample worlds, check formula holds
  • P(Formula1Formula2,MLN,C) ?
  • If Formula2 Conjunction of ground atoms
  • First construct min subset of network necessary
    to answer query (generalization of KBMC)
  • Then apply MCMC (or other)

21
Ground Network Construction
  • Initialize Markov net to contain all query preds
  • For each node in network
  • Add nodes Markov blanket to network
  • Remove any evidence nodes
  • Repeat until done

22
Probabilistic Inference
  • Recall
  • Exact inference is P-complete
  • Conditioning on Markov blanket is easy
  • Gibbs sampling exploits this

23
Markov Chain Monte Carlo
  • Gibbs Sampler
  • 1. Start with an initial assignment to nodes
  • 2. One node at a time, sample node given
    others
  • 3. Repeat
  • 4. Use samples to compute P(X)
  • Apply to ground network
  • Initialization MaxWalkSat
  • Can use multiple chains

24
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

25
Learning
  • Data is a relational database
  • Closed world assumption (if not EM)
  • Learning parameters (weights)
  • Generatively Pseudo-likelihood
  • Discriminatively Voted perceptron MaxWalkSat
  • Learning structure
  • Generalization of feature induction in Markov
    nets
  • Learn and/or modify clauses
  • Inductive logic programming with
    pseudo-likelihood as the objective function

26
Generative Weight Learning
  • Maximize likelihood (or posterior)
  • Use gradient ascent
  • Requires inference at each step (slow!)

Feature count according to data
Feature count according to model
27
Pseudo-Likelihood Besag, 1975
  • Likelihood of each variable given its Markov
    blanket in the data
  • Does not require inference at each step
  • Widely used

28
Optimization
  • Parameter tying over groundings of same clause
  • Maximize using L-BFGS Liu Nocedal, 1989

where nsati(xv) is the number of satisfied
groundingsof clause i in the training data when
x takes value v
  • Most terms not affected by changes in weights
  • After initial setup, each iteration takesO(
    ground predicates x first-order clauses)

29
Discriminative Weight Learning
Gradient of Conditional Log Likelihood
true groundings of formula in DB
Expected of true groundings slow!
Approximate expected count by MAP count
30
Voted PerceptronCollins, 2002
  • Used for discriminative training of HMMs
  • Expected count in gradient approximated by count
    in MAP state
  • MAP state found using Viterbi algorithm
  • Weights averaged over all iterations
  • initialize wi0
  • for t1 to T do
  • find the MAP configuration using
    Viterbi
  • ?wi, ? (training count MAP
    count)
  • end for

31
Voted Perceptron for MLNsSingla Domingos,
2004
  • HMM is special case of MLN
  • Expected count in gradient approximated by count
    in MAP state
  • MAP state found using MaxWalkSat
  • Weights averaged over all iterations
  • initialize wi0
  • for t1 to T do
  • find the MAP configuration using
    MaxWalkSat
  • ?wi, ? (training count MAP
    count)
  • end for

32
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

33
Applications to Date
  • Entity resolution (Cora, BibServ)
  • Information extraction for biology(won LLL-2005
    competition)
  • Probabilistic Cyc
  • Link prediction
  • Topic propagation in scientific communities
  • Etc.

34
Entity Resolution
  • Most logical systems make unique names assumption
  • What if we dont?
  • Equality predicate Same(A,B), or A B
  • Equality axioms
  • Reflexivity, symmetry, transitivity
  • For every unary predicate P x1 x2 gt (P(x1)
    ltgt P(x2))
  • For every binary predicate R x1 x2 ? y1 y2
    gt (R(x1,y1) ltgt R(x2,y2))
  • Etc.
  • But in Markov logic these are soft and learnable
  • Can also introduce reverse directionR(x1,y1) ?
    R(x2,y2) ? x1 x2 gt y1 y2
  • Surprisingly, this is all thats needed

35
Example Citation Matching
36
Markov Logic Formulation Predicates
  • Are two bibliography records the
    same?SameBib(b1,b2)
  • Are two field values the same?SameAuthor(a1,a2)S
    ameTitle(t1,t2)SameVenue(v1,v2)
  • How similar are two field strings?Predicates for
    ranges of cosine TF-IDF scoreTitleTFIDF.0(t1,t2)
    is true iff TF-IDF(t1,t2)0TitleTFIDF.2(a1,a2)
    is true iff 0 ltTF-IDF(a1,a2) lt 0.2Etc.

37
Markov Logic Formulation Formulas
  • Unit clauses (defaults)! SameBib(b1,b2)
  • Two fields are same gt Corresponding bib. records
    are sameAuthor(b1,a1) ? Author(b2,a2) ?
    SameAuthor(a1,a2) gt SameBib(b1,b2)
  • Two bib. records are same gt Corresponding fields
    are sameAuthor(b1,a1) ? Author(b2,a2) ?
    SameBib(b1,b2) gt SameAuthor(a1,a2)
  • High similarity score gt Two fields are
    sameTitleTFIDF.8(t1,t2) gtSameTitle(t1,t2)
  • Transitive closure (not incorporated in
    experiments)SameBib(b1,b2) ? SameBib(b2,b3) gt
    SameBib(b1,b3)
  • 25 predicates, 46 first-order clauses

38
What Does This Buy You?
  • Objects are matched collectively
  • Multiple types matched simultaneously
  • Constraints are soft, and strengths can be
    learned from data
  • Easy to add further knowledge
  • Constraints can be refined from data
  • Standard approach still embedded

39
Example
Record Title Author Venue
B1 Object Identification using CRFs Linda Stewart PKDD 04
B2 Object Identification using CRFs Linda Stewart 8th PKDD
B3 Learning Boolean Formulas Bill Johnson PKDD 04
B4 Learning of Boolean Formulas William Johnson 8th PKDD
Subset of a Bibliography Database
40
Standard Approach Fellegi Sunter, 1969
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author

record-match node
field-similarity node (evidence node)
41
Whats Missing?
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
Sim(PKDD 04, 8th PKDD)
Sim(PKDD 04, 8th PKDD)
Venue
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
If from b1b2, you infer that PKDD 04 is same
as 8th PKDD, how can you use that to help
figure out if b3b4?
42
Merging the Evidence Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b3b4 ?
b1b2 ?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Author
Still does not solve the problem. Why?
43
Introducing Field-Match Nodes
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
field-match node
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
Full representation in Collective Model
44
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
45
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
46
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
47
Flow of Information
Title
Title
Sim(Object Identification using CRF, Object
Identification using CRF)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
48
Flow of Information
Title
Title
Sim(Object Identification using CRFs, Object
Identification using CRFs)
Sim(Learning Boolean Formulas, Learning of
Boolean Expressions)
b1b2 ?
b3b4 ?
b1.Tb2.T?
b3.Tb4.T?
b1.Vb2.V? b3.Vb4.V?
b3.Ab4.A?
b1.Ab2.A?
Sim(PKDD 04, 8th PKDD)
Venue
Sim(Linda Stewart, Linda Stewart)
Sim(Bill Johnson, William Johnson)
Author
Author
49
Experiments
  • Databases
  • Cora McCallum et al., IRJ, 20001295 records,
    132 papers
  • BibServ.org Richardson Domingos,
    ISWC-0321,805 records, unknown papers
  • Goal De-duplicate bib.records, authors and
    venues
  • Pre-processing Form canopies McCallum et al,
    KDD-00
  • Compared with naïve Bayes (standard method), etc.
  • Measured area under precision-recall curve (AUC)
  • Our approach wins across the board

50
ResultsMatching Venues on Cora
51
Overview
  • Motivation
  • Background
  • Representation
  • Inference
  • Learning
  • Applications
  • Discussion

52
Relation to Other Approaches
Representation Logical language Probabilistic language
Markov logic First-order logic Markov nets
RMNs Conjunctive queries Markov nets
PRMs Frame systems Bayes nets
KBMC Horn clauses Bayes nets
SLPs Horn clauses Bayes nets
53
Going Further
  • First-order logic is not enough
  • We can Markovize other representationsin the
    same way
  • Lots to do ?

54
Summary
  • NLP involves relational structure, uncertainty
  • Markov logic combines first-order logic and
    probabilistic graphical models
  • Syntax First-order logic Weights
  • Semantics Templates for Markov networks
  • Inference MaxWalkSat KBMC MCMC
  • Learning Voted perceptron PL ILP
  • Applications to date Entity resolution, IE, etc.
  • Software Alchemyhttp//www.cs.washington.edu/ai/
    alchemy
Write a Comment
User Comments (0)
About PowerShow.com