Title: Natural Language Semantics using Probabilistic Logic
1Natural Language Semantics using Probabilistic
Logic
- Islam Beltagy
- Doctoral Dissertation Proposal
- Supervising Professors Raymond J. Mooney, Katrin
Erk
2- Who is the second president of the US ?
- A John Adams
- Who is the president that came after the first US
president? - A
- Semantic Representation how the meaning of
natural text is represented - Inference how to draw conclusions from that
semantic representation
2
3Objective
- Find a semantic representation that is
- Expressive
- Supports automated inference
- Why ? more NLP applications more effectively
- Question Answering, Automated Grading, Machine
Translation, Summarization
3
4Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
4
5Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
5
6Semantic Representations - Formal Semantics
- Mapping natural language to some formal language
(e.g. first-order logic) Montague, 1970 - John is driving a car
- ?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
patient(y, z) ? car(z) - Pros
- Deep representation Relations, Negations,
Disjunctions, Quantifiers ... - Supports automated inference
- Cons Unable to handle uncertain knowledge. Why
important ? (pickle, cucumber), (cut, slice)
6
7Semantic Representations - Distributional
Semantics
- Similar words and phrases occur in similar
contexts - Use context to represent meaning
- Meanings are vectors in high-dimensional spaces
- Words and phrases similarity measure
- e.g similarity(water, bathtub)
cosine(water, bathtub) - Pros robust probabilistic model that captures
graded notion of similarity. - Cons shallow representation for the semantics
7
8Proposed Semantic Representation
- Proposed semantic representation Probabilistic
Logic - Combines advantages of
- Formal Semantics (expressive automated
inference) - Distributional Semantics (gradedness)
8
9Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
9
10Probabilistic Logic
- Statistical Relational Learning Getoor and
Taskar, 2007 - Combine logical and statistical knowledge
- Provide a mechanism for probabilistic inference
- Use weighted first-order logic rules
- Weighted rules are soft rules (compared to hard
logical constraints) - Compactly encode complex probabilistic graphical
models - Inference P(QE, KB)
- Markov Logic Networks (MLN) Richardson and
Domingos, 2006 - Probabilistic Soft Logic (PSL) Kimmig et al.,
NIPS 2012
10
11Markov Logic NetworksRichardson and Domingos,
2006
- ?x. smoke(x) ? cancer(x) 1.5
- ?x,y. friend(x,y) ? (smoke(x) ?smoke(y)) 1.1
- Two constants Anna (A) and Bob (B)
- P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
11
12Markov Logic NetworksRichardson and Domingos,
2006
- Probability Mass Function (PMF)
- Inference calculate probability of atoms given
evidence set - P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))
the set of all atoms
No. of true groundings of formula i in x
Weight of formula i
Normalization constant
a possible truth assignment
12
13PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012
- Probabilistic logic framework designed with
efficient inference in mind - Atoms have continuous truth values in interval
0,1 (Boolean atoms in MLN) - Lukasiewicz relaxation of AND, OR, NOT
- I(l1 ? l2) max 0, I(l1) I(l2) 1
- I(l1 ? l2) min 1, I(l1) I(l2)
- I(? l1) 1 I(l1)
- Inference linear program (combinatorial counting
problem in MLN)
13
14PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012
- PDF
- Inference Most Probable Explanation (MPE)
- Linear program
Distance to satisfaction of rule r
Weight of formula r
Normalization constant
a possible continuous truth assignment
For all rules
14
15Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
15
16Evaluation Tasks
- Two tasks that require deep semantic
understanding to do well on them - 1) Recognizing Textual Entailment (RTE) Dagan et
al., 2013 - Given two sentences T and H, finding if T
Entails, Contradicts or not related (Neutral) to
H - Entailment T A man is walking through the
woods. - H A man is walking through a wooded
area. - Contradiction T A man is jumping into an empty
pool. - H A man is jumping into a full
pool. - Neutral T A young girl is dancing.
- H A young girl is standing on one
leg.
16
17Evaluation Tasks
- Two tasks that require deep semantic
understanding to do well on them - 2) Semantic Textual Similarity (STS) Agirre et
al., 2012 - Given two sentences S1, S2 , judge their semantic
similarity on a scale from 1 to 5 - S1 A man is playing a guitar.
- S2 A woman is playing the guitar. (score
2.75) - S1 A car is parking.
- S2 A cat is playing. (score 0.00)
17
18Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
18
19System ArchitectureBeltagy et al., SEM 2013
Knowledge Base Construction
T/S1
Parsing
LF1
KB
LF2
H/S2
Task Representation (RTE/STS)
Inference P(QE,KB) (MLN/PSL)
One advantage of using logic Modularity
Result (RTE/STS)
19
20Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
20
21Parsing
- Mapping input sentences to logic form
- Using Boxer, a rule based system on top of a CCG
parser Bos, 2008 - John is driving a car
- ?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
patient(y, z) ? car(z)
21
22Task RepresentationBeltagy et al., SemEval 2014
- Represent all tasks as inferences of the form
P(QE, KB) - RTE
- Two inferences P(HT, KB), P(H?T, KB)
- Use a classifier to map probabilities to RTE
class - STS
- Two inferences P(S1S2, KB), P(S2S1, KB)
- Use regression to map probabilities to overall
similarity score
22
23Domain Closure Assumption (DCA)
- There are no objects in the universe other than
the named constants - Constants need to be explicitly added
- Universal quantifiers do not behave as expected
because of finite domain - e.g. Tweety is a bird and it flies ? All birds fly
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
23
24P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
- E ?x,y. john(x) ? agent(y, x) ?
eat(y) - Skolemized E john(J) ? agent(T, J) ?
eat(T) - Embedded existentials
- E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
- Skolemized E ?x. bird(x) ? agent(f(x), x) ?
fly(f(x)) - Simulate skolem functions
- ?x. bird(x) ? ?y. skolemf(x,y) ? agent(y, x) ?
fly(y) - skolemf (B1, C1), skolemf (B2, C2)
24
25P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
- E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
- Q ?x,y. bird(x) ? agent(y, x) ? fly(y)
(false) - Solution introduce additional evidence bird(B)
- Pragmatically, birds exist (Existence)
- Negated existential
- E ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
- No additional constants needed
25
26P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
- E bird( ) ? agent(F, ) ? fly
(F) - Q ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
(true) - Universal quantifiers work only on the constants
of the given finite domain - Solution add an extra bird( ) to the
domain - If the new bird can be shown to fly, then there
is an explicit universal quantification in E
26
27Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
27
28Knowledge Base Construction
- Represent background knowledge as weighted
inference rules - 1) WordNet rules
- WordNet lexical database of word and their
semantic relations - Synonyms ?x. man(x) ? guy(x) w 8
- Hyponym ?x. car(x) ? vehicle(x) w 8
- Antonyms ?x. tall(x) ? ?short(x) w 8
28
29Knowledge Base Construction
- Represent background knowledge as weighted
inference rules - 2) Distributional rules (on-the-fly rules)
- For all pairs of words (a, b) where a?T/S1,
b?H/S2, generate the rule - ?x. a(x) ? b(x) f(w)
- w cosine(a, b)
- f(w) log(w/(1-w))
29
30Outline
- Introduction
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- RTE using MLNs
- STS using MLNs
- STS using PSL
- Evaluation
- Future work
30
31Inference
- Inference problem P(QE, KB)
- Solve it using MLN and PSL for RTE and STS
- RTE using MLNs
- STS using MLNs
- STS using PSL
31
32Outline
- Introduction
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- RTE using MLNs
- STS using MLNs
- STS using PSL
- Evaluation
- Future work
32
33MLNs for RTE - Query Formula (QF) Beltagy and
Mooney, StarAI 2014
- Alchemy (MLNs implementation) calculates only
probabilities of ground atoms - Inference algorithm supports query formulas
- P(QR) Z(Q U R) / Z(R) Gogate and Domingos,
2011 - Z normalization constant of the probability
distribution - Estimate the partition function Z using
SampleSearch Gogate and Dechter, 2011 - SampleSearch is an algorithm to estimate the
partition function Z of mixed graphical models
(probabilistic and deterministic)
33
34MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014
- MLNs grounding generates very large graphical
models - Q has O(cv) ground clauses
- v number of variables in Q
- c number of constants in the domain
34
35MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014
- Low priors by default, ground atoms have very
low probabilities, unless shown otherwise thought
inference - Example
- E man(M) ? agent(D, M) ? drive(D)
- Priors ?x. man(x) -2, ?x. guy(x) -2,
?x. drive(x) -2 - KB ?x. man(x) ? guy(x) 1.8
- Q ?x,y. guy(x) ? agent(y, x) ? drive(y)
- Ground Atoms man(M), man(D), guy(M), guy(D),
drive(M), drive(D) - Solution a MCW to eliminate unimportant ground
atoms - not reachable from the evidence (evidence
propagation) - Strict version of low priors
- Dramatically reduces size of the problem
35
36Outline
- Introduction
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- RTE using MLNs
- STS using MLNs
- STS using PSL
- Evaluation
- Future work
36
37MLNs for STS Beltagy et al., SEM 2013
- Strict conjunction in Q does not fit STS
- E A man is driving ?x,y. man(x)
? drive(y) ? agent(y, x) - Q A man is driving a bus ?x,y,z. man(x) ?
drive(y) ? agent(y, x) ? bus(z) ? patient(y,z) - Break Q into mini-clauses then combine their
evidences using an averaging combiner Natarajan
et al., 2010 - ?x,y,z. man(x) ? agent(y, x) ? result(x,y,z)
w - ?x,y,z. drive(y) ? agent(y, x) ? result(x,y,z)
w - ?x,y,z. drive(y) ? patient(y, z)? result(x,y,z)
w - ?x,y,z. bus(z) ? patient(y, z)? result(x,y,z)
w
37
38Outline
- Introduction
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- RTE using MLNs
- STS using MLNs
- STS using PSL
- Evaluation
- Future work
38
39PSL for STS Beltagy and Erk and Mooney, ACL
2014
- Similar to MLN, conjunction in PSL does not fit
STS - Replace conjunctions in Q with average
- I(l1 ? ? ln) avg( I(l1), , I(ln))
- Inference
- average is a linear function
- No changes in the optimization problem
- Heuristic grounding (details omitted)
39
40Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Knowledge Base
- Inference
- Future work
40
41Evaluation - Datasets
- SICK (RTE and STS) SemEval 2014
- Sentences Involving Compositional Knowledge
- 10,000 pairs of sentences
- msr-vid (STS) SemEval 2012
- Microsoft video description corpus
- 1,500 pair of short video descriptions
- msr-par (STS) SemEval 2012
- Microsoft paraphrase corpus
- 1,500 pair of long news sentences
41
42Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Knowledge Base
- Inference
- Future work
42
43Evaluation Knowledge Base
- logickb is better than logic and better than
dist - PSL is doing much better than MLN for the STS task
43
44Evaluation Error analysis of RTE
- Our systems accuracy 77.72
- The remaining 22.28 are
- Entailment pairs classified as Neutral 15.32
- Contradiction pairs classified as Neutral 6.12
- Other 0.84
- System precision 98.9, recall 78.56.
- High precision, low recall is the typical
behavior of logic-base systems - Fixes (future work)
- Larger knowledge base
- Fix some limitations in the detection of
contradictions
44
45Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Knowledge Base
- Inference
- Future work
45
46Evaluation Inference (RTE) Beltagy and
Mooney, StarAI 2014
- Dataset SICK (from SemEval 2014)
- Systems compared
- mln using Alchemy out-of-the-box
- mlnqf our algorithm to calculate probability of
query formula - mlnmcw mln with our modified-closed-world
- mlnqfmcw both components
- System Accuracy CPU Time Timeouts(30 min)
- mln 57 2min 27sec 96
- mlnqf 69 1min 51sec 30
- mlnmcw 66 10sec 2.5
- mlnqfmcw 72 7sec 2.1
46
47Evaluation Inference (STS) Beltagy and Erk
and Mooney, ACL 2014
- Compare MLN with PSL on the STS task
- PSL time MLN time MLN timeouts (10 min)
- msr-vid 8s 1m 31s 9
- msr-par 30s 11m 49s 97
- SICK 10s 4m 24s 36
- Apply MCW to MLN for a fairer comparison because
PSL already has a lazy grounding
47
48Outline
- Introduction
- Semantic representations
- Probabilistic logic
- Evaluation tasks
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
- Short Term
- Long Term
48
49Future Work RTE Task formulation
- 1) Better detecting of contradictions
- Example where the current P(H?T) fails
- T No man is playing a flute
- H A man is playing a large flute
- Detection of contradiction is
- T?H? false (from the logic point of view)
- ? T ? ?H probabilistic? P(?HT) 1
P(HT) (useless) - ? H? ?T probabilistic? P(?TH)
49
50Future Work RTE Task formulation
- 2) Using ratios
- P(HT) / P(H)
- P(?TH) / P(?T)
50
51P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none Future Work DCA
- Q ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
- Because of closed-world, Q comes true regardless
of E - Need Q to be true only when Q is explicitly
stated in E - Solution
- Add the negation of Q to the MLN with high weight
not infinity - R bird(B) ? agent(F, B) ? fly(F) w5
- P (QR) 0
- P(QE, R) 0 unless E unless E is negating R
51
52Future Work Knowledge Base Construction
- 1) Precompiled rules of paraphrase collections
like PPDB Ganitkevitch et al., NAACL 2013 - e.g solves gt finds a solution to
- ?e,x. solve(e) ? patient(e,x) ?
- ?s. find(e) ? patient(e,s) ? solution(s) ?
to(t,x) w - Variable binding is not trivial
- Templates
- Difference between logical expressions of the
sentence with and without the rule applied
52
53Future Work Knowledge Base Construction
- 2) Phrasal distributional rules
- Use linguistically motivated templates like
- Noun-phrase gt noun-phrase
- ?x. little(x) ? kid(x) ? smart(x) ? boy(x) w
- Subject-verb-object gt subject-verb-object
- ?x,y,z. man(x) ? agent(y,x) ? drive(y) ?
patient(y,z) ? car(z) ? guy(x) ? agent(y,x) ?
ride(y) ? patient(y,z) ? bike(z) w
53
54Future Work Inference
- 1) Better MLN inference with query formula
- Currently, we estimate Z(K U R) and Z(R)
- Combine both runs in one that exploits
- Similarities between Q U R and R
- We only need the ratio, not the absolute values
54
55Future Work Inference
- 2) Generalized modified closed-world assumption
- It is not clear how to propagate evidence of rule
like - ?x. dead(x) ? ? live(x)
- MCW needs to be generalized to arbitrary MLNs
- Find and eliminate ground atoms that have
marginal probability prior
probability
55
56Future Work Weight Learning
- One of the following
- Weight learning for inference rules
- Learn a better mapping from the weights we have
on resources to MLN weights - Learning how to weight different rules of
different resources differently - Weight learning for the STS task
- Weight different parts of the sentence
differently - e.g. black dog is more similar to white dog
that black cat
56
57Outline
- Introduction
- Completed research
- Parsing and task representation
- Knowledge base construction
- Inference
- Evaluation
- Future work
- Short Term
- Long Term
57
58Long-term Future Work
- 1) Question Answering
- Our semantic representation is a general and
expressive one, so apply it to more tasks - Given a query, find an answer for it from large
corpus of unstructured text - Inference finds best filling for existentially
quantified query - Efficient inference is the bottleneck
- 2) Generalized Quantifiers
- Few, most, many, are not natively supported in
first-order logic - Add support for them using
- Checking monotonicity
- Represent Few and Most as weighted universally
quantified rules
58
59Long-term Future Work
- 3) Contextualize WordNet Rules
- Use Word Sense Disambiguation, then generate
weighted inference rules from WordNet - 4) Other Languages
- Theoretically, this is a language independent
semantic representation - Practically, resources are not available
especially CCGBanks to train parsers and Boxer - 5) Inference Inspector
- Visualize the inference process and highlight
most effective rules - Not trivial in MLN because all rules affect the
final result to some extent
59
60Conclusion
- Probabilistic logic for semantic representation
- expressivity, automated inference and gradedness
- Evaluation on RTE and STS
- Formulating tasks as probabilistic logic
inferences - Building a knowledge base
- Performing inference efficiently base on the task
- For the short term future work, we
- enhance formulation of the RTE task, build bigger
knowledge base from more resources, generalize
the modified closed-world assumption, enhance our
MLN inference algorithm, and use some weight
learning - For the long term future work, we
- apply our semantic representation to the question
answering task, support generalized quantifiers,
contextualize WordNet rules, apply our semantic
representation to other languages and
implementing a probabilistic logic inference
inspector
60
61Thank You
?