Natural Language Semantics using Probabilistic Logic

About This Presentation

Title:

Natural Language Semantics using Probabilistic Logic

Description:

... No man is playing a flute H: ... * Outline Introduction Semantic representations Probabilistic logic Evaluation tasks Completed research Parsing and ... – PowerPoint PPT presentation

Number of Views:210

Avg rating:3.0/5.0

Slides: 60

Provided by: Raymond183

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Natural Language Semantics using Probabilistic Logic

1
Natural Language Semantics using Probabilistic
Logic

Islam Beltagy
Doctoral Dissertation Proposal
Supervising Professors Raymond J. Mooney, Katrin
Erk

Who is the second president of the US ?
A John Adams
Who is the president that came after the first US
president?
A
Semantic Representation how the meaning of
natural text is represented
Inference how to draw conclusions from that
semantic representation

2
3
Objective

Find a semantic representation that is
Expressive
Supports automated inference
Why ? more NLP applications more effectively
Question Answering, Automated Grading, Machine
Translation, Summarization

3
4
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

4
5
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

5
6
Semantic Representations - Formal Semantics

Mapping natural language to some formal language
(e.g. first-order logic) Montague, 1970
John is driving a car
?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
patient(y, z) ? car(z)
Pros
Deep representation Relations, Negations,
Disjunctions, Quantifiers ...
Supports automated inference
Cons Unable to handle uncertain knowledge. Why
important ? (pickle, cucumber), (cut, slice)

6
7
Semantic Representations - Distributional
Semantics

Similar words and phrases occur in similar
contexts
Use context to represent meaning
Meanings are vectors in high-dimensional spaces
Words and phrases similarity measure
e.g similarity(water, bathtub)
cosine(water, bathtub)
Pros robust probabilistic model that captures
graded notion of similarity.
Cons shallow representation for the semantics

7
8
Proposed Semantic Representation

Proposed semantic representation Probabilistic
Logic
Combines advantages of
Formal Semantics (expressive automated
inference)
Distributional Semantics (gradedness)

8
9
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

9
10
Probabilistic Logic

Statistical Relational Learning Getoor and
Taskar, 2007
Combine logical and statistical knowledge
Provide a mechanism for probabilistic inference
Use weighted first-order logic rules
Weighted rules are soft rules (compared to hard
logical constraints)
Compactly encode complex probabilistic graphical
models
Inference P(QE, KB)
Markov Logic Networks (MLN) Richardson and
Domingos, 2006
Probabilistic Soft Logic (PSL) Kimmig et al.,
NIPS 2012

10
11
Markov Logic NetworksRichardson and Domingos,
2006

?x. smoke(x) ? cancer(x) 1.5
?x,y. friend(x,y) ? (smoke(x) ?smoke(y)) 1.1
Two constants Anna (A) and Bob (B)
P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))

Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
11
12
Markov Logic NetworksRichardson and Domingos,
2006

Probability Mass Function (PMF)
Inference calculate probability of atoms given
evidence set
P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))

the set of all atoms
No. of true groundings of formula i in x
Weight of formula i
Normalization constant
a possible truth assignment
12
13
PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012

Probabilistic logic framework designed with
efficient inference in mind
Atoms have continuous truth values in interval
0,1 (Boolean atoms in MLN)
Lukasiewicz relaxation of AND, OR, NOT
I(l1 ? l2) max 0, I(l1) I(l2) 1
I(l1 ? l2) min 1, I(l1) I(l2)
I(? l1) 1 I(l1)
Inference linear program (combinatorial counting
problem in MLN)

13
14
PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012

PDF
Inference Most Probable Explanation (MPE)
Linear program

Distance to satisfaction of rule r
Weight of formula r
Normalization constant
a possible continuous truth assignment
For all rules
14
15
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

15
16
Evaluation Tasks

Two tasks that require deep semantic
understanding to do well on them
1) Recognizing Textual Entailment (RTE) Dagan et
al., 2013
Given two sentences T and H, finding if T
Entails, Contradicts or not related (Neutral) to
H
Entailment T A man is walking through the
woods.
H A man is walking through a wooded
area.
Contradiction T A man is jumping into an empty
pool.
H A man is jumping into a full
pool.
Neutral T A young girl is dancing.
H A young girl is standing on one
leg.

16
17
Evaluation Tasks

Two tasks that require deep semantic
understanding to do well on them
2) Semantic Textual Similarity (STS) Agirre et
al., 2012
Given two sentences S1, S2 , judge their semantic
similarity on a scale from 1 to 5
S1 A man is playing a guitar.
S2 A woman is playing the guitar. (score
2.75)
S1 A car is parking.
S2 A cat is playing. (score 0.00)

17
18
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

18
19
System ArchitectureBeltagy et al., SEM 2013
Knowledge Base Construction
T/S1
Parsing
LF1
KB
LF2
H/S2
Task Representation (RTE/STS)
Inference P(QE,KB) (MLN/PSL)
One advantage of using logic Modularity
Result (RTE/STS)
19
20
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

20
21
Parsing

Mapping input sentences to logic form
Using Boxer, a rule based system on top of a CCG
parser Bos, 2008
John is driving a car
?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
patient(y, z) ? car(z)

21
22
Task RepresentationBeltagy et al., SemEval 2014

Represent all tasks as inferences of the form
P(QE, KB)
RTE
Two inferences P(HT, KB), P(H?T, KB)
Use a classifier to map probabilities to RTE
class
STS
Two inferences P(S1S2, KB), P(S2S1, KB)
Use regression to map probabilities to overall
similarity score

22
23
Domain Closure Assumption (DCA)

There are no objects in the universe other than
the named constants
Constants need to be explicitly added
Universal quantifiers do not behave as expected
because of finite domain
e.g. Tweety is a bird and it flies ? All birds fly

P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
23
24
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work

E ?x,y. john(x) ? agent(y, x) ?
eat(y)
Skolemized E john(J) ? agent(T, J) ?
eat(T)
Embedded existentials
E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
Skolemized E ?x. bird(x) ? agent(f(x), x) ?
fly(f(x))
Simulate skolem functions
?x. bird(x) ? ?y. skolemf(x,y) ? agent(y, x) ?
fly(y)
skolemf (B1, C1), skolemf (B2, C2)

24
25
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work

E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
Q ?x,y. bird(x) ? agent(y, x) ? fly(y)
(false)
Solution introduce additional evidence bird(B)
Pragmatically, birds exist (Existence)
Negated existential
E ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
No additional constants needed

25
26
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work

E bird( ) ? agent(F, ) ? fly
(F)
Q ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
(true)
Universal quantifiers work only on the constants
of the given finite domain
Solution add an extra bird( ) to the
domain
If the new bird can be shown to fly, then there
is an explicit universal quantification in E

26
27
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work

27
28
Knowledge Base Construction

Represent background knowledge as weighted
inference rules
1) WordNet rules
WordNet lexical database of word and their
semantic relations
Synonyms ?x. man(x) ? guy(x) w 8
Hyponym ?x. car(x) ? vehicle(x) w 8
Antonyms ?x. tall(x) ? ?short(x) w 8

28
29
Knowledge Base Construction

Represent background knowledge as weighted
inference rules
2) Distributional rules (on-the-fly rules)
For all pairs of words (a, b) where a?T/S1,
b?H/S2, generate the rule
?x. a(x) ? b(x) f(w)
w cosine(a, b)
f(w) log(w/(1-w))

29
30
Outline

Introduction
Completed research
Parsing and task representation
Knowledge base construction
Inference
RTE using MLNs
STS using MLNs
STS using PSL
Evaluation
Future work

30
31
Inference

Inference problem P(QE, KB)
Solve it using MLN and PSL for RTE and STS
RTE using MLNs
STS using MLNs
STS using PSL

31
32
Outline

Introduction
Completed research
Parsing and task representation
Knowledge base construction
Inference
RTE using MLNs
STS using MLNs
STS using PSL
Evaluation
Future work

32
33
MLNs for RTE - Query Formula (QF) Beltagy and
Mooney, StarAI 2014

Alchemy (MLNs implementation) calculates only
probabilities of ground atoms
Inference algorithm supports query formulas
P(QR) Z(Q U R) / Z(R) Gogate and Domingos,
2011
Z normalization constant of the probability
distribution
Estimate the partition function Z using
SampleSearch Gogate and Dechter, 2011
SampleSearch is an algorithm to estimate the
partition function Z of mixed graphical models
(probabilistic and deterministic)

33
34
MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014

MLNs grounding generates very large graphical
models
Q has O(cv) ground clauses
v number of variables in Q
c number of constants in the domain

34
35
MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014

Low priors by default, ground atoms have very
low probabilities, unless shown otherwise thought
inference
Example
E man(M) ? agent(D, M) ? drive(D)
Priors ?x. man(x) -2, ?x. guy(x) -2,
?x. drive(x) -2
KB ?x. man(x) ? guy(x) 1.8
Q ?x,y. guy(x) ? agent(y, x) ? drive(y)
Ground Atoms man(M), man(D), guy(M), guy(D),
drive(M), drive(D)
Solution a MCW to eliminate unimportant ground
atoms
not reachable from the evidence (evidence
propagation)
Strict version of low priors
Dramatically reduces size of the problem

35
36
Outline

Introduction
Completed research
Parsing and task representation
Knowledge base construction
Inference
RTE using MLNs
STS using MLNs
STS using PSL
Evaluation
Future work

36
37
MLNs for STS Beltagy et al., SEM 2013

Strict conjunction in Q does not fit STS
E A man is driving ?x,y. man(x)
? drive(y) ? agent(y, x)
Q A man is driving a bus ?x,y,z. man(x) ?
drive(y) ? agent(y, x) ? bus(z) ? patient(y,z)
Break Q into mini-clauses then combine their
evidences using an averaging combiner Natarajan
et al., 2010
?x,y,z. man(x) ? agent(y, x) ? result(x,y,z)
w
?x,y,z. drive(y) ? agent(y, x) ? result(x,y,z)
w
?x,y,z. drive(y) ? patient(y, z)? result(x,y,z)
w
?x,y,z. bus(z) ? patient(y, z)? result(x,y,z)
w

37
38
Outline

Introduction
Completed research
Parsing and task representation
Knowledge base construction
Inference
RTE using MLNs
STS using MLNs
STS using PSL
Evaluation
Future work

38
39
PSL for STS Beltagy and Erk and Mooney, ACL
2014

Similar to MLN, conjunction in PSL does not fit
STS
Replace conjunctions in Q with average
I(l1 ? ? ln) avg( I(l1), , I(ln))
Inference
average is a linear function
No changes in the optimization problem
Heuristic grounding (details omitted)

39
40
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Knowledge Base
Inference
Future work

40
41
Evaluation - Datasets

SICK (RTE and STS) SemEval 2014
Sentences Involving Compositional Knowledge
10,000 pairs of sentences
msr-vid (STS) SemEval 2012
Microsoft video description corpus
1,500 pair of short video descriptions
msr-par (STS) SemEval 2012
Microsoft paraphrase corpus
1,500 pair of long news sentences

41
42
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Knowledge Base
Inference
Future work

42
43
Evaluation Knowledge Base

logickb is better than logic and better than
dist
PSL is doing much better than MLN for the STS task

43
44
Evaluation Error analysis of RTE

Our systems accuracy 77.72
The remaining 22.28 are
Entailment pairs classified as Neutral 15.32
Contradiction pairs classified as Neutral 6.12
Other 0.84
System precision 98.9, recall 78.56.
High precision, low recall is the typical
behavior of logic-base systems
Fixes (future work)
Larger knowledge base
Fix some limitations in the detection of
contradictions

44
45
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Knowledge Base
Inference
Future work

45
46
Evaluation Inference (RTE) Beltagy and
Mooney, StarAI 2014

Dataset SICK (from SemEval 2014)
Systems compared
mln using Alchemy out-of-the-box
mlnqf our algorithm to calculate probability of
query formula
mlnmcw mln with our modified-closed-world
mlnqfmcw both components
System Accuracy CPU Time Timeouts(30 min)
mln 57 2min 27sec 96
mlnqf 69 1min 51sec 30
mlnmcw 66 10sec 2.5
mlnqfmcw 72 7sec 2.1

46
47
Evaluation Inference (STS) Beltagy and Erk
and Mooney, ACL 2014

Compare MLN with PSL on the STS task
PSL time MLN time MLN timeouts (10 min)
msr-vid 8s 1m 31s 9
msr-par 30s 11m 49s 97
SICK 10s 4m 24s 36
Apply MCW to MLN for a fairer comparison because
PSL already has a lazy grounding

47
48
Outline

Introduction
Semantic representations
Probabilistic logic
Evaluation tasks
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work
Short Term
Long Term

48
49
Future Work RTE Task formulation

1) Better detecting of contradictions
Example where the current P(H?T) fails
T No man is playing a flute
H A man is playing a large flute
Detection of contradiction is
T?H? false (from the logic point of view)
? T ? ?H probabilistic? P(?HT) 1
P(HT) (useless)
? H? ?T probabilistic? P(?TH)

49
50
Future Work RTE Task formulation

2) Using ratios
P(HT) / P(H)
P(?TH) / P(?T)

50
51
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none Future Work DCA

Q ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
Because of closed-world, Q comes true regardless
of E
Need Q to be true only when Q is explicitly
stated in E
Solution
Add the negation of Q to the MLN with high weight
not infinity
R bird(B) ? agent(F, B) ? fly(F) w5
P (QR) 0
P(QE, R) 0 unless E unless E is negating R

51
52
Future Work Knowledge Base Construction

1) Precompiled rules of paraphrase collections
like PPDB Ganitkevitch et al., NAACL 2013
e.g solves gt finds a solution to
?e,x. solve(e) ? patient(e,x) ?
?s. find(e) ? patient(e,s) ? solution(s) ?
to(t,x) w
Variable binding is not trivial
Templates
Difference between logical expressions of the
sentence with and without the rule applied

52
53
Future Work Knowledge Base Construction

2) Phrasal distributional rules
Use linguistically motivated templates like
Noun-phrase gt noun-phrase
?x. little(x) ? kid(x) ? smart(x) ? boy(x) w
Subject-verb-object gt subject-verb-object
?x,y,z. man(x) ? agent(y,x) ? drive(y) ?
patient(y,z) ? car(z) ? guy(x) ? agent(y,x) ?
ride(y) ? patient(y,z) ? bike(z) w

53
54
Future Work Inference

1) Better MLN inference with query formula
Currently, we estimate Z(K U R) and Z(R)
Combine both runs in one that exploits
Similarities between Q U R and R
We only need the ratio, not the absolute values

54
55
Future Work Inference

2) Generalized modified closed-world assumption
It is not clear how to propagate evidence of rule
like
?x. dead(x) ? ? live(x)
MCW needs to be generalized to arbitrary MLNs
Find and eliminate ground atoms that have
marginal probability prior
probability

55
56
Future Work Weight Learning

One of the following
Weight learning for inference rules
Learn a better mapping from the weights we have
on resources to MLN weights
Learning how to weight different rules of
different resources differently
Weight learning for the STS task
Weight different parts of the sentence
differently
e.g. black dog is more similar to white dog
that black cat

56
57
Outline

Introduction
Completed research
Parsing and task representation
Knowledge base construction
Inference
Evaluation
Future work
Short Term
Long Term

57
58
Long-term Future Work

1) Question Answering
Our semantic representation is a general and
expressive one, so apply it to more tasks
Given a query, find an answer for it from large
corpus of unstructured text
Inference finds best filling for existentially
quantified query
Efficient inference is the bottleneck
2) Generalized Quantifiers
Few, most, many, are not natively supported in
first-order logic
Add support for them using
Checking monotonicity
Represent Few and Most as weighted universally
quantified rules

58
59
Long-term Future Work

3) Contextualize WordNet Rules
Use Word Sense Disambiguation, then generate
weighted inference rules from WordNet
4) Other Languages
Theoretically, this is a language independent
semantic representation
Practically, resources are not available
especially CCGBanks to train parsers and Boxer
5) Inference Inspector
Visualize the inference process and highlight
most effective rules
Not trivial in MLN because all rules affect the
final result to some extent

59
60
Conclusion

Probabilistic logic for semantic representation
expressivity, automated inference and gradedness
Evaluation on RTE and STS
Formulating tasks as probabilistic logic
inferences
Building a knowledge base
Performing inference efficiently base on the task
For the short term future work, we
enhance formulation of the RTE task, build bigger
knowledge base from more resources, generalize
the modified closed-world assumption, enhance our
MLN inference algorithm, and use some weight
learning
For the long term future work, we
apply our semantic representation to the question
answering task, support generalized quantifiers,
contextualize WordNet rules, apply our semantic
representation to other languages and
implementing a probabilistic logic inference
inspector