Natural Language Semantics using Probabilistic Logic - PowerPoint PPT Presentation

About This Presentation
Title:

Natural Language Semantics using Probabilistic Logic

Description:

... No man is playing a flute H: ... * Outline Introduction Semantic representations Probabilistic logic Evaluation tasks Completed research Parsing and ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 60
Provided by: Raymond183
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Semantics using Probabilistic Logic


1
Natural Language Semantics using Probabilistic
Logic
  • Islam Beltagy
  • Doctoral Dissertation Proposal
  • Supervising Professors Raymond J. Mooney, Katrin
    Erk

2
  • Who is the second president of the US ?
  • A John Adams
  • Who is the president that came after the first US
    president?
  • A
  • Semantic Representation how the meaning of
    natural text is represented
  • Inference how to draw conclusions from that
    semantic representation

2
3
Objective
  • Find a semantic representation that is
  • Expressive
  • Supports automated inference
  • Why ? more NLP applications more effectively
  • Question Answering, Automated Grading, Machine
    Translation, Summarization

3
4
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

4
5
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

5
6
Semantic Representations - Formal Semantics
  • Mapping natural language to some formal language
    (e.g. first-order logic) Montague, 1970
  • John is driving a car
  • ?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
    patient(y, z) ? car(z)
  • Pros
  • Deep representation Relations, Negations,
    Disjunctions, Quantifiers ...
  • Supports automated inference
  • Cons Unable to handle uncertain knowledge. Why
    important ? (pickle, cucumber), (cut, slice)

6
7
Semantic Representations - Distributional
Semantics
  • Similar words and phrases occur in similar
    contexts
  • Use context to represent meaning
  • Meanings are vectors in high-dimensional spaces
  • Words and phrases similarity measure
  • e.g similarity(water, bathtub)
    cosine(water, bathtub)
  • Pros robust probabilistic model that captures
    graded notion of similarity.
  • Cons shallow representation for the semantics

7
8
Proposed Semantic Representation
  • Proposed semantic representation Probabilistic
    Logic
  • Combines advantages of
  • Formal Semantics (expressive automated
    inference)
  • Distributional Semantics (gradedness)

8
9
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

9
10
Probabilistic Logic
  • Statistical Relational Learning Getoor and
    Taskar, 2007
  • Combine logical and statistical knowledge
  • Provide a mechanism for probabilistic inference
  • Use weighted first-order logic rules
  • Weighted rules are soft rules (compared to hard
    logical constraints)
  • Compactly encode complex probabilistic graphical
    models
  • Inference P(QE, KB)
  • Markov Logic Networks (MLN) Richardson and
    Domingos, 2006
  • Probabilistic Soft Logic (PSL) Kimmig et al.,
    NIPS 2012

10
11
Markov Logic NetworksRichardson and Domingos,
2006
  • ?x. smoke(x) ? cancer(x) 1.5
  • ?x,y. friend(x,y) ? (smoke(x) ?smoke(y)) 1.1
  • Two constants Anna (A) and Bob (B)
  • P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))

Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
11
12
Markov Logic NetworksRichardson and Domingos,
2006
  • Probability Mass Function (PMF)
  • Inference calculate probability of atoms given
    evidence set
  • P(Cancer(Anna) Friends(Anna,Bob), Smokes(Bob))

the set of all atoms
No. of true groundings of formula i in x
Weight of formula i
Normalization constant
a possible truth assignment
12
13
PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012
  • Probabilistic logic framework designed with
    efficient inference in mind
  • Atoms have continuous truth values in interval
    0,1 (Boolean atoms in MLN)
  • Lukasiewicz relaxation of AND, OR, NOT
  • I(l1 ? l2) max 0, I(l1) I(l2) 1
  • I(l1 ? l2) min 1, I(l1) I(l2)
  • I(? l1) 1 I(l1)
  • Inference linear program (combinatorial counting
    problem in MLN)

13
14
PSL Probabilistic Soft LogicKimmig et al.,
NIPS 2012
  • PDF
  • Inference Most Probable Explanation (MPE)
  • Linear program

Distance to satisfaction of rule r
Weight of formula r
Normalization constant
a possible continuous truth assignment
For all rules
14
15
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

15
16
Evaluation Tasks
  • Two tasks that require deep semantic
    understanding to do well on them
  • 1) Recognizing Textual Entailment (RTE) Dagan et
    al., 2013
  • Given two sentences T and H, finding if T
    Entails, Contradicts or not related (Neutral) to
    H
  • Entailment T A man is walking through the
    woods.
  • H A man is walking through a wooded
    area.
  • Contradiction T A man is jumping into an empty
    pool.
  • H A man is jumping into a full
    pool.
  • Neutral T A young girl is dancing.
  • H A young girl is standing on one
    leg.

16
17
Evaluation Tasks
  • Two tasks that require deep semantic
    understanding to do well on them
  • 2) Semantic Textual Similarity (STS) Agirre et
    al., 2012
  • Given two sentences S1, S2 , judge their semantic
    similarity on a scale from 1 to 5
  • S1 A man is playing a guitar.
  • S2 A woman is playing the guitar. (score
    2.75)
  • S1 A car is parking.
  • S2 A cat is playing. (score 0.00)

17
18
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

18
19
System ArchitectureBeltagy et al., SEM 2013
Knowledge Base Construction
T/S1
Parsing
LF1
KB
LF2
H/S2
Task Representation (RTE/STS)
Inference P(QE,KB) (MLN/PSL)
One advantage of using logic Modularity
Result (RTE/STS)
19
20
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

20
21
Parsing
  • Mapping input sentences to logic form
  • Using Boxer, a rule based system on top of a CCG
    parser Bos, 2008
  • John is driving a car
  • ?x,y,z. john(x) ? agent(y, x) ? drive(y) ?
    patient(y, z) ? car(z)

21
22
Task RepresentationBeltagy et al., SemEval 2014
  • Represent all tasks as inferences of the form
    P(QE, KB)
  • RTE
  • Two inferences P(HT, KB), P(H?T, KB)
  • Use a classifier to map probabilities to RTE
    class
  • STS
  • Two inferences P(S1S2, KB), P(S2S1, KB)
  • Use regression to map probabilities to overall
    similarity score

22
23
Domain Closure Assumption (DCA)
  • There are no objects in the universe other than
    the named constants
  • Constants need to be explicitly added
  • Universal quantifiers do not behave as expected
    because of finite domain
  • e.g. Tweety is a bird and it flies ? All birds fly

P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
23
24
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
  • E ?x,y. john(x) ? agent(y, x) ?
    eat(y)
  • Skolemized E john(J) ? agent(T, J) ?
    eat(T)
  • Embedded existentials
  • E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
  • Skolemized E ?x. bird(x) ? agent(f(x), x) ?
    fly(f(x))
  • Simulate skolem functions
  • ?x. bird(x) ? ?y. skolemf(x,y) ? agent(y, x) ?
    fly(y)
  • skolemf (B1, C1), skolemf (B2, C2)

24
25
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
  • E ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
  • Q ?x,y. bird(x) ? agent(y, x) ? fly(y)
    (false)
  • Solution introduce additional evidence bird(B)
  • Pragmatically, birds exist (Existence)
  • Negated existential
  • E ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
  • No additional constants needed

25
26
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none future work
  • E bird( ) ? agent(F, ) ? fly
    (F)
  • Q ?x. bird(x) ? ?y. agent(y, x) ? fly(y)
    (true)
  • Universal quantifiers work only on the constants
    of the given finite domain
  • Solution add an extra bird( ) to the
    domain
  • If the new bird can be shown to fly, then there
    is an explicit universal quantification in E

26
27
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work

27
28
Knowledge Base Construction
  • Represent background knowledge as weighted
    inference rules
  • 1) WordNet rules
  • WordNet lexical database of word and their
    semantic relations
  • Synonyms ?x. man(x) ? guy(x) w 8
  • Hyponym ?x. car(x) ? vehicle(x) w 8
  • Antonyms ?x. tall(x) ? ?short(x) w 8

28
29
Knowledge Base Construction
  • Represent background knowledge as weighted
    inference rules
  • 2) Distributional rules (on-the-fly rules)
  • For all pairs of words (a, b) where a?T/S1,
    b?H/S2, generate the rule
  • ?x. a(x) ? b(x) f(w)
  • w cosine(a, b)
  • f(w) log(w/(1-w))

29
30
Outline
  • Introduction
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • RTE using MLNs
  • STS using MLNs
  • STS using PSL
  • Evaluation
  • Future work

30
31
Inference
  • Inference problem P(QE, KB)
  • Solve it using MLN and PSL for RTE and STS
  • RTE using MLNs
  • STS using MLNs
  • STS using PSL

31
32
Outline
  • Introduction
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • RTE using MLNs
  • STS using MLNs
  • STS using PSL
  • Evaluation
  • Future work

32
33
MLNs for RTE - Query Formula (QF) Beltagy and
Mooney, StarAI 2014
  • Alchemy (MLNs implementation) calculates only
    probabilities of ground atoms
  • Inference algorithm supports query formulas
  • P(QR) Z(Q U R) / Z(R) Gogate and Domingos,
    2011
  • Z normalization constant of the probability
    distribution
  • Estimate the partition function Z using
    SampleSearch Gogate and Dechter, 2011
  • SampleSearch is an algorithm to estimate the
    partition function Z of mixed graphical models
    (probabilistic and deterministic)

33
34
MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014
  • MLNs grounding generates very large graphical
    models
  • Q has O(cv) ground clauses
  • v number of variables in Q
  • c number of constants in the domain

34
35
MLNs for RTE - Modified Closed-world (MCW)
Beltagy and Mooney, StarAI 2014
  • Low priors by default, ground atoms have very
    low probabilities, unless shown otherwise thought
    inference
  • Example
  • E man(M) ? agent(D, M) ? drive(D)
  • Priors ?x. man(x) -2, ?x. guy(x) -2,
    ?x. drive(x) -2
  • KB ?x. man(x) ? guy(x) 1.8
  • Q ?x,y. guy(x) ? agent(y, x) ? drive(y)
  • Ground Atoms man(M), man(D), guy(M), guy(D),
    drive(M), drive(D)
  • Solution a MCW to eliminate unimportant ground
    atoms
  • not reachable from the evidence (evidence
    propagation)
  • Strict version of low priors
  • Dramatically reduces size of the problem

35
36
Outline
  • Introduction
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • RTE using MLNs
  • STS using MLNs
  • STS using PSL
  • Evaluation
  • Future work

36
37
MLNs for STS Beltagy et al., SEM 2013
  • Strict conjunction in Q does not fit STS
  • E A man is driving ?x,y. man(x)
    ? drive(y) ? agent(y, x)
  • Q A man is driving a bus ?x,y,z. man(x) ?
    drive(y) ? agent(y, x) ? bus(z) ? patient(y,z)
  • Break Q into mini-clauses then combine their
    evidences using an averaging combiner Natarajan
    et al., 2010
  • ?x,y,z. man(x) ? agent(y, x) ? result(x,y,z)
    w
  • ?x,y,z. drive(y) ? agent(y, x) ? result(x,y,z)
    w
  • ?x,y,z. drive(y) ? patient(y, z)? result(x,y,z)
    w
  • ?x,y,z. bus(z) ? patient(y, z)? result(x,y,z)
    w

37
38
Outline
  • Introduction
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • RTE using MLNs
  • STS using MLNs
  • STS using PSL
  • Evaluation
  • Future work

38
39
PSL for STS Beltagy and Erk and Mooney, ACL
2014
  • Similar to MLN, conjunction in PSL does not fit
    STS
  • Replace conjunctions in Q with average
  • I(l1 ? ? ln) avg( I(l1), , I(ln))
  • Inference
  • average is a linear function
  • No changes in the optimization problem
  • Heuristic grounding (details omitted)

39
40
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Knowledge Base
  • Inference
  • Future work

40
41
Evaluation - Datasets
  • SICK (RTE and STS) SemEval 2014
  • Sentences Involving Compositional Knowledge
  • 10,000 pairs of sentences
  • msr-vid (STS) SemEval 2012
  • Microsoft video description corpus
  • 1,500 pair of short video descriptions
  • msr-par (STS) SemEval 2012
  • Microsoft paraphrase corpus
  • 1,500 pair of long news sentences

41
42
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Knowledge Base
  • Inference
  • Future work

42
43
Evaluation Knowledge Base
  • logickb is better than logic and better than
    dist
  • PSL is doing much better than MLN for the STS task

43
44
Evaluation Error analysis of RTE
  • Our systems accuracy 77.72
  • The remaining 22.28 are
  • Entailment pairs classified as Neutral 15.32
  • Contradiction pairs classified as Neutral 6.12
  • Other 0.84
  • System precision 98.9, recall 78.56.
  • High precision, low recall is the typical
    behavior of logic-base systems
  • Fixes (future work)
  • Larger knowledge base
  • Fix some limitations in the detection of
    contradictions

44
45
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Knowledge Base
  • Inference
  • Future work

45
46
Evaluation Inference (RTE) Beltagy and
Mooney, StarAI 2014
  • Dataset SICK (from SemEval 2014)
  • Systems compared
  • mln using Alchemy out-of-the-box
  • mlnqf our algorithm to calculate probability of
    query formula
  • mlnmcw mln with our modified-closed-world
  • mlnqfmcw both components
  • System Accuracy CPU Time Timeouts(30 min)
  • mln 57 2min 27sec 96
  • mlnqf 69 1min 51sec 30
  • mlnmcw 66 10sec 2.5
  • mlnqfmcw 72 7sec 2.1

46
47
Evaluation Inference (STS) Beltagy and Erk
and Mooney, ACL 2014
  • Compare MLN with PSL on the STS task
  • PSL time MLN time MLN timeouts (10 min)
  • msr-vid 8s 1m 31s 9
  • msr-par 30s 11m 49s 97
  • SICK 10s 4m 24s 36
  • Apply MCW to MLN for a fairer comparison because
    PSL already has a lazy grounding

47
48
Outline
  • Introduction
  • Semantic representations
  • Probabilistic logic
  • Evaluation tasks
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work
  • Short Term
  • Long Term

48
49
Future Work RTE Task formulation
  • 1) Better detecting of contradictions
  • Example where the current P(H?T) fails
  • T No man is playing a flute
  • H A man is playing a large flute
  • Detection of contradiction is
  • T?H? false (from the logic point of view)
  • ? T ? ?H probabilistic? P(?HT) 1
    P(HT) (useless)
  • ? H? ?T probabilistic? P(?TH)

49
50
Future Work RTE Task formulation
  • 2) Using ratios
  • P(HT) / P(H)
  • P(?TH) / P(?T)

50
51
P(QE,KB) E Q
? Skolemization none
? All birds fly ? Some birds fly Tweety is a bird. It flies ? All birds fly
? (? ?) none Future Work DCA
  • Q ? ?x,y. bird(x) ? agent(y, x) ? fly(y)
  • Because of closed-world, Q comes true regardless
    of E
  • Need Q to be true only when Q is explicitly
    stated in E
  • Solution
  • Add the negation of Q to the MLN with high weight
    not infinity
  • R bird(B) ? agent(F, B) ? fly(F) w5
  • P (QR) 0
  • P(QE, R) 0 unless E unless E is negating R

51
52
Future Work Knowledge Base Construction
  • 1) Precompiled rules of paraphrase collections
    like PPDB Ganitkevitch et al., NAACL 2013
  • e.g solves gt finds a solution to
  • ?e,x. solve(e) ? patient(e,x) ?
  • ?s. find(e) ? patient(e,s) ? solution(s) ?
    to(t,x) w
  • Variable binding is not trivial
  • Templates
  • Difference between logical expressions of the
    sentence with and without the rule applied

52
53
Future Work Knowledge Base Construction
  • 2) Phrasal distributional rules
  • Use linguistically motivated templates like
  • Noun-phrase gt noun-phrase
  • ?x. little(x) ? kid(x) ? smart(x) ? boy(x) w
  • Subject-verb-object gt subject-verb-object
  • ?x,y,z. man(x) ? agent(y,x) ? drive(y) ?
    patient(y,z) ? car(z) ? guy(x) ? agent(y,x) ?
    ride(y) ? patient(y,z) ? bike(z) w

53
54
Future Work Inference
  • 1) Better MLN inference with query formula
  • Currently, we estimate Z(K U R) and Z(R)
  • Combine both runs in one that exploits
  • Similarities between Q U R and R
  • We only need the ratio, not the absolute values

54
55
Future Work Inference
  • 2) Generalized modified closed-world assumption
  • It is not clear how to propagate evidence of rule
    like
  • ?x. dead(x) ? ? live(x)
  • MCW needs to be generalized to arbitrary MLNs
  • Find and eliminate ground atoms that have
    marginal probability prior
    probability

55
56
Future Work Weight Learning
  • One of the following
  • Weight learning for inference rules
  • Learn a better mapping from the weights we have
    on resources to MLN weights
  • Learning how to weight different rules of
    different resources differently
  • Weight learning for the STS task
  • Weight different parts of the sentence
    differently
  • e.g. black dog is more similar to white dog
    that black cat

56
57
Outline
  • Introduction
  • Completed research
  • Parsing and task representation
  • Knowledge base construction
  • Inference
  • Evaluation
  • Future work
  • Short Term
  • Long Term

57
58
Long-term Future Work
  • 1) Question Answering
  • Our semantic representation is a general and
    expressive one, so apply it to more tasks
  • Given a query, find an answer for it from large
    corpus of unstructured text
  • Inference finds best filling for existentially
    quantified query
  • Efficient inference is the bottleneck
  • 2) Generalized Quantifiers
  • Few, most, many, are not natively supported in
    first-order logic
  • Add support for them using
  • Checking monotonicity
  • Represent Few and Most as weighted universally
    quantified rules

58
59
Long-term Future Work
  • 3) Contextualize WordNet Rules
  • Use Word Sense Disambiguation, then generate
    weighted inference rules from WordNet
  • 4) Other Languages
  • Theoretically, this is a language independent
    semantic representation
  • Practically, resources are not available
    especially CCGBanks to train parsers and Boxer
  • 5) Inference Inspector
  • Visualize the inference process and highlight
    most effective rules
  • Not trivial in MLN because all rules affect the
    final result to some extent

59
60
Conclusion
  • Probabilistic logic for semantic representation
  • expressivity, automated inference and gradedness
  • Evaluation on RTE and STS
  • Formulating tasks as probabilistic logic
    inferences
  • Building a knowledge base
  • Performing inference efficiently base on the task
  • For the short term future work, we
  • enhance formulation of the RTE task, build bigger
    knowledge base from more resources, generalize
    the modified closed-world assumption, enhance our
    MLN inference algorithm, and use some weight
    learning
  • For the long term future work, we
  • apply our semantic representation to the question
    answering task, support generalized quantifiers,
    contextualize WordNet rules, apply our semantic
    representation to other languages and
    implementing a probabilistic logic inference
    inspector

60
61
Thank You
?
Write a Comment
User Comments (0)
About PowerShow.com