Learning with Latent Alignment Structures - PowerPoint PPT Presentation

About This Presentation

Title:

Learning with Latent Alignment Structures

Description:

Bush later met with French President Jacques Chirac. MT: ????????????????? ... 1. Bush later met with French president Jacques Chirac. ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 26

Provided by: mengqi

Learn more at: https://cs.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning with Latent Alignment Structures

1
Learning with Latent Alignment Structures

Quasi-synchronous Grammar and Tree-edit CRFs for
Question Answering and Textual Entailment
Mengqiu Wang
Joint work with Chris Manning, Noah Smith

2
Task definition

At a high-level
Learning the syntactic and semantic relations
between two pieces of text
Application-specific definition of the relations
Question Answering
Q Who is the leader of France?
A Bush later met with French President Jacques
Chirac
Machine Translation
C ?????????????????
E Premier Wen Jiabao met with Japanese Prime
Minister Shinzo Abe yesterday.
Summarization
T US rounds up 400 Saddam diehards as group
claims anti-US attacks in Iraq.
S US rounded up 400 people in Iraq.
Textual Entailment (IE, IR, QA, SUM)
Txt Responding to Scheuer's comments in La
Repubblica, the prime minister's office said the
analysts' allegations, "beyond being false, are
also absolutely incompatible with the contents of
the conversation between Prime Minister Silvio
Berlusconi and U.S. Ambassador to Rome Mel
Sembler."
Hyp Mel Sembler represents the U.S.

3
The Challenges

Latent alignment structure
QA Who is the leader of France?
Bush later met with French President Jacques
Chirac
MT ?????????????????
Premier Wen Jiabao met with Japanese Prime
Minister Shinzo Abe yesterday.
Sum US rounds up 400 Saddam diehards as group
claims anti-US attacks in Iraq.
US rounded up 400 people in Iraq.
RTE Responding to the conversation between
Prime Minister Silvio Berlusconi and U.S.
Ambassador to Rome Mel Sembler.
Mel Sembler represents the U.S.

4
Other modeling challenges
1. Bush later met with French president Jacques
Chirac. 2. Henri Hadjenberg, who is the leader
of France s Jewish community, 3.
Who is the leader of France?
Question
Answer Ranking
5
Semantic Tranformations

QWho is the leader of France?
A Bush later met with French president Jacques
Chirac.

6
Syntactic Transformations
mod
mod
leader
the
France
of
is
?

mod
Bush
met
French
with
president
Jacques
Chirac
7
Syntactic Variations
mod
mod
leader
the
France
of
is
?

mod
mod
Henri
Hadjenberb
,
who
leader
is
the
of
France
s
Jewish
community
8
Whats been done?

The latent alignment problem
Instead of treating alignment as latent variable,
treat it as a separate task. First find the best
alignment, then proceed with the rest of the task
Pros Usually simple and efficient.
Cons Not very robust, no way to correct
alignment errors in later steps.
Modeling syntax and semantics
Extract features from syntactic parse trees and
semantic resources then throw them into a linear
classifier. Use syntax and semantic to enrich the
feature space, but no principled ways to make use
of syntax
Pros No need to worry about trees too much
Cons Ad-hocs

9
What I think an ideal model should do

Carry alignment uncertainty into final task
Treat alignment as latent variables and jointly
learn about proper alignment structure and the
overall task
In other words, model the distribution over
alignments and sum out all possible alignments at
decoding time.
Syntax-based and feature-rich models
Directly model syntax
Enable the use of rich semantic features and
features from other world-knowledge resources.

10
Road map

Present two models that address the raised issues
1 A model based on Quasi-synchronous Grammar
(EMNLP 07)
Experiments on Question Answering task
2 A tree-edit CRFs model (current work)
Experiments on RTE
Discuss and compare these two models
Modeling power
Pros and cons
Future work

11
Switching gear

Quasi-synchronous Grammar for Question Answering

12
Tree-edit CRFs for RTE

Extension to McCallum et al. UAI2005 work on CRFs
for finite-state String Edit Distance
Key attractions
Models the transformation of dependency parse
trees (thus directly models syntax), unlike
McCallum et al. 05, which only models word
strings
Discriminatively trained (not a generative model,
unlike QG)
Trained on both the positive and negative
instances of sentence pairs (QG is only trained
on positive Q/A pairs)
CRFs the underlying graphical model is an
undirected graphical model (QG is basically a
Bayes Net, directed)
Joint model over alignments (vs. local alignment
models in QG)
Feature rich

13
TE-CRFs model in details

First of all, lets look at the correspondence
between alignment (with constraints) and edit
operations

14
root
root
Q
A
substitute
root
root
met VBD
is VB
substitute
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
insert
det
of
Fancy substitute
nmod
president NN
the DT
France NNP location
substitute
delete
nmod
French JJ location
substitute
15
TE-CRFs model in details

Each valid tree edit operation sequence that
transforms one tree into the other corresponds to
an alignment. A tree edit operation sequence is
models as a transition sequence among a set of
states in a FSM

D, S, I
D, S, I
D, S, I
S1
S2
D, E, I
D, S, I
S3
D, S, I
D, S, I
substitute
insert
substitute
delete
substitute
substitute

16
FSM
insert
substitute
substitute
delete
substitute
substitute

This is for one edit operation sequence
substitute
delete
substitute
substitute
substitute
insert
insert
substitute
delete
substitute
substitute
substitute
substitute
substitute
delete
substitute
insert
substitute
There are many other valid edit sequences
17
FSM cont.
e
e
Positive State Set
Start
Stop
e
e
Negative State Set
18
FSM transitions
Positive State Set

S1
S1
S2
S3
S2
S3

S3
S3
S2
S3
S1
S2

S1
S1
S2
S2
S2
S2

S2
S3
S3
S2

S1
S3
Stop
Start
Negative State Set

S1
S1
S2
S3
S2
S3

S3
S3
S2
S3
S1
S2

S1
S1
S2
S2
S2

S2
S2
S3
S3
S2

S1
S3
19
Parameterization
substitute
S1
S2
positive or negative
positive and negative
20
Training using EM
Jensens Inequality
E-step
M-step Using L-BFGS
21
Features for RTE

Substitution
Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/A
dv/Other
Sub/MisSub -- Punct/Stopword/ModalWord
Antonym/Hypernym/Synonym/Nombank/Country
Different NE/Pos
Unrelated words
Delete
Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
hood/Conditional/If
Insert
Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
hood/Conditional/If
Tree
RootAligned/RootAlignedSameWord
Parent,Child,DepRel triple match/mismatch
Date/Time/Numerical
DateMismatch, hasNumDetMismatch,
normalizedFormMismatch

22
Tree-edit CRFs for Textual Entailment

Preliminary results
Trained on RTE2 dev, tested on RTE2 test.
model taken after 50 EM iterations
acc0.6275, map0.6407.
RTE2 official results
Hickl (LCC) acc0.7538, map0.8082
Tatu (LCC) acc0.7375, map0.7133
Zanzotto (Milan Rome) acc0.6388, map0.6441
Adams (Dallas) acc0.6262, map0.6282

23
Comparison QG vs. TE-CRFs
QG
TE-CRFs

Generative
Directed, BayesNet, local
Allow arbitrary swapping in alignment
Allow limited use of semantic features
(lexical-semantic log-linear model in mixture
model)
Computationally cheaper

Discriminative
Undirected, CRFs, global
No swapping cant do substitutions that involve
swapping (can be extended, see future work)
Allow arbitrary semantic features
Computationally more expensive

24
Future work
QG
TE-CRFs

Generative
Train discriminatively using Noahs Contrastive
Estimation
Directed, BayesNet, local
Higher-order Markovization
Allow arbitrary swapping in alignment
Allow limited use of semantic features
(lexical-semantic log-linear model in mixture
model)
Computationally cheaper
Run RTE experiments