Title: Learning with Latent Alignment Structures
1Learning with Latent Alignment Structures
- Quasi-synchronous Grammar and Tree-edit CRFs for
Question Answering and Textual Entailment - Mengqiu Wang
- Joint work with Chris Manning, Noah Smith
2Task definition
- At a high-level
- Learning the syntactic and semantic relations
between two pieces of text - Application-specific definition of the relations
- Question Answering
- Q Who is the leader of France?
- A Bush later met with French President Jacques
Chirac - Machine Translation
- C ?????????????????
- E Premier Wen Jiabao met with Japanese Prime
Minister Shinzo Abe yesterday. - Summarization
- T US rounds up 400 Saddam diehards as group
claims anti-US attacks in Iraq. - S US rounded up 400 people in Iraq.
- Textual Entailment (IE, IR, QA, SUM)
- Txt Responding to Scheuer's comments in La
Repubblica, the prime minister's office said the
analysts' allegations, "beyond being false, are
also absolutely incompatible with the contents of
the conversation between Prime Minister Silvio
Berlusconi and U.S. Ambassador to Rome Mel
Sembler." - Hyp Mel Sembler represents the U.S.
3The Challenges
- Latent alignment structure
- QA Who is the leader of France?
- Bush later met with French President Jacques
Chirac - MT ?????????????????
- Premier Wen Jiabao met with Japanese Prime
Minister Shinzo Abe yesterday. - Sum US rounds up 400 Saddam diehards as group
claims anti-US attacks in Iraq. - US rounded up 400 people in Iraq.
- RTE Responding to the conversation between
Prime Minister Silvio Berlusconi and U.S.
Ambassador to Rome Mel Sembler. - Mel Sembler represents the U.S.
4Other modeling challenges
1. Bush later met with French president Jacques
Chirac. 2. Henri Hadjenberg, who is the leader
of France s Jewish community, 3.
Who is the leader of France?
Question
Answer Ranking
5Semantic Tranformations
- QWho is the leader of France?
- A Bush later met with French president Jacques
Chirac.
6Syntactic Transformations
mod
mod
leader
the
France
of
is
?
mod
Bush
met
French
with
president
Jacques
Chirac
7Syntactic Variations
mod
mod
leader
the
France
of
is
?
mod
mod
Henri
Hadjenberb
,
who
leader
is
the
of
France
s
Jewish
community
8Whats been done?
- The latent alignment problem
- Instead of treating alignment as latent variable,
treat it as a separate task. First find the best
alignment, then proceed with the rest of the task - Pros Usually simple and efficient.
- Cons Not very robust, no way to correct
alignment errors in later steps. - Modeling syntax and semantics
- Extract features from syntactic parse trees and
semantic resources then throw them into a linear
classifier. Use syntax and semantic to enrich the
feature space, but no principled ways to make use
of syntax - Pros No need to worry about trees too much
- Cons Ad-hocs
9What I think an ideal model should do
- Carry alignment uncertainty into final task
- Treat alignment as latent variables and jointly
learn about proper alignment structure and the
overall task - In other words, model the distribution over
alignments and sum out all possible alignments at
decoding time. - Syntax-based and feature-rich models
- Directly model syntax
- Enable the use of rich semantic features and
features from other world-knowledge resources.
10Road map
- Present two models that address the raised issues
- 1 A model based on Quasi-synchronous Grammar
(EMNLP 07) - Experiments on Question Answering task
- 2 A tree-edit CRFs model (current work)
- Experiments on RTE
- Discuss and compare these two models
- Modeling power
- Pros and cons
- Future work
11Switching gear
- Quasi-synchronous Grammar for Question Answering
12Tree-edit CRFs for RTE
- Extension to McCallum et al. UAI2005 work on CRFs
for finite-state String Edit Distance - Key attractions
- Models the transformation of dependency parse
trees (thus directly models syntax), unlike
McCallum et al. 05, which only models word
strings - Discriminatively trained (not a generative model,
unlike QG) - Trained on both the positive and negative
instances of sentence pairs (QG is only trained
on positive Q/A pairs) - CRFs the underlying graphical model is an
undirected graphical model (QG is basically a
Bayes Net, directed) - Joint model over alignments (vs. local alignment
models in QG) - Feature rich
13TE-CRFs model in details
- First of all, lets look at the correspondence
between alignment (with constraints) and edit
operations
14 root
root
Q
A
substitute
root
root
met VBD
is VB
substitute
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
insert
det
of
Fancy substitute
nmod
president NN
the DT
France NNP location
substitute
delete
nmod
French JJ location
substitute
15TE-CRFs model in details
- Each valid tree edit operation sequence that
transforms one tree into the other corresponds to
an alignment. A tree edit operation sequence is
models as a transition sequence among a set of
states in a FSM
D, S, I
D, S, I
D, S, I
S1
S2
D, E, I
D, S, I
S3
D, S, I
D, S, I
substitute
insert
substitute
delete
substitute
substitute
16FSM
insert
substitute
substitute
delete
substitute
substitute
This is for one edit operation sequence
substitute
delete
substitute
substitute
substitute
insert
insert
substitute
delete
substitute
substitute
substitute
substitute
substitute
delete
substitute
insert
substitute
There are many other valid edit sequences
17FSM cont.
e
e
Positive State Set
Start
Stop
e
e
Negative State Set
18FSM transitions
Positive State Set
S1
S1
S2
S3
S2
S3
S3
S3
S2
S3
S1
S2
S1
S1
S2
S2
S2
S2
S2
S3
S3
S2
S1
S3
Stop
Start
Negative State Set
S1
S1
S2
S3
S2
S3
S3
S3
S2
S3
S1
S2
S1
S1
S2
S2
S2
S2
S2
S3
S3
S2
S1
S3
19Parameterization
substitute
S1
S2
positive or negative
positive and negative
20Training using EM
Jensens Inequality
E-step
M-step Using L-BFGS
21Features for RTE
- Substitution
- Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/A
dv/Other - Sub/MisSub -- Punct/Stopword/ModalWord
- Antonym/Hypernym/Synonym/Nombank/Country
- Different NE/Pos
- Unrelated words
- Delete
- Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
hood/Conditional/If - Insert
- Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
hood/Conditional/If - Tree
- RootAligned/RootAlignedSameWord
- Parent,Child,DepRel triple match/mismatch
- Date/Time/Numerical
- DateMismatch, hasNumDetMismatch,
normalizedFormMismatch
22Tree-edit CRFs for Textual Entailment
- Preliminary results
- Trained on RTE2 dev, tested on RTE2 test.
- model taken after 50 EM iterations
- acc0.6275, map0.6407.
- RTE2 official results
- Hickl (LCC) acc0.7538, map0.8082
- Tatu (LCC) acc0.7375, map0.7133
- Zanzotto (Milan Rome) acc0.6388, map0.6441
- Adams (Dallas) acc0.6262, map0.6282
23Comparison QG vs. TE-CRFs
QG
TE-CRFs
- Generative
- Directed, BayesNet, local
- Allow arbitrary swapping in alignment
- Allow limited use of semantic features
(lexical-semantic log-linear model in mixture
model) - Computationally cheaper
- Discriminative
- Undirected, CRFs, global
- No swapping cant do substitutions that involve
swapping (can be extended, see future work) - Allow arbitrary semantic features
- Computationally more expensive
24Future work
QG
TE-CRFs
- Generative
- Train discriminatively using Noahs Contrastive
Estimation - Directed, BayesNet, local
- Higher-order Markovization
- Allow arbitrary swapping in alignment
- Allow limited use of semantic features
(lexical-semantic log-linear model in mixture
model) - Computationally cheaper
- Run RTE experiments
- Discriminative
- Undirected, CRFs, global
- No swapping
- Constrained unordered trees
- Fancy edit operations (e.g. substitute sub-trees)
- Allow arbitrary semantic features
- More expensive
- Run QA and MT alignment experiments
25Thank you!