Learning with Latent Alignment Structures - PowerPoint PPT Presentation

About This Presentation
Title:

Learning with Latent Alignment Structures

Description:

Bush later met with French President Jacques Chirac. MT: ????????????????? ... 1. Bush later met with French president Jacques Chirac. ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 26
Provided by: mengqi
Learn more at: https://cs.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning with Latent Alignment Structures


1
Learning with Latent Alignment Structures
  • Quasi-synchronous Grammar and Tree-edit CRFs for
    Question Answering and Textual Entailment
  • Mengqiu Wang
  • Joint work with Chris Manning, Noah Smith

2
Task definition
  • At a high-level
  • Learning the syntactic and semantic relations
    between two pieces of text
  • Application-specific definition of the relations
  • Question Answering
  • Q Who is the leader of France?
  • A Bush later met with French President Jacques
    Chirac
  • Machine Translation
  • C ?????????????????
  • E Premier Wen Jiabao met with Japanese Prime
    Minister Shinzo Abe yesterday.
  • Summarization
  • T US rounds up 400 Saddam diehards as group
    claims anti-US attacks in Iraq.
  • S US rounded up 400 people in Iraq.
  • Textual Entailment (IE, IR, QA, SUM)
  • Txt Responding to Scheuer's comments in La
    Repubblica, the prime minister's office said the
    analysts' allegations, "beyond being false, are
    also absolutely incompatible with the contents of
    the conversation between Prime Minister Silvio
    Berlusconi and U.S. Ambassador to Rome Mel
    Sembler."
  • Hyp Mel Sembler represents the U.S.

3
The Challenges
  • Latent alignment structure
  • QA Who is the leader of France?
  • Bush later met with French President Jacques
    Chirac
  • MT ?????????????????
  • Premier Wen Jiabao met with Japanese Prime
    Minister Shinzo Abe yesterday.
  • Sum US rounds up 400 Saddam diehards as group
    claims anti-US attacks in Iraq.
  • US rounded up 400 people in Iraq.
  • RTE Responding to the conversation between
    Prime Minister Silvio Berlusconi and U.S.
    Ambassador to Rome Mel Sembler.
  • Mel Sembler represents the U.S.

4
Other modeling challenges
1. Bush later met with French president Jacques
Chirac. 2. Henri Hadjenberg, who is the leader
of France s Jewish community, 3.
Who is the leader of France?
Question
Answer Ranking
5
Semantic Tranformations
  • QWho is the leader of France?
  • A Bush later met with French president Jacques
    Chirac.

6
Syntactic Transformations
mod
mod
leader
the
France
of
is
?
  • Who

mod
Bush
met
French
with
president
Jacques
Chirac
7
Syntactic Variations
mod
mod
leader
the
France
of
is
?
  • Who

mod
mod
Henri
Hadjenberb
,
who
leader
is
the
of
France
s
Jewish
community
8
Whats been done?
  • The latent alignment problem
  • Instead of treating alignment as latent variable,
    treat it as a separate task. First find the best
    alignment, then proceed with the rest of the task
  • Pros Usually simple and efficient.
  • Cons Not very robust, no way to correct
    alignment errors in later steps.
  • Modeling syntax and semantics
  • Extract features from syntactic parse trees and
    semantic resources then throw them into a linear
    classifier. Use syntax and semantic to enrich the
    feature space, but no principled ways to make use
    of syntax
  • Pros No need to worry about trees too much
  • Cons Ad-hocs

9
What I think an ideal model should do
  • Carry alignment uncertainty into final task
  • Treat alignment as latent variables and jointly
    learn about proper alignment structure and the
    overall task
  • In other words, model the distribution over
    alignments and sum out all possible alignments at
    decoding time.
  • Syntax-based and feature-rich models
  • Directly model syntax
  • Enable the use of rich semantic features and
    features from other world-knowledge resources.

10
Road map
  • Present two models that address the raised issues
  • 1 A model based on Quasi-synchronous Grammar
    (EMNLP 07)
  • Experiments on Question Answering task
  • 2 A tree-edit CRFs model (current work)
  • Experiments on RTE
  • Discuss and compare these two models
  • Modeling power
  • Pros and cons
  • Future work

11
Switching gear
  • Quasi-synchronous Grammar for Question Answering

12
Tree-edit CRFs for RTE
  • Extension to McCallum et al. UAI2005 work on CRFs
    for finite-state String Edit Distance
  • Key attractions
  • Models the transformation of dependency parse
    trees (thus directly models syntax), unlike
    McCallum et al. 05, which only models word
    strings
  • Discriminatively trained (not a generative model,
    unlike QG)
  • Trained on both the positive and negative
    instances of sentence pairs (QG is only trained
    on positive Q/A pairs)
  • CRFs the underlying graphical model is an
    undirected graphical model (QG is basically a
    Bayes Net, directed)
  • Joint model over alignments (vs. local alignment
    models in QG)
  • Feature rich

13
TE-CRFs model in details
  • First of all, lets look at the correspondence
    between alignment (with constraints) and edit
    operations

14
root
root
Q
A
substitute
root
root
met VBD
is VB
substitute
subj
obj
subj
with
Bush NNP person
Jacques Chirac NNP person
who WP qword
leader NN
insert
det
of
Fancy substitute
nmod
president NN
the DT
France NNP location
substitute
delete
nmod
French JJ location
substitute
15
TE-CRFs model in details
  • Each valid tree edit operation sequence that
    transforms one tree into the other corresponds to
    an alignment. A tree edit operation sequence is
    models as a transition sequence among a set of
    states in a FSM

D, S, I
D, S, I
D, S, I
S1
S2
D, E, I
D, S, I
S3
D, S, I
D, S, I
substitute
insert
substitute
delete
substitute
substitute







16
FSM
insert
substitute
substitute
delete
substitute
substitute







This is for one edit operation sequence
substitute
delete
substitute
substitute
substitute
insert
insert
substitute
delete
substitute
substitute
substitute
substitute
substitute
delete
substitute
insert
substitute
There are many other valid edit sequences
17
FSM cont.
e
e
Positive State Set
Start
Stop
e
e
Negative State Set
18
FSM transitions
Positive State Set

S1
S1
S2
S3
S2
S3

S3
S3
S2
S3
S1
S2







S1
S1
S2
S2
S2
S2

S2
S3
S3
S2

S1
S3
Stop
Start
Negative State Set

S1
S1
S2
S3
S2
S3

S3
S3
S2
S3
S1
S2







S1
S1
S2
S2
S2

S2
S2
S3
S3
S2

S1
S3
19
Parameterization
substitute
S1
S2
positive or negative
positive and negative
20
Training using EM
Jensens Inequality
E-step
M-step Using L-BFGS
21
Features for RTE
  • Substitution
  • Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/A
    dv/Other
  • Sub/MisSub -- Punct/Stopword/ModalWord
  • Antonym/Hypernym/Synonym/Nombank/Country
  • Different NE/Pos
  • Unrelated words
  • Delete
  • Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
    hood/Conditional/If
  • Insert
  • Stopword/Punct/NE/Other/Polarity/Quantifier/Likeli
    hood/Conditional/If
  • Tree
  • RootAligned/RootAlignedSameWord
  • Parent,Child,DepRel triple match/mismatch
  • Date/Time/Numerical
  • DateMismatch, hasNumDetMismatch,
    normalizedFormMismatch

22
Tree-edit CRFs for Textual Entailment
  • Preliminary results
  • Trained on RTE2 dev, tested on RTE2 test.
  • model taken after 50 EM iterations
  • acc0.6275, map0.6407.
  • RTE2 official results
  • Hickl (LCC) acc0.7538, map0.8082
  • Tatu (LCC) acc0.7375, map0.7133
  • Zanzotto (Milan Rome) acc0.6388, map0.6441
  • Adams (Dallas) acc0.6262, map0.6282

23
Comparison QG vs. TE-CRFs
QG
TE-CRFs
  • Generative
  • Directed, BayesNet, local
  • Allow arbitrary swapping in alignment
  • Allow limited use of semantic features
    (lexical-semantic log-linear model in mixture
    model)
  • Computationally cheaper
  • Discriminative
  • Undirected, CRFs, global
  • No swapping cant do substitutions that involve
    swapping (can be extended, see future work)
  • Allow arbitrary semantic features
  • Computationally more expensive

24
Future work
QG
TE-CRFs
  • Generative
  • Train discriminatively using Noahs Contrastive
    Estimation
  • Directed, BayesNet, local
  • Higher-order Markovization
  • Allow arbitrary swapping in alignment
  • Allow limited use of semantic features
    (lexical-semantic log-linear model in mixture
    model)
  • Computationally cheaper
  • Run RTE experiments
  • Discriminative
  • Undirected, CRFs, global
  • No swapping
  • Constrained unordered trees
  • Fancy edit operations (e.g. substitute sub-trees)
  • Allow arbitrary semantic features
  • More expensive
  • Run QA and MT alignment experiments

25
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com