Title: Human Judgements in Parallel Treebank Alignment
1Human Judgements in Parallel Treebank Alignment
- Martin Volk, Torsten Marek, Yvonne Samuelsson
- University of Zurich and Stockholm University
- volk_at_cl.uzh.ch
2English Syntax Tree
3(No Transcript)
4(No Transcript)
5DE EN Alignment
6SMULTRON
- Stockholm MULtilingual TReebank
- 1000 sentences in 3 languages (DE-EN-SV)
- 500 from Jostein Gaarders Sophies World ( 7
500 tokens, 14 tokens/sentence) and - 500 from Economy texts ( 11 000 tokens, 22
tokens/sentence) - ABB Quarterly report
- Rainforest Alliance Banana Certification Program
- SEB Annual report
- Released January 2008 www.ling.su.se/dali/researc
h/smultron/index.htm
7German Annotation
8German sentence flat annotation
9German sentence deepened
10English Annotation
11English Syntax Tree
12English annotation
- Follows the Penn Treebank guidelines
- Slower annotation because of
- insertion of traces
- secondary edges
- deeper trees
13(No Transcript)
14Tree Alignment
15- Sentence alignment
- Word alignment
- input for Statistical MT
- Phrase alignment
- linguistically motivated phrases
- input for Example-based MT
16Alignment Example
17Tools for Parallel Treebanks
- creating and editing trees
- from mono-lingual treebanks
- PoS-taggers, chunkers, editor, tree-enricher
- aligning phrases
- use of word alignment tools
- tree alignment editor ? Stockholm TreeAligner
- searching across languages
- TIGER-Search for parallel treebanks ? Stockholm
TreeAligner
18Guidelines for Alignment
- Align words and phrases that represent the same
meaning and could serve as translation units in
an MT system. - Align as many words and phrases as possible.
- Distinguish between exact and approximate
alignments. - 1n word / phrase alignments are allowed, but not
mn word / phrase alignments. - mn sentence alignments are allowed.
19Examples
- Do not align
- die Verwunderung über das Leben
- their astonishment at the world
- Do align
- was für eine seltsame Welt
- what an extraordinary world
20Specific rules
- a pronoun in one language shall never be aligned
with a full noun in the other - names are aligned regardless of spelling, unless
the name is changed (fiction) - ignore number/case but not voice
21Exact vs approximate alignment
- best vs. second-best translation
- an acronym in one language shall be aligned as
approximate (fuzzy) with a spelled-out term in
the other - PT Power Technologies
- difficult distinctions
- einer der ersten Tage im Mai early May
22Related Research
- Blinker project (Melamed)
- Prague Czech-English Treebank
- Example-based MT in Dublin
- Linköping English-Swedish Treebank
23Experiment
- 12 students to align 20 tree pairs DE-EN
- 10 tree pairs from Sophies world
- 10 tree pairs from Economy text
- advanced CL students
- received
- short introduction
- the written guidelines
24Gold Standard Alignment (DE-EN)
word - word word - word phrase - phrase phrase - phrase
exact approx. exact approx.
10 sent. Sophie 75 3 46 12
10 sent. Sophie 78 78 58 58
10 sent. Econ 159 19 62 9
10 sent. Econ 178 178 71 71
25Experiment Results
- The students created
- a huge variety in number of alignments
- Sophie part from 47 to 125 (ø 94.3)
- Econ part from 62 to 259 (ø 186.9)
- ? the 3 students with the lowest numbers were
non-native speakers of German - ? 1 student had misunderstood the task
26Experiment Results
- The remaining 8 students had a high overlap with
the gold standard (Recall) - Sophie part from 48 to 81 (ø 68.7)
- Econ part from 66 to 89 (ø 75.5)
- Precision
- Sophie part from 81 to 97 (ø 89.1)
- Econ part from 78 to 94 (ø 88.2)
27Discrepancies
- students sometimes aligned a word (or some words)
with a node. - e.g. the word natürlich to the phrase of course
- students sometimes aligned a German verb group
with a single verb form in English - e.g. ist zurückzuführen vs. reflecting
28Discrepancies
- based on different grammatical forms
- a definite single NP in German with an indefinite
plural NP in English - der Umsatz vs. revenues
- a German genitive NP with a PP in English
- der beiden Divisionen vs. of the two divisions
29Missed by all students
- alignment of German word to empty token in
English - wenn sie die Hand ausstreckte vs.
- herself shaking hands
30(No Transcript)
31Conclusions
- Our alignment guidelines are sufficient for a
core of clear alignment decisions. - Needed
- Better alignment rules with concrete examples.
- Better support tools (consistency checking).
- The distinction between exact alignment and
approximate alignment is very tricky.
32Thank You for Your Attention!
33Applications of Parallel Treebanks
- For the Translator
- corpus for translation studies
- search tools needed
- For the Computational Linguist
- input for Example-based Machine Translation
- evaluation corpus for word, phrase or clause
alignment - training corpus for transfer rules
34Alignment Example
35Parallel Treebanking
SV sentence
DE sentence
ANNOTATE - PoS tagger (STTS) - Chunker (TIGER)
PoS tagger (SUC) STTS conversion ANNOTATE -
Chunker (SWE-TIGER)
flat DE tree
flat SV tree
Deepening
Deepening Back conv.
DE tree
SV tree
phrase alignment