Title: Results
1Results see proceedings for full table
Ar Ch Cz Da Du Ge Ja Po Sl Sp Sw Tu Tot SD Bu
McD 66.9 85.9 80.2 84.8 79.2 87.3 90.7 86.8 73.4 82.3 82.6 63.2 80.3 8.4 87.6
Niv 66.7 86.9 78.4 84.8 78.6 85.8 91.7 87.6 70.3 81.3 84.6 65.7 80.2 8.5 87.4
ON 66.7 86.7 76.6 82.8 77.5 85.4 90.6 84.7 71.1 79.8 81.8 57.5 78.4 9.4 85.2
Rie 66.7 90.0 67.4 83.6 78.6 86.2 90.5 84.4 71.2 77.4 80.7 58.6 77.9 10.1 0.0
Sag 62.7 84.7 75.2 81.6 76.6 84.9 90.4 86.0 69.1 77.7 82.0 63.2 77.8 9.0 0.0
Che 65.2 84.3 76.2 81.7 71.8 84.1 89.9 85.1 71.4 80.5 81.1 61.2 77.7 8.7 86.3
Cor 63.5 79.9 74.5 81.7 71.4 83.5 90.0 84.6 72.4 80.4 79.7 61.7 76.9 8.5 83.4
Av 59.9 78.3 67.2 78.3 70.7 78.6 85.9 80.6 65.2 73.5 76.4 56.0 80.0
SD 6.5 8.8 8.9 5.5 6.7 7.5 7.1 5.8 6.8 8.4 6.5 7.7 6.3
2Results continued
- Good parsers good on all languages
- Best overall scores achieved by two very
different approaches - Little difference in ranking (mostly just /-1)
when using UAS or label accuracy metric instead
of LAS - except Two groups with special emphasis on
DEPREL values score 2/3 ranks for label
accuracy one group with bug in HEAD assignment
scores 4 ranks for UAS - Very little difference in scores as well as
rankings when scoring all tokens (i.e. including
punctuation) - But some outliers in ranking for individual
languages - Turkish Johansson and Nugues 7, Yuret 7,
Riedel et al. -5 - Dutch Canisius et al. 6, Schiehlen and Spranger
8
3Analysis easy data sets
Ar Ch Cz Da Du Ge Ja Po Sl Sp Sw Tu Bu
Top score 66.9 90.0 80.2 84.8 79.2 87.3 91.7 87.6 73.4 82.3 84.6 65.7 87.6
Av. score 59.9 78.3 67.2 78.3 70.7 78.6 85.9 80.6 65.2 73.5 76.4 56.0 80.0
Tokens (k) 54 337 1249 94 195 700 151 207 29 89 191 58 190
Tok./tree 37.2 5.9 17.2 18.2 14.6 17.8 8.9 22.8 18.7 27.0 17.3 11.5 14.8
DEP./CPOS 1.9 6.3 6.5 5.2 2 .88 .35 3.7 2.3 1.4 1.5 1.8 1.6
DEP./POS 1.4 .28 1.2 2.2 .09 .88 .09 2.6 .89 .55 1.5 .83 .34
H. prec. 82.9 24.8 50.9 75.0 46.5 50.9 8.9 60.3 47.2 60.8 52.8 6.2 62.9
H. foll. 11.6 58.2 42.4 18.6 44.6 42.7 72.5 34.6 46.9 35.1 40.7 80.4 29.2
np trees 11.2 0.0 23.2 15.6 36.4 27.8 5.3 18.9 22.2 1.7 9.8 11.6 5.4
new FOR. 17.3 9.3 5.2 18.1 20.7 6.5 0.96 11.6 22.0 14.7 18.0 41.4 14.5
new LEM. 4.3 n/a 1.8 n/a 15.9 n/a n/a 7.8 9.9 9.7 n/a 13.2 n/a
4Analysis difficult data sets
Ar Ch Cz Da Du Ge Ja Po Sl Sp Sw Tu Bu
Top score 66.9 90.0 80.2 84.8 79.2 87.3 91.7 87.6 73.4 82.3 84.6 65.7 87.6
Av. score 59.9 78.3 67.2 78.3 70.7 78.6 85.9 80.6 65.2 73.5 76.4 56.0 80.0
Tokens(k) 54 337 1249 94 195 700 151 207 29 89 191 58 190
Tok./tree 37.2 5.9 17.2 18.2 14.6 17.8 8.9 22.8 18.7 27.0 17.3 11.5 14.8
DEP./CPOS 1.9 6.3 6.5 5.2 2 .88 .35 3.7 2.3 1.4 1.5 1.8 1.6
DEP./POS 1.4 .28 1.2 2.2 .09 .88 .09 2.6 .89 .55 1.5 .83 .34
H. prec. 82.9 24.8 50.9 75.0 46.5 50.9 8.9 60.3 47.2 60.8 52.8 6.2 62.9
H. foll. 11.6 58.2 42.4 18.6 44.6 42.7 72.5 34.6 46.9 35.1 40.7 80.4 29.2
np trees 11.2 0.0 23.2 15.6 36.4 27.8 5.3 18.9 22.2 1.7 9.8 11.6 5.4
new FOR. 17.3 9.3 5.2 18.1 20.7 6.5 0.96 11.6 22.0 14.7 18.0 41.4 14.5
new LEM. 4.3 n/a 1.8 n/a 15.9 n/a n/a 7.8 9.9 9.7 n/a 13.2 n/a
5The future Using the resources Parsed test
sets
- Parser combination
- Check test cases where parser majority disagrees
with treebank - Possible reasons
- Challenge for current parser technology
- Treebank annotation wrong
- Conversion wrong
- (Non-sentence)
- for German test data checked cases where 17 out
of 18 parsers said DEPRELX but gold standard had
DEPRELY (11 cases) - 4 challenge distinguishing PP complements from
adjuncts - 1 treebank PoS tag wrong
- 1 non-sentence
- 5 either treebank or conversion wrong (to be
investigated)
6The future Using the resources Parsers
- Collaborate with treebank providers or other
treebank experts - To semi-automatically enlarge or improve the
treebank - To use automatically parsed texts as input for
other NLP projects - Please let us know if you are willing to help
- Arabic Otakar Smrž, Jan Hajic also general
feedback - Bulgarian Petya Osenova, Kiril Simov
- Czech Jan Hajic
- German Martin Forst, Michael Schiehlen, Kristina
Spranger - Portuguese Eckhard Bick
- Slovene Tomaž Erjavec
- Spanish Toni MartÃ, Roser Morante
- Swedish Joakim Nivre
- Turkish Gülsen Eryigit
- Gertjan van Noord (Dutch Alpino treebank)
interested in general feedback
7The future Improving the resources
- http//nextens.uvt.nl/conll/ is a static web
page - But hopefully many other people will continue
this line of research - They need a platform to exchange (information
about) - Experience with/bug reports about/patches for
treebanks - Treebank conversion and validation scripts (esp.
head tables!) - Other new/improved tools (e.g. visualization,
analysis) - Details of experiments on new treebanks (e.g.
training/test split) - Predictions on test sets by new/improved parsers
- ...
- SIGNLL agreed to provide such a platform (hosted
at Tilburg University) - http//depparse.uvt.nl/ a Wiki site where
everybody is welcome to contribute!
8Acknowledgements Many thanks to
- Jan Hajic for granting the PDT/PADT temporary
licenses for CoNLL-X and talking to LDC about it - Christopher Cieri for arranging distribution
through LDC and Tony Castelletto for handling the
distribution - Otakar Smrž for valuable help during the
conversion of PADT - the SDT people for granting the special license
for CoNLL-X and Tomaž Erjavec for converting the
SDT for us - Matthias Trautner Kromann and assistants for
creating the DDT and releasing it under the GNU
General Public License - Joakim Nivre, Johan Hall and Jens Nilsson for the
conversion of DDT to Malt-XML, for the conversion
of the original Talbanken to Talbanken05 and for
making it freely available for research purposes - Joakim Nivre again for prompt and proper response
to all our questions - Bilge Say and Kemal Oflazer for granting the
Metu-Sabanci license for CoNLL-X and answering
questions - Gülsen Eryigit for making many corrections to
Metu-Sabanci and discussing some aspects of the
conversion
9Acknowledgements continued
- the TIGER team (esp. Martin Forst) for allowing
us to use the treebank for the shared task - Yasuhiro Kawata, Julia Bartels and colleagues
from Tübingen University for the construction of
the Japanese Verbmobil treebank - Sandra Kübler for providing the Japanese
Verbmobil data and granting the special license
for CoNLL-X - Diana Santos, Eckhard Bick and other Floresta
sintá(c)tica project members for creating the
treebank and making it publicly available, for
answering many questions about the treebank
(Diana and Eckhard), for correcting problems and
making new releases (Diana), and for sharing
scripts and explaining the head rules implemented
in them (Eckhard) - Jason Baldridge for useful discussions and to Ben
Wing for independently reporting problems which
Diana then fixed - Gertjan van Noord and the other people at
Groningen University for creating the Alpino
Treebank and releasing it for free - Gertjan van Noord for answering all our questions
and for providing extra test material - Antal van den Bosch for help with the
memory-based tagger
10Acknowledgements continued
- Academia Sinica for granting the Sinica treebank
temporary license for CoNLL-X - Keh-Jiann Chen for answering our questions about
the Sinica treebank - Montserrat Civit and Toni Martà for allowing us
to use Cast3LB for CoNLL-X and supplying the head
table and function mapping - Kiril Simov and Petya Osenova for allowing us to
use the BulTreeBank for CoNLL-X - Svetoslav Marinov, Atanas Chanev, Kiril Simov and
Petya Osenova for converting the BulTreeBank - Dan Bikel for making the Randomized Parsing
Evaluation Comparator - SIGNLL for having a shared task
- Erik Tjong Kim Sang for posting the Call for
Papers - LluÃs MÃ rquez and Xavier Carreras for sharing
their experience from organizing the two previous
shared tasks - LluÃs MÃ rquez for being a very helpful CoNLL
organizer - All participants, including those who could not
submit results or cannot be here today - It has been a pleasure working with you!
11Future research
- More result analysis
- Baseline
- Correlation between parsing approach and types of
errors - Importance of individual features and algorithm
details - Repeat experiments on improved data
- POSTAG and FEATS for Talbanken05, LEMMA for
Talbanken05 (DDT) - LEMMA and FEATS for new version of TIGER, POSTAG
for TIGER - better DEPREL for Cast3LB
- larger treebanks
- having several good parsers facilitates
annotation of more text! - better quality
- check cases where parsers and treebank disagree!
12Future research continued
- Repeat experiments with other parsers
- http//nextens.uvt.nl/conll/post_task_data.html
- Repeat experiments with additional external
information - large-scale distributional data harvested from
the internet - Similar experiments but including secondary
relations - Similar experiments on other data
- Hebrew treebank
- SUSANNE treebank (English)
- Kyoto University Corpus, ATR corpus (Japanese)
- Szeged Corpus (Hungarian)
- see list in Wikipedia article on treebank
(please contribute!) - Integrate automatic tokenization, morphological
analysis and POS tagging