Title: Developing a German grammar for analysis and generation using OpenCCG
1Developing a German grammar for analysis and
generation using OpenCCG
- Ciprian Gerstenberger
- University of Saarland
- IGK Colloquium January 13th 2005
2Outline
- NLP environments a comparison
- The choice OpenCCG
- The formalism MMCCG
- The German grammar
- Future work
3Dialogue systems
- building dialogue systems ? linguistic resources
- linguistic resources ? tools for developing and
maintaining - wide range of different NLP environments
- ? Which is the most appropriate environment for
our purposes?
4NLP environments for dialogue systems
- General requirements
- both for analysis and generation
- multi-lingual
- easy domain reconfigurability
- Requirements for NLG
- realization of contextually sensitive utterances
- linguistically motivated control over flexible
sentence realization
5NLP environments for dialogue systems
- Technical requirements
- freely available
- well documented
- offering support when needed
- freely available resources (for German)
- efficient
- platform independent
6NLP environments
- KPML (Lisp) Systemic-Functional Grammar (SFG)
- OpenCCG (Java) Multi-Modal Combinatory
Categorial Grammar (MMCCG) - Babel (Prolog) Head-Driven Phrase Structure
Grammar (HPSG) - LKB (Lisp) Head-Driven Phrase Structure Grammar
(HPSG) - XLE (C) Lexical Functional Grammar (LFG)
- XTAG (Lisp) Tree Adjoning Grammar (TAG)
- XDG (Oz) Topological Dependency Grammar (TDG)
7NLP environments Babel
- Babel-System (S. Müller)
- implementing HPSG
- Prolog
- only analysis, no generation
- multi-lingual (?)
- resources for German grammar with good coverage
- freely available
- documentation
- support (?)
8NLP environments LKB
- LKB
- implementing HPSG
- Lisp
- multi-lingual
- both analysis and generation
- but resources for German not usable for
generation - freely available
- documentation
- support (?)
9NLP environments XTAG
- XTAG
- implementing TAG
- Lisp
- both analysis and generation
- multi-lingual
- resources for German (DFKI ?)
- freely available
- documentation
- support (?)
10NLP environments XDG
- XDG
- implementing TDG
- Oz
- only analysis (generation as dependency parsing
using TAGs) - multi-lingual (?)
- resources for German (toy grammars)
- freely available
- documentation (?)
- support
11NLP environments KPML
- KOMET-Penman Multilingual Linguistic resource
development - implementing Systemic-Functional Grammar (SFG)
- Lisp
- multi-lingual
- flexible generation
- good sentence realization control
- only for generation, no parsing
- resources for German grammar with good coverage
- freely available
- documentation and support
12NLP environments XLE
- Xerox Linguistic Environment
- implementing LFG
- C and Tcl/Tk
- multi-lingual
- both analysis and generation
- resources for German (not freely available)
- documentation
- support
- not freely available
13NLP environments OpenCCG
- OpenCCG
- implementing Multi-Modal Combinatory Categorial
Grammar (MMCCG) - open source Java-based NLP library
- both analysis and generation
- multi-lingual
- no resources for German, but grammars for English
- freely available
- documentation
- support
14NLP environments The Choice
- OpenCCG
- Java-based NLP library ? platform independent
- analysis and generation ? uniform grammar
resources - multi-lingual ? extendable
- used and in use in several other projects
FLIGHTS, COMIC, COSY - supporting output format for TTS (e.g. APML)
- optimized sentence realization
- flexible generation
- sentence realization control
15Basic formalism CCG
- Combinatory Categorial Grammar
- lexicalized grammar formalism
- lexical items are assigned syntactic categories
- combinatory rules
16MMCCG
- Multi-Modal Combinatory Categorial Grammar
- refining CCG by introducing means of controlling
the application of combinatory rules - specifying modes on category forming operators
(slashes) - making application of rules dependent on the
slash mode - four basic modes governing different levels of
associativity and permutativity
17Example
- Der Hund sieht die Katze.
18Example (cont.)
- Der Hund sieht die Katze.
19Developing a German Grammar
- joint work with Magdalena Wolska (DIALOG Project)
- Desiderata
- uniform resources for analysis and generation
- covering all phenomena in our domains
- achieve more generality of the grammar than wrt
phenomena encountered in our (relatively small)
corpora
20Phenomena
- Some phenomena in German
- agreement
- position of the finite verb
- Topological Fields controlling the Vorfeld
- complex sentences
- ambiguity
- controlling sentence realization
21Lexical forms
22Agreement
23Agreement (cont.)
24Agreement/Complex sentences
25Clause types
- Verb-initial clauses
- yes/no questions
- Soll ich die den Titel zu der Liste hinzufügen?
- alternative questions
- Möchtest Du Mozart oder Bach hören?
- imperatives
- Wähle das Album Californication von den Red Hot
Chili Peppers!
26Clause types (cont.)
- Verb-second clauses
- main declarative
- Der Titel wurde hinzugefügt.
- wh-question
- Welcher Künstler spielt Missunderstood?
27Clause types (cont.)
- Verb-final clauses
- subordinate clause
- Wenn Sie möchten, kann ich We Just Cant Get
Enough CCG abspielen. - relative clause
- Ich nehme aus den ersten vier Alben, die du hast,
jeweils den ersten Song. - complement clause
- Ich glaube, daß das Album Dangerously In Love
heißt.
28Topological Fields
Controlling the Vorfeld occupation using flags
29Topological Fields (cont.)
Controlling the Vorfeld occupation using flags
30Analysis Ambiguities
Der Hund von dem traurigen Mann den ich sah
rennt.
31Analysis Ambiguities (cont.)
Das Kind rennt wenn der Hund rennt weil die Katze
rennt.
32Generation
- Sentence realization without control
33Generation (cont.)
- Sentence realization with control fronted
subject
34Generation (cont.)
- Sentence realization with control fronted object
35Future Work (1)
- extending the grammar wrt the two domain
currently modelled (MP3 and maths tutorial) - (AP, NP, sentence, etc.) coordination
- complex NP (e.g. postmodifications)
- control and raising verbs
- particle verbs (Ich spiele den Song ab vs. Ich
möchte den Song abspielen) - Topological Fields scrambling in the Mittelfeld
36Future Work (2)
- analysis coping with partial input, ill-formed
utterances - generation realizing elliptical output
- using a dynamic morphological module
- development of an ontology