Developing a German grammar for analysis and generation using OpenCCG - PowerPoint PPT Presentation

About This Presentation
Title:

Developing a German grammar for analysis and generation using OpenCCG

Description:

multi-lingual. both analysis and generation. resources for German ... multi-lingual extendable. used and in use in several other projects: FLIGHTS, COMIC, COSY ... – PowerPoint PPT presentation

Number of Views:230
Avg rating:3.0/5.0
Slides: 37
Provided by: coli9
Category:

less

Transcript and Presenter's Notes

Title: Developing a German grammar for analysis and generation using OpenCCG


1
Developing a German grammar for analysis and
generation using OpenCCG
  • Ciprian Gerstenberger
  • University of Saarland
  • IGK Colloquium January 13th 2005

2
Outline
  1. NLP environments a comparison
  2. The choice OpenCCG
  3. The formalism MMCCG
  4. The German grammar
  5. Future work

3
Dialogue systems
  • building dialogue systems ? linguistic resources
  • linguistic resources ? tools for developing and
    maintaining
  • wide range of different NLP environments
  • ? Which is the most appropriate environment for
    our purposes?

4
NLP environments for dialogue systems
  • General requirements
  • both for analysis and generation
  • multi-lingual
  • easy domain reconfigurability
  • Requirements for NLG
  • realization of contextually sensitive utterances
  • linguistically motivated control over flexible
    sentence realization

5
NLP environments for dialogue systems
  • Technical requirements
  • freely available
  • well documented
  • offering support when needed
  • freely available resources (for German)
  • efficient
  • platform independent

6
NLP environments
  • KPML (Lisp) Systemic-Functional Grammar (SFG)
  • OpenCCG (Java) Multi-Modal Combinatory
    Categorial Grammar (MMCCG)
  • Babel (Prolog) Head-Driven Phrase Structure
    Grammar (HPSG)
  • LKB (Lisp) Head-Driven Phrase Structure Grammar
    (HPSG)
  • XLE (C) Lexical Functional Grammar (LFG)
  • XTAG (Lisp) Tree Adjoning Grammar (TAG)
  • XDG (Oz) Topological Dependency Grammar (TDG)

7
NLP environments Babel
  • Babel-System (S. Müller)
  • implementing HPSG
  • Prolog
  • only analysis, no generation
  • multi-lingual (?)
  • resources for German grammar with good coverage
  • freely available
  • documentation
  • support (?)

8
NLP environments LKB
  • LKB
  • implementing HPSG
  • Lisp
  • multi-lingual
  • both analysis and generation
  • but resources for German not usable for
    generation
  • freely available
  • documentation
  • support (?)

9
NLP environments XTAG
  • XTAG
  • implementing TAG
  • Lisp
  • both analysis and generation
  • multi-lingual
  • resources for German (DFKI ?)
  • freely available
  • documentation
  • support (?)

10
NLP environments XDG
  • XDG
  • implementing TDG
  • Oz
  • only analysis (generation as dependency parsing
    using TAGs)
  • multi-lingual (?)
  • resources for German (toy grammars)
  • freely available
  • documentation (?)
  • support

11
NLP environments KPML
  • KOMET-Penman Multilingual Linguistic resource
    development
  • implementing Systemic-Functional Grammar (SFG)
  • Lisp
  • multi-lingual
  • flexible generation
  • good sentence realization control
  • only for generation, no parsing
  • resources for German grammar with good coverage
  • freely available
  • documentation and support

12
NLP environments XLE
  • Xerox Linguistic Environment
  • implementing LFG
  • C and Tcl/Tk
  • multi-lingual
  • both analysis and generation
  • resources for German (not freely available)
  • documentation
  • support
  • not freely available

13
NLP environments OpenCCG
  • OpenCCG
  • implementing Multi-Modal Combinatory Categorial
    Grammar (MMCCG)
  • open source Java-based NLP library
  • both analysis and generation
  • multi-lingual
  • no resources for German, but grammars for English
  • freely available
  • documentation
  • support

14
NLP environments The Choice
  • OpenCCG
  • Java-based NLP library ? platform independent
  • analysis and generation ? uniform grammar
    resources
  • multi-lingual ? extendable
  • used and in use in several other projects
    FLIGHTS, COMIC, COSY
  • supporting output format for TTS (e.g. APML)
  • optimized sentence realization
  • flexible generation
  • sentence realization control

15
Basic formalism CCG
  • Combinatory Categorial Grammar
  • lexicalized grammar formalism
  • lexical items are assigned syntactic categories
  • combinatory rules

16
MMCCG
  • Multi-Modal Combinatory Categorial Grammar
  • refining CCG by introducing means of controlling
    the application of combinatory rules
  • specifying modes on category forming operators
    (slashes)
  • making application of rules dependent on the
    slash mode
  • four basic modes governing different levels of
    associativity and permutativity

17
Example
  • Der Hund sieht die Katze.

18
Example (cont.)
  • Der Hund sieht die Katze.

19
Developing a German Grammar
  • joint work with Magdalena Wolska (DIALOG Project)
  • Desiderata
  • uniform resources for analysis and generation
  • covering all phenomena in our domains
  • achieve more generality of the grammar than wrt
    phenomena encountered in our (relatively small)
    corpora

20
Phenomena
  • Some phenomena in German
  • agreement
  • position of the finite verb
  • Topological Fields controlling the Vorfeld
  • complex sentences
  • ambiguity
  • controlling sentence realization

21
Lexical forms

22
Agreement

23
Agreement (cont.)

24
Agreement/Complex sentences

25
Clause types
  • Verb-initial clauses
  • yes/no questions
  • Soll ich die den Titel zu der Liste hinzufügen?
  • alternative questions
  • Möchtest Du Mozart oder Bach hören?
  • imperatives
  • Wähle das Album Californication von den Red Hot
    Chili Peppers!

26
Clause types (cont.)
  • Verb-second clauses
  • main declarative
  • Der Titel wurde hinzugefügt.
  • wh-question
  • Welcher Künstler spielt Missunderstood?

27
Clause types (cont.)
  • Verb-final clauses
  • subordinate clause
  • Wenn Sie möchten, kann ich We Just Cant Get
    Enough CCG abspielen.
  • relative clause
  • Ich nehme aus den ersten vier Alben, die du hast,
    jeweils den ersten Song.
  • complement clause
  • Ich glaube, daß das Album Dangerously In Love
    heißt.

28
Topological Fields
Controlling the Vorfeld occupation using flags
29
Topological Fields (cont.)
Controlling the Vorfeld occupation using flags
30
Analysis Ambiguities
Der Hund von dem traurigen Mann den ich sah
rennt.
31
Analysis Ambiguities (cont.)
Das Kind rennt wenn der Hund rennt weil die Katze
rennt.
32
Generation
  • Sentence realization without control

33
Generation (cont.)
  • Sentence realization with control fronted
    subject

34
Generation (cont.)
  • Sentence realization with control fronted object

35
Future Work (1)
  • extending the grammar wrt the two domain
    currently modelled (MP3 and maths tutorial)
  • (AP, NP, sentence, etc.) coordination
  • complex NP (e.g. postmodifications)
  • control and raising verbs
  • particle verbs (Ich spiele den Song ab vs. Ich
    möchte den Song abspielen)
  • Topological Fields scrambling in the Mittelfeld

36
Future Work (2)
  • analysis coping with partial input, ill-formed
    utterances
  • generation realizing elliptical output
  • using a dynamic morphological module
  • development of an ontology
Write a Comment
User Comments (0)
About PowerShow.com