Title: Grammar Development Platform
1Grammar Development Platform
2Grammar Development
What is a Grammar Development Platform good for?
- Information Retrieval/Extraction
XLE
English Anna sees the man.
German Anna sieht den Mann.
Parser
Generator
MT
English c-str and f-str
German f-str
3A Sample Development Platform
XLE (Xerox Linguistic Environment)
- Main Developer John Maxwell (PARC)
- Software (Shareware) Emacs, Tcl/Tk
- Platforms Unix (Solaris), Linux, MacOsX
4A Sample Development Platform
XLE (Xerox Linguistic Environment)
- Linguistic Theory LFG (Lexical-Functional
Grammar) orginally developed by Ronald M. Kaplan
(PARC) and Joan Bresnan (Stanford)
- Parser Bottom-Up, Left-to-Right
- Performance Worst-case exponential, polynomial
in practice (makes broad-coverage grammars
feasible)
5Palo Alto Research Center (PARC), English Grammar
IMS, University of Stuttgart German Grammar
Fuji Xerox Japanese Grammar
The ParGram Project
University of Bergen Norwegian Bokmal and
Nynorsk
UMIST Urdu Grammar
XRCE Grenoble French Grammar
6ParGram
Possible Applications
- Machine Translation (French, English)
- Tree Banking (English, German)
- Smart Text Annotation (German)
- Robust Parsing (English, German, French)
- Information Extraction (English)
7Grammar Components
Each Grammar Contains
- Phrase Structure Rules (S NP VP)
- Lexicon (verb stems and functional elements)
- Finite-State Morphological Analyzer
No Semantics
8Phrase Structure Rules
Formulation as used today goes back to Chomsky
1957.
Sample Set for English
S NP VP
VP V NP
NP D (ADJ) N
Why these kinds of rules?
- Natural Language is recursive and potentially
infinite.
- Constituency, X-bar Theory
9Phrase Structure Rules
The syntax of natural languages is context-free.
Colorless green ideas sleep furiously.
However, we must also deal with context-sensitive
information.
The monkey sleeps.
The monkey sleep.
The monkeys sleeps.
10Features and Unifications
Context-Sensitivity can be achieved in many ways.
XLE and LFG (like many other theories/platforms)
uses phrase-structure annotation via
attribute-value pairs.
S NP VP (?SUBJ) ? (?SUBJ NUM) (?
NUM)
XLE
Features are checked via Unificaition.
11The Ambiguity Problem
XLE
PP-Attachment
The girl saw the monkey with the telescope.
Categorial Ambiguity
Flying planes can be dangerous.
Time flies like an arrow.
12Lexicons
Typically Contain
- Category Information (Terminal Node in Tree)
- Context Sensitive Featural Information
- Subcategorization Information
- Semantics (sometimes)
XLE
13Ambiguity in Large Grammars
Ambiguity a serious problem even in simple
sentences
- Subject/Object Ambiguities (German)
Within XLE various techniques have been invented
to cut down on the explosion of parses.
XLE
14Morphologies and Tokenizers
Beyond the Word Writing and adding in
Morphological Analysis and Tokenization
XLE
15Parallel Analyses
Languages Differ on the Surface (c-structure)
English Yassin was seen. German Yassin wurde
gesehen. Urdu yassin dekha gaya
XLE
ParGram Goal The same underlying f-structures
for all languages (modulo lexical semantics).
16The Parallel in ParGram
Analyses at the level of f-structure are held as
parallel as possible across languages
(crosslinguistic invariance).
- Theoretical Advantage This models the idea of
UG.
- Applicational Advantage machine translation is
made easier.
Analyses at the level of c-structure are allowed
to differ much more (variance across languages).
17FST Morphological Analyzers
Kaplan and Butt (2002) this LFG
morphology-syntax interface is natural
calana to drive (M.Sg)
surface form
Sequence Relation
driveVerbInfMSg
Lexical Relation
NUM sg
VFORM inf
GEND masc
Satisfaction Relation
f-structure (m-structure)
18(No Transcript)