Title: Generation
1Generation
2Aims of this talk
- Discuss MRS and LKB generation
- Describe larger research programme modular
generation - Mention some interactions with other work in
progress - RMRS
- SEM-I
3Outline of talk
- Towards modular generation
- Why MRS?
- MRS and chart generation
- Data-driven techniques
- SEM-I and documentation
4Modular architecture
Language independent component
Meaning representation
Language dependent realization
string or speech output
5Desiderata for a portable realization module
- Application independent
- Any well-formed input should be accepted
- No grammar-specific/conventional information
should be essential in the input - Output should be idiomatic
6Architecture (preview)
External LF
SEM-I
Internal LF
specialization modules
Chart generator
control modules
String
7Why MRS?
- Flat structures
- independence of syntax conventional LFs
partially mirror tree structure - manipulation of individual components can ignore
scope structure etc - lexicalised generation
- composition by accumulation of EPs robust
composition - Underspecification
8An excursion Robust MRS
- Deep Thought integration of deep and shallow
processing via compatible semantics - All components construct RMRSs
- Principled way of building robustness into deep
processing - Requirements for consistency etc help human users
too
9Extreme flattening of deep output
some
every
y
dog1
every
some
x
cat
x
y
chase
cat
y
dog1
chase
x
y
x
x
e
y
x
e
y
lb1every_q(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2cat_n(x), lb5dog_n_1(y), lb4some_q(y),
RSTR(lb4,h8), BODY(lb4,h7), lb3chase_v(e),ARG1(lb
3,x), ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5
10Extreme Underspecification
- Factorize deep representation to minimal units
- Only represent what you know
- Robust MRS
- Separating relations
- Separate arguments
- Explicit equalities
- Conventions for predicate names and sense
distinctions - Hierarchy of sorts on variables
11Chart generation with the LKB
- Determine lexical signs from MRS
- Determine possible rules contributing EPs
(construction semantics compound rule etc) - Instantiate signs (lexical and rule) according to
variable equivalences - Apply lexical rules
- Instantiate chart
- Generate by parsing without string position
- Check output against input
12Lexical lookup for generation
- _like_v_1(e,x,y) return lexical entry for sense
1 of verb like - temp_loc_rel(e,x,y) returns multiple lexical
entries - multiple relations in one lexical entry e.g.,
who, where - entries with null semantics heuristics
13Instantiation of entries
- _like_v_1(e,x,y) named(x,Kim)
named(y,Sandy) - find locations corresponding to xs in all FSs
- replace all xs with constant
- repeat for ys etc
- Also for rules contributing construction
semantics - Skolemization (misleading name ...)
14Lexical rule application
- Lexical rules that contribute EPs only used if EP
is in input - Inflectional rules will only apply if variable
has the correct sort - Lexical rule application does morphological
generation (e.g., liked, bought)
15Chart generation proper
- Possible lexical signs added to a chart structure
- Currently no indexing of chart edges
- chart generation can use semantic indices, but
current results suggest this doesnt help - Rules applied as for chart parsing edges checked
for compatibility with input semantics (bag of
EPs)
16Root conditions
- Complete structures must consume all the EPs in
the input MRS - Should check for compatibility of scopes
- precise qeq matching is (probably) too strict
- exactly same scopes is (probably) unrealistic and
too slow
17Generation failures due to MRS issues
- Well-formedness check prior to input to generator
(optional) - Lexical lookup failure predicate doesnt match
entry, wrong arity, wrong variable types - Unwanted instantiations of variables
- Missing EPs in input syntax (e.g., no noun),
lexical selection - Too many EPs in input e.g., two verbs and no
coordination
18Improving generation via corpus-based techniques
- CONTROL e.g. intersective modifier order
- Logical representation does not determine order
- wet(x) weather(x) cold(x)
- UNDERSPECIFIED INPUT e.g.,
- Determiners none/a/the/
- Prepositions in/on/at
19Constraining generation for idiomatic output
- Intersective modifier order e.g., adjectives,
prepositional phrases - Logical representation does not determine order
- wet(x) weather(x) cold(x)
20Adjective ordering
- Constraints / preferences
- big red car
- red big car
- cold wet weather
- wet cold weather (OK, but dispreferred)
- Difficult to encode in symbolic grammar
21Corpus-derived adjective ordering
- ngrams perform poorly
- Thater direct evidence plus clustering
- positional probability
- Malouf (2000) memory-based learning plus
positional probability 92 on BNC
22Underspecified input to generation
- We bought a car on Friday
- Accept
- pron(x) a_quant(y,h1,h2) car(y)
buy(epast,x,y) on(e,z) named(z,Friday) - and
- pron(x) general_q(y,h1,h2) car(y)
buy(epast,x,y) temploc(e,z) named(z,Friday) - And maybe
- pron(x1pl) car(y) buy(epast,x,y)
temp_loc(e,z) named(z,Friday)
23Guess the determiner
- We went climbing in _ Andes
- _ president of _ United States
- I tore _ pyjamas
- I tore _ duvet
- George doesnt like _ vegetables
- We bought _ new car yesterday
24Determining determiners
- Determiners are partly conventionalized, often
predictable from local context - Translation from Japanese etc, speech prosthesis
application - More meaning-rich determiners assumed to be
specified in the input - Minnen et al 85 on WSJ (using TiMBL)
25Preposition guessing
- Choice between temporal in/on/at
- in the morning
- in July
- on Wednesday
- on Wednesday morning
- at three oclock
- at New Year
- ERG uses hand-coded rules and lexical categories
- Machine learning approach gives very high
precision and recall on WSJ, good results on
balanced corpus (Lin Mei, 2004, Cambridge MPhil
thesis)
26SEM-I semantic interface
- Meta-level manually specified grammar
relations (constructions and closed-class) - Object-level linked to lexical database for deep
grammars - Definitional e.g. lemmaPOSsense
- Linked test suites, examples, documentation
27SEM-I development
- SEM-I eventually forms the API stable, changes
negotiated. - SEM-I vs Verbmobil SEMDB
- Technical limitations of SEMDB
- Too painful!
- Munging rules external vs internal
- SEM-I development must be incremental
28Role of SEM-I in architecture
- Offline
- Definition of correct (R)MRS for developers
- Documentation
- Checking of test-suites
- Online
- In unifier/selector reject invalid RMRSs
- Patching up input to generation
29Goal semi-automated documentation
incr tsdb() and semantic test-suite
Lex DB
ERG Documentation strings
Object-level SEM-I
Auto-generate examples
semi-automatic
Documentation
examples, autogenerated on demand
Meta-level SEM-I
autogenerate appendix
30Robust generation
- SEM-I an important preliminary
- check whether generator input is semantically
compatible with grammars - Eventually hierarchy of relations outside
grammars, allowing underspecification - fill-in of underspecified RMRS
- exploit work on determiner guessing etc
31Architecture (again)
External LF
SEM-I
Internal LF
specialization modules
Chart generator
control modules
String
32Interface
- External representation
- public, documented
- reasonably stable
- Internal representation
- syntax/semantics interface
- convenient for analysis
- External/Internal conversion via SEM-I
33Guaranteed generation?
- Given a well-formed input MRS/RMRS, with
elementary predications found in SEM-I (and
dependencies) - Can we generate a string? with input fix up?
negotiation? - Semantically bleached lexical items which, one,
piece, do, make - Defective paradigms, negative polarity,
anti-collocations etc?
34Next stages
- SEM-I development
- Documentation and test suite integration
- Generation from RMRSs produced by shallower
parser (or deep/shallow combination) - Partially fixed text in generation (cogeneration)
- Further statistical modules e.g., locational
prepositions, other modifiers - More underspecification
- Gradually increase flexibility of interface to
generation