Generation - PowerPoint PPT Presentation

About This Presentation
Title:

Generation

Description:

Generation – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 35
Provided by: anncop
Category:
Tags: generation

less

Transcript and Presenter's Notes

Title: Generation


1
Generation
2
Aims of this talk
  • Discuss MRS and LKB generation
  • Describe larger research programme modular
    generation
  • Mention some interactions with other work in
    progress
  • RMRS
  • SEM-I

3
Outline of talk
  • Towards modular generation
  • Why MRS?
  • MRS and chart generation
  • Data-driven techniques
  • SEM-I and documentation

4
Modular architecture
Language independent component
Meaning representation
Language dependent realization
string or speech output
5
Desiderata for a portable realization module
  • Application independent
  • Any well-formed input should be accepted
  • No grammar-specific/conventional information
    should be essential in the input
  • Output should be idiomatic

6
Architecture (preview)
External LF
SEM-I
Internal LF
specialization modules
Chart generator
control modules
String
7
Why MRS?
  • Flat structures
  • independence of syntax conventional LFs
    partially mirror tree structure
  • manipulation of individual components can ignore
    scope structure etc
  • lexicalised generation
  • composition by accumulation of EPs robust
    composition
  • Underspecification

8
An excursion Robust MRS
  • Deep Thought integration of deep and shallow
    processing via compatible semantics
  • All components construct RMRSs
  • Principled way of building robustness into deep
    processing
  • Requirements for consistency etc help human users
    too

9
Extreme flattening of deep output
some
every
y
dog1
every
some
x
cat
x
y
chase
cat
y
dog1
chase
x
y
x
x
e
y
x
e
y
lb1every_q(x), RSTR(lb1,h9), BODY(lb1,h6),
lb2cat_n(x), lb5dog_n_1(y), lb4some_q(y),
RSTR(lb4,h8), BODY(lb4,h7), lb3chase_v(e),ARG1(lb
3,x), ARG2(lb3,y), h9 qeq lb2,h8 qeq lb5
10
Extreme Underspecification
  • Factorize deep representation to minimal units
  • Only represent what you know
  • Robust MRS
  • Separating relations
  • Separate arguments
  • Explicit equalities
  • Conventions for predicate names and sense
    distinctions
  • Hierarchy of sorts on variables

11
Chart generation with the LKB
  1. Determine lexical signs from MRS
  2. Determine possible rules contributing EPs
    (construction semantics compound rule etc)
  3. Instantiate signs (lexical and rule) according to
    variable equivalences
  4. Apply lexical rules
  5. Instantiate chart
  6. Generate by parsing without string position
  7. Check output against input

12
Lexical lookup for generation
  • _like_v_1(e,x,y) return lexical entry for sense
    1 of verb like
  • temp_loc_rel(e,x,y) returns multiple lexical
    entries
  • multiple relations in one lexical entry e.g.,
    who, where
  • entries with null semantics heuristics

13
Instantiation of entries
  • _like_v_1(e,x,y) named(x,Kim)
    named(y,Sandy)
  • find locations corresponding to xs in all FSs
  • replace all xs with constant
  • repeat for ys etc
  • Also for rules contributing construction
    semantics
  • Skolemization (misleading name ...)

14
Lexical rule application
  • Lexical rules that contribute EPs only used if EP
    is in input
  • Inflectional rules will only apply if variable
    has the correct sort
  • Lexical rule application does morphological
    generation (e.g., liked, bought)

15
Chart generation proper
  • Possible lexical signs added to a chart structure
  • Currently no indexing of chart edges
  • chart generation can use semantic indices, but
    current results suggest this doesnt help
  • Rules applied as for chart parsing edges checked
    for compatibility with input semantics (bag of
    EPs)

16
Root conditions
  • Complete structures must consume all the EPs in
    the input MRS
  • Should check for compatibility of scopes
  • precise qeq matching is (probably) too strict
  • exactly same scopes is (probably) unrealistic and
    too slow

17
Generation failures due to MRS issues
  • Well-formedness check prior to input to generator
    (optional)
  • Lexical lookup failure predicate doesnt match
    entry, wrong arity, wrong variable types
  • Unwanted instantiations of variables
  • Missing EPs in input syntax (e.g., no noun),
    lexical selection
  • Too many EPs in input e.g., two verbs and no
    coordination

18
Improving generation via corpus-based techniques
  • CONTROL e.g. intersective modifier order
  • Logical representation does not determine order
  • wet(x) weather(x) cold(x)
  • UNDERSPECIFIED INPUT e.g.,
  • Determiners none/a/the/
  • Prepositions in/on/at

19
Constraining generation for idiomatic output
  • Intersective modifier order e.g., adjectives,
    prepositional phrases
  • Logical representation does not determine order
  • wet(x) weather(x) cold(x)

20
Adjective ordering
  • Constraints / preferences
  • big red car
  • red big car
  • cold wet weather
  • wet cold weather (OK, but dispreferred)
  • Difficult to encode in symbolic grammar

21
Corpus-derived adjective ordering
  • ngrams perform poorly
  • Thater direct evidence plus clustering
  • positional probability
  • Malouf (2000) memory-based learning plus
    positional probability 92 on BNC

22
Underspecified input to generation
  • We bought a car on Friday
  • Accept
  • pron(x) a_quant(y,h1,h2) car(y)
    buy(epast,x,y) on(e,z) named(z,Friday)
  • and
  • pron(x) general_q(y,h1,h2) car(y)
    buy(epast,x,y) temploc(e,z) named(z,Friday)
  • And maybe
  • pron(x1pl) car(y) buy(epast,x,y)
    temp_loc(e,z) named(z,Friday)

23
Guess the determiner
  • We went climbing in _ Andes
  • _ president of _ United States
  • I tore _ pyjamas
  • I tore _ duvet
  • George doesnt like _ vegetables
  • We bought _ new car yesterday

24
Determining determiners
  • Determiners are partly conventionalized, often
    predictable from local context
  • Translation from Japanese etc, speech prosthesis
    application
  • More meaning-rich determiners assumed to be
    specified in the input
  • Minnen et al 85 on WSJ (using TiMBL)

25
Preposition guessing
  • Choice between temporal in/on/at
  • in the morning
  • in July
  • on Wednesday
  • on Wednesday morning
  • at three oclock
  • at New Year
  • ERG uses hand-coded rules and lexical categories
  • Machine learning approach gives very high
    precision and recall on WSJ, good results on
    balanced corpus (Lin Mei, 2004, Cambridge MPhil
    thesis)

26
SEM-I semantic interface
  • Meta-level manually specified grammar
    relations (constructions and closed-class)
  • Object-level linked to lexical database for deep
    grammars
  • Definitional e.g. lemmaPOSsense
  • Linked test suites, examples, documentation

27
SEM-I development
  • SEM-I eventually forms the API stable, changes
    negotiated.
  • SEM-I vs Verbmobil SEMDB
  • Technical limitations of SEMDB
  • Too painful!
  • Munging rules external vs internal
  • SEM-I development must be incremental

28
Role of SEM-I in architecture
  • Offline
  • Definition of correct (R)MRS for developers
  • Documentation
  • Checking of test-suites
  • Online
  • In unifier/selector reject invalid RMRSs
  • Patching up input to generation

29
Goal semi-automated documentation
incr tsdb() and semantic test-suite
Lex DB
ERG Documentation strings
Object-level SEM-I
Auto-generate examples
semi-automatic
Documentation
examples, autogenerated on demand
Meta-level SEM-I
autogenerate appendix
30
Robust generation
  • SEM-I an important preliminary
  • check whether generator input is semantically
    compatible with grammars
  • Eventually hierarchy of relations outside
    grammars, allowing underspecification
  • fill-in of underspecified RMRS
  • exploit work on determiner guessing etc

31
Architecture (again)
External LF
SEM-I
Internal LF
specialization modules
Chart generator
control modules
String
32
Interface
  • External representation
  • public, documented
  • reasonably stable
  • Internal representation
  • syntax/semantics interface
  • convenient for analysis
  • External/Internal conversion via SEM-I

33
Guaranteed generation?
  • Given a well-formed input MRS/RMRS, with
    elementary predications found in SEM-I (and
    dependencies)
  • Can we generate a string? with input fix up?
    negotiation?
  • Semantically bleached lexical items which, one,
    piece, do, make
  • Defective paradigms, negative polarity,
    anti-collocations etc?

34
Next stages
  • SEM-I development
  • Documentation and test suite integration
  • Generation from RMRSs produced by shallower
    parser (or deep/shallow combination)
  • Partially fixed text in generation (cogeneration)
  • Further statistical modules e.g., locational
    prepositions, other modifiers
  • More underspecification
  • Gradually increase flexibility of interface to
    generation
Write a Comment
User Comments (0)
About PowerShow.com