Title: Discourse Structure and Anaphoric Accessibility
1Discourse Structure and Anaphoric Accessibility
- Massimo Poesio and Barbara Di Eugenio
-
- with help from Gerard Keohane
-
2Content
- Empirical Investigations of Discourse Structure
- Grosz and Sidners theory of the Global Focus
- Relational Discourse Analysis
- How we used RDA to study GS
- Results
- Discussion
3Empirical Investigations of Discourse Structure
A new opportunity
- Original proposals concerning effect of discourse
structure on accessibility (Reichman, 1985 Fox,
1987 Grosz and Sidner, 1986) based on
unsystematic analysis of data - These days we know more about reliable studies of
discourse phenomena (Passonneau and Litman, 1993
Carletta et al, 1997) - These new resources already used to propose new
theories of anaphora and discourse structure such
as Veins Theory (Cristea, Ide, Marcu, et al,
1998, 1999, 2000) - The goal of this project use a reliably
annotated corpus (the Sherlock corpus from the
University of Pittsburgh, Moser and Moore, 1996
Di Eugenio et al, 1997) to study claims of GS
4Grosz and Sidners Theory of the Global Focus
- The structure of a discourse is determined by the
intentions utterances are meant to convey
(DISCOURSE SEGMENT PURPOSES) - INTENTIONAL STRUCTURE DOMINANCE and SAT-PRECEDES
relations between DSPs - ATTENTIONAL STRUCTURE a stack of FOCUS SPACES
- Focus spaces on the stack contain accessible
discourse entities - Presence on the stack reflects intentional
structure - The problem how to identify DSPs in a discourse
5Relational Discourse Analysis (RDA)
- Moore and Pollack, 1992 Moser and Moore, 1996
- Combines ideas from RST and Grosz and Sidners
theory - From Grosz and Sidner discourse structure is
determined by intentional structure - RDA-SEGMENT a segment expressing an intentional
relation - From RST segments have internal structure
- CORE (cfr. NUCLEUS)
- CONTRIBUTOR (cfr. SATELLITE)
- Both INTENTIONAL and INFORMATIONAL relations
- A fixed number of intentional relations
- Has been proven to be usable for reliable analysis
6RDA Analysis of an excerpt from a tutorial
- 1.1 Before troubleshooting inside the text
station, - 1.2 Its always best to eliminate both the UUT
and the TP - 2.1 Since the test package is moved frequently
- 2.2 It is prone to damage
- 3.1 Also, testing the test package is much easier
and faster - 3.2 than opening up test station drawers.
CONVINCE
CONVINCE
ENABLE
Prescribed-act Wrong-act
Causeeffect
step1step2
1.1
1.2
2.2
2.1
3.2
3.1
7Moser and Moore mapping between RST relations
and GS
- Basic principles
- Every DSP must be associated with a core
- Constituents of the RDA structure that do not
include cores such as clusters do not
introduce DSPs - Consequences for attentional state
- A new focus space only pushed when a segment is
open - Information relations do not affect the
attentional state
8Mapping RDA into Attentional State
- 1.1 Before troubleshooting inside the text
station, - 1.2 Its always best to eliminate both the UUT
and the TP - 2.1 Since the test package is moved frequently
- 2.2 It is prone to damage
- 3.1 Also, testing the test package is much easier
and faster - 3.2 than opening up test station drawers.
DSP1
CONVINCE
CONVINCE
DSP 2
ENABLE
Prescribed-act Wrong-act
Causeeffect
step1step2
1.1
1.2
2.2
2.1
3.2
3.1
9Using an RDA-annotated corpus to study anaphoric
accessibility
- The data the SHERLOCK corpus, already annotated
according to RDA instructions (Moser, 1996) - Added anaphoric annotation according to GNOME
instructions (Poesio, 2000) derived from MATE
scheme (Poesio Bruneseaux and Romary, 1999) - Use RDA analysis to drive focus space
construction - Measure
- Accessibility
- Perplexity
10The Data the SHERLOCK corpus
- 17 tutorial dialogues collected within the
Sherlock project (Lesgold et al, 1992) - Students solve electronic troubleshooting problem
- 313 turns, 1333 clauses
- RDA annotation Moser and Moore, 1996
- Reliability verified at different levels
- Intentional relations CONCEDE, CONVINCE, ENABLE,
JOINT
11An example of Sherlock dialogue
- STUDENT
- 1.1 Why isn't measurement signal path green
during good test - readings (steps)?
- TUTOR
- 2.1 For each step that passed,
- 2.2 you know the measurement path is good.
- 2.3 You also know that one of the measurement
paths is bad. - 2.4 Showing the UUT, Test Package, and
measurement section as - unknown is correct
- 2.5 because, you know when you get your fail
that - something was wrong,
- 2.6 but you didn't know exactly what.
- 2.7 The DMM is green
- 2.8 because it has been working all along.
- 2.9 The stimulus section is green
- 2.10 because it was not used
- 2.11 and is assumed to be good.
12Anaphoric Annotation
- The GNOME scheme (Poesio, 2000)
- Mark up all NPs as NE element, with a variety of
attributes - About 3000 NEs
- Use separate ANTE element to mark up anaphoric
relations (including bridges) - In this annotation only direct anaphoric
relations - (About 1500 total)
13Evaluation
- A PERL script simulates focus space construction
and computes accessibility and perplexity - Accessibility whether antecedent is in focus
stack - Perplexity Sum 1/d(xi ) m(xi) (where m(xi) 1
if xi matches anaphor, 0 otherwise) - Parameters for focus space construction
- PUSHING
- Whenever relation is encountered (either
informational or intentional) - Only when intentional
- POPPING
- As soon as associated constituent is completed
- Immediate popping of contributors, delayed
popping of cores - Delayed popping of contributors
14Evaluation I Intentional vs Informational
Accessibility
OK NO Out of AP PN
All 199 74 63 158
Intentional (immediate popping) 280 20 63 131
Perplexity All 0.83, Intentional 1.23
15Complications
ENABLE
- 24.13a Since S52 puts a return (0 VDC) on its
outputs - 24.13b when they are active,
- 24.14 the inactive state must be some other
voltage. - 24.15 So even though you may not know what the
other voltage is, - 24.16 You can test to ensure that
- 24.17a the active pins are 0 VDC
- 24.17b and all the inactive pins are not 0 VDC.
DSP 1
CONCEDE
ENABLE
24.14
24.16
24.15
Effectcause
24.13a
24.13b
Contrast1 contrast2
24.17a
24.17b
16Complications
ENABLE
- 24.13a Since S52 puts a return (0 VDC) on its
outputs - 24.13b when they are active,
- 24.14 the inactive state must be some other
voltage. - 24.15 So even though you may not know what the
other voltage is, - 24.16 You can test to ensure that
- 24.17a the active pins are 0 VDC
- 24.17b and all the inactive pins are not 0 VDC.
DSP 1
CONCEDE
ENABLE
24.14
24.16
24.15
Effectcause
24.13a
24.13b
Contrast1 contrast2
24.17a
24.17b
17Evaluation II Delayed Popping
Accessibility
OK NO
Immediate popping 280 20
Delay pop of cores 287 16
Delay pop of contributors 310 8
- Average perplexity with immediate popping 1.23
- Delayed popping of cores 1.3
- Delayed popping of contributors 1.33
Perplexity
18Discussion
- Accessibility
- Intentional vs. informational distinction makes
sense - Cfr. Fox
- Want to keep contributors as well as cores on
stack - cfr. Veins Theory
- An evaluation of Grosz and Sidners framework
- The most direct implementation makes quite a few
discourse entities unaccessible - Difficult to interpret more complex operations in
terms of intentional structure - Alternative a cache model (cfr. Guindon 1985,
Walker 1996, 1998) - Version 1 (conservative) cache of focus spaces
- Version 2 cache of forward looking centers
19Cache-based global focus a conservative proposal
- Cache elements are FOCUS SPACES
- Cache elements are RANKED
- Current focus space lt other constituents of
same segment lt dominating segments lt focus
spaces of contributors to closed spaces(Cfr.
Reichman 85) - Search algorithm follow ranking
- Cache replacement algorithm
- Opening RDA segment open new focus space,
replace lowest-ranked element of cache, assign it
highest rank - Closing RDA segment Assign lowest rank to
embedded contributors