Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue

Description:

... (Praat) 5-point median smoothed Normalized per-speaker/call Scalar measures: Maximum, minimum, mean Full utterance Acoustic Contrasts Pitch: ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 18
Provided by: GinaAnn1
Category:

less

Transcript and Presenter's Notes

Title: Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue


1
Prosodic Cues to Discourse Segment Boundariesin
Human-Computer Dialogue
  • SIGDial 2004
  • Gina-Anne Levow
  • April 30, 2004

2
Roadmap
  • Motivation
  • Data Collection
  • Segment Boundary Selection
  • Feature Extraction Analysis
  • Cues to Segment Boundaries
  • Preliminary Classification Study
  • Conclusion

3
Why Segment?
  • Enables language understanding tasks
  • Reference resolution
  • Anaphors typically refer to entities in current
    segment
  • Summarization
  • Identify and represent range of topics
  • Conversational understanding
  • Constrain recognition
  • Different interpretations in different contexts

4
Approaches to Segmentation
  • Monologue
  • Text similarity
  • Vector space, language model, cue phrases
  • (Hearst 1994, Beeferman et al 1999, Marcu 2000)
  • Prosodic cues (with text)
  • Pitch, amplitude, duration, pause
  • (Nakatani et al 1995, Swerts 1997, Tur et al
    2001)
  • Human dialogue
  • Dialogue act classification (Shriberg et al,
    1998 Taylor et al, 1998)
  • Text language models Prosody contour, accent
    type
  • Multi-party segmentation
  • Text silence (Galley et al, 2003)

5
Prosody in Human-Computer Dialogue
  • Errors in speech recognition
  • Prosody provides additional source of evidence
  • Topic change can be expensive
  • Possible contrasts to human-human dialogue
  • More stilted speaking style
  • Slow conversation

6
Data Collection
  • System
  • SpeechActs (Sun Microsystems, 1993-1996)
  • Voice-only interface to desktop applications
  • Email, calendar, weather, stock quotes, time,
    currency
  • Data
  • 60 hours, collected during field trial
  • 19 subjects 4 expert, 14 novice, guest
  • Recorded 8KHz, 8-bit ulaw, Logged
  • Manually transcribed
  • gt 7500 user utterances

7
Discourse Segment Boundary Data
  • Focus
  • High-level discourse segment boundaries
  • Not fine-grained subtopic analysis (future work)
  • More reliably coded and extracted
  • (Swerts, 1997 Nakatani et al 1995)
  • Task-based correspondence
  • Align with changes from application to
    application
  • Reliably extractable from current data set

8
Data Set
  • Paired data set
  • Discourse segment-final and segment-initial pairs
  • User utterances
  • Last command in current application, and
    application change command
  • U Whats the price for Sun? Segment-final
  • S
  • U Switch to mail. Segment-initial
  • 473 pairs
  • Extracted automatically
  • Alignment, content verified manually

9
Acoustic Analysis
  • Features
  • Pitch and intensity
  • Extracted automatically (Praat)
  • 5-point median smoothed
  • Normalized per-speaker/call
  • Scalar measures
  • Maximum, minimum, mean
  • Full utterance

10
Acoustic Contrasts
  • Pitch
  • Segment initial vs segment-final
  • Maximum, minimum, and mean significantly higher
  • Lower final fall in segment-final
  • Intensity
  • Segment-initial vs segment-final
  • Mean intensity significantly higher
  • No other measures significant

11
Acoustic Contrasts
12
Discussion
  • Segment initial utterances
  • Significantly higher in pitch and intensity
  • Largest contrast
  • Dramatically lower pitch in segment final
  • Low pitch as cue to topic finality
  • Robust cues to discourse segment boundaries

13
Classification Preliminary Experiments
  • Automatic prosody-based identification of segment
    boundaries
  • Question Does a pair of utterances span a
    segment boundary?
  • Data
  • Ordered utterance pairs
  • Half segment-final segment-initial
  • Half non-boundary

14
Classifier and Features
  • Decision tree classifier (c4.5)
  • Features
  • Pitch and intensity
  • Maximum, minimum, mean
  • Values for each utterance
  • Differences across pair
  • Preliminary classification results
  • 70-80 accuracy
  • Key features
  • Minimum pitch, average intensity

15
Classifier Tree
Min
16
Conclusions Future Work
  • Discourse segment boundaries in HCI
  • Segment-initial utterances
  • Significant increases in pitch and intensity
  • Relative to segment final
  • Robust contrastive use of pitch and intensity
  • Preliminary classification efforts 70-80
  • Difference in pitch minimum, intensity
  • Extend to subdialogue structure
  • Richer feature set, data set

17
Conclusions Future Work
  • Discourse segment boundaries in HCI
  • Segment-initial utterances
  • Significant increases in pitch and intensity
  • Relative to segment final
  • Robust contrastive use of pitch and intensity
  • Extend to subdialogue structure
  • Richer feature set, data set
Write a Comment
User Comments (0)
About PowerShow.com