Recognizing Structure: Sentence and Topic Segmentation - PowerPoint PPT Presentation

About This Presentation
Title:

Recognizing Structure: Sentence and Topic Segmentation

Description:

Types of Discourse Structure in Spoken Corpora. Domain independent ... Example: reviews of Ipod nano. Cue phrases: now, well, first. Pronominal reference ... – PowerPoint PPT presentation

Number of Views:568
Avg rating:3.0/5.0
Slides: 35
Provided by: juliahir
Category:

less

Transcript and Presenter's Notes

Title: Recognizing Structure: Sentence and Topic Segmentation


1
Recognizing Structure Sentence and Topic
Segmentation
  • Julia Hirschberg
  • CS 4706

2
Types of Discourse Structure in Spoken Corpora
  • Domain independent
  • Sentence/utterance boundaries
  • Speaker turn segmentation
  • Topic/story segmentation
  • Domain dependent
  • Broadcast news
  • Meetings
  • Telephone conversations

3
Today
  • Theoretical studies of discourse structure
  • Text-based approaches and cues
  • Speech-based approaches and cues
  • Practical goals for speech segmentation and
    state-of-the-art

4
Hierarchical Structure in Discourse?
  • Welcome to word processing.
  • Thats using a computer to type letters and
    reports. Make a typo?
  • No problem.
  • Just back up, type over the mistake, and its
    gone.
  • ?And, it eliminates retyping.
  • ?And, it eliminates retyping.

5
Structures of Discourse Structure (Grosz Sidner
86)
  • Leading theory of discourse structure
  • Three components
  • Linguistic structure
  • What is said
  • Assumption divided into appropriate units of
    analysis (Discourse Segments)
  • Intentional structure
  • How are segments structured into topics,
    subtopics
  • Relations satisfaction-precedence, dominance
  • Attentional structure

6
How are these relations recognized in a discourse?
  • Linguistic markers
  • tense and aspect
  • cue phrases now, well
  • Inference of S intentions
  • Inference from task structure
  • Intonational Information

7
Structural Information in Text
  • Example reviews of Ipod nano
  • Cue phrases now, well, first
  • Pronominal reference
  • Orthography and formatting -- in text
  • Lexical information (Hearst 94, Reynar 98,
    Beeferman et al 99)
  • Domain dependent
  • Domain independent

8
Methods of Text Segmentation
  • Lexical cohesion methods vs. multiple source
  • Vocabulary similarity indicates topic cohesion
  • Intuition from Halliday Hasan 76
  • Features to capture
  • Stem repetition
  • Entity repetition
  • Word frequency
  • Context vectors
  • Semantic similarity
  • Word distance
  • Methods
  • Sliding window

9
  • Lexical chains
  • Clustering
  • Combine lexical cohesion with other cues
  • Features
  • Cue phrases
  • Reference (e.g. pronouns)
  • Syntactic features
  • Methods
  • Machine Learning from labeled corpora

10
Choi 2000 An Example
  • Implements leading methods and compares new
    algorithm to them on corpus of 700 concatenated
    documents
  • Comparison algorithms
  • Baselines
  • No boundaries
  • All boundaries
  • Regular partition
  • Random of random partitions
  • Actual of random partitions

11
  • Textiling Algorithm (Hearst 94)
  • DotPlot algorithms (Reynar 98)
  • Segmenter (Kan et al 98)
  • Choi 00 proposal
  • Cosine similarity measure on stems
  • Same sentence 1 no overlap 0
  • Similarity matrix ? rank matrix
  • Replace each cell with of lower-valued neighbor
    cells, normalized by of neighboring cells
  • How likely is this sentence to be a boundary,
    compared to other sentences?
  • Minimize effect of outliers

12
  • Divisive clustering based on
  • D(n) sum of rank values (sI,j) of segment n/
    inside area of segment n (j-i1) for i,j the
    sentences at the beginning and end of segment n
  • Homogeneous segments have large rank values
    within a small area of the matrix
  • Keep dividing the corpus
  • until ?D(n) D(n) - D(n-1) shows little change
  • Chois algorithm performs best (9-12 error)

13
Acoustic and Prosodic Cues to Segmentation
  • Intuition
  • Speakers vary acoustic and prosodic cues to
    convey variation in discourse structure
  • Systematic? In read or spontaneous speech?
  • Evidence
  • Observations from recorded corpora
  • Laboratory experiments
  • Machine learning of discourse structure from
    acoustic/prosodic features

14
Spoken Cues to Discourse/Topic Structure
  • Pitch range
  • Lehiste 75, Brown et al 83, Silverman 86,
    Avesani Vayra 88, Ayers 92, Swerts et al 92,
    Grosz Hirschberg92, Swerts Ostendorf 95,
    Hirschberg Nakatani 96
  • Preceding pause
  • Lehiste 79, Chafe 80, Brown et al 83,
    Silverman 86, Woodbury 87, Avesani Vayra 88,
    Grosz Hirschberg92, Passoneau Litman 93,
    Hirschberg Nakatani 96

15
  • Rate
  • Butterworth 75, Lehiste 80, Grosz
    Hirschberg92, Hirschberg Nakatani 96
  • Amplitude
  • Brown et al 83, Grosz Hirschberg92,
    Hirschberg Nakatani 96
  • Contour
  • Brown et al 83, Woodbury 87, Swerts et al 92

16
A Practical Problem Finding Sentence and
Topic/Story Boundaries in ASR Transcripts
  • Motivation
  • Finding sentences critical to further
    syntactic/semantic analysis
  • Topic/story id important to identify common
    regions for q/a, extraction
  • Features
  • Lexical cues
  • Domain dependent
  • Sensitive to ASR performance
  • Acoustic/prosodic cues
  • Domain independent
  • Sensitive to speaker identity
  • Statistical, Machine Learning approaches with
    large segmented corpora (e.g. Broadcast News)

17
ASR Transcription
  • aides tonight in boston in depth the truth squad
    for special series until election day tonight the
    truth about the budget surplus of the candidates
    are promising the two international flash points
    getting worse while the middle east and a new
    power play by milosevic and a lifelong a family
    tries to say one child life by having another
    amazing breakthrough the u s was was told local
    own boss good evening uh from the university of
    massachusetts in boston the site of the widely
    anticipated first of eight between vice president
    al gore and governor george w bush with the
    election now just five weeks away this is the
    beginning of a sprint to the finish and a strong
    start here tonight is important this is the stage
    for the two candidates will appear before a
    national television audience taking questions
    from jim lehrer of p b s n b cs david gregory is
    here with governor bush claire shipman is
    covering the vice president claire you begin
    tonight please

18
Speaker segmentation (Diarization)
  • Speaker 0 - aides tonight in boston in depth the
    truth squad for special series until election day
    tonight the truth about the budget surplus of the
    candidates are promising the two international
    flash points getting worse while the middle east
    and a new power play by milosevic and a lifelong
    a family tries to say one child life by having
    another amazing breakthrough the u s was was told
    local own boss good evening uh from the
    university of massachusetts in boston
  • Speaker 1 - the site of the widely anticipated
    first of eight between vice president al gore and
    governor george w bush with the election now
    just five weeks away this is the beginning of a
    sprint to the finish and a strong start here
    tonight is important this is the stage for the
    two candidates will appear before a national
    television audience taking questions from jim
    lehrer of p b s n b cs david gregory is here
    with governor bush claire shipman is covering the
    vice president claire you begin tonight please

19
Sentence detection, punctuation
  • Speaker Anchor - Aides tonight in boston. In
    depth the truth squad for special series until
    election day. Tonight the truth about the budget
    surplus of the candidates are promising. The two
    international flash points getting worse. While
    the middle east. And a new power play by
    milosevic and a lifelong a family tries to say
    one child life by having another amazing
    breakthrough the u. s. was was told local own
    boss. Good evening uh from the university of
    massachusetts in boston.
  • Speaker Reporter - The site of the widely
    anticipated first of eight between vice president
    al gore and governor george w. bush. With the
    election now just five weeks away. This is the
    beginning of a sprint to the finish. And a strong
    start here tonight is important. This is the
    stage for the two candidates will appear before a
    national television audience taking questions
    from jim lehrer of p. b. s. n. b. c.'s david
    gregory is here with governor bush. Claire
    shipman is covering the vice president claire you
    begin tonight please.

20
Story boundary detection
  • Speaker Anchor - Aides tonight in boston. In
    depth the truth squad for special series until
    election day. Tonight the truth about the budget
    surplus of the candidates are promising. The two
    international flash points getting worse. While
    the middle east. And a new power play by
    milosevic and a lifelong a family tries to say
    one child life by having another amazing
    breakthrough the u. s. was was told local own
    boss. Good evening uh from the university of
    massachusetts in boston.
  • Speaker Reporter - The site of the widely
    anticipated first of eight between vice president
    al gore and governor george w. bush. With the
    election now just five weeks away. This is the
    beginning of a sprint to the finish. And a strong
    start here tonight is important. This is the
    stage for the two candidates will appear before a
    national television audience taking questions
    from jim lehrer of p. b. s. n. b. c.'s david
    gregory is here with governor bush. Claire
    shipman is covering the vice president claire you
    begin tonight please.

21
Prosodic Cues (Shriberg et al 00)
  • Text-based segmentation is fineif you have
    reliable text
  • Could prosodic cues perform as well or better at
    sentence and topic segmentation in ASR
    transcripts? more robust? more general?
  • Goal identify sentence and topic boundaries at
    ASR-defined word boundaries
  • CART decision trees and LM
  • HMM combined prosodic and LM results

22
  • Features --for each potential boundary location
  • Pause at boundary (raw and normalized by speaker)
  • Pause at word before boundary (is this a new
    turn or part of continuous speech segment?)
  • Phone and rhyme duration (normalized by inherent
    duration) (phrase-final lengthening?)
  • F0 (smoothed and stylized) reset, range
    (topline, baseline), slope and continuity
  • Voice quality (halving/doubling estimates as
    correlates of creak or glottalization)
  • Speaker change, time from start of turn, turns
    in conversation and gender
  • Trained/tested on Switchboard and Broadcast News

23
Sentence segmentation results
  • Prosodic only model
  • Better than LM for BN
  • Worse (on hand transcription) and same (for ASR
    transcript) on SB
  • Slightly improves LM on SB
  • Useful features for BN
  • Pause at boundary, turn change/no turn change, f0
    diff across boundary, rhyme duration
  • Useful features for SB
  • Phone/rhyme duration before boundary, pause at
    boundary, turn/no turn, pause at preceding word
    boundary, time in turn

24
Topic segmentation results (BN only)
  • Useful features
  • Pause at boundary, f0 range, turn/no turn,
    gender, time in turn
  • Prosody alone better than LM
  • Combined model improves significantly

25
Recent Work on Story Segmentation (Rosenberg et
al 07)
  • Story Segmentation goal Divide each show into
    homogenous regions, each about a single topic
  • Task focussed Q/A
  • Issue what unit of anlysis should we use in
    assessing potential boundaries?

26
TDT-4 Corpus
  • English 312.5 hours, 250 broadcasts, 6 shows
  • Arabic 88.5 hours, 109 broadcasts, 2 shows
  • Mandarin 109 hours, 134 broadcasts, 3 shows
  • Manually annotated story boundaries
  • ASR Hypotheses
  • Speaker Diarization Hypotheses

27
Approach
  • Identify set of segments which define
  • Unit of analysis
  • Candidate boundaries
  • Classify each candidate boundary based on
    features extracted from segments
  • C4.5 Decision Tree
  • Model each show-type separately
  • E.g. CNN Headline News and ABC World News
    Tonight have distinct models
  • Evaluate using WindowDiff with k100

28
Segment Boundary Modeling Features
  • Acoustic
  • Pitch Intensity
  • speaker normalized
  • min, mean, max, stdev, slope
  • Speaking Rate
  • vowels/sec, voiced frames/sec
  • Final Vowel, Rhyme Length
  • Pause Length
  • Lexical
  • TextTiling scores
  • LCSeg scores
  • Story beginning and ending keywords
  • Structural
  • Position in show
  • Speaker participation
  • First or last speaker turn?

29
Input Segmentations
  • ASR Word boundaries
  • No segmentation baseline
  • Hypothesized Sentence Units
  • Boundaries with 0.5, 0.3 and 0.1 confidence
    thresholds
  • Pause-based Segmentation
  • Boundaries at pauses over 500ms and 250ms
  • Hypothesized Intonational Phrases

30
Hypothesizing Intonational Phrases
  • 30 minutes manually annotated ASR BN from
    reserved TDT-4 CNN show.
  • Phrase was marked if a phrase boundary occurred
    since the previous word boundary.
  • C4.5 Decision Tree
  • Pitch, Energy and Duration Features
  • Normalized by hypothesized speaker id and
    surrounding context
  • 66.5 F-Measure (p.683, r.647)

31
Story Segmentation Results
32
Input Segmentation Statistics
Exact Story Boundary Coverage (pct.)
Mean Distance to Nearest True Boundary (words)
Segment to True Boundary Ratio
33
Conclusions and Future Work
  • Best Performance
  • Low threshold (0.1) sentences
  • Short pause (250ms) segmentation
  • Hyp. IPs perform better than sentences.
  • Would increased SU, IP accuracy improve story
    segmentation?
  • External evaluation impact on IR and MT
    performance.

34
Next Class
  • Spoken Dialogue Systems
Write a Comment
User Comments (0)
About PowerShow.com