Title: Use of Boundary Metadata in Parsing
1Use of Boundary Metadata in Parsing Language
Modeling
- Mari Ostendorf
- J. Kahn, D. Hillard, D. Wong W. McNeill
- University of Washington
- Work partially supported by NSF
- Thanks to SRI-ICSI colleagues for advice, N-best
lists MDE models.
2Introduction What are Boundary Events?
- Between-word event, typically marking constituent
edge - In EARS MDE speaker change, sentence-like unit
(SU ) interruption point (IP ) - Beyond EARS prosodic phrases, discourse segments
yeah yeah I mean oh its you know were
about to do like the the uh fiesta bowl there
3Boundary Metadata (cont.)
- Why are boundary events important?
- Large body of work showing that these are
important for human language processing - These are the analog of punctuation in written
text, which is used in most NLP systems
yeah yeah I mean oh its you know were
about to do like the the uh fiesta bowl there
Yeah. Yeah. Were about to do the fiesta bowl
there.
cleanup
4MDE Parsing
- Prior work on prosody parsing
- Shows accuracy gains and parser speed-ups
- BUT focus is on isolated utterances, mostly in
human-computer dialog systems - Key problem for CTS BN sentence segmentation
( speaker segmentation for BN) - Why bother parsing CTS or BN?
- Probably useful for question answering,
translation - May be useful for IE with conversational speech
because of high rate of pronouns - Evidence that parsing helps with edit detection
- Improvements to POS useful even if full parse is
not
5MDE Parsing Experiment Design
- Targeting effects of SU (and IP) information on
parsing Switchboard conversations - Compare performance of multiple parsers with
different kinds of SU-segmented input
Automatic segmentation Kim et al. (2004) 35
SU SER Naïve segmentation pause duration only
68 SU SER
6SU Detection Parser Performance
of parse F-score obtained relative to oracle SU
case for that parser
- F-score F-measure of bracket precision
recall - Better segmentation helps all parsers
- Best absolute is Bikel, but not most robust to
SU noise
7Metadata Rich enough?
- SUs and IPs are impoverished representations.
Complete? - No. Adding punctuation improves performance
- But punctuation is also incomplete for speech
IPs carry complementary information
8MDE Language Modeling
- Encouraging/motivating prior work
- In language modeling
- early Stolcke work using linguistic vs. acoustic
segments, more recent pause-conditioned LM - Heeman, Hasegawa-Johnson work on prosody LMs
- In parsing SUs improve parsing in the SLM, which
has also been used as an LM in ASR - Major problems for CTS/BN
- Hand annotating all speech data used in LMs with
SUs IPs is not feasible (much too costly) - No way to perceptually annotate text data
9The Case for a Weakly Supervised Approach
- From prosodic phrase break modeling work
- Given small amount of hand-labeled data large
amount of syntactically marked data - Use EM to leverage unlabeled data in training
intonation phrase hesitation detection models - Reduce break detection errors by 15 relative
- Error of current SU detection systems
- Even on reference transcripts is rather high 38
and 50 SER on CTS and BN, respectively - But, statistical model can characterize noise of
annotations (as in MDE work with edits IPs)
10MDE-Informed LMs Approach for CTS
- Annotate Fisher Switchboard using low cost MDE
models (no F0 features) - Train different LMs with detected boundary
events - As words included in word sequence, modeled in
variable n-gram - As words or head-like conditioning events in
SLM - Augmented with confidence in HMM-like LM
- Integrate into STT as separate knowledge source
in last stage of N-best rescoring - Experiments are in progress.
11Conclusions
- Findings
- Metadata extraction is useful for parsing
- Improvements in both MDE parsers are needed
- Weakly supervised learning looks promising for
leveraging large corpora - Next steps
- New parsing models for integrating metadata using
boundary posteriors as features in reranking
(with Johnson Charniak) - STT experiments with MDE-informed LM
- Explore use of sub-SU events (e.g. prosodic
breaks)