Title: Highlights of the SRI ICSI UW MDE Systems
1Highlights of the SRIICSIUWMDE Systems
- Barbara Peskin
- reporting on work of
- Diarization James Fung, Chuck Wooters, Xavier
Anguera - Structural MDE Yang Liu, Liz Shriberg, Andreas
Stolcke, Jeremy Ang, Dustin Hillard, Mari
Ostendorf
2Overview
The SRIICSIUW team submitted results for all
MDE tasks, domains, conditions
- Diarization
- Starting point IDIAPs speaker segclust system
- New features and explorations
- System pluses and minuses
- Structural MDE
- Starting point Hidden event modeling via HMMs
- Additional model types MaxEnt and CRF
- Some novel extensions, combinations
- Summary
3Diarization The Basic System
- Based on segmentation and clustering algorithm
developed by IDIAP (J. Ajmera et al.) - BIC-like approach but without tweak factor
- Self-regulating by holding total number of model
parameters constant - Main steps
- Generate initial segmentation (currently just
equal-length chopping) - Train GMM model for each segment
- Decode data using this set of models
- Re-train models using label assignments from
decoding - Select closest pair of clusters, x and y, using
BIC metric maximizing DBIC log P(xy
lxy) log P(x lx) log P(y ly) - Train merged model for xy, with comp(xy)
comp(x) comp(y) - Iterate 3) -gt 6), stopping when no merge produces
DBIC ? 0
4Areas of Exploration
- Very limited development, few changes since last
years evals - Main improvement introduced simple
speech/non-speech filter - No time to train our own models
- So just used S/NS filter from start of STT
systems pre-processing - Simple two-class detector nonspeech includes
music and silence
Results on Eval04 test set, but using
not-quite-eval system Numbers in ( )s should be
0, probably indicating bug in our frame lt-gt time
conversion
5Explorations (contd)
- Another (post-eval) improvement modified
stopping criterion - Rather than stopping when DBIC thresh 0,
instead stop when achieve best score from
decoding stage - Keeps system threshold-free but now directly
captures max likelihood of model ensemble - Other areas of exploration details tomorrow
include - Initialization schemes
- Speaker/cluster model types
- Computational efficiencies, pruning
6System Strengths Weaknesses
- This is a very simple, serviceable, portable
system - No thresholds, penalties, etc. that need to be
re-tuned - No external training data just uses whats in
test segments - Unlike most other systems, this did not break
when moved to expanded BN space.
Out-of-the-box performance -
- Of course, simplicity is both a strength and a
weakness - Does not take advantage of known properties of
shows (anchors, segment lengths, adverts, ...) - Does not take advantage of STT models,
information, integration - Does not (yet!) rival best systems performance
7Structural MDE Framework
- Starting point unified approach through Hidden
Event modeling work (pioneered by Shriberg,
Stolcke, et al.) - Language model incl. N-grams, class-based,
POS-based, repetition-string models - Prosody model decision tree classifier based on
100 prosodic features - Model combination via HMM framework
- This years eval efforts focused on exploring
additional model types - Maximum Entropy models
- Conditional Random Fields
- Significant effort also devoted to dealing with
diverse data sources - v5 vs v6 vs mapped-v6 data from LDC
- additional unannotated STT sources
- and to combinations and interactions (model
types, knowledge sources, and cross-task
DiarizationStructural, STTMDE)
8The Model Types
- Hidden Markov Model (HMM)
- Good way to optimize at the sequence level (via
Forward-Backward algorithm) - But our interpolated LMs do not do good job of
modeling inter-dependent features, and our models
are generative rather than discriminative - Maximum Entropy Model (MaxEnt)
- Accommodates grab-bag of overlapping/non-indt
features in principled way - But makes strictly local decisions (modulo
contextual info in feature set) - Conditional Random Field (CRF)
- Combines many of advantages of HMM and MaxEnt
- Can accommodate same mix of overlapping features
as MaxEnt - While still modeling sequential, rather than
strictly local, information (optimizes globally
over whole sequence) - Basic ref for CRFs Lafferty, McCallum,
Pereira, Proc. ICML 2001
9Model Types (contd)
- MANY tasks, domains, conditions! Submitted
systems for - (SUBD, EWD, FWD, IPD) x (BN, CTS) x (ref, 2
STT inputs) - Sample performance for SUBD task bracketed error
ignores subtype -
- Too many systems and combos to report all!
More details in tomorrows talk
10Other System Novelties
- Found that using turn info from diarization
greatly improves SUBD performance over using info
from STTs auto segclust - Evaluated effect of improved STT by comparing
best STT output (superEARS for BN, IBMSRI for
CTS), SRI output, and reference words. E.g. -
- UW contrast system, examining joint optimization
of STT and MDE next slide
Sample result using BN ref condition, HMM model
only
again, all scores use mdeval-v17
11Integrating ASR and MDE Search
- Why? Combined MDE scores based on multiple STT
hypotheses provide better MDE evidence - How?
- Detect metadata for multiple ASR hypotheses
- Merge hypotheses (with probability mass) into a
confusion network - Results so far
- For earlier HMM-only model error reduced in CTS
boundary event detection (1 3 absolute
reduction in slot error rate for SUs and IPs,
respectively) - Combining MaxEnt and CRF models, confusion
networks have less benefit
SU or SU -
-
1
president no-event of
no-event war SU
.2
.12
at
no-event
12Summary
- The RT-04F Metadata evaluation was a very
demanding but very interesting exercise 5
tasks x 2 domains x numerous inputs - For Diarization
- Demonstrated good basic performance (and
surprisingly good robustness to expanded BN
space) - We have just begun exploring the assumptions of
our original system and making associated
improvements - Speech / nonspeech filtering
- Stopping criterion
- Initializations training iterations
- For Structural MDE
- Explored relative merits (alone in combo) of
multiple model types and knowledge sources,
yielding significant reductions in error rates - Examined effect of STT quality and diarization /
turn decisions - Just beginning to reap benefits of joint STT
MDE optimizations, modeling