Highlights of the SRI ICSI UW MDE Systems - PowerPoint PPT Presentation

1 / 12

About This Presentation

Title:

Highlights of the SRI ICSI UW MDE Systems

Description:

Structural MDE: Yang Liu, Liz Shriberg, Andreas Stolcke, Jeremy Ang, Dustin ... Prosody model: decision tree classifier based on ~100 prosodic features ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 13

Provided by: officespec9

Category:

more less

Transcript and Presenter's Notes

Title: Highlights of the SRI ICSI UW MDE Systems

1
Highlights of the SRIICSIUWMDE Systems

Barbara Peskin
reporting on work of
Diarization James Fung, Chuck Wooters, Xavier
Anguera
Structural MDE Yang Liu, Liz Shriberg, Andreas
Stolcke, Jeremy Ang, Dustin Hillard, Mari
Ostendorf

2
Overview
The SRIICSIUW team submitted results for all
MDE tasks, domains, conditions

Diarization
Starting point IDIAPs speaker segclust system
New features and explorations
System pluses and minuses
Structural MDE
Starting point Hidden event modeling via HMMs
Additional model types MaxEnt and CRF
Some novel extensions, combinations
Summary

3
Diarization The Basic System

Based on segmentation and clustering algorithm
developed by IDIAP (J. Ajmera et al.)
BIC-like approach but without tweak factor
Self-regulating by holding total number of model
parameters constant
Main steps
Generate initial segmentation (currently just
equal-length chopping)
Train GMM model for each segment
Decode data using this set of models
Re-train models using label assignments from
decoding
Select closest pair of clusters, x and y, using
BIC metric maximizing DBIC log P(xy
lxy) log P(x lx) log P(y ly)
Train merged model for xy, with comp(xy)
comp(x) comp(y)
Iterate 3) -gt 6), stopping when no merge produces
DBIC ? 0

4
Areas of Exploration

Very limited development, few changes since last
years evals
Main improvement introduced simple
speech/non-speech filter
No time to train our own models
So just used S/NS filter from start of STT
systems pre-processing
Simple two-class detector nonspeech includes
music and silence

Results on Eval04 test set, but using
not-quite-eval system Numbers in ( )s should be
0, probably indicating bug in our frame lt-gt time
conversion
5
Explorations (contd)

Another (post-eval) improvement modified
stopping criterion
Rather than stopping when DBIC thresh 0,
instead stop when achieve best score from
decoding stage
Keeps system threshold-free but now directly
captures max likelihood of model ensemble
Other areas of exploration details tomorrow
include
Initialization schemes
Speaker/cluster model types
Computational efficiencies, pruning

6
System Strengths Weaknesses

This is a very simple, serviceable, portable
system
No thresholds, penalties, etc. that need to be
re-tuned
No external training data just uses whats in
test segments
Unlike most other systems, this did not break
when moved to expanded BN space.
Out-of-the-box performance
Of course, simplicity is both a strength and a
weakness
Does not take advantage of known properties of
shows (anchors, segment lengths, adverts, ...)
Does not take advantage of STT models,
information, integration
Does not (yet!) rival best systems performance

7
Structural MDE Framework

Starting point unified approach through Hidden
Event modeling work (pioneered by Shriberg,
Stolcke, et al.)
Language model incl. N-grams, class-based,
POS-based, repetition-string models
Prosody model decision tree classifier based on
100 prosodic features
Model combination via HMM framework
This years eval efforts focused on exploring
additional model types
Maximum Entropy models
Conditional Random Fields
Significant effort also devoted to dealing with
diverse data sources
v5 vs v6 vs mapped-v6 data from LDC
additional unannotated STT sources
and to combinations and interactions (model
types, knowledge sources, and cross-task
DiarizationStructural, STTMDE)

8
The Model Types

Hidden Markov Model (HMM)
Good way to optimize at the sequence level (via
Forward-Backward algorithm)
But our interpolated LMs do not do good job of
modeling inter-dependent features, and our models
are generative rather than discriminative
Maximum Entropy Model (MaxEnt)
Accommodates grab-bag of overlapping/non-indt
features in principled way
But makes strictly local decisions (modulo
contextual info in feature set)
Conditional Random Field (CRF)
Combines many of advantages of HMM and MaxEnt
Can accommodate same mix of overlapping features
as MaxEnt
While still modeling sequential, rather than
strictly local, information (optimizes globally
over whole sequence)
Basic ref for CRFs Lafferty, McCallum,
Pereira, Proc. ICML 2001

9
Model Types (contd)

MANY tasks, domains, conditions! Submitted
systems for
(SUBD, EWD, FWD, IPD) x (BN, CTS) x (ref, 2
STT inputs)
Sample performance for SUBD task bracketed error
ignores subtype
Too many systems and combos to report all!
More details in tomorrows talk

10
Other System Novelties

Found that using turn info from diarization
greatly improves SUBD performance over using info
from STTs auto segclust
Evaluated effect of improved STT by comparing
best STT output (superEARS for BN, IBMSRI for
CTS), SRI output, and reference words. E.g.
UW contrast system, examining joint optimization
of STT and MDE next slide

Sample result using BN ref condition, HMM model
only
again, all scores use mdeval-v17
11
Integrating ASR and MDE Search

Why? Combined MDE scores based on multiple STT
hypotheses provide better MDE evidence
How?
Detect metadata for multiple ASR hypotheses
Merge hypotheses (with probability mass) into a
confusion network
Results so far
For earlier HMM-only model error reduced in CTS
boundary event detection (1 3 absolute
reduction in slot error rate for SUs and IPs,
respectively)
Combining MaxEnt and CRF models, confusion
networks have less benefit

SU or SU -
-
1
president no-event of
no-event war SU
.2
.12
at
no-event
12
Summary

The RT-04F Metadata evaluation was a very
demanding but very interesting exercise 5
tasks x 2 domains x numerous inputs
For Diarization
Demonstrated good basic performance (and
surprisingly good robustness to expanded BN
space)
We have just begun exploring the assumptions of
our original system and making associated
improvements
Speech / nonspeech filtering
Stopping criterion
Initializations training iterations
For Structural MDE
Explored relative merits (alone in combo) of
multiple model types and knowledge sources,
yielding significant reductions in error rates
Examined effect of STT quality and diarization /
turn decisions
Just beginning to reap benefits of joint STT
MDE optimizations, modeling