Highlights of the SRI ICSI UW MDE Systems - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Highlights of the SRI ICSI UW MDE Systems

Description:

Structural MDE: Yang Liu, Liz Shriberg, Andreas Stolcke, Jeremy Ang, Dustin ... Prosody model: decision tree classifier based on ~100 prosodic features ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 13
Provided by: officespec9
Category:

less

Transcript and Presenter's Notes

Title: Highlights of the SRI ICSI UW MDE Systems


1
Highlights of the SRIICSIUWMDE Systems
  • Barbara Peskin
  • reporting on work of
  • Diarization James Fung, Chuck Wooters, Xavier
    Anguera
  • Structural MDE Yang Liu, Liz Shriberg, Andreas
    Stolcke, Jeremy Ang, Dustin Hillard, Mari
    Ostendorf

2
Overview
The SRIICSIUW team submitted results for all
MDE tasks, domains, conditions
  • Diarization
  • Starting point IDIAPs speaker segclust system
  • New features and explorations
  • System pluses and minuses
  • Structural MDE
  • Starting point Hidden event modeling via HMMs
  • Additional model types MaxEnt and CRF
  • Some novel extensions, combinations
  • Summary

3
Diarization The Basic System
  • Based on segmentation and clustering algorithm
    developed by IDIAP (J. Ajmera et al.)
  • BIC-like approach but without tweak factor
  • Self-regulating by holding total number of model
    parameters constant
  • Main steps
  • Generate initial segmentation (currently just
    equal-length chopping)
  • Train GMM model for each segment
  • Decode data using this set of models
  • Re-train models using label assignments from
    decoding
  • Select closest pair of clusters, x and y, using
    BIC metric maximizing DBIC log P(xy
    lxy) log P(x lx) log P(y ly)
  • Train merged model for xy, with comp(xy)
    comp(x) comp(y)
  • Iterate 3) -gt 6), stopping when no merge produces
    DBIC ? 0

4
Areas of Exploration
  • Very limited development, few changes since last
    years evals
  • Main improvement introduced simple
    speech/non-speech filter
  • No time to train our own models
  • So just used S/NS filter from start of STT
    systems pre-processing
  • Simple two-class detector nonspeech includes
    music and silence

Results on Eval04 test set, but using
not-quite-eval system Numbers in ( )s should be
0, probably indicating bug in our frame lt-gt time
conversion
5
Explorations (contd)
  • Another (post-eval) improvement modified
    stopping criterion
  • Rather than stopping when DBIC thresh 0,
    instead stop when achieve best score from
    decoding stage
  • Keeps system threshold-free but now directly
    captures max likelihood of model ensemble
  • Other areas of exploration details tomorrow
    include
  • Initialization schemes
  • Speaker/cluster model types
  • Computational efficiencies, pruning

6
System Strengths Weaknesses
  • This is a very simple, serviceable, portable
    system
  • No thresholds, penalties, etc. that need to be
    re-tuned
  • No external training data just uses whats in
    test segments
  • Unlike most other systems, this did not break
    when moved to expanded BN space.
    Out-of-the-box performance
  • Of course, simplicity is both a strength and a
    weakness
  • Does not take advantage of known properties of
    shows (anchors, segment lengths, adverts, ...)
  • Does not take advantage of STT models,
    information, integration
  • Does not (yet!) rival best systems performance

7
Structural MDE Framework
  • Starting point unified approach through Hidden
    Event modeling work (pioneered by Shriberg,
    Stolcke, et al.)
  • Language model incl. N-grams, class-based,
    POS-based, repetition-string models
  • Prosody model decision tree classifier based on
    100 prosodic features
  • Model combination via HMM framework
  • This years eval efforts focused on exploring
    additional model types
  • Maximum Entropy models
  • Conditional Random Fields
  • Significant effort also devoted to dealing with
    diverse data sources
  • v5 vs v6 vs mapped-v6 data from LDC
  • additional unannotated STT sources
  • and to combinations and interactions (model
    types, knowledge sources, and cross-task
    DiarizationStructural, STTMDE)

8
The Model Types
  • Hidden Markov Model (HMM)
  • Good way to optimize at the sequence level (via
    Forward-Backward algorithm)
  • But our interpolated LMs do not do good job of
    modeling inter-dependent features, and our models
    are generative rather than discriminative
  • Maximum Entropy Model (MaxEnt)
  • Accommodates grab-bag of overlapping/non-indt
    features in principled way
  • But makes strictly local decisions (modulo
    contextual info in feature set)
  • Conditional Random Field (CRF)
  • Combines many of advantages of HMM and MaxEnt
  • Can accommodate same mix of overlapping features
    as MaxEnt
  • While still modeling sequential, rather than
    strictly local, information (optimizes globally
    over whole sequence)
  • Basic ref for CRFs Lafferty, McCallum,
    Pereira, Proc. ICML 2001

9
Model Types (contd)
  • MANY tasks, domains, conditions! Submitted
    systems for
  • (SUBD, EWD, FWD, IPD) x (BN, CTS) x (ref, 2
    STT inputs)
  • Sample performance for SUBD task bracketed error
    ignores subtype
  • Too many systems and combos to report all!
    More details in tomorrows talk

10
Other System Novelties
  • Found that using turn info from diarization
    greatly improves SUBD performance over using info
    from STTs auto segclust
  • Evaluated effect of improved STT by comparing
    best STT output (superEARS for BN, IBMSRI for
    CTS), SRI output, and reference words. E.g.
  • UW contrast system, examining joint optimization
    of STT and MDE next slide

Sample result using BN ref condition, HMM model
only
again, all scores use mdeval-v17
11
Integrating ASR and MDE Search
  • Why? Combined MDE scores based on multiple STT
    hypotheses provide better MDE evidence
  • How?
  • Detect metadata for multiple ASR hypotheses
  • Merge hypotheses (with probability mass) into a
    confusion network
  • Results so far
  • For earlier HMM-only model error reduced in CTS
    boundary event detection (1 3 absolute
    reduction in slot error rate for SUs and IPs,
    respectively)
  • Combining MaxEnt and CRF models, confusion
    networks have less benefit


SU or SU -
-
1
president no-event of
no-event war SU
.2
.12
at
no-event
12
Summary
  • The RT-04F Metadata evaluation was a very
    demanding but very interesting exercise 5
    tasks x 2 domains x numerous inputs
  • For Diarization
  • Demonstrated good basic performance (and
    surprisingly good robustness to expanded BN
    space)
  • We have just begun exploring the assumptions of
    our original system and making associated
    improvements
  • Speech / nonspeech filtering
  • Stopping criterion
  • Initializations training iterations
  • For Structural MDE
  • Explored relative merits (alone in combo) of
    multiple model types and knowledge sources,
    yielding significant reductions in error rates
  • Examined effect of STT quality and diarization /
    turn decisions
  • Just beginning to reap benefits of joint STT
    MDE optimizations, modeling
Write a Comment
User Comments (0)
About PowerShow.com