Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consor - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consor

Description:

RT-05 Meeting Recognition Workshop, MLMI - Edinburgh, July 13 ... Meghan Glenn, Stephanie Strassel. Linguistic Data Consortium {mlglenn, strassel_at_ldc.upenn.edu} ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 17
Provided by: LDC7
Category:

less

Transcript and Presenter's Notes

Title: Linguistic Resources for Meeting Recognition Meghan Glenn, Stephanie Strassel Linguistic Data Consor


1
Linguistic Resources for Meeting Recognition
Meghan Glenn, Stephanie Strassel Linguistic Data
Consortiummlglenn, strassel_at_ldc.upenn.eduht
tp//www.ldc.upenn.edu/Projects/NISTMeet
2
Scope of Work
  • Training data
  • (Pre-publication) distribution
  • Conference room test data
  • Transcription
  • Careful
  • Quick
  • Comparison and analysis
  • Infrastructure
  • XTrans Toolkit
  • Features for meetings

3
RT-05S Training Datadistributed by LDC
  • (Pre-publication) distribution via e-corpus to
    RT-05 participants
  • All available from www.ldc.upenn.edu/Catalog

4
RT-05S Evaluation Datatranscribed by LDC
  • Conference room data
  • Ten meeting sessions, 12 minutes each
  • Contributed by five sites
  • Multiple recording conditions for each session
  • Primarily business meeting content
  • Transcribers report it was faster, easier and
    more interesting to transcribe than RT-04 meeting
    eval data
  • All data carefully transcribed (CTR)
  • Half of data quickly transcribed (QTR)
  • for contrastive study

5
CTR Process
  • Using IHM channels
  • One exception participant on speakerphone
  • 1st pass manual segmentation
  • Turns ? breath groups
  • 3-8 seconds per segment, designed for ease of
    transcription only
  • 10 ms padding around each segment boundary
  • No segmentation or transcription of isolated
    speaker noise
  • 2nd pass initial verbatim transcription
  • No time limit
  • Goal is to get everything right
  • 3rd pass verify existing transcription and
    timestamps, add additional markup
  • Indicate proper names, filled pauses, noise, etc.
  • Revisit difficult sections

6
CTR Quality Control
  • Additional QC pass by lead transcriber
  • Using mixed IHM recordings and/or SDM
  • Merge individual transcripts
  • Speaker assignment
  • Transcription accuracy, completeness
  • Markup consistency
  • Spell check
  • Syntax (format) check
  • Check consistency and accuracy of names,
    acronyms, terminology
  • Check silence (untranscribed) regions for missed
    speech using customized tool

7
QTR Process
  • 0th pass automatic audio segmentation
  • Pause detection algorithm
  • No manual correction
  • 1st pass verbatim transcription
  • Limited to five times real time
  • Goal is to get the words right only
  • No special markup, orthography or capitalization
  • No extra time spent on difficult sections (e.g.,
    disfluencies)
  • QC pass minimal, semi-automated
  • Spell check
  • Format check
  • No check of transcript content, consistency of
    names/terms, etc.

8
CTR vs. QTR
9
Example
10
Unique Challenges
  • Many speakers takes longer to transcribe!
  • Impact of overlapping speech, even using IHM
    audio
  • Varying levels of speaker participation
  • Often no speech but other speaker/background
    noise
  • Meeting content
  • All over the map, from games to technical
    meetings
  • Lack of customized transcription tools
  • Existing tools optimized for
  • 1-channel, multispeaker per channel (BN)
  • 2-channel, one speaker per channel (CTS)
  • Needed a tool that merges features of each
  • Arbitrary number of channels, speakers
  • Easily move between mixed and individual signal
    playback
  • Access to video would also help disambiguate

11
XTrans
  • Multipurpose speech annotation tool
  • Multilingual, multi-platform
  • written in Python
  • AGTK infrastructure
  • Customized task modules
  • Careful transcription
  • Specialized QC functions
  • Quick transcription
  • Timed mode
  • Metadata annotation
  • Structural features, speaker diarization
  • Correction mode
  • e.g., correct automatic transcript or QTR ? CTR
  • Comparison and adjudication of multiple
    transcripts
  • Allows video input

12
One Channel View
13
One Speaker, MultiChannel
14
MultiSpeaker, MultiChannel
15
PanelView
16
Adjudication Mode
Write a Comment
User Comments (0)
About PowerShow.com