The CUED Speech Group - PowerPoint PPT Presentation

About This Presentation
Title:

The CUED Speech Group

Description:

The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department – PowerPoint PPT presentation

Number of Views:321
Avg rating:3.0/5.0
Slides: 20
Provided by: Philip669
Category:
Tags: cued | applying | group | speech

less

Transcript and Presenter's Notes

Title: The CUED Speech Group


1
The CUED Speech Group
  • Dr Mark Gales
  • Machine Intelligence Laboratory
  • Cambridge University Engineering Department

2
1. CUED Organisation
130 1100 450
Academic Staff Undergrads Postgrads
CUED 6 Divisions
A. ThermoFluids
B. Electrical Eng
C. Mechanics
D. Structures
E. Management
F. Information Engineering Division
3
2. Speech Group Overview
  • Primary research interests in speech processing
  • 4 members of Academic Staff
  • 9 Research Assistants/Associates
  • 12 PhD students

3
4
Principal Staff and Research Interests
  • Dr Bill Byrne
  • Statistical machine translation
  • Automatic speech recognition
  • Cross-lingual adaptation and synthesis
  • Dr Mark Gales
  • Large vocabulary speech recognition
  • Speaker and environment adaptation
  • Kernel methods for speech processing
  • Professor Phil Woodland
  • Large vocabulary speech recognition/meta-data
    extraction
  • Information retrieval from audio
  • ASR and SMT integration
  • Professor Steve Young
  • Statistical dialogue modelling
  • Voice conversion

4
5
Research Interests
  • data driven semantic processing
  • statistical modelling

Dialogue
5
6
Example Current and Recent Projects
  • Global Autonomous Language Exploitation
  • DARPA GALE funded (collab with BBN, LIMSI, ISI )
  • HTK Rich Audio Trancription Project (finished
    2004)
  • DARPA EARS funded
  • CLASSIC Computational Learning in Adaptive
    Systems for Spoken Conversation
  • EU (collab with Edinburgh, France Telecom,,)
  • EMIME Effective Multilingual Interaction in
    Mobile Environments
  • EU (collab with Edinburgh, IDIAP, Nagoya
    Institute of Technology )
  • R2EAP Rapid and Reliable Environment Aware
    Processing
  • TREL funded

Also active collaborations with IBM, Google,
Microsoft,
6
7
3. Rich Audio Transcription Project
Rich Transcript
Natural Speech
English/Mandarin
  • DARPA-funded project
  • Effective Affordable Reusable Speech-to-text
    (EARS) program
  • Transform natural speech into human readable form
  • Need to add meta-data to the ASR output
  • For example speaker-terms/handle disfluencies

7
8
Rich Text Transcription
ASR Output
okay carl uh do you exercise yeah actually um i
belong to a gym down here golds gym and uh i try
to exercise five days a week um and now and
then ill ill get it interrupted by work or just
full of crazy hours you know
Meta-Data Extraction (MDE) Markup
Speaker1 / okay carl F uh do you exercise
/ Speaker2 / DM yeah actually F um i belong
to a gym down here / / golds gym / / and F
uh i try to exercise five days a week F um /
/ and now and then REP ill ill get it
interrupted by work or just full of crazy
hours DM you know /
Final Text
Speaker1 Okay Carl do you exercise? Speaker2
I belong to a gym down here, Golds Gym, and I
try to exercise five days a
week and now and then Ill get it
interrupted by work or just full of crazy
hours.
8
9
4. Statistical Machine Translation
  • Aim is to translate from one language to another
  • For example translate text from Chinese to
    English
  • Process involves collecting parallel (bitext)
    corpora
  • Align at document/sentence/word level
  • Use statistical approaches to obtain most
    probable translation

9
10
GALE Integrated ASR and SMT
  • Member of the AGILE team (lead by BBN)
  • The DARPA Global Autonomous Language
    Exploitation (GALE) program has the aim of
    developing speech and language processing
    technologies to recognise, analyse, and
    translate speech and text into readable English.
  • Primary languages for STT/SMT Chinese and Arabic

10
11
5. Statistical Dialogue Modelling
  • Use a statistical framework for all stages

11
12
CLASSiC Project Architecture
Legend ASR Automatic Speech recognition NLU
Natural Language Understanding DM Dialogue
Management NLG Natural Language Generation TTS
Text To Speech
st Input Sound Signal ut Utterance
Hypotheses ht Conceptual Interpretation
Hypotheses at Action Hypotheses wt Word String
Hypotheses rt Speech Synthesis Hypotheses X
possible elimination of hypotheses
13
6. EMIME Speech-to-Speech Translation
  • Personalised speech-to-speech translation
  • Learn characteristics of a users speech
  • Reproduce users speech in synthesis
  • Cross-lingual capability
  • Map speaker characteristics across languages
  • Unified approach for recognition and synthesis
  • Common statistical model hidden Markov models
  • Simplifies adaptation (common to both synthesis
    and recognition)
  • Improve understanding of recognition/synthesis

13
14
7. R2EAP Robust Speech Recognition
  • Current ASR performance degrades with changing
    noise
  • Major limitation on deploying speech recognition
    systems

14
15
Project Overview
  • Aims of the project
  • To develop techniques that allow ASR system to
    rapidly respond to changing acoustic conditions
  • While maintaining high levels of recognition
    accuracy over a wide range of conditions
  • And be flexible so they are applicable to a wide
    range of tasks and computational requirements.
  • Project started in January 2008 3 year duration
  • Close collaboration with TREL Cambridge Lab.
  • Common development code-base extended HTK
  • Common evaluation sets
  • Builds on current (and previous) PhD studentships
  • Monthly joint meetings

15
16
Approach Model Compensation
  • Model compensation schemes highly effective BUT
  • Slow compared to feature compensation scheme
  • Need schemes to improve speed while maintaining
    performance
  • Also automatically detect/track changing noise
    conditions

16
17
8. Toshiba-CUED PhD Collaborations
  • To date 5 Research studentships (partly) funded
    by Toshiba
  • Shared software - code transfer both directions
  • Shared data sets - both (emotional) synthesis and
    ASR
  • 6 monthly reports and review meetings
  • Students and topics
  • Hank Liao (2003-2007) Uncertainty decoding for
    Noise Robust ASR
  • Catherine Breslin (2004-2008) Complementary
    System Generation and Combination
  • Zeynep Inanoglu (2004-2008) Recognition and
    Synthesis of Emotion
  • Rogier van Dalen (2007-2010) Noise Robust ASR
  • Stuart Moore (2007-2010) Number Sense
    Disambiguation
  • Very useful and successful collaboration

17
18
9. HTK Version 3.0 Development
  • HTK is a free software toolkit for developing
    HMM-based systems
  • 1000s of users worldwide
  • widely used for research by universities and
    industry

1989 1992 1993 1999 2000 date
V1.0 1.4 V1.5 2.3 V3.0 V3.4
Initial development at CUED Commercial
development by Entropic Academic development at
CUED
  • Development partly funded by Microsoft and
    DARPA EARS Project
  • Primary dissemination route for CU research
    output

2004 - date the ATK Real-time HTK-based
recognition system
18
19
10. Summary
  • Speech Group works on many aspects of speech
    processing
  • Large vocabulary speech recognition
  • Statistical machine translation
  • Statistical dialogue systems
  • Speech synthesis and voice conversion
  • Statistical machine learning approach to all
    applications
  • World-wide reputation for research
  • CUED systems have defined state-of-the-art for
    the past decade
  • Developed a number of techniques widely used by
    industry
  • Hidden Markov Model Toolkit (HTK)
  • Freely-available software, 1000s of users
    worldwide
  • State-of-the art features (discriminative
    training, adaptation )
  • HMM Synthesis extension (HTS) from Nagoya
    Institute of Technology

19
Write a Comment
User Comments (0)
About PowerShow.com