Title: The CUED Speech Group
1The CUED Speech Group
- Dr Mark Gales
- Machine Intelligence Laboratory
- Cambridge University Engineering Department
21. CUED Organisation
130 1100 450
Academic Staff Undergrads Postgrads
CUED 6 Divisions
A. ThermoFluids
B. Electrical Eng
C. Mechanics
D. Structures
E. Management
F. Information Engineering Division
32. Speech Group Overview
- Primary research interests in speech processing
- 4 members of Academic Staff
- 9 Research Assistants/Associates
- 12 PhD students
3
4Principal Staff and Research Interests
- Dr Bill Byrne
- Statistical machine translation
- Automatic speech recognition
- Cross-lingual adaptation and synthesis
- Dr Mark Gales
- Large vocabulary speech recognition
- Speaker and environment adaptation
- Kernel methods for speech processing
- Professor Phil Woodland
- Large vocabulary speech recognition/meta-data
extraction - Information retrieval from audio
- ASR and SMT integration
- Professor Steve Young
- Statistical dialogue modelling
- Voice conversion
4
5Research Interests
- data driven semantic processing
- statistical modelling
Dialogue
5
6Example Current and Recent Projects
- Global Autonomous Language Exploitation
- DARPA GALE funded (collab with BBN, LIMSI, ISI )
- HTK Rich Audio Trancription Project (finished
2004) - DARPA EARS funded
- CLASSIC Computational Learning in Adaptive
Systems for Spoken Conversation - EU (collab with Edinburgh, France Telecom,,)
- EMIME Effective Multilingual Interaction in
Mobile Environments - EU (collab with Edinburgh, IDIAP, Nagoya
Institute of Technology ) - R2EAP Rapid and Reliable Environment Aware
Processing - TREL funded
Also active collaborations with IBM, Google,
Microsoft,
6
73. Rich Audio Transcription Project
Rich Transcript
Natural Speech
English/Mandarin
- DARPA-funded project
- Effective Affordable Reusable Speech-to-text
(EARS) program - Transform natural speech into human readable form
- Need to add meta-data to the ASR output
- For example speaker-terms/handle disfluencies
7
8Rich Text Transcription
ASR Output
okay carl uh do you exercise yeah actually um i
belong to a gym down here golds gym and uh i try
to exercise five days a week um and now and
then ill ill get it interrupted by work or just
full of crazy hours you know
Meta-Data Extraction (MDE) Markup
Speaker1 / okay carl F uh do you exercise
/ Speaker2 / DM yeah actually F um i belong
to a gym down here / / golds gym / / and F
uh i try to exercise five days a week F um /
/ and now and then REP ill ill get it
interrupted by work or just full of crazy
hours DM you know /
Final Text
Speaker1 Okay Carl do you exercise? Speaker2
I belong to a gym down here, Golds Gym, and I
try to exercise five days a
week and now and then Ill get it
interrupted by work or just full of crazy
hours.
8
94. Statistical Machine Translation
- Aim is to translate from one language to another
- For example translate text from Chinese to
English
- Process involves collecting parallel (bitext)
corpora - Align at document/sentence/word level
- Use statistical approaches to obtain most
probable translation
9
10GALE Integrated ASR and SMT
- Member of the AGILE team (lead by BBN)
- The DARPA Global Autonomous Language
Exploitation (GALE) program has the aim of
developing speech and language processing
technologies to recognise, analyse, and
translate speech and text into readable English. - Primary languages for STT/SMT Chinese and Arabic
10
115. Statistical Dialogue Modelling
- Use a statistical framework for all stages
11
12CLASSiC Project Architecture
Legend ASR Automatic Speech recognition NLU
Natural Language Understanding DM Dialogue
Management NLG Natural Language Generation TTS
Text To Speech
st Input Sound Signal ut Utterance
Hypotheses ht Conceptual Interpretation
Hypotheses at Action Hypotheses wt Word String
Hypotheses rt Speech Synthesis Hypotheses X
possible elimination of hypotheses
136. EMIME Speech-to-Speech Translation
- Personalised speech-to-speech translation
- Learn characteristics of a users speech
- Reproduce users speech in synthesis
- Cross-lingual capability
- Map speaker characteristics across languages
- Unified approach for recognition and synthesis
- Common statistical model hidden Markov models
- Simplifies adaptation (common to both synthesis
and recognition) - Improve understanding of recognition/synthesis
13
147. R2EAP Robust Speech Recognition
- Current ASR performance degrades with changing
noise - Major limitation on deploying speech recognition
systems
14
15Project Overview
- Aims of the project
- To develop techniques that allow ASR system to
rapidly respond to changing acoustic conditions - While maintaining high levels of recognition
accuracy over a wide range of conditions - And be flexible so they are applicable to a wide
range of tasks and computational requirements. - Project started in January 2008 3 year duration
- Close collaboration with TREL Cambridge Lab.
- Common development code-base extended HTK
- Common evaluation sets
- Builds on current (and previous) PhD studentships
- Monthly joint meetings
15
16 Approach Model Compensation
- Model compensation schemes highly effective BUT
- Slow compared to feature compensation scheme
- Need schemes to improve speed while maintaining
performance - Also automatically detect/track changing noise
conditions
16
178. Toshiba-CUED PhD Collaborations
- To date 5 Research studentships (partly) funded
by Toshiba - Shared software - code transfer both directions
- Shared data sets - both (emotional) synthesis and
ASR - 6 monthly reports and review meetings
- Students and topics
- Hank Liao (2003-2007) Uncertainty decoding for
Noise Robust ASR - Catherine Breslin (2004-2008) Complementary
System Generation and Combination - Zeynep Inanoglu (2004-2008) Recognition and
Synthesis of Emotion - Rogier van Dalen (2007-2010) Noise Robust ASR
- Stuart Moore (2007-2010) Number Sense
Disambiguation - Very useful and successful collaboration
17
189. HTK Version 3.0 Development
- HTK is a free software toolkit for developing
HMM-based systems - 1000s of users worldwide
- widely used for research by universities and
industry
1989 1992 1993 1999 2000 date
V1.0 1.4 V1.5 2.3 V3.0 V3.4
Initial development at CUED Commercial
development by Entropic Academic development at
CUED
- Development partly funded by Microsoft and
DARPA EARS Project - Primary dissemination route for CU research
output
2004 - date the ATK Real-time HTK-based
recognition system
18
1910. Summary
- Speech Group works on many aspects of speech
processing - Large vocabulary speech recognition
- Statistical machine translation
- Statistical dialogue systems
- Speech synthesis and voice conversion
- Statistical machine learning approach to all
applications - World-wide reputation for research
- CUED systems have defined state-of-the-art for
the past decade - Developed a number of techniques widely used by
industry - Hidden Markov Model Toolkit (HTK)
- Freely-available software, 1000s of users
worldwide - State-of-the art features (discriminative
training, adaptation ) - HMM Synthesis extension (HTS) from Nagoya
Institute of Technology
19