The CUED Speech Group - PowerPoint PPT Presentation

About This Presentation

Title:

The CUED Speech Group

Description:

The CUED Speech Group Dr Mark Gales Machine Intelligence Laboratory Cambridge University Engineering Department – PowerPoint PPT presentation

Number of Views:326

Avg rating:3.0/5.0

Slides: 20

Provided by: Philip669

Category:

more less

Transcript and Presenter's Notes

Title: The CUED Speech Group

1
The CUED Speech Group

Dr Mark Gales
Machine Intelligence Laboratory
Cambridge University Engineering Department

2
1. CUED Organisation
130 1100 450
Academic Staff Undergrads Postgrads
CUED 6 Divisions
A. ThermoFluids
B. Electrical Eng
C. Mechanics
D. Structures
E. Management
F. Information Engineering Division
3
2. Speech Group Overview

Primary research interests in speech processing
4 members of Academic Staff
9 Research Assistants/Associates
12 PhD students

3
4
Principal Staff and Research Interests

Dr Bill Byrne
Statistical machine translation
Automatic speech recognition
Cross-lingual adaptation and synthesis
Dr Mark Gales
Large vocabulary speech recognition
Speaker and environment adaptation
Kernel methods for speech processing
Professor Phil Woodland
Large vocabulary speech recognition/meta-data
extraction
Information retrieval from audio
ASR and SMT integration
Professor Steve Young
Statistical dialogue modelling
Voice conversion

4
5
Research Interests

data driven semantic processing
statistical modelling

Dialogue
5
6
Example Current and Recent Projects

Global Autonomous Language Exploitation
DARPA GALE funded (collab with BBN, LIMSI, ISI )
HTK Rich Audio Trancription Project (finished
2004)
DARPA EARS funded
CLASSIC Computational Learning in Adaptive
Systems for Spoken Conversation
EU (collab with Edinburgh, France Telecom,,)
EMIME Effective Multilingual Interaction in
Mobile Environments
EU (collab with Edinburgh, IDIAP, Nagoya
Institute of Technology )
R2EAP Rapid and Reliable Environment Aware
Processing
TREL funded

Also active collaborations with IBM, Google,
Microsoft,
6
7
3. Rich Audio Transcription Project
Rich Transcript
Natural Speech
English/Mandarin

DARPA-funded project
Effective Affordable Reusable Speech-to-text
(EARS) program
Transform natural speech into human readable form
Need to add meta-data to the ASR output
For example speaker-terms/handle disfluencies

7
8
Rich Text Transcription
ASR Output
okay carl uh do you exercise yeah actually um i
belong to a gym down here golds gym and uh i try
to exercise five days a week um and now and
then ill ill get it interrupted by work or just
full of crazy hours you know
Meta-Data Extraction (MDE) Markup
Speaker1 / okay carl F uh do you exercise
/ Speaker2 / DM yeah actually F um i belong
to a gym down here / / golds gym / / and F
uh i try to exercise five days a week F um /
/ and now and then REP ill ill get it
interrupted by work or just full of crazy
hours DM you know /
Final Text
Speaker1 Okay Carl do you exercise? Speaker2
I belong to a gym down here, Golds Gym, and I
try to exercise five days a
week and now and then Ill get it
interrupted by work or just full of crazy
hours.
8
9
4. Statistical Machine Translation

Aim is to translate from one language to another
For example translate text from Chinese to
English

Process involves collecting parallel (bitext)
corpora
Align at document/sentence/word level
Use statistical approaches to obtain most
probable translation

9
10
GALE Integrated ASR and SMT

Member of the AGILE team (lead by BBN)
The DARPA Global Autonomous Language
Exploitation (GALE) program has the aim of
developing speech and language processing
technologies to recognise, analyse, and
translate speech and text into readable English.
Primary languages for STT/SMT Chinese and Arabic

10
11
5. Statistical Dialogue Modelling

Use a statistical framework for all stages

11
12
CLASSiC Project Architecture
Legend ASR Automatic Speech recognition NLU
Natural Language Understanding DM Dialogue
Management NLG Natural Language Generation TTS
Text To Speech
st Input Sound Signal ut Utterance
Hypotheses ht Conceptual Interpretation
Hypotheses at Action Hypotheses wt Word String
Hypotheses rt Speech Synthesis Hypotheses X
possible elimination of hypotheses
13
6. EMIME Speech-to-Speech Translation

Personalised speech-to-speech translation
Learn characteristics of a users speech
Reproduce users speech in synthesis
Cross-lingual capability
Map speaker characteristics across languages
Unified approach for recognition and synthesis
Common statistical model hidden Markov models
Simplifies adaptation (common to both synthesis
and recognition)
Improve understanding of recognition/synthesis

13
14
7. R2EAP Robust Speech Recognition

Current ASR performance degrades with changing
noise
Major limitation on deploying speech recognition
systems

14
15
Project Overview

Aims of the project
To develop techniques that allow ASR system to
rapidly respond to changing acoustic conditions
While maintaining high levels of recognition
accuracy over a wide range of conditions
And be flexible so they are applicable to a wide
range of tasks and computational requirements.
Project started in January 2008 3 year duration
Close collaboration with TREL Cambridge Lab.
Common development code-base extended HTK
Common evaluation sets
Builds on current (and previous) PhD studentships
Monthly joint meetings

15
16
Approach Model Compensation