Title: Agenda
1Agenda
Monday
Tuesday
Wednesday
900 1200 planning next steps (esp. review)
900 1200 ITC, UPC demo, SonyUKA
WP10 Towards Lab/Museum Demo
1300 1530 suggestions/discussion on
demonstrator type(s) and location(s)
1400 1500 project status and administrative
items
1530 1830 reports on work done (from all
sites)
1600 1800 discussion / decision on
demonstrator type(s) and location(s)
1900 dinner
2Status of the Project
3Deliverable Status (now 23)
Deliverable
Deliverable
due
due
9.1
Catalan Wordnet
12
5.1
Software for robust speech rec. and speaker
localisation / ID under reverberant/noisy condit.
24
10.2
Initial Dialog/Discourse Model Software Component
15
6.3
Software to adapt acoustic and Lang. model to
current context
6.2
Conversational speech recog. optimised with data
from distant-speaking microphones
24
18
7.2
Description of the interface to the conversation
context model program for selecting appropr. mode
of presentation
24
3.2
Additional Development Test Data Package and
update of test-bed software
24
2.2, 10.3 7.3, 9.2 4.3, 11.2 8.2
30 32 35 39
4.2
Automatic selection of video presentation based
on communication activity
24
4Project Review
- probably end of October (week of 27th)
- as part of greater review event together with
other projects - same style as last year
- probably same location as last year
- same reviewers will be asked to do the review
again (Jean-Marc Langé, Christian Wellekens)
5Options for Demonstrator
- Place/Event Duration Domain
- a) Forum 2004 6 Weeks Forum Visitor Service
- Seminar at ACL 1-x Days any
- UPC 2003 1-x Days any
- Museum Grenoble ?
- Museum KA (ZKM) 1 Weeks
- Lab INPG and/or UJF 1 Day
- Lab Karlsruhe 1 Day
- Lab Sony 1 Day
preferably Barcelona
Museum Visitor Service
Lab Visitor Service
6Options Pros and Cons
Place/Event Pro Con a) Forum
2004 Visibility, Contract Duration,
Forum, Data, Cost b) Seminar at ACL at
Forum, Languages, Duration, Domain (any) c)
UPC 2003 Languages, Duration, Visibility Do
main, Organisation, Cost d) Museum
Grenoble Experience, Data e) Museum KA
(ZKM) Cooperation Languages f) g) h) Labs
at INPG,UJF,UKA,Sony Organisation Languages,
Visibility
7Personnel at Karlsruhe
- Ivica Rogina will leave by the end of
September (due to legislation) - core Team will be
- Petra Gieselmann (Dialog)
- Matthias Wölfel (Speech, Integration)
- Hartwig Holzapfel (Dialog, Integration)
- Tobias Kluge (Room, Integration)
- others
- Alex Waibel
8OAA IntegrationStatusltMay 2003
Voice Input Close
Binaural Head
Janus English Close N-gram
Janus Spanish Close
Camera Man
Close CFG
Janus Cat. Close
Testimony Tracker
Focus of Attention
available and integrated
available, not integrated
Augmented Table
Information Retrieval
Dialog
in development
partially done
Translation
Room
9OAA Integration
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
10
5
6a
8
6b
13b
7a
Augmented Table
Information Retrieval
Dialog
Agent
7b
13a
17a
15
17b
Message
Topics
Testimonies
Translation
Room
Data
10OAA IntegrationStatusMay 2003
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
available and integrated
10
5
6a
8
6b
available, not integrated
13b
7a
Augmented Table
Information Retrieval
Dialog
7b
13a
17a
in development
15
17b
Topics
Testimonies
partially done
Translation
Room
planned
11OAA IntegrationStatus Sept. 2003
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
available and integrated
10
5
6a
8
6b
available, not integrated
13b
7a
Augmented Table
Information Retrieval
Dialog
7b
13a
17a
in development
15
17b
Topics
Testimonies
partially done
Translation
Room
planned
12Topic SpottingExperimental Environment
Segmenter
Audio Stream
1-1½m
Distant English Recogniser
OAA
Topic Detector
- seven topics (from switchboard evaluation) were
offered - fully spontaneous multi-human conversation
13Speech Recognizer
- As presented at ARPA RT-03 Workshop, Boston
- trained on 265h Switchboard and
Call-Home-English data - no model adaptation (no MLLR or VTLN)
- linear feature space adaptation, semi-tied full
covariances - fully continuous models (10k codebooks, 50k
mixtures) - vocabulary 41k from Switchboard Broadcast
News CNN - language model 3gram SWB 5gram class SWB
4gram BN 4gram CNN - performance on SWB-evaluation 2003 23.4 WER
14First Experimental Results
correctly recognized / incorrectly inserted
lassie movies movie pavillion people probably
volume old stuff try system see puts lassie pets
movies nice dinosaurs count pets permit movies
source ten talk years cut taxes cinema watching
matrix mexico matrix old movie rather drugs two
matrix metrics two came year movie theater
tuesday promotion mad fold matrix three saturday
watch wheel rider little legalize bit bizarre
matrix computer game screen saver played
played mall seen sprint playing game play like
movie screen thing first rest person shooter
right three around kill all exactly smoking guns
right actually read statistics many germans
jumpes smoke almost forty percent germans jobs
smoke decreasing days cigarette tax went idea
suppose case especially kids girls still
human percentage men higher among olders mobiles
switch topics terrorists smoke persons world
trade center towers technicalities looked like
smoking cigarettes sponsored stuff lucky strike
trying funny show terrorism alzheimers
politically incorrect topic exhibit like pictures
exhibitions parallels construct warhol
museum mothers desctruction cook photograps
paintings certainly paintings warhol warhol topic
talking health help fitness exercise regularly
went going gym right true actually playing
volleyball really healthy fitness fun fun phone
fitness studio swimming sitting thomas schaaf
sharks around show sure education
depiction computers education asking question
computers harm morning improve education heard
c.m.u. developed system helps children read truth
good really fun learn lessons read specifically
insist agree american style reading claim tested
high schools three old children high school
children johanna oceans never sure right probably
testing see kindergarten kindergarten right
remember legitimizing ended guy wall arpa
awful conference coincidence said end america
like five percent analphabets cause closed damage
like means hundred million dollars guys per year
educational program reduce beans used damage
hundred billion dollars probably hundred swimming
pool internet like anybody hungry restaurant know
couple restaurants days italian need chinese
mensa mensa already morning italian restaurant
sparkling chinese better mexican food quite right
requirement true two weeks really read frequent
good spicy spicy
15Statistics on First Experiment
Testset 729 words ( many noises) Content-Wor
ds 262 words Topic-Indicators 124
words Correctly recognised 54 content-words
(20.6) Missed 208 content-words
(79.4) Inserted 56 content-words Correctly
recognised 29 topic-indicators
(23.4) Missed 95 topic-indicators
(76.6) Inserted 9 topic-indicators
16Problems Identified
- wider range of signal energy gt difficulties
with segmentation of speech - some utterances
were missed completey - laughter and other
noises of close people can have much more
power than actual speech from distand people - gt difficulties with signal adaptation
- so far very poor recognition accuracy
- when talking about topic A many triggers for
topic B are uttered smoking ! (cigarette) tax,
Lassie-Movies ! pets, etc.
17Room for Improvements
- statistics on phrases
- wordnet create synonyms
- tfidf-statistics on available topics
- quality of speech recognizer (model and signal
adaptation) - adaptive speech segmenter
- improved signal quality (distant signal
binaural processing) - adapt vocabulary and language model to set of
topics
18Improvements to the Lecture Tracker
- optimised speed to 0.2?realtime (P4 2GHz)
- optimised automatic segmenter (min/max segment
sizes) - added laser-pointer interaction functionality
- stable pointing (projected finger)
- highlighting of selected presentation items
- drawing into presentation
- operating pull-down menus within presentation
19The ISL Lecture-Talk-Presentation Corpus
41 Lectures, talks, presentations (29 hours of
recorded audio) - recorded with lavaliere close
microphone - plus video if possible - plus slides
if available
- Transcription on word level disfluencies
spont. events
- Topics Speech Technology (ASR, IR, NLP, TTS,
Multimodality)
- Native speakers of English
- Non-native speakers of English with accent
20Lecture-Talk-Presentation Corpus
7 Lectures CMU faculty, ave
duration73 min, stdev 9min
23 Presentations Student presentations
ave duration 29 min, stdev 19 min
11 Talks Guest speakers and invited talks
ave duration 52 min, stdev
20 min
21Lecture-Talk-Presentation Corpus
- Transcription - first pass, second pass,
final check - segmented in turns or logical paragraphs
- contributions from the audience are
transcribed as long as they were understandable - transcription conventions similar to
Verbmobil conventions
Susi Burger Send us your dataCMU/ISL offers
to transcribe English data recorded at partner
locations