Title: Can speech technology be useful for people with dysarthria? Speech technology
1Can speech technology be usefulfor people with
dysarthria?Speech technology pathology
- Helmer Strik
- Language Speech
- Dept. of Linguistics
- Radboud University
- Nijmegen
2Outline
- Speech technology pathology
- Applications existing, possible
- In practice
- Target groups
- Speech technology dysarthria
- Introduction
- Speech recognition for dysarthric speech
- Conclusions
3Applications
- AAC (Augmentative Alternative Communication)
- Improve communication
- Interactive tools
- Training, reading, listening
- Assessment
- Diagnosis, monitoring
- Therapy
4AAC
- Speaking problems
- Speech generation
- Speech manipulation
- Speech recognition (of handicapped)
- output (text, speech, talking head, etc.)
- Hearing problems
- Hearing aids, cochlear implants, etc.
- Speech recognition (of others)
- output (text,sign language, talking head, etc.)
5ASR output channel
text
ASR
speech synthesis
6Interactive tools
- Speech generation
- Reading tools screen readers, reading pen, text
processors, etc. - Writing tools word prediction, TTS, (dedicated)
spell checking - Analysis, manipulation, training
- Delayed Auditory Feedback (DAF) and Frequency
Altered Feedback (FAF), for stutterers - CAFET Computer-Aided Fluency Establishment
Training - CAPT Computer Assisted Pronunciation Training
7Delayed Auditory Feedback (DAF) Frequency Altered
Feedback (FAF)
8Assessment, therapy
- Assessment diagnosis, monitoring
- Therapy
- Clinical setting, with expert
- Speech analysis visualization, categorization,
etc. - IBM speech viewer
-
- Research
9Applications
- Amount of applications differs
- (from most to fewest)
- speech generation
- speech analysis, manipulation, etc.
- speech recognition
10In practice
- Many existing applications
- Many more are possible
- However, relatively little use
- Why?
11In practice
- However, relatively little use. Why?
- Needed
- Tailor made, flexible applications
- Tailor made taking into account the capabilities
desires of the user environment - Flexible the capabilities desires often change
- More user tests adequacy evaluation
- instead of technology improvement performance
evaluation
12Target groups
- International Classification of Functioning,
Disability and Health (ICF) - Mental functions aphasia, dyslexia, mental
disabilities - Sensory functions blindness, deafness, both
- Voice speech functions dysarthria, anarthria,
mutism, stuttering - Motorial functions dyspraxia, apraxia, RSI /
UEMSD (Upper Extremity Musculoskeletal Disorders)
13Speech technology dysarthria
- Dysarthria speech disorder caused by
dysfunctioning of nerves and muscles - Many different kinds of dysarthria
14Can speech technology be useful for people with
dysarthria?
- Yes!
- AAC
- Interactive tools
- Assessment
- Therapy
15Can speech technology be useful for people with
dysarthria?
- Speech generation
- Prefer voice similar to their (old) voice
- Preferably own voice
- AAC
- Manipulation
- Speech recognition output channel
- Pronunciation training
- Speech recognition, analysis, feedback, etc.
16Speech technology dysarthria ASR for
dysarthric speech
- Questions
- How well can dysarthric speech be recognized by a
standard (non-dysarthric) speech recognizer? - Will the recognition results improve if we train
the recognizer on speech of dysarthric speakers?
17Experimental setup Speakers
- Dysarthric 2 Dutch males, DYS1 DYS2
- Reference 2 Dutch males, REF1 REF2
- Total duration of the speech material (minutes)
- DYS 2 speaks more slowly
DYS1 DYS2 REF1 REF2
8.5 min. 12.8 min. 9.1 min. 7.9 min.
18Experimental setup Speech tasks
- All four speakers read the same list of items,
consisting of four different tasks - 1. NUM numbers 0-12 spoken in isolation
- 2. PFU from Polyphone the 50 most Frequent
Utterances - 3. PMS 130 Plomp-Mimpen Sentences (semantically
unpredictable) - 4. PRS 10 Phonetically Rich Sentences
19Experimental setup Speech tasks
- Number of utterances words per task
- The NUM and PRS task were both read three times.
NUM PFU PMS PRS
utt. 39 50 130 30
words 39 91 809 336
20Experimental setup Speech recognizer
- General specifications
- Standard phone based recognizer
- 37 context independent phones
- 3-state HMMs
- 14 cepstral coeffiecients deltas from Melbank
freq 350-3400 Hz - 16ms Hamming window, 10 ms step
21Experimental setup Experiments
- Lexicon language model (uni- and bigram)
- Based on all words in 4 tasks
- Task specific same for all speakers
- Perplexity
NUM PFU PMS PRS
13 15 8 2
22Experimental setup Speaker Indep. Dependent
- SI Speaker Independent training material
- Polyphone (5000 speaker Dutch telephone
database) - 4022 connected digit strings
- 3702 polyphone most frequent items
- 20,110 phonetically rich sentences
- SD Speaker Dependent training material
- Speakers own speech
23Speaker Independent (SI) Results
Word Error Rates (WERs) for SI recognition
24Speaker Independent (SI)Conclusions
- REF better than DYS
- DYS1 better than DYS2 in short utterances because
of speaking rate (table 1) - Results DYS quite reasonable (especially for
sentences) because of tight language model
25Speaker Dependent (SD)
Models (also) trained on speech of
speakers Jackknife procedure
semi randomly selected test set rest
training set
26Speaker Dependent (SD) Results
- Word Error Rates (WERs) for the whole test set
- for different number of Gaussians (2N)
2N 0 2 4 8 16 32 64
DYS1 14.3 12.0 9.5 9.7 10.3 11.7 15.1
DYS2 7.5 4.1 2.9 2.4 3.0 3.8 5.3
REF1 3.4 2.2 1.8 2.6 3.5 4.0 4.2
REF2 3.6 2.4 2.8 3.0 3.3 3.9 4.4
27Speaker Dependent (SD) Results
Word Error Rates (WERs) for SD recognition
28Speaker Dependent (SD) Results
- Word Error Rates (WERs)
- for SD / SI recognition
DYS1 DYS2 REF1 REF2
NUM 2.6 / 15.4 0.0 / 41.0 0.0 / 0.0 0.0 / 0.0
PFU 9.9 / 19.8 5.5 / 22.0 1.1 / 1.1 2.2 / 1.1
PMS 12.2 / 30.3 3.3 / 15.2 2.2 / 2.1 3.6 / 1.7
PRS 3.6 / 7.4 1.5 / 4.5 1.2 / 1.2 1.2 / 0.0
29Speaker Dependent (SD)Conclusions
- For REF results for SD equal or worse than for SI
(counterbalance between own models, but less
training material) - For DYS results for SD much better than for SI
- DYS2 better than DYS1, almost as good as REF
30ConclusionsASR for dysarthric speech
- Results for DYS2 are remarkable
- SI High WERs, esp. for NUM PFU
- SD sometimes better than REF
- Low speaking rate!
- Automatic recognition of dysarthric speech is
possible. Better results - Lower speaking rate
- Speaker dependent models
- Even better also speaker dependent lexicon
31ConclusionsST pathology
- Applications
- Many already exist
- Many more are possible
- Needed
- Tailor made, flexible applications
- User tests, adequacy evaluation
32References
- http//lands.let.ru.nl/TSpublic/strik/pres/
- p97-SPACE.ppt
- E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002)
Automatic recognition of dutch dysarthric
speech A pilot study. ICSLP-2002, Denver, USA,
pp. 661-664. -
- T. Rietveld I. Stolte (2005)
- Taal- en spraaktechnologie en communicatieve
beperkingen
33END