Can speech technology be useful for people with dysarthria? Speech technology - PowerPoint PPT Presentation

About This Presentation

Title:

Can speech technology be useful for people with dysarthria? Speech technology

Description:

output (text, speech, talking head, etc.) Hearing problems. Hearing aids, cochlear implants, etc. Speech recognition (of others) ... – PowerPoint PPT presentation

Number of Views:592

Avg rating:3.0/5.0

Slides: 34

Provided by: helmer

Category:

more less

Transcript and Presenter's Notes

Title: Can speech technology be useful for people with dysarthria? Speech technology

1
Can speech technology be usefulfor people with
dysarthria?Speech technology pathology

Helmer Strik
Language Speech
Dept. of Linguistics
Radboud University
Nijmegen

2
Outline

Speech technology pathology
Applications existing, possible
In practice
Target groups
Speech technology dysarthria
Introduction
Speech recognition for dysarthric speech
Conclusions

3
Applications

AAC (Augmentative Alternative Communication)
Improve communication
Interactive tools
Training, reading, listening
Assessment
Diagnosis, monitoring
Therapy

4
AAC

Speaking problems
Speech generation
Speech manipulation
Speech recognition (of handicapped)
output (text, speech, talking head, etc.)
Hearing problems
Hearing aids, cochlear implants, etc.
Speech recognition (of others)
output (text,sign language, talking head, etc.)

5
ASR output channel

text
ASR
speech synthesis
6
Interactive tools

Speech generation
Reading tools screen readers, reading pen, text
processors, etc.
Writing tools word prediction, TTS, (dedicated)
spell checking
Analysis, manipulation, training
Delayed Auditory Feedback (DAF) and Frequency
Altered Feedback (FAF), for stutterers
CAFET Computer-Aided Fluency Establishment
Training
CAPT Computer Assisted Pronunciation Training

7
Delayed Auditory Feedback (DAF) Frequency Altered
Feedback (FAF)
8
Assessment, therapy

Assessment diagnosis, monitoring
Therapy
Clinical setting, with expert
Speech analysis visualization, categorization,
etc.
IBM speech viewer
Research

9
Applications

Amount of applications differs
(from most to fewest)
speech generation
speech analysis, manipulation, etc.
speech recognition

10
In practice

Many existing applications
Many more are possible
However, relatively little use
Why?

11
In practice

However, relatively little use. Why?
Needed
Tailor made, flexible applications
Tailor made taking into account the capabilities
desires of the user environment
Flexible the capabilities desires often change
More user tests adequacy evaluation
instead of technology improvement performance
evaluation

12
Target groups

International Classification of Functioning,
Disability and Health (ICF)
Mental functions aphasia, dyslexia, mental
disabilities
Sensory functions blindness, deafness, both
Voice speech functions dysarthria, anarthria,
mutism, stuttering
Motorial functions dyspraxia, apraxia, RSI /
UEMSD (Upper Extremity Musculoskeletal Disorders)

13
Speech technology dysarthria

Dysarthria speech disorder caused by
dysfunctioning of nerves and muscles
Many different kinds of dysarthria

14
Can speech technology be useful for people with
dysarthria?

Yes!
AAC
Interactive tools
Assessment
Therapy

15
Can speech technology be useful for people with
dysarthria?

Speech generation
Prefer voice similar to their (old) voice
Preferably own voice
AAC
Manipulation
Speech recognition output channel
Pronunciation training
Speech recognition, analysis, feedback, etc.

16
Speech technology dysarthria ASR for
dysarthric speech

Questions
How well can dysarthric speech be recognized by a
standard (non-dysarthric) speech recognizer?
Will the recognition results improve if we train
the recognizer on speech of dysarthric speakers?

17
Experimental setup Speakers

Dysarthric 2 Dutch males, DYS1 DYS2
Reference 2 Dutch males, REF1 REF2
Total duration of the speech material (minutes)
DYS 2 speaks more slowly

DYS1 DYS2 REF1 REF2
8.5 min. 12.8 min. 9.1 min. 7.9 min.
18
Experimental setup Speech tasks

All four speakers read the same list of items,
consisting of four different tasks
1. NUM numbers 0-12 spoken in isolation
2. PFU from Polyphone the 50 most Frequent
Utterances
3. PMS 130 Plomp-Mimpen Sentences (semantically
unpredictable)
4. PRS 10 Phonetically Rich Sentences

19
Experimental setup Speech tasks

Number of utterances words per task
The NUM and PRS task were both read three times.

NUM PFU PMS PRS
utt. 39 50 130 30
words 39 91 809 336
20
Experimental setup Speech recognizer

General specifications
Standard phone based recognizer
37 context independent phones
3-state HMMs
14 cepstral coeffiecients deltas from Melbank
freq 350-3400 Hz
16ms Hamming window, 10 ms step

21
Experimental setup Experiments

Lexicon language model (uni- and bigram)
Based on all words in 4 tasks
Task specific same for all speakers
Perplexity

NUM PFU PMS PRS
13 15 8 2
22
Experimental setup Speaker Indep. Dependent

SI Speaker Independent training material
Polyphone (5000 speaker Dutch telephone
database)
4022 connected digit strings
3702 polyphone most frequent items
20,110 phonetically rich sentences
SD Speaker Dependent training material
Speakers own speech

23
Speaker Independent (SI) Results
Word Error Rates (WERs) for SI recognition
24
Speaker Independent (SI)Conclusions

REF better than DYS
DYS1 better than DYS2 in short utterances because
of speaking rate (table 1)
Results DYS quite reasonable (especially for
sentences) because of tight language model

25
Speaker Dependent (SD)
Models (also) trained on speech of
speakers Jackknife procedure
semi randomly selected test set rest
training set
26
Speaker Dependent (SD) Results

Word Error Rates (WERs) for the whole test set
for different number of Gaussians (2N)

2N 0 2 4 8 16 32 64
DYS1 14.3 12.0 9.5 9.7 10.3 11.7 15.1
DYS2 7.5 4.1 2.9 2.4 3.0 3.8 5.3
REF1 3.4 2.2 1.8 2.6 3.5 4.0 4.2
REF2 3.6 2.4 2.8 3.0 3.3 3.9 4.4
27
Speaker Dependent (SD) Results
Word Error Rates (WERs) for SD recognition
28
Speaker Dependent (SD) Results

Word Error Rates (WERs)
for SD / SI recognition

DYS1 DYS2 REF1 REF2
NUM 2.6 / 15.4 0.0 / 41.0 0.0 / 0.0 0.0 / 0.0
PFU 9.9 / 19.8 5.5 / 22.0 1.1 / 1.1 2.2 / 1.1
PMS 12.2 / 30.3 3.3 / 15.2 2.2 / 2.1 3.6 / 1.7
PRS 3.6 / 7.4 1.5 / 4.5 1.2 / 1.2 1.2 / 0.0
29
Speaker Dependent (SD)Conclusions

For REF results for SD equal or worse than for SI
(counterbalance between own models, but less
training material)
For DYS results for SD much better than for SI
DYS2 better than DYS1, almost as good as REF

30
ConclusionsASR for dysarthric speech

Results for DYS2 are remarkable
SI High WERs, esp. for NUM PFU
SD sometimes better than REF
Low speaking rate!
Automatic recognition of dysarthric speech is
possible. Better results
Lower speaking rate
Speaker dependent models
Even better also speaker dependent lexicon

31
ConclusionsST pathology

Applications
Many already exist
Many more are possible
Needed
Tailor made, flexible applications
User tests, adequacy evaluation

32
References

http//lands.let.ru.nl/TSpublic/strik/pres/
p97-SPACE.ppt
E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002)
Automatic recognition of dutch dysarthric
speech A pilot study. ICSLP-2002, Denver, USA,
pp. 661-664.
T. Rietveld I. Stolte (2005)
Taal- en spraaktechnologie en communicatieve
beperkingen

33
END

Write a Comment

User Comments (0)