Can speech technology be useful for people with dysarthria? Speech technology - PowerPoint PPT Presentation

About This Presentation
Title:

Can speech technology be useful for people with dysarthria? Speech technology

Description:

output (text, speech, talking head, etc.) Hearing problems. Hearing aids, cochlear implants, etc. Speech recognition (of others) ... – PowerPoint PPT presentation

Number of Views:592
Avg rating:3.0/5.0
Slides: 34
Provided by: helmer
Category:

less

Transcript and Presenter's Notes

Title: Can speech technology be useful for people with dysarthria? Speech technology


1
Can speech technology be usefulfor people with
dysarthria?Speech technology pathology
  • Helmer Strik
  • Language Speech
  • Dept. of Linguistics
  • Radboud University
  • Nijmegen

2
Outline
  • Speech technology pathology
  • Applications existing, possible
  • In practice
  • Target groups
  • Speech technology dysarthria
  • Introduction
  • Speech recognition for dysarthric speech
  • Conclusions

3
Applications
  • AAC (Augmentative Alternative Communication)
  • Improve communication
  • Interactive tools
  • Training, reading, listening
  • Assessment
  • Diagnosis, monitoring
  • Therapy

4
AAC
  • Speaking problems
  • Speech generation
  • Speech manipulation
  • Speech recognition (of handicapped)
  • output (text, speech, talking head, etc.)
  • Hearing problems
  • Hearing aids, cochlear implants, etc.
  • Speech recognition (of others)
  • output (text,sign language, talking head, etc.)

5
ASR output channel
 
text
ASR
speech synthesis
6
Interactive tools
  • Speech generation
  • Reading tools screen readers, reading pen, text
    processors, etc.
  • Writing tools word prediction, TTS, (dedicated)
    spell checking
  • Analysis, manipulation, training
  • Delayed Auditory Feedback (DAF) and Frequency
    Altered Feedback (FAF), for stutterers
  • CAFET Computer-Aided Fluency Establishment
    Training
  • CAPT Computer Assisted Pronunciation Training

7
Delayed Auditory Feedback (DAF) Frequency Altered
Feedback (FAF)
8
Assessment, therapy
  • Assessment diagnosis, monitoring
  • Therapy
  • Clinical setting, with expert
  • Speech analysis visualization, categorization,
    etc.
  • IBM speech viewer
  • Research

9
Applications
  • Amount of applications differs
  • (from most to fewest)
  • speech generation
  • speech analysis, manipulation, etc.
  • speech recognition

10
In practice
  • Many existing applications
  • Many more are possible
  • However, relatively little use
  • Why?

11
In practice
  • However, relatively little use. Why?
  • Needed
  • Tailor made, flexible applications
  • Tailor made taking into account the capabilities
    desires of the user environment
  • Flexible the capabilities desires often change
  • More user tests adequacy evaluation
  • instead of technology improvement performance
    evaluation

12
Target groups
  • International Classification of Functioning,
    Disability and Health (ICF)
  • Mental functions aphasia, dyslexia, mental
    disabilities
  • Sensory functions blindness, deafness, both
  • Voice speech functions dysarthria, anarthria,
    mutism, stuttering
  • Motorial functions dyspraxia, apraxia, RSI /
    UEMSD (Upper Extremity Musculoskeletal Disorders)

13
Speech technology dysarthria
  • Dysarthria speech disorder caused by
    dysfunctioning of nerves and muscles
  • Many different kinds of dysarthria

14
Can speech technology be useful for people with
dysarthria?
  • Yes!
  • AAC
  • Interactive tools
  • Assessment
  • Therapy

15
Can speech technology be useful for people with
dysarthria?
  • Speech generation
  • Prefer voice similar to their (old) voice
  • Preferably own voice
  • AAC
  • Manipulation
  • Speech recognition output channel
  • Pronunciation training
  • Speech recognition, analysis, feedback, etc.

16
Speech technology dysarthria ASR for
dysarthric speech
  • Questions
  • How well can dysarthric speech be recognized by a
    standard (non-dysarthric) speech recognizer?
  • Will the recognition results improve if we train
    the recognizer on speech of dysarthric speakers?

17
Experimental setup Speakers
  • Dysarthric 2 Dutch males, DYS1 DYS2
  • Reference 2 Dutch males, REF1 REF2
  • Total duration of the speech material (minutes)
  • DYS 2 speaks more slowly

DYS1 DYS2 REF1 REF2
8.5 min. 12.8 min. 9.1 min. 7.9 min.
18
Experimental setup Speech tasks
  • All four speakers read the same list of items,
    consisting of four different tasks
  • 1. NUM numbers 0-12 spoken in isolation
  • 2. PFU from Polyphone the 50 most Frequent
    Utterances
  • 3. PMS 130 Plomp-Mimpen Sentences (semantically
    unpredictable)
  • 4. PRS 10 Phonetically Rich Sentences

19
Experimental setup Speech tasks
  • Number of utterances words per task
  • The NUM and PRS task were both read three times.

NUM PFU PMS PRS
utt. 39 50 130 30
words 39 91 809 336
20
Experimental setup Speech recognizer
  • General specifications
  • Standard phone based recognizer
  • 37 context independent phones
  • 3-state HMMs
  • 14 cepstral coeffiecients deltas from Melbank
    freq 350-3400 Hz
  • 16ms Hamming window, 10 ms step

21
Experimental setup Experiments
  • Lexicon language model (uni- and bigram)
  • Based on all words in 4 tasks
  • Task specific same for all speakers
  • Perplexity

NUM PFU PMS PRS
13 15 8 2
22
Experimental setup Speaker Indep. Dependent
  • SI Speaker Independent training material
  • Polyphone (5000 speaker Dutch telephone
    database)
  • 4022 connected digit strings
  • 3702 polyphone most frequent items
  • 20,110 phonetically rich sentences
  • SD Speaker Dependent training material
  • Speakers own speech

23
Speaker Independent (SI) Results
Word Error Rates (WERs) for SI recognition
24
Speaker Independent (SI)Conclusions
  • REF better than DYS
  • DYS1 better than DYS2 in short utterances because
    of speaking rate (table 1)
  • Results DYS quite reasonable (especially for
    sentences) because of tight language model

25
Speaker Dependent (SD)
Models (also) trained on speech of
speakers Jackknife procedure
semi randomly selected test set rest
training set
26
Speaker Dependent (SD) Results
  • Word Error Rates (WERs) for the whole test set
  • for different number of Gaussians (2N)

2N 0 2 4 8 16 32 64
DYS1 14.3 12.0 9.5 9.7 10.3 11.7 15.1
DYS2 7.5 4.1 2.9 2.4 3.0 3.8 5.3
REF1 3.4 2.2 1.8 2.6 3.5 4.0 4.2
REF2 3.6 2.4 2.8 3.0 3.3 3.9 4.4
27
Speaker Dependent (SD) Results
Word Error Rates (WERs) for SD recognition
28
Speaker Dependent (SD) Results
  • Word Error Rates (WERs)
  • for SD / SI recognition

DYS1 DYS2 REF1 REF2
NUM 2.6 / 15.4 0.0 / 41.0 0.0 / 0.0 0.0 / 0.0
PFU 9.9 / 19.8 5.5 / 22.0 1.1 / 1.1 2.2 / 1.1
PMS 12.2 / 30.3 3.3 / 15.2 2.2 / 2.1 3.6 / 1.7
PRS 3.6 / 7.4 1.5 / 4.5 1.2 / 1.2 1.2 / 0.0
29
Speaker Dependent (SD)Conclusions
  • For REF results for SD equal or worse than for SI
    (counterbalance between own models, but less
    training material)
  • For DYS results for SD much better than for SI
  • DYS2 better than DYS1, almost as good as REF

30
ConclusionsASR for dysarthric speech
  • Results for DYS2 are remarkable
  • SI High WERs, esp. for NUM PFU
  • SD sometimes better than REF
  • Low speaking rate!
  • Automatic recognition of dysarthric speech is
    possible. Better results
  • Lower speaking rate
  • Speaker dependent models
  • Even better also speaker dependent lexicon

31
ConclusionsST pathology
  • Applications
  • Many already exist
  • Many more are possible
  • Needed
  • Tailor made, flexible applications
  • User tests, adequacy evaluation

32
References
  • http//lands.let.ru.nl/TSpublic/strik/pres/
  • p97-SPACE.ppt
  • E. Sanders, M. Ruiter, L. Beijer, H. Strik (2002)
    Automatic recognition of dutch dysarthric
    speech A pilot study. ICSLP-2002, Denver, USA,
    pp. 661-664.
  • T. Rietveld I. Stolte (2005)
  • Taal- en spraaktechnologie en communicatieve
    beperkingen

33
END
Write a Comment
User Comments (0)
About PowerShow.com