Title: Building a corpus of pathological speech
1Building a corpus of pathological speech
Catherine Middag Jean-Pierre Martens
Gwen Van Nuffelen Marc De Bodt
2Dutch Corpus of Pathological and Normal Speech
Speakers N
Normal (N) 119
Dysarthria (D) 102
Hearing impairment (H) 47
Laryngectomy (L) 45
Cleft (C) 39
Articulation disorders (A) 16
Voice disorder (VD) 8
Glossectomy (G) 1
Total 377
disturbed muscular control due to damage of the
nervous system ? weak, slow, imprecise,
uncoordinated movements
3Dutch Corpus of Pathological and Normal Speech
Speakers N
Normal (N) 119
Dysarthria (D) 102
Hearing impairment (H) 47
Laryngectomy (L) 45
Cleft (C) 39
Articulation disorders (A) 16
Voice disorder (VD) 8
Glossectomy (G) 1
Total 377
TL surgical removal of the larynx and separation
of the trachea from the mouth, nose, and
esophagus TE, E, electro larynx (servox) PL
partial removal of laryngeal structures, vocal
folds
4Dutch Corpus of Pathological and Normal Speech
Speakers N
Normal (N) 119
Dysarthria (D) 102
Hearing impairment (H) 47
Laryngectomy (L) 45
Cleft (C) 39
Articulation disorders (A) 16
Voice disorder (VD) 8
Glossectomy (G) 1
Total 377
5Speakers
- native speakers of Dutch
- adequate language, cognitive, visual and hearing
abilities
6Recordings
- Natural, quiet environment clinical setting
- No sound treated box
- Mini-disc (Sony, MZ-R700)
- Microphone
- Sony (mouth-microphone distance 30 cm)
- Shure head set
- Transferred to a notebook ? wave file (mono,
44kHz) - 16 kHz
7Type of samples
Sample N
Dutch Intelligibility Assessment 357
Articulation assessment 21
Sentences 211
Text 172
Text Marloes 221
Spontaneous speech 39
Semi spontaneous speech 136
Sustained vowel 216
Diadochokinetic rate 214
Formant transition 212
8Dutch Intelligibility Assessment (DIA)
- Intelligibility at phoneme level
- 50 consonant vowel consonant words
- 3 subtests
- A initial consonants (19 words)
- B final consonants (15 words)
- C medial vowels/ diphthongs (16 words)
- Balanced mix of existing and non-existing (well
pronounceable) words - Large pool of test items 25 lists/ subtest ?
252525 different tests
9DIA
lijst A3 1. vop 2. ziep 3. fuis 4. deek 5. koen 6. hom 7. dar 8. paam 9. mil 10. boos 11. son 12. geur 13. nee 14. taf 15. oes 16. loon 17. ruk 18. joef 19. wout lijst B22 1. geen 2. diem 3. zoem 4. daai 5. jog 6. peef 7. zaar 8. paat 9. tik 10. vang 11. boop 12. lieuw 13. roos 14. toe 15. riel lijst C11 1. gul 2. zuut 3. det 4. wok 5. waan 6. heun 7. nout 8. vees 9. meul 10. wiel 11. sas 12. tuik 13. oet 14. rood 15. min 16. deil
16 year-old girl, stroke, dysarthria, PI 40
79 year-old male, TL, TE-speech, PI 68
10DIA
List A10
Intelligibility percentage of phonemes correctly
understood
11DIA
12Annotations DIA
- Praat
- 2 tiers
- Tier 1 target word
- Tier 2 fixed frame perceived phoneme
- . VC
- CV.
- C.C
- Orthographic transcriptions
13List A Target phoneme initial consonant Fixed
frame . V C
14Articulation assessment
- Children
- Insufficient reading skills
- Logo-Art (Baarda et al, 2001)
- Picture naming test
- Annotations
- Orthographic
- Tier 1 target
- Tier 2 perceived utterance (no fixed frame)
15Sentences
- Motor Speech Profile (Kay Elemetrics)
- Wil je liever de thee of de borrel ?
- Na nieuwjaar was hij weeral hier
- N 211
- Orthographic transcriptions
- Tier 1 tier 2 no word boundaries
man, no speech pathology
18 year-old male, congenital dysarthria
16Text Marloes and Text
- Text Papa en Marloes
- standardized text
- balanced representation of Dutch phonemes
- often used in clinical practice
- Text
- different texts with the same reading level
- orthographic transcriptions
- 2 tiers
- boundaries between sentences
17(Semi) Spontaneous speech
- Spontaneous
- Semi spontaneous randomly selected sequence of
pictures - No annotations available
18Future
- Gradually increase number samples
- DIA ? validation SPACE intelligibility assessment
- DIA sentence level gt 200 control speakers 36
sentences annotations pathological samples
19