Title: Objective intelligibility assessment of pathological speakers
1- Objective intelligibility assessment of
pathological speakers - Catherine Middag, Gwen Van Nuffelen,
- Jean-Pierre Martens, Marc De Bodt
2Introduction
- Intelligibility popular measure for
pathological speech assessment - Perceptual assessment affected by non-speech
information - familiarity with speaker and type of disorder
- usage of linguistic context
- Word intelligibility tests designed to eliminate
bias due to linguistic context - Replacing the human listener by an automatic
speech recognizer (ASR) can solve the other
problems, but is the ASR sufficiently reliable? - test case automation of the Dutch
Intelligibility Assessment (DIA)
3Dutch Intelligibility Assessment (DIA)
- 50 isolated CVC words
- intelligibility percent phonemes correct
4How to apply ASR in the DIA?
- Two approaches
- let ASR recognize the words and count the
percentage of correct decisions - let ASR check how well the acoustics match with
the phonetic transcription of the target word
(alignment) - Our experience
- intelligibility emerging from first approach
insufficiently reliable - therefore we developed a system based on alignment
5System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model
objective score
6System architecture flow chart
Speech aligner
- Two systems
- complex state-of-the-art HMM-based system
(ASR-ESAT) - simple system with phonological layer
(ASR-ELIS) - (point more directly to articulatory
problems)
7System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model
- Two feature sets
- Phonemic features (patient has trouble
pronouncing a certain phoneme) - Phonological features (patient has problems with
voicing, manner or place of articulation)
objective score
8Extraction of phonemic features (PMF)
Speech aligner ASR-ESAT
Phonemic features
- (0.70.50.3) /3
- /p/ (0.40.8) /2
- /o/ (0.60.8) /2
- /l/ 0.6
9Extraction of phonological features (PLF)
Speech aligner ASR-ELIS
Phonological features
Burst 0.6 Back (0.70.9)/2 Voiced
(0.80.60.5)/3
10Extraction of phonological features (PLF)
Speech aligner ASR-ELIS
Phonological features
Not burst (0.20.1 Not back
(0.10.1 Not voiced (0.10.1
11Extraction of phonological features (PLF)
Speech aligner ASR-ELIS
Phonological features
Irrelevant features for these phones
12System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model
objective score
13Intelligibility prediction model (IPM)
- Objective
- map speaker features (PMF, PLF or
combinations) to speaker intelligibility score - Model training
- train on DIA recordings
- pathological speakers ( some normal control
speakers) - Model type and size
- limited number of pathological speakers
- high number of features
- ? linear regression model
- ? feature selection
14Reference material (DIA)
- 211 speakers
- 51 normals
- 60 dysarthric
- 12 clefts
- 42 hearing impaired
- 37 with laryngectomy
- 7 with dysphonia
- 2 others
- Pathological speakers mean of 78,7
- Normals mean of 93,3
- Few with very low score
15Results individual systems
- Based on five-fold cross validation
- Measure Pearson Correlation Coefficient (PCC)
ELIS PLF PCC 0.78
ESAT PMF PCC 0.80
16Results combined system
17Results pathology-specific IPM
- Instead of creating one general IPM, one can
create IPMs for specific pathologies - still trained on all speakers (enough speakers)
- model selection based on performance of speakers
of that pathology (importance of features depends
on type of disorder)
18Results pathology-specific IPM
- Dysarthria 0.94 (red circles)
- Dispersion of other speakers is increased
- Largest deviations in low intelligibility area
- scarce data in that area
- can be solved by adding more weight to patients
with very low intelligibility
19Development of DIA-tool
- PMF and PLF can predict intelligibility of
pathological speech - Combining PMF and PLF yields high PCCs
- 0.86 for general model
- over 0.91 for pathology specific model
- PCCs for specific pathologies compete with
subjective inter-rater agreements (0.91) - This opens up possibilities for development of an
automated version of the DIA (see demonstration
later) based on PLF PMF
20New feature set Context-dependent phonological
features (CD-PLF)
- Until now
- PMF Does the patient have trouble pronouncing a
certain phoneme? - PLF Does the patient have problems with
voicing, manner or place of articulation - New Does the patient have problems with a
desired change of voicing, manner or place of
articulation? - ? CD-PLFs how well is change in PLF realized?
21Extraction of context-dependent phonological
features (CD-PLF)
Speech aligner ASR-ELIS
CD-PLF features
22Results for CD-PLF
- CD-PLFs alone compete with previous best PLFPMF
0.86 - CD-PLFPMF 0.90 ? new best!
- Pathology-specific results for CD-PLFPMF
23Conclusions and future work
- PMF, PLF and CD-PLF can predict intelligibility
of pathological speech - CD-PLFs seem to play an important role
- CD-PLF PCC 0.87
- CD-PLF PMF PCC0.90
- ? not the articulation pattern but the change in
the articulation pattern matters? - More research is needed before adding this
feature set to the tool - High PCCs open up new possibilities for
- more profound articulatory assessment, which is
directly related to determination of appropriate
therapy - monitoring of effectiveness of chosen therapy ?
tool - using more natural speech (words, phrases) in
tests
24