Development of the SPACE intelligibility assessment method - PowerPoint PPT Presentation

About This Presentation

Title:

Development of the SPACE intelligibility assessment method

Description:

Intelligibility = popular measure for pathological speech assessment ... 7 with dysphonia. 2 others. Pathological speakers : mean of 78,7 % Normals : mean of 93,3 ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 30

Provided by: fsto

Category:

more less

Transcript and Presenter's Notes

Title: Development of the SPACE intelligibility assessment method

1

Development of the SPACE intelligibility
assessment method
Catherine Middag, Gwen Van Nuffelen,
Jean-Pierre Martens, Marc De Bodt

2
Introduction

Intelligibility popular measure for
pathological speech assessment
Perceptual assessment affected by non-speech
information
familiarity of listener with speaker and type of
disorder
? hard to eliminate this subjective bias
guessing on the basis of linguistic context
? test material design must eliminate this
bias
Replacing the human listener by an automatic
speech recognizer (ASR) can solve the two
problems, but is the ASR sufficiently reliable?
test case automation of the Dutch
Intelligibility Assessment (DIA)

3
Dutch Intelligibility Assessment (DIA)

50 isolated (nonsense) words
intelligibility percent phonemes correct

4
How to apply ASR in the DIA?

Two approaches
let ASR recognize the words and count the
percentage of correct decisions
let ASR check how well on average the acoustics
support the phonetic transcription of the target
word (alignment)
Our experience
intelligibility emerging from first approach
insufficiently reliable
therefore we developed a system based on alignment

5
System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model
objective score
6
System architecture flow chart
Speech aligner

Two systems
complex state-of-the-art HMM-based system
(ASR-ESAT)
simple system with a phonological layer
(ASR-ELIS)
(point more directly to articulatory
problems)
Acoustic models trained on speech of normal
adult speakers

7
ASR - ESAT

Acoustic models
state-of-the-art Semi-Continuous HMM
triphone models trained on normal speech
states tied using decision trees phonological
questions
Output
each frame t assigned to state st
per frame st, P(stXt)?

8
ASR - ELIS
Xt

24 binary phonological features concerning
voicing
manner of articulation
place of articulation

target speech transcription
PLF extractor
P(S1Xt), , P(SnXt)
P(K1Xt), , P(K24Xt)
Probability product model
Viterbi decoder
st, P(stXt)?
P(K1Xt)..P(K24Xt)?
9
System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model

Three feature sets
Phonemic features (patient has trouble
pronouncing a certain phoneme)
Phonological features (patient has problems with
voicing, manner or place of articulation)
NEW context-dependent features (patient has
problems with a desired change of voicing, manner
or place of articulation)

objective score
10
Extraction of phonemic features (PMF)
Frame Phoneme P(stXt)
1 0.7
2 0.5
3 /p/ 0.4
4 /p/ 0.8
5 /o/ 0.6
6 /o/ 0.8
7 /l/ 0.6
8 0.3
Speech aligner ASR-ESAT
Phonemic features

(0.70.50.3) /3
/p/ (0.40.8) /2
/o/ (0.60.8) /2
/l/ 0.6

11
Extraction of phonological features (PLF)
Frame Phone voiced P(K1Xt) back P(K2Xt) burst P(K3Xt)
1 0.1 0.1 0.2
2 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 0.1 0.1 0.0
Speech aligner ASR-ELIS
Phonological features
Burst 0.6 Back (0.70.9)/2 Voiced
(0.80.60.5)/3
12
Extraction of phonological features (PLF)
Frame Phone voiced P(K1Xt) back P(K2Xt) burst P(K3Xt)
1 0.1 0.1 0.2
2 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 0.1 0.1 0.0
Speech aligner ASR-ELIS
Phonological features
Not burst (0.20.1 Not back
(0.10.1 Not voiced (0.10.1
13
Extraction of phonological features (PLF)
Frame Phone voiced P(K1Xt) back P(K2Xt) burst P(K3Xt)
1 0.1 0.1 0.2
2 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 0.1 0.1 0.0
Speech aligner ASR-ELIS
Phonological features
Irrelevant features for these phones
14
Extraction of context-dependent phonological
features (CD-PLF)

How well is change in PLF realized?
use PLF target in preceding/succeeding phone as
context
binary features ? two values for target
(present/absent)
binary features ? restricted number of left
right contexts
Left or right context can be
present, absent, not relevant, silence
Model selection (preliminary)
maximum 4 2 4 32 CD-PLFs per PLF
? 768 in total
select only those CD-PLFs occurring at least
twice in every test
? 123 in total

15
Extraction of context-dependent phonological
features (CD-PLF)
Segment Phone voiced burst
2 0.1 0.2
3 /pcl/ 0.2 0.2
4 /p/ 0.2 0.6
6 /o/ 0.6 0.1
7 /s/ 0.4 0.3
8 0.2 0.1
9 /m/ 0.7 0.3
10 /A/ 0.8 0.0
11 /l/ 0.6 0.1
12 0.1 0.1
Speech aligner ASR-ELIS
CD-PLF features
voicing burst
Off, on, off 0.6 Yes, no, no 0.1
On, on, on 0.8 No, no, no 0.0
16
System architecture flow chart
Speech aligner
speaker features
Intelligibility Prediction Model
objective score
17
Intelligibility prediction model (IPM)

Objective
map speaker features (PMF, PLF, CD-PLF or
combinations) to speaker intelligibility score
Model training
train on DIA recordings
pathological speakers ( some normal control
speakers)
Model type and size
limited number of pathological speakers
high number of features
? linear regression model
? feature selection

18
Reference material (DIA)

211 speakers
51 normals
60 dysarthric
12 clefts (children)
42 hearing impaired
37 with laryngectomy
7 with dysphonia
2 others
Pathological speakers mean of 78,7
Normals mean of 93,3
Few with very low score

19
Solving microphone issues

Two microphones were used.
Difference can be found in cepstral means (?
Cepstral mean subtraction was performed)

20
Training / validation

Models chosen with five-fold cross validation
Measure Standard deviation (STD) in case of
normality, 67 of the computed score lie in an
interval of STD around the perceptual score
More features more chance of overfitting
Rule of thumb take 1 feature for every 10
training examples
? Restrict number of features to maximum 15

21
Results individual systems
PMFelis 9.52
PMFesat 8.57
22
Results individual systems
PLF (elis) 9.35
CD-PLF (elis) 8.48
23
Results all systems

New models with CD-PLF outperform old PLF models
CD-PLFs form best system with one feature set
PMFesat CD-PLF best system with combined
feature sets
Using three ELIS feature sets yields next best
result and needs only one recognizer (the
simplest one)
? less complex system

Model STD N
PMFesat 8.57 15
PMFelis 9.52 15
PLF 9.35 15
CD-PLF 8.48 15
PMFelis PLF 8.20 15
PMFesat PLF 8.00 13
PMFelis CD-PLF 7.63 15
PLF CD-PLF 8.04 15
PMFesat CD-PLF 7.34 15
PMFelis PLF CD-PLF 7.48 15
24
Results combined system

CD-PLF PMFesat
STD 7.34

25
Results pathology-specific IPM

Instead of creating one general IPM, one can
create IPMs for specific pathologies
trained on all speakers (to have enough speakers)
model selection based on performance on speakers
of that pathology (importance of features depends
on type of disorder)

26
Results pathology-specific IPM (2)
Model DYS LAR HEAR
PMFesat 8.44 8.32 7.48
PMFelis 8.10 5.88 9.73
PLF 8.27 7.17 8.05
CD-PLF 6.49 5.70 6.87
PMFelis PLF 6.97 5.14 6.63
PMFesat PLF 6.87 6.49 6.20
PMFelis CD-PLF 6.50 3.54 6.05
PLF CD-PLF 6.32 5.82 6.17
PMFesat CD-PLF 6.69 4.86 5.27
PMFelis PLF CD-PLF 6.32 3.68 5.73

Very good match in case CD-PLFs are involved
New models with CD-PLF outperform old PLF models
CD-PLFs form best system with one feature set
Using three ELIS feature sets yields (almost)
best result and needs only one recognizer (the
simplest one)
? less complex system

27
Results pathology-specific IPM

Dysarthria 6.32 (red circles)
Dispersion of other speakers is increased
Largest deviations in low intelligibility area
scarce data in that area
can be solved by adding more weight to patients
with very low intelligibility

28
Conclusions and future work

PMF, PLF and CD-PLF can predict intelligibility
of pathological speech
CD-PLFs seem to play an important role
STD 7.34 for general model combining CD-PLF and
PMFesat
STDs less than 6.32 for pathology specific model
using 3 elis feature sets
? not the articulation pattern but the change in
the articulation pattern matters?
More research is needed before adding this
feature set to the tool
Results on validation set compete with human
inter-rater agreements.
Future work
more profound articulatory assessment, which is
directly related to determination of appropriate
therapy
monitoring of effectiveness of chosen therapy
using more natural speech (words, phrases) in
tests