AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY presentation

About This Presentation

Transcript and Presenter's Notes

Title: AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

1
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

R.J.J.H. van Son, Barbertje M. Streefkerk, and
Louis C.W. Pols

Institute of Phonetic Sciences / ACLC University
of Amsterdam, Herengracht 338, 1016 CG Amsterdam,
The Netherlandstel 31 20 5252183 fax 31 20
5252197 email Rob.van.Son_at_hum.uva.nl ICSLP2000,
Beijing, China, Oct. 20, 2000
2
INTRODUCTION

Speech is "efficient" Important components are
emphasized
Less important ones are de-emphasized
Two mechanisms
1) Prosody Lexical Stress and Sentence Accent
(Prominence)
2) Predictability Frequency of Occurrence
(tested) and
Context (not tested)

3
MECHANISMS FOR EFFICIENT SPEECH

Speech emphasis should mirror importance
which largely corresponds to unpredictability
Prosodic structure distributes emphasis according
to importance (lexical stress, sentence accent /
prominence)
Speakers can (de-)emphasize according to supposed
(un)importance
Speech production mechanisms can facilitate
redundant speech or hamper unpredictable speech

4
QUESTIONS

Can the distribution of emphasis or reduction be
completely explained from Prosody? (Lexical
stress
and Sentence Accent / Prominence)
If not, can we identify a speech production
mechanism that would assist efficiency in speech?
e.g. preprogrammed articulation of redundant and
/ or high-frequent syllable-like segments?

5
SPEECH MATERIAL (DUTCH)

Single Male Speaker Vowels and Consonants
Matched Informal and Read speech, 791 matched VCV
pairs
Polyphone Vowels only 273 speakers (out of
5000), telephone speech, 1244 read sentences
Segmented with a modified HMM recognizer (Xue
Wang)

Corpora sizes Number of realizations of vowels
and consonants

Unstressed Stressed Total
Corpus ? Accent ?
Single consonants 550 180 569
283 1582 Speaker vowels 812 461 528
224 2025 Polyphone vowels
4435 4942 9603 3516 22496

Accent Sentence accent / Prominence
Stressed/Unstressed Lexical stress

6
METHODS SPEECH PREPARATION

Single speaker corpus
All 2 x 791 VCV segments hand-labeled
Also sentence accent determined by hand
22 Native listeners identified consonants from
this corpus
Polyphone corpus
Automatically labeled using a pronunciation
lexicon and a modified HMM recognizer
10 Judges marked prominent words (prominence
1-10)
Word and Syllable -log2(Frequencies) for both
corpora were determined from Dutch CELEX

7
METHODS ANALYSISSingle Speaker
CorpusConsonants and Vowels

Duration in ms (vowels and consonants)
Contrast (vowels only) F1 / F2 distance to (300,
1450) Hz in semitones
Spectral Center of Gravity (CoG) (V and
C)Weighted mean frequency in semitones at point
of maximum energy
Log2(Perplexity) from consonant identification
Calculated from confusion matrices

8
METHODS ANALYSISPolyphone Corpus Vowels only

Loudness
in sone
Spectral Center of Gravity (CoG)
Weighted mean frequency in semitones averaged
over the segment
Prominence (1-10)The number of 'PROMINENT'
listener judgements0 5 is considered
Unaccented6 10 is considered Accented

9
CONSISTENCY OF MEASUREMENTS Correlation
coefficients between factors

G
Single Speaker
E
S
A
2
C
Polyphone
Filled symbols Plt0.01

Duration in ms Loudness in sones
CoG Spectral Center of Gravity (semitones)
Px log2(Perplexity) plotted is R
Contrast F1/ F2 distance to (300, 1450) Hz
(semitones)

10
CONSONANT REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Single speaker corpus (n1582)
G
E
A
Filled symbols Plt0.01

CoG Spectral Center of Gravity (semitones)
Perplexity log2(Perplexity), plotted is R.
Syllable and word frequencies were correlated
(R0.230, p0.01)

11
VOWEL REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Single speaker corpus (n2025)
Filled symbols Plt0.01

Duration in ms
Contrast F1/ F2 distance to (300, 1450) Hz
(semitones)
CoG Spectral Center of Gravity (semitones)
Syllable and word frequencies were correlated
(R0.280, plt0.01)

12
DISCUSSION OF SINGLE SPEAKER DATA

There are consistent correlations between
frequency of occurrence and acoustic reduction
(duration, CoG and contrast), but not for
consonant identification (perplexity)
Correlations for syllable frequencies tend to be
larger than those for word frequencies (p?0.01)
Correlations were found after accounting for
Phoneme identity, Lexical Stress and Sentence
Accent

13
PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY
OF OCCURRENCE (correlation coefficients)
Polyphone corpus (n22496)
G
Loudness
E
CoG
C
Syllable freq.
A
Word freq.
Filled plt0.01
Filled symbols Plt0.01

Loudness (sone)
CoG Spectral Center of Gravity (semitones)
Syllable and word frequencies (-log2(freq))

14
VOWEL REDUCTION VERSUS FREQUENCY OF
OCCURRENCE (correlation coefficients)
Polyphone corpus (n22496)
Filled symbols Plt0.01
Accent Prom gt 5 Prom lt 5

Loudness (sone)
CoG Spectral Center of Gravity (semitones)
Syllable and word frequencies were correlated
(R0.316, plt0.01)

15
DISCUSSION OF POLYPHONE DATA

Perceived prominence correlates with acoustic
vowel reduction (loudness, CoG) and frequency of
occurrence (syllable and word)
There are small but consistent correlations
between acoustic vowel reduction and frequency
of occurrence
Correlations were found after accounting for
Vowel identity, Lexical Stress and Prominence

16
CONCLUSIONS

LEXICAL STRESS and
SENTENCE ACCENT / PROMINENCE cannot explain all
of the efficiency of speech FREQUENCY OF
OCCURRENCE and possibly CONTEXT in general are
needed for a full account
A SYLLABARY which speeds up (and reduces) the
articulation of stored, high-frequency,
syllables with respect to computed, rare,
syllables might explain at least part of our data

17
SPOKEN LANGUAGE CORPUSHow Efficient is Speech

8-10 speakers 60 minutes of speech each
(fixed and variable materials)
Informal story telling and retold stories 15
min
Reading continuous texts 15 min
Reading Isolated (Pseudo-) sentences 20 min
Word lists 5 min
Syllable lists 5 min

18
MEASURINGSPEECH EFFICIENCY

Speaking Style differences
(Informal, Retold, Read, Sentences, Lists)
Predictability
Frequency of Occurrence (words and syllables)
In Context (language models)
Cloze-tests
Shadowing (RT or delay)
Acoustic Reduction
Segment identification
Duration
Spectral reduction

Write a Comment

User Comments (0)

About PowerShow.com

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY PowerPoint PPT Presentation