Title: SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT
1SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT
Institute of Informatics of the Slovak Academy of
Sciences
Rusko Milan
- Institute of Informatics of the Slovak Academy of
Sciences - Dubravska cesta 9, 847 05 Bratislava, Slovakia
- Milan.Rusko_at_savba.sk
2Expressive speech
- Expressive speech designates the whole vocal
display of a speaker. - It consists
- Linguistic information part of information that
can be encoded in general written text message - Various additional information on the speaker
- age, cultural background, education, sex,
attempt, relation to the listener, individuality
etc. -
- (The expression individuality is used here to
denote personality, mood (attitude) and emotions
of a speaker.)
3Expressive speech
LISTENER (receiver)
Linguistic information L2
Age A2
Cultural background C2
Education E2
Sex S2
Attempt AT2
Relation to the listenener R2
Individuality - personality - mood - emotions I2
Other Y1 ... Yk
SPEAKER (transmitter)
Linguistic information L1
Age A1
Cultural background C1
Education E1
Sex S1
Attempt AT1
Relation to the listenener R1
Individuality - personality - mood - emotions I1
Other X1 ... Xi
D
E
C
O
D
I
N
G
Expresion gt
C
O
D
I
N
G
- - gt SPEECH - - gt
gt Impression
4Personality (and temperament)
Personality is considered to be a set of constant
features of an individual. Temperament is that
aspect of personality that is genetically based,
inborn.
- Ancient Greeks 2 dimensions of temperament gt 4
types of temperament - sanguine type (cheerful and optimistic, pleasant
to be with) - choleric type (quick, hot temper, often an
aggressive nature) - phlegmatic type (characterized by slowness,
laziness, and dullness) - melancholy type (sad, even depressed,
pessimistic view of world)
.
5Generalized model of personality
personality p have n dimensions, and so it can be
represented by a following vector (Egges, A.,
Kshirsagar, S., Magnenat-Thalmann, N. 2
.
6The OCEAN modelThe Big Five model of
personality
- Five dimensions are enough to express the
personality. - The Big Five model also known as OCEAN model
takes into account the following five dimensions
of personality - Openness
- Consciousness
- Extraversion
- Agreeableness
- Neuroticism
- (Digman, J. M 3, McRae, R.R. John, O.P. 4)
.
7Traditional psychological classification of
personality dimensions Five Factor Model
Digman 1990, Mc.Rae, John 1992
8Mood and Emotion
Mood (attitude) can be defined as a rather static
state of being, that is less static than
personality and less fluent than emotions. Mood
can be defined as one-dimensional (e.g. good or
bad mood) or perhaps multi-dimensional (feeling
in love, being paranoid etc.) (KsirsagarMagnenat
-Thalmann5)
9Generalized model of emotion
An emotional state has a similar structure as
personality, but it changes over time. Defined
as an m-dimensional vector, where all m emotion
intensities are represented by a value in the
interval 0,1 .
The actual emotional state is dependent on the
preliminary evolvement of emotins. A need to
model the emotins respecting their previous
trends (history). An emotional state history ?t
is defined, that contains all emotional states
until et, thus
10Generalized model of mood
Egges continues with defining the individual ITas
a triple (p, mt, et), where mt represents the
mood of the individual at a time t. Mood
dimension is defined as a value in the interval
-1,1. k mood dimensions gt the mood can be
described as follows
The mood and emotional values are changing in
time gt Both have to be updated regularly.
11Basic emotions
There are many theories of emotions and many
different classifications exist. This table,
taken from Ortony, A., Turner, T. J. 6 gives a
short overview of basic emotion sets used by
different authors.
12Placement on emotion dimensions
Pleasure Happy ltgt Unhappy Pleased
ltgtAnnoyed Satisfied ltgtUnsatisfied Cont
ented ltgtMelancholic Hopeful
ltgtDespairing Relaxed ltgt
Bored Arousal Stimulated ltgt Relaxed Excited
ltgtCalm Frenzied ltgt Sluggish Jittery
ltgt Dull Wide-awake ltgtSleepy Aroused
ltgtUnaroused Dominance Controlling ltgt
Controlled Influential ltgtInfluenced In
control ltgt Cared-for Important ltgt
Awed Dominant ltgtSubmissive Autonomous
ltgt Guided
Semantic differential scales are often used for
measuring emotion dimensions. A Set of
dimensions as proposed by Mehrabian Russell
(1974, Appendix B, p. 216)7. It is evident
that the authors have included moods and
personality dimensions in this system too.
13Acoustic correlates of emotions
Problem speech parameters involved in expression
of personality, moods and emotions are shared for
all the components of expressivity. Decoding the
expressive speech code is very subjective.
Nevertheless, a general set of the speech
parameters responsible for the expression of
emotion can be constructed. There are three main
categories of speech correlates of
emotion Pitch contour Timing Voice
quality It is believed that value combinations of
these speech parameters are used to express vocal
emotion.(Schröder M.8)
14Pitch contour
Pitch contour is a representation of the
intonation of an utterance, which describes the
nature of accents and the overall pitch range of
the utterance. Pitch is expressed as fundamental
frequency (F0). One of the most frequently used
methods for F0 measurement is the method using
autocorrelation function of the LP
residual. Parameters include average pitch, pitch
range, contour slope, and final lowering.
15Intonation contour
- Models of intonation - two main categories
- Phonetic
- Phonological
- The phonetic models (e.g. Fujisaki model, Tilt
model, MOMEL and many others) model the
intonation curve. -
- The phonological model (e.g. ToBI) is used to
model the speaker's concept of distribution of
accents in the intonational phrase.
16Automatic intonation contour analysis in Fujisaki
editor
17Pitch contour analysis in PRAAT with ToBI labels
18Timing
- Timing
- Speed that an utterance is spoken
- Rhythm
- Duration of emphasized syllables
- The results of measurement of syllable and
phoneme lengths are often given in a form of
z-scores - (the instantaneous value is normalized be the
mean value of the same elements in the whole
database. - Parameters speech rate, hesitation pauses,
exaggeration...
19Voice quality
- Voice quality denotes the overall character of
the voice, which includes effects such as
whispering, hoarseness, breathiness, and
intensity. - The voice quality is influenced mainly by
- function of glottis
- function of the vocal tract
- A detailed classification scheme was published by
Laver 9.
20(No Transcript)
21Analysis of the glottal function
- The analysis of the glottal function is generally
done using source-filter model of speech
production 10. - The glottal function is obtained from the speech
signal by inverse filtering. One of the most
efficient inverse filtering methods uses Discrete
Linear Prediction DLP (El-Jaroudi A., Makhoul
J., 11) - to obtain the inverse filter coefficients and to
filter the speech signal. - The resultant DLP residual function is considered
as a representative of a derivative of glottal
volume velocity function.
22Time and spectral domain characteristics of the
glottal function
- Time characteristics
- OQ, Open Quotient ratio of the open phase of
the glottal waveform to the period of the pulse. - OQ predicts the values for the amplitudes of the
lower harmonics. (increased value of OQ is
correlated with an increase in the amplitude of
the lower harmonics in the voice spectrum.) - CQ, Closing Quotient ratio of the closing phase
of the glottal pulse to the period of the pulse. - These characteristics has been recently often
replaced by AQ Amplitude quotient and
NAQ-Normalized amplitude quotient (Alku 12). - EE, Excitation Strength amplitude of the
negative peak, calculated after the positive
peak. EE is correlated with the overall intensity
of the signal. A decrease in EE is correlated
with a breathy voice. - RK, Glottal Symmetry/Skew ratio of the closing
phase to the opening phase of the differentiated
glottal pulse. RK affects mainly the lower
harmonics the more symmetrical the pulse, the
greater their amplitude. - Spectral characteristics
- H1-H2 the amplitude of the first harmonic (H1)
compared to the amplitude of the second harmonic
(H2). An indicator of the relative length of the
opening phase of the glottal pulse (Hanson 1997).
- H1-A1 the amplitude of the first harmonic (H1)
compared to the strongest harmonic in the first
formant (A1). Reflects the first formant
bandwidth - spectral tilt - Expected to be large and positive
for breathy voices and small and/or negative for
creaky voices - H1-A2 the amplitude of the first harmonic (H1)
compared to the amplitude of the strongest
harmonic in the second formant (A2). An indicator
of spectral tilt at the mid formant frequencies.
Large and positive for breathy voices and small
and/or negative for creaky voices.
23Glottal pulse analysis in APARAT
24Analysis of the vocal tract
- Methods of vocal tract shape estimation include
x-ray, computer tomography and magnetic resonance
methods. - stationary sound production only
- .Cheaper and quicker method computing of the
vocal tract shape from the speech signal - complementary to glottal pulse analysis from the
speech signal. (e.g. vocal tract shape
computation from LPC derived reflection
coefficients). - - allows for analysis of the dynamic behavior of
the articulators. Similar information can be
obtained by formant analysis using homomorphic
deconvolution (cepstrum) or LPC spectrum analysis.
25Static analysis by synthesis using articulatory
synthesizer
(TRACTSYN)
26Dynamic analysis by synthesis (articulatory
synth. TRACTSYN)
27Acoustic correlates of emotions applied in speech
synthesis
28Vision Speech Sound Mining
- Aim to extract information from supra-segmental
and extra-linguistic layers - Where to look for information
- time domain a) quantity (lengths of segments)
- b) rhythm
- frequency domain
- a) long term characteristics
- b) short term characteristics
- model based characteristics
- a) glottal excitation function b)
articulatory model
29Vision Speech Sound Mining
- How to define a set of speech sound objects?
- Objective methods of analysis (pattern
recognition) - Subjective methods (impression of the listener)
- Possible objects
- Speech sound event
- Speech sound act
- Speech sound gesture
- Speech sound characteristic
- Speech sound characteristic change
30Vision Speech Sound Mining
- First steps to be accomplished
- Speech corpus building
- Annotation of SSO
- Boundary markers
- Frequencies of occurence of SSO
- Concordances of SSO
- Correlation among different sets of objects
(pitch SSO, accent SSO, rhythmic SSO, timbre SSO,
etc.) - Semantic representation of SSO
- Cross cultural semantic analysis
31Vision Speech Sound Mining
- Traditional methods used in NLP and
- data mining will be applicable
- Bag of words ? Bag of SSO
- WordNet ? SSO semantic net
- e.t.c.
- Research on the relation between lingvistic and
paralingvisticextralingvistic information. - Creation of a complex (holistic) model of the
speech signal as an information carrier in
communication.
32Thank you for your attention
Milan Rusko Institute of Informatics Slovak
Academy of Sciences Milan.Rusko_at_savba.sk