SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT - PowerPoint PPT Presentation

About This Presentation

Title:

SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT

Description:

... is a representation of the intonation of an utterance, which describes the ... Automatic intonation contour analysis in Fujisaki editor. WIKT 2006 ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 33

Provided by: syha1

Category:

more less

Transcript and Presenter's Notes

Title: SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT

1
SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT
Institute of Informatics of the Slovak Academy of
Sciences
Rusko Milan

Institute of Informatics of the Slovak Academy of
Sciences
Dubravska cesta 9, 847 05 Bratislava, Slovakia
Milan.Rusko_at_savba.sk

2
Expressive speech

Expressive speech designates the whole vocal
display of a speaker.
It consists
Linguistic information part of information that
can be encoded in general written text message
Various additional information on the speaker
age, cultural background, education, sex,
attempt, relation to the listener, individuality
etc.
(The expression individuality is used here to
denote personality, mood (attitude) and emotions
of a speaker.)

3
Expressive speech
LISTENER (receiver)
Linguistic information L2
Age A2
Cultural background C2
Education E2
Sex S2
Attempt AT2
Relation to the listenener R2
Individuality - personality - mood - emotions I2
Other Y1 ... Yk
SPEAKER (transmitter)
Linguistic information L1
Age A1
Cultural background C1
Education E1
Sex S1
Attempt AT1
Relation to the listenener R1
Individuality - personality - mood - emotions I1
Other X1 ... Xi
D
E
C
O
D
I
N
G
Expresion gt
C
O
D
I
N
G
- - gt SPEECH - - gt
gt Impression
4
Personality (and temperament)
Personality is considered to be a set of constant
features of an individual. Temperament is that
aspect of personality that is genetically based,
inborn.

Ancient Greeks 2 dimensions of temperament gt 4
types of temperament
sanguine type (cheerful and optimistic, pleasant
to be with)
choleric type (quick, hot temper, often an
aggressive nature)
phlegmatic type (characterized by slowness,
laziness, and dullness)
melancholy type (sad, even depressed,
pessimistic view of world)

.
5
Generalized model of personality
personality p have n dimensions, and so it can be
represented by a following vector (Egges, A.,
Kshirsagar, S., Magnenat-Thalmann, N. 2
.
6
The OCEAN modelThe Big Five model of
personality

Five dimensions are enough to express the
personality.
The Big Five model also known as OCEAN model
takes into account the following five dimensions
of personality
Openness
Consciousness
Extraversion
Agreeableness
Neuroticism
(Digman, J. M 3, McRae, R.R. John, O.P. 4)

.
7
Traditional psychological classification of
personality dimensions Five Factor Model
Digman 1990, Mc.Rae, John 1992
8
Mood and Emotion
Mood (attitude) can be defined as a rather static
state of being, that is less static than
personality and less fluent than emotions. Mood
can be defined as one-dimensional (e.g. good or
bad mood) or perhaps multi-dimensional (feeling
in love, being paranoid etc.) (KsirsagarMagnenat
-Thalmann5)
9
Generalized model of emotion
An emotional state has a similar structure as
personality, but it changes over time. Defined
as an m-dimensional vector, where all m emotion
intensities are represented by a value in the
interval 0,1 .
The actual emotional state is dependent on the
preliminary evolvement of emotins. A need to
model the emotins respecting their previous
trends (history). An emotional state history ?t
is defined, that contains all emotional states
until et, thus
10
Generalized model of mood
Egges continues with defining the individual ITas
a triple (p, mt, et), where mt represents the
mood of the individual at a time t. Mood
dimension is defined as a value in the interval
-1,1. k mood dimensions gt the mood can be
described as follows
The mood and emotional values are changing in
time gt Both have to be updated regularly.
11
Basic emotions
There are many theories of emotions and many
different classifications exist. This table,
taken from Ortony, A., Turner, T. J. 6 gives a
short overview of basic emotion sets used by
different authors.
12
Placement on emotion dimensions
Pleasure Happy ltgt Unhappy Pleased
ltgtAnnoyed Satisfied ltgtUnsatisfied Cont
ented ltgtMelancholic Hopeful
ltgtDespairing Relaxed ltgt
Bored Arousal Stimulated ltgt Relaxed Excited
ltgtCalm Frenzied ltgt Sluggish Jittery
ltgt Dull Wide-awake ltgtSleepy Aroused
ltgtUnaroused Dominance Controlling ltgt
Controlled Influential ltgtInfluenced In
control ltgt Cared-for Important ltgt
Awed Dominant ltgtSubmissive Autonomous
ltgt Guided
Semantic differential scales are often used for
measuring emotion dimensions. A Set of
dimensions as proposed by Mehrabian Russell
(1974, Appendix B, p. 216)7. It is evident
that the authors have included moods and
personality dimensions in this system too.
13
Acoustic correlates of emotions
Problem speech parameters involved in expression
of personality, moods and emotions are shared for
all the components of expressivity. Decoding the
expressive speech code is very subjective.
Nevertheless, a general set of the speech
parameters responsible for the expression of
emotion can be constructed. There are three main
categories of speech correlates of
emotion Pitch contour Timing Voice
quality It is believed that value combinations of
these speech parameters are used to express vocal
emotion.(Schröder M.8)
14
Pitch contour
Pitch contour is a representation of the
intonation of an utterance, which describes the
nature of accents and the overall pitch range of
the utterance. Pitch is expressed as fundamental
frequency (F0). One of the most frequently used
methods for F0 measurement is the method using
autocorrelation function of the LP
residual. Parameters include average pitch, pitch
range, contour slope, and final lowering.
15
Intonation contour

Models of intonation - two main categories
Phonetic
Phonological
The phonetic models (e.g. Fujisaki model, Tilt
model, MOMEL and many others) model the
intonation curve.
The phonological model (e.g. ToBI) is used to
model the speaker's concept of distribution of
accents in the intonational phrase.

16
Automatic intonation contour analysis in Fujisaki
editor
17
Pitch contour analysis in PRAAT with ToBI labels
18
Timing

Timing
Speed that an utterance is spoken
Rhythm
Duration of emphasized syllables
The results of measurement of syllable and
phoneme lengths are often given in a form of
z-scores
(the instantaneous value is normalized be the
mean value of the same elements in the whole
database.
Parameters speech rate, hesitation pauses,
exaggeration...

19
Voice quality

Voice quality denotes the overall character of
the voice, which includes effects such as
whispering, hoarseness, breathiness, and
intensity.
The voice quality is influenced mainly by
function of glottis
function of the vocal tract
A detailed classification scheme was published by
Laver 9.

20
(No Transcript)
21
Analysis of the glottal function

The analysis of the glottal function is generally
done using source-filter model of speech
production 10.
The glottal function is obtained from the speech
signal by inverse filtering. One of the most
efficient inverse filtering methods uses Discrete
Linear Prediction DLP (El-Jaroudi A., Makhoul
J., 11)
to obtain the inverse filter coefficients and to
filter the speech signal.
The resultant DLP residual function is considered
as a representative of a derivative of glottal
volume velocity function.

22
Time and spectral domain characteristics of the
glottal function

Time characteristics
OQ, Open Quotient ratio of the open phase of
the glottal waveform to the period of the pulse.
OQ predicts the values for the amplitudes of the
lower harmonics. (increased value of OQ is
correlated with an increase in the amplitude of
the lower harmonics in the voice spectrum.)
CQ, Closing Quotient ratio of the closing phase
of the glottal pulse to the period of the pulse.
These characteristics has been recently often
replaced by AQ Amplitude quotient and
NAQ-Normalized amplitude quotient (Alku 12).
EE, Excitation Strength amplitude of the
negative peak, calculated after the positive
peak. EE is correlated with the overall intensity
of the signal. A decrease in EE is correlated
with a breathy voice.
RK, Glottal Symmetry/Skew ratio of the closing
phase to the opening phase of the differentiated
glottal pulse. RK affects mainly the lower
harmonics the more symmetrical the pulse, the
greater their amplitude.
Spectral characteristics
H1-H2 the amplitude of the first harmonic (H1)
compared to the amplitude of the second harmonic
(H2). An indicator of the relative length of the
opening phase of the glottal pulse (Hanson 1997).
H1-A1 the amplitude of the first harmonic (H1)
compared to the strongest harmonic in the first
formant (A1). Reflects the first formant
bandwidth
spectral tilt - Expected to be large and positive
for breathy voices and small and/or negative for
creaky voices
H1-A2 the amplitude of the first harmonic (H1)
compared to the amplitude of the strongest
harmonic in the second formant (A2). An indicator
of spectral tilt at the mid formant frequencies.
Large and positive for breathy voices and small
and/or negative for creaky voices.

23
Glottal pulse analysis in APARAT
24
Analysis of the vocal tract

Methods of vocal tract shape estimation include
x-ray, computer tomography and magnetic resonance
methods.
stationary sound production only
.Cheaper and quicker method computing of the
vocal tract shape from the speech signal
complementary to glottal pulse analysis from the
speech signal. (e.g. vocal tract shape
computation from LPC derived reflection
coefficients).
- allows for analysis of the dynamic behavior of
the articulators. Similar information can be
obtained by formant analysis using homomorphic
deconvolution (cepstrum) or LPC spectrum analysis.

25
Static analysis by synthesis using articulatory
synthesizer
(TRACTSYN)
26
Dynamic analysis by synthesis (articulatory
synth. TRACTSYN)
27
Acoustic correlates of emotions applied in speech
synthesis
28
Vision Speech Sound Mining

Aim to extract information from supra-segmental
and extra-linguistic layers
Where to look for information
time domain a) quantity (lengths of segments)
b) rhythm
frequency domain
a) long term characteristics
b) short term characteristics
model based characteristics
a) glottal excitation function b)
articulatory model

29
Vision Speech Sound Mining

How to define a set of speech sound objects?
Objective methods of analysis (pattern
recognition)
Subjective methods (impression of the listener)
Possible objects
Speech sound event
Speech sound act
Speech sound gesture
Speech sound characteristic
Speech sound characteristic change

30
Vision Speech Sound Mining

First steps to be accomplished
Speech corpus building
Annotation of SSO
Boundary markers
Frequencies of occurence of SSO
Concordances of SSO
Correlation among different sets of objects
(pitch SSO, accent SSO, rhythmic SSO, timbre SSO,
etc.)
Semantic representation of SSO
Cross cultural semantic analysis

31
Vision Speech Sound Mining

Traditional methods used in NLP and
data mining will be applicable
Bag of words ? Bag of SSO
WordNet ? SSO semantic net
e.t.c.
Research on the relation between lingvistic and
paralingvisticextralingvistic information.
Creation of a complex (holistic) model of the
speech signal as an information carrier in
communication.

32
Thank you for your attention
Milan Rusko Institute of Informatics Slovak
Academy of Sciences Milan.Rusko_at_savba.sk

Write a Comment

User Comments (0)