SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT - PowerPoint PPT Presentation

About This Presentation
Title:

SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT

Description:

... is a representation of the intonation of an utterance, which describes the ... Automatic intonation contour analysis in Fujisaki editor. WIKT 2006 ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 33
Provided by: syha1
Category:

less

Transcript and Presenter's Notes

Title: SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT


1
SPEECH IS MORE THAN ONLY ITS LINGVISTIC CONTENT
Institute of Informatics of the Slovak Academy of
Sciences
Rusko Milan
  • Institute of Informatics of the Slovak Academy of
    Sciences
  • Dubravska cesta 9, 847 05 Bratislava, Slovakia
  • Milan.Rusko_at_savba.sk

2
Expressive speech
  • Expressive speech designates the whole vocal
    display of a speaker.
  • It consists
  • Linguistic information part of information that
    can be encoded in general written text message
  • Various additional information on the speaker
  • age, cultural background, education, sex,
    attempt, relation to the listener, individuality
    etc.
  • (The expression individuality is used here to
    denote personality, mood (attitude) and emotions
    of a speaker.)

3
Expressive speech
LISTENER (receiver)
Linguistic information L2
Age A2
Cultural background C2
Education E2
Sex S2
Attempt AT2
Relation to the listenener R2
Individuality - personality - mood - emotions I2
Other Y1 ... Yk
SPEAKER (transmitter)
Linguistic information L1
Age A1
Cultural background C1
Education E1
Sex S1
Attempt AT1
Relation to the listenener R1
Individuality - personality - mood - emotions I1
Other X1 ... Xi
D
E
C
O
D
I
N
G
Expresion gt
C
O
D
I
N
G
- - gt SPEECH - - gt
gt Impression
4
Personality (and temperament)
Personality is considered to be a set of constant
features of an individual. Temperament is that
aspect of personality that is genetically based,
inborn.
  • Ancient Greeks 2 dimensions of temperament gt 4
    types of temperament
  • sanguine type (cheerful and optimistic, pleasant
    to be with)
  • choleric type (quick, hot temper, often an
    aggressive nature)
  • phlegmatic type (characterized by slowness,
    laziness, and dullness)
  • melancholy type (sad, even depressed,
    pessimistic view of world)

.
5
Generalized model of personality
personality p have n dimensions, and so it can be
represented by a following vector (Egges, A.,
Kshirsagar, S., Magnenat-Thalmann, N. 2
.
6
The OCEAN modelThe Big Five model of
personality
  • Five dimensions are enough to express the
    personality.
  • The Big Five model also known as OCEAN model
    takes into account the following five dimensions
    of personality
  • Openness
  • Consciousness
  • Extraversion
  • Agreeableness
  • Neuroticism
  • (Digman, J. M 3, McRae, R.R. John, O.P. 4)

.
7
Traditional psychological classification of
personality dimensions Five Factor Model
Digman 1990, Mc.Rae, John 1992
8
Mood and Emotion
Mood (attitude) can be defined as a rather static
state of being, that is less static than
personality and less fluent than emotions. Mood
can be defined as one-dimensional (e.g. good or
bad mood) or perhaps multi-dimensional (feeling
in love, being paranoid etc.) (KsirsagarMagnenat
-Thalmann5)
9
Generalized model of emotion
An emotional state has a similar structure as
personality, but it changes over time. Defined
as an m-dimensional vector, where all m emotion
intensities are represented by a value in the
interval 0,1 .
The actual emotional state is dependent on the
preliminary evolvement of emotins. A need to
model the emotins respecting their previous
trends (history). An emotional state history ?t
is defined, that contains all emotional states
until et, thus
10
Generalized model of mood
Egges continues with defining the individual ITas
a triple (p, mt, et), where mt represents the
mood of the individual at a time t. Mood
dimension is defined as a value in the interval
-1,1. k mood dimensions gt the mood can be
described as follows
The mood and emotional values are changing in
time gt Both have to be updated regularly.
11
Basic emotions
There are many theories of emotions and many
different classifications exist. This table,
taken from Ortony, A., Turner, T. J. 6 gives a
short overview of basic emotion sets used by
different authors.
12
Placement on emotion dimensions
Pleasure Happy ltgt Unhappy Pleased
ltgtAnnoyed Satisfied ltgtUnsatisfied Cont
ented ltgtMelancholic Hopeful
ltgtDespairing Relaxed ltgt
Bored Arousal Stimulated ltgt Relaxed Excited
ltgtCalm Frenzied ltgt Sluggish Jittery
ltgt Dull Wide-awake ltgtSleepy Aroused
ltgtUnaroused Dominance Controlling ltgt
Controlled Influential ltgtInfluenced In
control ltgt Cared-for Important ltgt
Awed Dominant ltgtSubmissive Autonomous
ltgt Guided
Semantic differential scales are often used for
measuring emotion dimensions. A Set of
dimensions as proposed by Mehrabian Russell
(1974, Appendix B, p. 216)7. It is evident
that the authors have included moods and
personality dimensions in this system too.
13
Acoustic correlates of emotions
Problem speech parameters involved in expression
of personality, moods and emotions are shared for
all the components of expressivity. Decoding the
expressive speech code is very subjective.
Nevertheless, a general set of the speech
parameters responsible for the expression of
emotion can be constructed. There are three main
categories of speech correlates of
emotion Pitch contour Timing Voice
quality It is believed that value combinations of
these speech parameters are used to express vocal
emotion.(Schröder M.8)
14
Pitch contour
Pitch contour is a representation of the
intonation of an utterance, which describes the
nature of accents and the overall pitch range of
the utterance. Pitch is expressed as fundamental
frequency (F0). One of the most frequently used
methods for F0 measurement is the method using
autocorrelation function of the LP
residual. Parameters include average pitch, pitch
range, contour slope, and final lowering.
15
Intonation contour
  • Models of intonation - two main categories
  • Phonetic
  • Phonological
  • The phonetic models (e.g. Fujisaki model, Tilt
    model, MOMEL and many others) model the
    intonation curve.
  • The phonological model (e.g. ToBI) is used to
    model the speaker's concept of distribution of
    accents in the intonational phrase.

16
Automatic intonation contour analysis in Fujisaki
editor
17
Pitch contour analysis in PRAAT with ToBI labels
18
Timing
  • Timing
  • Speed that an utterance is spoken
  • Rhythm
  • Duration of emphasized syllables
  • The results of measurement of syllable and
    phoneme lengths are often given in a form of
    z-scores
  • (the instantaneous value is normalized be the
    mean value of the same elements in the whole
    database.
  • Parameters speech rate, hesitation pauses,
    exaggeration...

19
Voice quality
  • Voice quality denotes the overall character of
    the voice, which includes effects such as
    whispering, hoarseness, breathiness, and
    intensity.
  • The voice quality is influenced mainly by
  • function of glottis
  • function of the vocal tract
  • A detailed classification scheme was published by
    Laver 9.

20
(No Transcript)
21
Analysis of the glottal function
  • The analysis of the glottal function is generally
    done using source-filter model of speech
    production 10.
  • The glottal function is obtained from the speech
    signal by inverse filtering. One of the most
    efficient inverse filtering methods uses Discrete
    Linear Prediction DLP (El-Jaroudi A., Makhoul
    J., 11)
  • to obtain the inverse filter coefficients and to
    filter the speech signal.
  • The resultant DLP residual function is considered
    as a representative of a derivative of glottal
    volume velocity function.

22
Time and spectral domain characteristics of the
glottal function
  • Time characteristics
  • OQ, Open Quotient ratio of the open phase of
    the glottal waveform to the period of the pulse.
  • OQ predicts the values for the amplitudes of the
    lower harmonics. (increased value of OQ is
    correlated with an increase in the amplitude of
    the lower harmonics in the voice spectrum.)
  • CQ, Closing Quotient ratio of the closing phase
    of the glottal pulse to the period of the pulse.
  • These characteristics has been recently often
    replaced by AQ Amplitude quotient and
    NAQ-Normalized amplitude quotient (Alku 12).
  • EE, Excitation Strength amplitude of the
    negative peak, calculated after the positive
    peak. EE is correlated with the overall intensity
    of the signal. A decrease in EE is correlated
    with a breathy voice.
  • RK, Glottal Symmetry/Skew ratio of the closing
    phase to the opening phase of the differentiated
    glottal pulse. RK affects mainly the lower
    harmonics the more symmetrical the pulse, the
    greater their amplitude.
  • Spectral characteristics
  • H1-H2 the amplitude of the first harmonic (H1)
    compared to the amplitude of the second harmonic
    (H2). An indicator of the relative length of the
    opening phase of the glottal pulse (Hanson 1997).
  • H1-A1 the amplitude of the first harmonic (H1)
    compared to the strongest harmonic in the first
    formant (A1). Reflects the first formant
    bandwidth
  • spectral tilt - Expected to be large and positive
    for breathy voices and small and/or negative for
    creaky voices
  • H1-A2 the amplitude of the first harmonic (H1)
    compared to the amplitude of the strongest
    harmonic in the second formant (A2). An indicator
    of spectral tilt at the mid formant frequencies.
    Large and positive for breathy voices and small
    and/or negative for creaky voices.

23
Glottal pulse analysis in APARAT
24
Analysis of the vocal tract
  • Methods of vocal tract shape estimation include
    x-ray, computer tomography and magnetic resonance
    methods.
  • stationary sound production only
  • .Cheaper and quicker method computing of the
    vocal tract shape from the speech signal
  • complementary to glottal pulse analysis from the
    speech signal. (e.g. vocal tract shape
    computation from LPC derived reflection
    coefficients).
  • - allows for analysis of the dynamic behavior of
    the articulators. Similar information can be
    obtained by formant analysis using homomorphic
    deconvolution (cepstrum) or LPC spectrum analysis.

25
Static analysis by synthesis using articulatory
synthesizer
(TRACTSYN)
26
Dynamic analysis by synthesis (articulatory
synth. TRACTSYN)
27
Acoustic correlates of emotions applied in speech
synthesis
28
Vision Speech Sound Mining
  • Aim to extract information from supra-segmental
    and extra-linguistic layers
  • Where to look for information
  • time domain a) quantity (lengths of segments)
  • b) rhythm
  • frequency domain
  • a) long term characteristics
  • b) short term characteristics
  • model based characteristics
  • a) glottal excitation function b)
    articulatory model

29
Vision Speech Sound Mining
  • How to define a set of speech sound objects?
  • Objective methods of analysis (pattern
    recognition)
  • Subjective methods (impression of the listener)
  • Possible objects
  • Speech sound event
  • Speech sound act
  • Speech sound gesture
  • Speech sound characteristic
  • Speech sound characteristic change

30
Vision Speech Sound Mining
  • First steps to be accomplished
  • Speech corpus building
  • Annotation of SSO
  • Boundary markers
  • Frequencies of occurence of SSO
  • Concordances of SSO
  • Correlation among different sets of objects
    (pitch SSO, accent SSO, rhythmic SSO, timbre SSO,
    etc.)
  • Semantic representation of SSO
  • Cross cultural semantic analysis

31
Vision Speech Sound Mining
  • Traditional methods used in NLP and
  • data mining will be applicable
  • Bag of words ? Bag of SSO
  • WordNet ? SSO semantic net
  • e.t.c.
  • Research on the relation between lingvistic and
    paralingvisticextralingvistic information.
  • Creation of a complex (holistic) model of the
    speech signal as an information carrier in
    communication.

32
Thank you for your attention
Milan Rusko Institute of Informatics Slovak
Academy of Sciences Milan.Rusko_at_savba.sk
Write a Comment
User Comments (0)
About PowerShow.com