Analysis,%20Modelling%20and%20Synthesis%20of - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis,%20Modelling%20and%20Synthesis%20of

Description:

Accents are acoustic manifestations of differences in pronunciation and ... Three spectrogram examples of formant tracks superimposed on LPC spectrum of speech ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 28
Provided by: aim78
Category:

less

Transcript and Presenter's Notes

Title: Analysis,%20Modelling%20and%20Synthesis%20of


1
Analysis, Modelling and Synthesis of British,
Australian and American Accents
Supported by EPSRC
Qin Yan Saeed Vaseghi Multimedia
Communication Signal processing Lab Department of
Electronic and Computer Engineering Brunel
University
2
Content
  • 1- Introduction to Phonetics and Acoustics of
    Accents
  • 2- Research Issues in Modelling Acoustics of
    Accents of English
  • 3- Current Research Problems
  • 4- Accent Analysis and Models
  • 5- Accent Morphing
  • 6- Audio Demo

3
1. Introduction to Phonetics and Acoustics of
Accents
  • 1.1 Background
  • Accents are acoustic manifestations of
    differences in pronunciation and intonations by a
    community of people from a national, regional or
    a socio-economic grouping.
  • Accents are dynamic processes in that they evolve
    over time influenced by large-scale immigration,
    socio-economic changes and cultural trends.
  • Applications of accent models include
  • - speech recognition,
  • - text to speech synthesis,
  • - voice editing,
  • - accent morphing in broadcasting and films,
  • - toys and computer games,
  • - accent coaching, education.

4
  • 1.2 Basic Structure of Accents
  • Generally the structural differences between
    accents can be divided into two broad parts
  • (a) Differences in phonetic transcriptions.
  • (b) Differences in acoustics correlates and
    intonations of accents.
  • The importance of an accent feature depends on
    its distance from that of the standard or
    received pronunciation and the frequency with
    which that feature occurs in the acoustics of
    speech.

5
  • 1.3 Phonetics of Accents
  • A dominant aspect of accents is in the
    differences in pronunciation as transcribed by a
    phonetic dictionary.
  • The differences in phonetic transcription can be
    categorized into two classes
  • a) Differences in the number and identity of the
    phonemes.
  • For example, British English as transcribed by
    Cambridge Universitys BEEP dictionary2 has five
    extra vowels /ax(?) ea(??) ia(i?) ua (u?) ah (?)
    / compared to American as transcribed by Carnegie
    Melon University CMU dictionary. /i? ?? u?/,are
    allophones of /i ? u/. American /?/ is merged
    with /a/ compared with British accent.
  • American transcription has three different
    levels of stress for vowels and diphthongs. Also
    Australian English has distinctive vowels such as
    /æi/ instead of /ei/ and /æ?/for /au/.
  • b) Differences in phonetic realizations phoneme
    substitution, deletion, insertion.
  • For example, JOHN is pronounced as /??n/ in
    American but as /??n/ in British and Australian
    English. The word SAY is pronounced as /sei/ in
    British and American but it is pronounced as
    /sæi / in Australian.

6
  • 1.4 Acoustics of Accents
  • Perceived acoustics differences of accents are
    due to the differences, during the production of
    sound, in the configurations, positioning,
    tension and movement of laryngeal and
    supra-laryngeal articulatory parameters, namely
    vocal folds, vocal tract, tongue and lips
  • Four aspects of acoustic correlates of accents
    are considered essential for accent models and
    accent synthesis. These are
  • (a) Formants (i.e. frequency of vocal tract
    resonance) correlates of accents, including
  • (i) Formant trajectories Fkj(t), k is
    the formant index and j is phoneme index.
  • (ii) Timing and magnitude of the
    formant target point(s) in formant space for
    each phonetic unit.

7
(b) Pitch prosody correlates of accents,
include (i) Pitch trajectory at
various linguistic contexts and positions. e.g.
pitch rise, at the beginning of a voiced
group or phrase, pitch fall at the end of a
phrase. (ii) Pitch nucleus i.e. the
timing and magnitude of the prominent pitch
event in a voiced group. (c)
Duration and Timing correlates of accents,
(i) Duration of vowels and diphthongs.
(ii) Relative duration and timings of the
two constituent vowels of diphthongs. (d)
Laryngeal (glottal) correlates of accents, i.e
the voice quality of speech segments in certain
contexts as a function of accent.
8
  • 2. Research Issues in Modelling Acoustics of
    Accents of English
  • Definition of an accent feature set composed
    of formants trajectories, formants target
    points, pitch trajectory, power trajectory,
    duration.
  • Separation, normalisation, or averaging out of
    speakers characteristics from accent
    characteristics, this is required for modelling
    parameters of accent.
  • Modelling formants of vowels and diphthongs, the
    latter is composed of two connected elementary
    sounds.
  • Modelling the duration of vowels and diphthongs
    and the relative duration of the two halves of
    diphthongs.
  • Modelling pitch trajectory in different
    phonetic/linguistic positions and contexts.
  • Modelling voice quality correlates of an accents
    in different phonetic/linguistic positions and
    contexts.
  • Integration of all accent features within a
    coherent generative model.

9
(No Transcript)
10
Speech Accent Feature Analysis Method
Speaking Rate Durations
HMM Training
Labeling Segmentation
Formants Trajectories
Input Speech
Accent Profile
F0 Range/Mean Pitch Accents
Pitch Contour Tracker
Pitch Marker
Tone Nucleus Features
Block diagram illustration of the processes
involved in accent analysis
  • The basic processes involved in accent analysis
    includes
  • Speech phonetic labelling and boundary
    segmentation using HMMs
  • Pitch trajectory and pitch nucleus estimation
  • Formant models and formant track estimation
  • Duration and power trajectory analysis

11
Analysis of Duration Correlate of AU, US and UK
Accent Speech
Figure Comparison of speaking rates of British,
Australian and American.
0.2
0.18
0.16
0.14
0.12
Duration (sec)
0.1
0.08
0.06
0.04
Australian
British
American
0.02
aa
ae
ah
ao
aw
ay
eh
er
ey
ih
iy
ow
oy
uh
uw
Figure Comparison of phoneme durations of
British, Australian and American.
12
Comparison of speaking rates of British, American
and Australian Accents.
  • Australian speaking (word) rate is 23 slower
    than British
  • American speaking (word) rate is 15 slower than
    British

Model Input British Model American Model Australian Model
British 12.8 29.3 34.9
American 30.6 8.8 29.94
Australian 33.1 27.3 7.28
Table () word error of speech recognition
across British, American and Australian accents.
  • There is an apparent correlation between
    automatic speech recognition and speaking rate.
  • Australian with the slowest speaking rate obtains
    the best recognition results followed by American
    and British.

13
Formant Estimation with 2D-HMM
  • Formant feature extraction, illustrated consists
    of three main functions,
  • an LP model,
  • (2) a polynomial root finder, and
  • (3) a contour trend estimator.
  • Consider the z-transfer function of an LP model
    with K real poles and I complex pole pairs and a
    gain factor G as
  • where Ak is the pole radius, Fi the pole
    frequency and Fs sampling frequency.

D-estimator
Formant candidate Feature vector
Frequency,Bandwidth Intensity Calculation
Segmentation window
LPC Model
Polynomial roots
Speech
LP-based Formant-candidate feature extraction
method
14
Time(s)
Frequency(Hz)
Illustration of of LP spectrum and the modelling
of 6 complex pole pairs of a speech segment with
an HMM composed of 4 formant-states.
  • 2D HMMs span time and frequency dimensions
  • Left-right HMM states across frequency model
    formants such that the first state models the
    first formant, the second state the second
    formant and so on
  • The distribution of formants in each state is
    modelled by a mixture Gaussian density.

15
Three spectrogram examples of formant tracks
superimposed on LPC spectrum of speech
16
Comparison of histograms (thin solid line) and
Gaussian HMMs of formants of Australian English
(bold dashed line). X axis frequency (Hz) Y
axis probability. The figures show that HMMS are
excellent models of the distribution of the
formants.
17
Comparison of Formants Spaces of American,
Australian and British Accents
F1 vs F2 space of British, Australian and
American English. Click phoneme to listen.
  • Note the following features
  • Rising of vowels /ae/ and /eh/ in Australian.
  • Fronting of the open vowel /aa/ and high vowel
    /uw/ in Australian.
  • Fronting and rising of the vowel /er/ in
    Australian.
  • The vowels /iy/, /eh/ and /ae/ in Australian
    are closer.

18
Figure Comparison of trajectories and target
time of formant of British, Australian and
American accents
19
Figure Comparison of formants of Australian,
British and American (female)
Formant Ranking using a normalised distance
Accent Pairs Formant Ranking Order Formant Ranking Order Formant Ranking Order Formant Ranking Order
Accent Pairs 1 2 3 4
British Australian 1st 2nd 4th 3rd
British American 2nd 1st 3rd 4th
Australian American 2nd 1st 3rd 4th
  • 2nd Formant has widest frequency range and
    is most sensitive to Accent

20
Accent Morphing Method
Accent Model
HMM Training/ Adaptation
Speech Labeling Segmentation
Prosody Modification
Source Speech
Formant Mapping
Formant Estimation
Accent Synthesised Speech
Pitch Tracker
Figure Diagram of a voice morphing system used
for accent conversion
  • Formant Mapping Transformation of formants of
    the source towards those of the target accent is
    based on non-uniform linear prediction model
    frequency warping.
  • Prosody Modification based on time domain
    pitch synchronous overlap and add (TD-PSOLA)
    method.
  • Prosody Modification includes pitch slope,
    duration and power trajectory.
  • Application Text to speech synthesis,
    Broadcasting System e.g. Accent modification in
    films, Education software such language teaching,
    Speech interface in mobile, Call centre and other
    electronic products

21
Formant Transformation via Non-Uniform LP
Frequency Warping
-35
-40
-45
-50
-55
Magnitude (dB)

-60
-65
I
12

-70
I
I
23

34

-75
1
0.1
BW
BW
BW
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BW
1

3

4


2

F
F
F
F
F
01

12

23

34

45

Frequency (Hz)

Figure Illustration of a non-uniform frequency
warping using LP model frequency response. The
spectrum is divided into a number of bands
centered on the formants and a different set of
warping parameters is applied to each band.
Formant Transformation Ratios
Linear Prediction Model
Polynomial roots Pole estimation
Accent modified spectrum
LP Spectrum Mapping
Speech
Formant Estimation
Formant HMMs
Figure Illustration modification of spectrum
towards formants of target accent
22
The frequency bands of the source speaker
F01F12F23F34F45 are mapped to the target accent
using a set of warping ratios derived from
differences in the formants of phonetic segments
of speech across accents as
-
T
T
f
f

1
i
i

a

)
1
(
i
i
-
S
S
f
f

i
i
1
Where fiT and fiS are the ith formants of the
source and target accents
The frequency mapping can be expressed as

a
f
f
)
1
(
)
1
(



)
1
(
i
i
i
i
i
i
Figure Illustration of warped(solid line) and
original(dash dot line) formant trajectories of
/aa/ in accent conversion from Australian to
British.
23
Pitch Modification Using Time Domain PSOLA
(TD-PSOLA)

marks

Source pitch
marks

Target pitch
Illustration of mapping of pitch periods of a
source speech to a target
Source Speech Pitch Marks
Target Speech Pitch Marks
  • TD-PSOLA is applied into each corresponding
    voiced speech segment to modify the pitch slope
    and duration of the segments

24
Examples of changes in accent/duration modulation
of pitch
(b)
(a)
(a) article in Australian, (b)
Australian-accent article transformed to
British accent
(d)
(c)
(c) asked in Australian, (d)
Australian-accent article transformed to
British accent
25
Source
Source
Speech
Speech
LP
LP
Model
Model
Source
Source
Formant
Speaker
Formant
Speaker
Trajectory
Trajectory
HMM
HMM
Mapped
Mapped
Model
Model
Spectrum Warping / Pole Rotation
Speech
Speech
Speech
Speech
Warping
Warping
Recon
Recon-
Spectrum Warping / Pole Rotation
Factors
Factors
struction
struction
Target
Target
Formant
Speaker
Formant
Speaker
Trajectory
Trajectory
HMM
HMM
Model
Model
-
-
LP
LP
LPC
LPC
Model
Model
Target
Target
Speech
Speech
Model
Formant
Model
Formant
Speech
Speech
Formant Mapping
Formant Mapping
Estimation
Tracking
Estimation
Tracking
Reconstruction
Reconstruction
An Outline of Voice-Morph A system for Voice and
Accent Conversion
An example of voice conversion
American male
Transformed(AM m-gtf)
American female
26
Accent Conversion Demonstration
Source Accent
Target Accent
Spoken word
Transformed
Australian
British
Article
Claim
Beige
British
American
Transformed
Cooperation
Boston
Opposition
The occupied
27
The End
Write a Comment
User Comments (0)
About PowerShow.com