CSE 551: - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CSE 551:

Description:

... 'stone' (Although in English most burst-fricative pairs are represented as ... over speakers and contexts (van Bergem 1993) Coarticulation ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 27
Provided by: johnpau1
Category:
Tags: cse

less

Transcript and Presenter's Notes

Title: CSE 551:


1
CSE 551 Structure of Spoken Language Lecture 9
Syllable Structure, Vowel Neutralization, and
Coarticulation John-Paul Hosom Fall 2004
2
  • NOTE
  • Theres a tutorial on the web that allows you
    tohear the effect of different formant values
  • http//www.asel.udel.edu/speech/tutorials/synthesi
    s/ceevees.html
  • You can enter start time, end time, amplitude,
    and formant
  • values for beginning, middle and end of a
    syllable, then
  • generate a waveform and hear the result.

3
  • Syllables
  • Words are composed of phonetic clusters
    syllables
  • Each syllable has a nucleus typically the
    nucleus isa vowel or diphthong, sometimes a
    syllabic nasal or lateral (button, bottle) or
    retroflex (bird)
  • Nucleus is syllabic nasal or lateral only when
    following alveolar consonant in previous
    syllable of a word
  • Syllable boundaries sometimes ambiguous tasty
    tas/ty tast/y ta/sty bottling bott/l/ing
    bott/ling
  • Syllable can be broken into components syllable
    contains onset, rhyme rhyme contains
    nucleus, codaonset and coda are consonants,
    rhyme is a vowel.

4
Syllables
limitations on consonant clusters not all CCC
combinations are possible in syllable-initial
position
graphic from http//www.arts.uwa.edu.au/LingWWW/LI
N101-102
5
  • Syllables
  • Sonority corresponds roughly to degree of
    constrictionalong vocal and/or nasal tract
  • Ordering of sonority vowels, glides (/w/,
    /y/), liquids (/l/, /r/), nasals, fricatives,
    affricates, stops
  • Fricatives, affricates, and stops may be
    clustered into onecategory, obstruents, for
    purposes of sonority
  • Syllabification can be done according to
    sonority principlethe sonority must rise and
    fall in a syllable
  • Also, theres the Maximal Onset PrinciplePut a
    consonant in the onset rather than the coda when
    possible

6
  • Syllables
  • Because of rise and fall of sonority in syllables
    the followingrestrictions occur (a) glide
    (/w/,/y/) must be immediately adjacent to a
    vowel, (b) /r/ and then /l/ are next closest
    consonant(s) to vowel, (c) nasal is next
    closest, (d) obstruent is farthest from the
    vowel (but there may be more than one
    obstruent in onset or coda)
  • Obstruents in a cluster must have same voicing
  • In series of obstruents between two vowels,
    voicing can change only once, at the syllable
    boundary.
  • English allows up to 3 consonants in syllable
    initial position, 4 consonants at syllable final
    position
  • Examples sphere /s f iy r/, streak /s t r iy k/,
    texts /t eh k s t s/, helms /h eh l m z/ but
    not /t l iy/ or /p w iy/

7
  • Syllables
  • The ordering of glides and liquids doesnt matter
    for our purposes (applying to syllabification),
    because glides and liquids can not occur
    sequentially within the same syllable.
  • Fricatives and stops can occur in any order
    within a syllable, e.g. Senator Paul Tsongas,
    tsunami, stone (Although in English most
    burst-fricative pairs are represented as distinct
    phonemes (/ch/, /jh/)).

8
  • Vowel Neutralization
  • When speech is uttered very quickly (or is not
    well enunciated),the formants tend to shift
    toward that of a neutral vowel

(from Daniloff, p. 320)
(from van Bergem 1993 p. 8)
9
  • Vowel Neutralization
  • Target undershoot

/m ih pc ph ih
eh/
10
  • Vowel Neutralization

/m ih pc ph ih
eh/
Target undershoot /ih/ extracted and
concatenated from mip
11
  • Vowel Neutralization
  • However, neutralization is not always so simple
    sometimesvowel formants shift away from the
    neutral position,depending on their context, and
    vowels tend toward slightlydifferent neutral
    targets.
  • Neutralization is to some extent an artifact of
    averagingover speakers and contexts (van Bergem
    1993)

12
  • Coarticulation
  • Coarticulation is the blending of adjacent
    speech sounds,
  • due to gradual movement of the articulators.
  • Coarticulation makes automatic speech recognition
    andtext-to-speech synthesis difficult, but
    humans use coarticulationto conserve effort
    while speaking and provide robustnessduring
    recognition.
  • There is Right-to-Left (RL) or anticipatory and
    Left-to-Right (LR) or carry-over
    coarticulation
  • Models of coarticulation and syllabification
    ? Locus Theory ? Modified Locus Theory
    (Klatt) ? Öhmans Theory ? Kozhevnikov-Chistov
    ich (KC) Theory ? Wickelgrens Theory, etc.

13
Coarticulation RL coarticulation occurs
due to high-level planning of phonetic sequences
spoon s p uw n rounding in
isolation rounding in context
more observable if neighboring sounds
not specified with respect to potentially
coarticulated feature e.g. /s/, /p/, /n/ not
specified with respect to lip rounding (from
Daniloff, pp. 323-324)
14
Coarticulation Locus Theory Locus Theory
(Delattre, Liberman, and Cooper, 1955) there
are, for each consonant, characteristic frequency
positions, or loci, at which the formant
transitions begin, or to which they may be
assumed to point. On this basis, the transitions
may be regarded simply as movements of the
formants from their respective loci to the
frequency levels appropriate for the next phone
The spectrographic patterns , which produce /d/
before /iy/, /aa/, and /ow/, show how these
transitions seem to be pointing to a F2 locus
in the vicinity of 1800 Hz. ? Each consonant
has target frequencies independent of
the neighboring vowels. ? Formants transition
from these target frequencies to the
vowel target frequencies.
15
  • Coarticulation Locus Theory
  • Locus Theory
  • Consonants and vowels both have targets of
    articulatorpositions and therefore formant
    frequency locations
  • Given sufficient duration of a syllable, all
    phonemes reachtheir targets
  • The slope of the formants during a transition
    from a consonantto a vowel is relatively
    constant until reaching the target
  • If the syllable duration doesnt allow enough
    time for theformants to reach their targets,
    target undershoot occursand the formants
    change direction before fully realizingthe
    intended vowel

16
  • Coarticulation Locus Theory
  • Locus Theory

(From Klatt 1987, p. 753)
17
  • Coarticulation Modified Locus Theory
  • Problems with Locus Theory
  • A transition may have both rapid and slow
    componentsrapid release of obstruction via
    tongue tip, followed by slow movement of tongue
    body.
  • Preceding vowel can influence F2 onset of a CV
    transition(Öhman, 1966)
  • F2 may be insensitive to oral constrictions
    (obstruents)if the tongue position is toward the
    front of the mouth (as in /iy/)
  • (as reported by Fant 1973, Klatt1987)

18
  • Coarticulation Modified Locus Theory
  • Modified Locus Theory
  • Klatt hypothesized that main effects of the vowel
    on thearticulation of consonants are front/back
    position and liprounding
  • Vowels divided into three sets front round
    front, round(because there are no rounded
    front vowels in English,sets 1 and 2 are
    mutually exclusive)
  • front /iy ih eh ae/
  • round /uw ao ow er/
  • front, round /uh ah aa aw/
  • Predicted Fonset from Ftarget for these 3 classes
    (locus theory)
  • Achieved 95 intelligibility for CVC nonsense
    syllables

19
  • Coarticulation Locus Theory
  • Modified Locus Theory

-front, -round front round
(From Klatt 1987, p. 754)
20
Coarticulation Öhmans Theory Öhman (1965)
found that loci of consonants is NOT
independent of neighboring vowels a
nd that for /g/ more than one locus is
required Conclusion consonant gestures are
superimposed on vowel gestures that are present
during the consonant even when consonant is
being uttered in VCV, there is effect of both V
on C.
21
Coarticulation Öhmans Theory Öhman (1966)
proposed model of coarticulation based
on vocal-tract shape evolving over time. Assumes
that vocal-tract shapes can be mapped to formant
frequencies. For VCV utterances where s(x,t)
is the vocal tract shape at position x and time
t, v(x) is the vocal tract shape at position x
for a given vowel, c(x) is the vocal tract shape
of the consonant, k(t) is an interpolation value
(from 0 to 1), and wc(x) describes the degree to
which c(x) resists coarticulation. v(x)
describes the shape of the vocal tract, which
may be a combination of two vowels if V1 ? V2.
(v(x) will vary over time from V1 to V2)
22
  • Coarticulation Kozhevnikov-Chistovich (KC)
    Theory
  • Syllabification using CnV pattern CV, CCV,
    CCCV,
  • phrase give true answers
  • g ih v t r uw ae n s er z
  • ---- ----------- -- ------- -
  • S1 S2 S3 S4 S5
  • (2) Measured relative durations of words,
    syllables, vowels
  • relative duration of vowel Dvow / Dsyll,
    syllable Dsyll / Dword word Dword /
    Dphrase

23
Coarticulation Kozhevnikov-Chistovich (KC)
Theory Found coarticulation within syllable but
not across syllables C1 V1 C2 C3 V2
  • articulatory gestures for consonant(s) and vowel
    begin nearly
  • simultaneously with onset of initial consonant in
    syllable
  • Example lip rounding in /uw/ begins with /v/ in
    give true answers,
  • but nasalization of /ae/ does not occur.
  • assumes little or no LR coarticulation
  • assumes motor programming of speech is
    discontinuous at VC boundary
  • counter-examples showing LR coarticulation
    (Moll and Daniloff 1971, Kent, Carney, and
    Severeid 1974, Öhman 1966)

24
Coarticulation Wickelgrens Theory Speech
units are mentally coded as context-sensitive
units in phonetic string /X Y Z/, Y is encoded
as XYZ By assuming (context-sensitive)
allophones to be the basic unit of articulation,
it is trivial to account for how the
same phoneme in different phonemic environments
can be different in some respects at all levels
of the speech process (Wickelgren 1969, p. 11)
However, coarticulation can spread over more
than one phone (up to seven phones distance).
Other criticisms MacNeilage 1970, Whitaker
1970, Halwes and Jenkins 1971 Allophonic
richness may only beget strategic poverty (Kent
and Minifie 1977) However, Wickelgrens is the
only model currently used in ASR and
concatenative text-to-speech (exceptions Wouters
2001, Wrede 2001).
25
  • Coarticulation Gays Theory
  • Gay, 1977 The syllabic unit of motor
    organization is the CV unit
  • Based on X-ray motion pictures of VCV utterances
  • anticipatory tongue movements for V2 in V1CV2
    sequencedont begin until closure of C has been
    attained
  • movement toward V2 occurs during closure of C,
    havinga large effect on position and shape of
    tongue during releaseof closure
  • V1 has little effect on position of tongue at
    moment ofclosure
  • supports KC theory conflicts with Ohmans
    findings

26
  • Coarticulation
  • Other models MacNeilage, Henke, Benguerel and
    Cowan,Moll and Daniloff, Liberman, Tatham, etc.
  • Some are feature based in that each phonetic
    segmentis assigned distinctive features which
    can then be modifiedin regular ways
  • Some are hierarchical models, with several
    levels oforganization and complex interaction
    between levels
  • However, coarticulatory patterns are not
    explainedadequately by any theories or models
    (Kent and Minifie, 1977)
  • Conflicting evidence (Öhman and Kent Moll vs.
    KC and Gay)
Write a Comment
User Comments (0)
About PowerShow.com