Speech%20Perception - PowerPoint PPT Presentation

About This Presentation
Title:

Speech%20Perception

Description:

... (Elman and McClelland 1984, ... Mad Gab Butcher Ed Stew Gather (Put your heads together) Mad Gab Lease Hummer Reap Wrest Lee (Lisa Marie Presley) ... – PowerPoint PPT presentation

Number of Views:242
Avg rating:3.0/5.0
Slides: 35
Provided by: Heid124
Category:

less

Transcript and Presenter's Notes

Title: Speech%20Perception


1
Speech Perception
??????????????? recognize speech wreck a
nice beach
?
2
The Major Questions in Speech Perception
  1. How do we identify the sounds we hear?
  2. What about the lack of invariance in the speech
    signal?
  3. What about degraded signals?

3
How do we identify sounds?
  • Speech occurs at an alarming rate
  • (estimates vary between 120-180 wpm)
  • 10-15 or 25-30 phonetic segments/second!
  • The speech signal is continuous there are no
    easily identifiable boundaries between words
  • The speech signal to the right ?
  • is segmented into how are you?

4
How do we deal with the lack of invariance in a
speech signal?
  • Lack of Invariance comes from
  • Coarticulation effects (Allophonic variation)
  • Tom Burton tried to steal a butter plate
  • Speaker variation
  • No exact repetition
  • Reduction / deletion of segments

5
Acoustic Cues
  • No single acoustic cue is reliably present for
    any given phoneme
  • for di and du, the /d/ is very different, but
    speakers will indicate that its still the same
    segment
  • Each phoneme has more than one acoustic cue
  • voice-onset-time (VOT)
  • energy in the burst
  • onset frequency of the first formant
  • placement in syllable

6
Voice Onset Time (VOT)
  • Measure of time between the burst of air and
    beginning of vocal-fold vibration of the adjacent
    vowel
  • Best single cue for distinguishing between
    voiced/voiceless consonants in many languages
    English, Dutch, Spanish, Hungarian, Tamil,
    Cantonese, Thai, and Eastern Armenian (Lisker
    Abramson, 1964)
  • BUT we can still interpret whispered speech!
    (practically all voiceless)

7
(No Transcript)
8
Categorical Perception(chunking of speech
signals)
  • Although speech is non-discrete, we perceive it
    discretely!
  • Task Identify the sound
  • 0------10------20------30------40------50------60
  • /d/ 100 50 100 /t/

9
Categorical Perception Yeni-Komshian and
LaFontaine (1983)
  • 7 stimuli, between di/ti (VOT 0 - 60 ms)
  • 0----10----20----30----40----50----60
  • same 1-step 2-steps
  • Task Discriminate between these sounds
  • (2 steps apart so 20 ms difference on VOT)
  • 0/20 ms 100 same 40/60 ms 100 same
  • 10/30 ms 50 same 30/50 ms - 50 same
  • 20/40 ms 100 different

10
What about bilinguals?
  • VOT boundaries vary between languages
  • Perception studies show compromise-effects
  • Canadian French-English bilinguals
  • (Caramazza, Yeni-Komshian, Zurif, Carbone,
    1973)
  • Spanish-English bilinguals
  • (Williams, 1977, 1980)
  • Bilinguals seem to have developed a single
    perceptual system!

11
Coarticulation Effects
  • Phonemes are influenced by the sounds around
    them!
  • Take naturally recorded speech
  • Remove vowel
  • Guess the vowel
  • Example see si remove the vowel
  • Play 150 ms of /s/
  • Can identify removed vowel (for most vowels)

12
How is speech perceived under less than ideal
conditions?
Top-down UNDERSTANDING Bottom-up
  • Semantic context
  • Syntactic structure
  • Acoustic Information

13
A demonstration
  • The McGurk Effect
  • We use visual AND auditory cues to determine what
    segments were hearing!

14
Top-Down Processing(using semantic and syntactic
information to decode individual words in fluent
speech)language speech recognition
talkrecognize speech???????????????Botto
m-Up Processing(using acoustic information to
encode the speech signal)
15
Phoneme Restoration Effect(Warren, 1970)
  • Replaced sounds with a cough
  • Word presented in a sentence
  • The bill was sent to the legi_lature.
  • Where does the cough occur?
  • Participants thought whole word was present. The
    /s/ was mentally restored!
  • It was found that the _eel was on the orange.
  • It was found that the _eel was on the shoe.

16
Semantic Influences(Garnes and Bond, 1976)
  • 16 tokens, spanning the spectrum of
    bait-date-gate
  • 3 carrier sentences
  • Heres the fishing gear and the ______.
  • Check the time and the _______.
  • Paint the fence and the _______.
  • If unambiguous, get semantically implausible
    sentences (Paint the fence and the bait.)
  • If ambiguous (near a phoneme boundary), semantic
    context effects

17
Slurred Speech
  • Syntactic and semantic cues help!
  • Words (with noise) are perceived more accurately
    in sentences than in isolation
  • (Pollack Pickett, 1964) recorded
    conversations and excised individual words.
    Presented the words to listeners for
    identification, and only half the excised words
    were correctly recognized.

18
Rules of Rapid Speechhanmethethimbook
  • Often can drop the las consonan
  • Consonants in clusters may be modified to have
    the same blace of articulation/voicing
  • thimbook, thingcarpet, Istambul
  • NOT thingbook, thim slice
  • Almost all vowels can be shortened

19
Listening for Mispronunciation(Cole, 1973 Cole,
Jakimik, Cooper, 1978 Cole Jakimik, 1980)
  • 20-minute story. Press a button whenever you hear
    a mispronunciation.
  • Notice more stop errors with voicing
  • 70 for stops (boot to poot)
  • 64 for affricates (chance to jance)
  • 38 for fricatives (fin to vin)
  • Notice almost all place changes (80-90)
  • (take to pake)
  • no higher percentage if voicing also changed
    (take to gake)
  • Notice more errors at beginnings of words
  • 72 for word-initial segments (dish to tish)
  • 33 for word-final segments (split to splid)
  • Conclusion we DO use bottom-up information!

20
Mad Gab
21
Mad Gab
  • Lit Told Hid High No

22
Mad Gab
  • Ate Whole Freak Haul

23
Mad Gab
  • Hike Air Rub Ouch Hue

24
Mad Gab
  • Huff Ink Earn Elf Aisle

25
Mad Gab
  • Ale All Heap Hop
  • (A lollipop)

26
Mad Gab
  • Butcher Ed Stew Gather
  • (Put your heads together)

27
Mad Gab
  • Lease Hummer Reap Wrest Lee
  • (Lisa Marie Presley)

28
Mad Gab
  • Bill Spare Reed Oh-boy!
  • (Pillsbury Dough Boy)

29
Models of Speech Perception
  • Motor Theory of Speech Perception
  • Speech signals interpreted by reference to motor
    speech movements
  • Cohort Model
  • TRACE Model

30
Models of Speech Perception
  • Motor Theory of Speech Perception
  • Cohort Model
  • 1) The acoustic information at the beginning of a
    word activates a cohort of possible words
  • 2) Syntax and semantics influence the selection
    of the target word from the cohort
  • TRACE Model

31
Cohort size
  • Standard dictionary
  • after 50 ms, 115 nouns share the same sounds
  • after 100 ms, 43 nouns
  • after 200 ms, 11 nouns
  • after 300 ms, 5 nouns
  • (Average word length, depending on speech rate,
    for one-, two-, and three-syllable words is
    between 550 830 ms)
  • Word recognition occurs before the isolation
    point! (only one word possible)

32
Models of Speech Perception
  • Motor Theory of Speech Perception
  • Cohort Model
  • TRACE Model Neural Network
  • (Elman and McClelland 1984, 1986)
  • processing occurs through excitatory and
    inhibitory connections in processing units
    called nodes
  • 3 levels of nodes features, phonemes, and words
    all highly interconnected

33
Evidence for the TRACE model (or other
interactive models)
  • We activate all possible words from the phonology
    regardless of semantic fit
  • He swam across to the far side of the river and
    scrambled up the bank before running off primes
    bank as financial institution!
  • parts of words cause priming
  • trombone primes for rib just as well as bone
  • word boundaries dont interfere with phonological
    retrieval
  • nudist is primed by the phrase new distance
  • BUT we eliminate all the irrelevant words within
    a few syllables

34
For more information
  • b-d-g continuum
  • http//www.phonetik.uni-muenchen.de/Lehre/Skripten
    /Haskins/Haskins/MISC/PP/bdg/bdgau.html
  • Resources on phonetics and phonology
  • http//faculty.washington.edu/dillon/PhonResources
    /
  • Why we need prosody and lexical access
  • http//emsah.uq.edu.au/linguistics/book/flant.htm
Write a Comment
User Comments (0)
About PowerShow.com