Title: Theoretical Perspectives to Speech Perception
1Theoretical Perspectives to Speech Perception
- Mark C. Flynn
- The University of Canterbury
2Question to be answered
-
- How are we able to perceive the acoustic speech
wave and transform it into a linguistically coded
message in our brain?
3Basic Problem is
- Human communication is not just a transfer of
information like two fax machines connected with
a wire. It is a series of alternating displays of
behaviour by sensitive, scheming, second
guessing, social animals. - - Steven Pinker
4Spectrogram
5The speech input consists of
- Frequency range 50-5600Hz
- Critical band filters
- Dynamic range 50dB
- Temporal resolution of 10ms
- Smallest detectable change in F0 2Hz
- Smallest change in F1 40Hz
- Smallest change in F2 100Hz
- Smallest change in F3 150Hz
6Issues in models of speech perception
- Bottom-up vs top-down processing
- Acoustic phonetic invariance
- Segmentation of the signal into phonetic units
- Time normalization
- Talker normalization
- Lexical representations for optimal search
- Phonological recoding of words in sentences
- Dealing with errors in the initial representation
- Interpretation of prosodic cues
7Bottom-up processing
- Peripheral processing
- Acoustic property detectors
- Phonetic feature detectors
- Segmental analysis
- Lexical search
- Syntactic and semantic analysis
8Top-down processing
- Higher level processing
- Lexical cues
- Contextual cues
- World knowledge
- Cognitive skills
9Acoustic phonetic non-invariance
- Intra-speaker variability
- Co-articulation (phonetic context)
- Which acoustic cues are used?
- How are the acoustic cues combined?
10Segmentation into phonetic units
- Segments overlap (co-articulation)
- Segments are not always well defined acoustically
- Acoustic segments do not always match phonetic
segments - Errors are difficult to recover from
11Overlapping speech segments
12Effects of consonants on vowel transitions
13Time normalization
- Duration of segments can vary up to two or three
times - Speaking rate
- Syllable stress
- Syntactic boundaries
- Co-articulation
- Duration is a phonetic cue that interacts with
the other factors above
14Talker normalization
- Formant frequencies depend of anatomical size
- Dialects vary
- Moderate amounts of noise, and distortion are
common, but do not affect the accuracy of speech
perception in normally hearing listeners - Indexical communication
15Lexical normalization
- Perceptual units?
- features
- phonemes
- syllables
- words
- beyond?
16Phonological recoding
- The phonetic representation of a word depends on
the sentence/word context. - For example, /p/ is normally aspirated but in
some contexts it is unreleased as in /apt/ or
unaspirated as in /spIt/ (Kess, 1992).
17Interpretation of prosodic cues
- Prosodic cues (F0, segmental durations, intensity
contour) provide syntactic, semantic and
pragmatic information - The prosodic cues modify the acoustic phonetic
cues - How is the prosodic information preserved and
incorporated
18Models of Speech Perception
- Motor Theory (Liberman et al., 1967 Liberman
Mattingly, 1985). - TRACE (McClelland Elmann, 1986 Elmann, 1989).
- Neighborhood Activation Model (Luce, 1986 Luce,
Pisoni, Goldinger, 1989). - Logogen Theory (Morton, 1969 1979).
19(No Transcript)
20(No Transcript)
21The Speech Chain
22Why perception is not all phonetic!
- Woman Im leaving you
- Main Who is he?
- Blundy gets chair for wife
- Isle of view
23Predicted vs Actual CNC words.
24Effect of Context (at 5dBSNR).
Participants (in order of PTA)
25Distortion and Audibility.