SPEECH RECOGNITION - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

SPEECH RECOGNITION

Description:

At the end of the lecture you will have learned. The definition of language ... writing (orthography) sign (gesture) Associations between stimulus and response. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 19
Provided by: brendan9
Category:

less

Transcript and Presenter's Notes

Title: SPEECH RECOGNITION


1
SPEECH RECOGNITION
  • CORE READING
  • Parkin, A. (2000). Essential Cognitive
    Psychology. Psychology Press, Chap 10.
  • SUPPLEMENTARY READING
  • Campbell, R. (1999). Seeing speech in unexpected
    places - Mouths, machines and minds.
    Psychologist, 12(9), 446-449.
  • McGurk, H., MacDonald, J. (1976). Hearing lips
    and seeing voices. Nature, 264, 746-748.
  • AIMS OBJECTIVES
  • The aim of this lecture is to review models of
    hearing and speech recognition.
  • At the end of the lecture you will have learned
  • The definition of language
  • Temporal processing of speech sounds
  • Categorical knowledge in speech recognition
  • The role of context in speech recognition

2
32 calls grunts, barks, screams hoots Auditory,
visual and tactile signals
3
What is language?
  • Language signalling that enables concepts to be
    communicated between members of the same species
  • humans, dolphins, chimpanzees..
  • Signalling systems are symbolic
  • speech (phonemes)
  • writing (orthography)
  • sign (gesture)
  • Associations between stimulus and response.
  • Abstract concepts representing actions.

4
Broad language
sound
meaning
Speech stream WordsVowels and
consonants Printalphabetic and non-alphabetic
specific language
5
Chimp language
  • Primates can be taught to use manual signals such
    as American Sign Language (ASL).
  • They show an ability to manipulate symbols in
    order to communicate simple concept words
  • I want a banana
  • Spontaneous signs e.g. watermelon drink fruit
  • Transmit signs to young
  • Sign in sleep
  • True language communication system whose
    elements can be combined to form concepts and
    meanings generative via rules
  • Humans do this with speech (Chomsky, 1959).
  • Chimps have no speech and so cannot create new
    concepts using speech but may use sign

6
Speech
  • First form of language that humans acquire.
  • Acquired relatively early in development.
  • Acquisition is effortless.
  • Is a basis for acquisition of other language
    skills including reading ability.
  • Human speech is known to be processed by specific
    brain areas.
  • Wernickes area
  • temporal lobe
  • Brocas area
  • frontal lobe

7
Brocas area expressive
Wernickes area receptive
8
Speech perception
  • When listening to a spoken word it is necessary
    to perform three operations
  • process a series of sound waves
  • make fine distinctions between similar patterns
    of sounds
  • extract meaning from the utterance.
  • Segmentation problem
  • speech is a continuously changing pattern of
    sound
  • however, human speech perception is categorical.
  • b versus d.

9
Phonemes
  • Phonemes are sound units of language.
  • They have no meaning e.g., DA and PA.
  • Two types of sounds vowels and consonants
  • Humans use phonemes to construct words.
  • We make distinctions between phonemes according
    to temporal duration of the sound.
  • Vowels and consonants can be distinguished
    according sound wave frequency.
  • Rhythm synchronous proximity in duration of
    sound wave frequency.

10
Sound waves
Higherlouder
Frequency (shorterspeech)
11
Voice onset time (VOT)
  • Time elapsing between onset of a sound from mouth
    and the vibration of the vocal cords is called
    voice onset time.
  • The same acoustic signal with a different VOT
    results in perception of two different phonemes.
  • At a VOT of between 0 to 20 milliseconds all
    sounds are identified as DA.
  • Between 20 to 30 milliseconds there is
    uncertainty.
  • gt 30 milliseconds sounds are identified as PA.

12
Category boundary
Hear PA
Hear DA
13
Categorical perception
  • Similar sounds can easily be confused when they
    are presented in isolation for example DA and PA.
  • Speech is continuous there are no natural breaks.
  • We hallucinate word boundaries
  • Speech perception is categorical.
  • The speech recognition system must be able to
    modify fuzzy input.
  • This gives the listener the correct sound using
    top-down knowledge.

14
Effects on speech recognition
  • Massaros model can explain top-down effects on
    speech recognition quite well.
  • McGurk effect
  • Phonemic restoration effect.

15
McGurk effect
  • All participants are presented with a single
    phoneme e.g. VA and asked to identify it.
  • Participants are simultaneously shown a video
    with a mouth articulating a phoneme e.g., BA, DA,
    VA.

16
Chimps actively monitor facial expression and
eye movements in conversation
17
Language environment
  • Phoneme perception depends on language
    environment.
  • Neonates in any culture can discriminate easily
    between sounds r l.
  • American adults can make discrimination but
    Japanese adults cannot.
  • Language environment shapes perception.

18
Summary
  • Speech recognition requires recognising the
    temporal duration of sounds (VOT).
  • Listeners make use of context including lips and
    facial expression in speech perception.

DA or PA?
Write a Comment
User Comments (0)
About PowerShow.com