Title: CS 551651:
1CS 551/651 Structure of Spoken Language Lecture
10 Overview of Sound Perception John-Paul
Hosom Fall 2008
2Anatomy
3Anatomy
- The Outer Ear
- composed of pinna (auricle) and external
auditory meatus (ear canal) - functions
- irregular shape of pinna directs high-frequency
sounds into ear canal - shape of pinna helps with determining location of
sound - ear canal acts as resonator (2.7 cm long), with
broad resonance between 3 to 5 kHz - implications
- smaller animals better at hearing high-frequency
sounds
4Anatomy
- The Middle Ear
- composed of
- chamber (tympanic cavity) containing ossicular
chain malleus (hammer), incus (anvil), stapes
(stirrup) - middle ear also contains epitympanic recess. (The
ossicles are lodged in the epitympanic recess.) - tympanic membrane (ear drum) is partition between
ear canal (outer ear) and middle ear - sound transferred from tympanic membrane to
cochlea (inner ear) via ossicular chain - stapes connects to footplate, which connects to
oval window,which is membrane of inner ear - functions
- matches acoustic vibration of air to that of
fluid in cochlea(if air directly hits oval
window (water), theres a 30 dB drop in energy) - has low-pass filter effect
5Anatomy
- The Inner Ear
- composed of
- cochlea, semi-circular canals
- function
- simply speaking, the inner ear performs a
frequency analysis - of the incoming sound, which is transmitted via
VIIIth cranial - nerve to CNS.
- cochlea
- spiral in shape
- 35 mm long, wound in 2 ¾ turns
- filled with incompressible water-like fluid
- separated into 3 parts by two membranes, the
Basilar Membrane (BM) and Vestibular (Reissners)
Membrane - thousands of hair cells are attached to BM these
cells are connected to neurons that fire in
response to sound
6Anatomy
- The Inner Ear
- cochlea
- Sound from ear canal is amplified by middle ear
- Vibration of bone against oval window is received
by cochlea (not just at oval window, but by
entire cochlea no standing waves) - Entire cochlea vibrates at the same frequency as
the stimulus - Different locations of the cochlea respond better
to particular frequencies - Higher frequencies respond more near base of
cochlea. - Cochlea and VIIIth nerve have tonotopic
organization direct mapping between place and
frequency
7Anatomy
A schematic of the cochlea unrolled (middle)
and basilar membrane (bottom). The top figure
indicates the tonotopic organization. (from J.D.
Durrant and J.H. Lovrinic, Bases of Hearing
Science, 1977, in Daniloff p. 395)
8Anatomy
This figure shows instantaneous displacement of
the BM for two instants in time, in response to
200-Hz sine wave, and the envelope of amplitude
peaks for this wave. Each point on BM vibrates
at a frequency equal to the input frequency (200
Hz).
9Anatomy
- Tonotopic organization
- BM varies in tautness and shape along its length,
creating different frequency responses - Tautness at base responds well to high-frequency
sounds compliance at apex (tip) responds well to
low-frequency sounds. - Each point in BM has a characteristic frequency
(CF) at which the frequency response is maximum - The bandpass shape of a CF filter has constant
ratio of frequency to bandwidth, implying better
resolution (lower bandwidth) at lower frequencies
10Anatomy
- Transduction
- Between BM and tectorial membrane (A thin,
responsive, gelatinous membrane) are hair cells
about 25,000-30,000 outer hair cells, 3500-5000
inner hair cells in humans. (Tunnel of Corti
separates outer from inner). Each hair cell has
30-300 hair-like projections called stereocilia
protruding from the surface into the fluid-filled
cavity in a bundle. - When BM vibrates up and down, it creates a
shearing motion between tectorial membrane and
stereocilia. This motion causes tips of
stereocilia to be displaced, causing electrical
action potentials in a hair cell the electrical
signal is then transmitted down auditory nerve.
11Anatomy
- Transduction
- Most (95) neurons connecting cochlea to higher
levels in auditory system connect to inner hair
cells - Function of outer hair cells less clear provides
amplification, sharp tuning (partially under the
control of higher levels). - Hair cells connect to neurons about 30,000
neurons in one human auditory nerve .
12Anatomy
Organ of Corti contains hair cells and neurons
three parallel rows
1-Inner hair cell2-Outer hair cells3-Tunnel of
Corti4-Basilar membrane5-Habenula perforata
6-Tectorial membrane7-Deiters' cells8-Space of
Nuel9-Hensen's cells10-Inner spiral sulcus
11- nerves
11
from http//www.iurc.montp.inserm.fr/cric/auditio
n/english/corti/corti.htm
13Anatomy
The same picture from Grays Anatomy (1918)
14Neural Response
Each neuron in the auditory nerve responds to
certain frequencies the response to each
frequency can be plotted by stimulating a neuron
with a particular frequency and measuring the
response rate (firing rate) of the neuron
The most sensitive frequency is the
Characteristic Frequency (CF)
15Neural Response
- Auditory firings processed by two types of
neurons - ones extracting precise temporal features (onset
chopper units), - others for spectral features (transient chopper
units). (OShaughnessy p. 113) - Each neuron has spontaneous rate of firing this
rate depends on the sensitivity of the neuron
(high spontaneous rates associated with low
threshold of firing). - 3 groups of spontaneous rates
- high rate (61, 18 to 250 spikes/sec),
- medium rate (23, 0.5 to 18 spikes/sec),
- low rate (16, lt0.5 spikes/sec).
16Neural Response
The firing rate of a neuron to a given stimulus
can be plotted
audio visual detection level threshold
- Firing rate has a dynamic range if intensity is
below or above this range, firing rate wont
change. - Typical range of 20 dB for low-threshold fibers,
40-50 dB for high-threshold fibers
17Neural Response
With three groups of neurons with different
thresholds and firing rates, can cover wide range
of signal levels at a given frequency
low rate
high rate
18Neural Response
- Phase Locking
- In addition to encoding frequency according to
place along the BM, information is encoded in the
rate of neuron firing - Upper limit of 4 to 5 kHz for phase locking
tone
firings
1.0 msec/group 1000 Hz
1.18 msec/group 850 Hz
2.45 msec/group 408 Hz
count
msec
msec
msec
This figure shows the number of neuron firings
over time in response to three different tones
the timing of the firings is related to the
frequency of the tone
19Neural Response
Neural Recruitment Another method for
increasing dynamic range is for multiple neurons
to fire in response to the same stimulus If
stimulus is low in energy, a small number of
neurons, located near the CF, fire More intense
stimuli cause more neurons, located farther
from the CF, to fire
strongstimulus
weak stimulus
1 line 50100 fibers
(same frequency)
20Neural Response
- Adaptation
- If stimulus remains, neurons adapt to it,
decreasing the firing rate with an exponential
rate of decay (time constant ? ? 40 msec). - Most of decay occurs within 15-20 msec of
stimulus onset. - When stimulus removed, firing rate falls to near
zero andthen exponentially increases back to
spontaneous rate. - There may be two classes of neurons
- neurons that respond to steady-state sounds,
- neurons that respond to changes in frequency,
with frequencysensitivity greatest at levels
near human speech (OShaugnessy p. 119)
21Hearing Threshold
This figure shows the absolute thresholds of
hearing, as a function of frequency
22JND
- Just Noticeable Difference measure of ability
to perceive a difference - JND tests
- Two stimuli differ along one dimension, otherwise
identical - Subjects asked if two sounds are the same or
different (AX test, is XA?) - Or subjects are asked which of two sounds most
resembles third (ABX or AXB test, is X A or
B?) - The JND occurs when 75 of responses are
different (AX) or correctly identified (ABX) - People are able to discriminate between 100 Hz
and 101 Hz, - but cant identify if a tone is 100, 101, , 109
Hz without - making pairwise comparisons
23JND
JND Trivia JND is greater for louder sounds,
sounds with duration ? 250 msec Sounds of equal
intensity increase in loudness up to 200
msec Below 1 kHz, two tones must be different by
1 to 3 Hz to be perceived as different At higher
frequencies, JND is larger (approx. 8 kHz tones
require a 100 Hz separation to be heard as
different) Entire frequency range has 1600
distinguishable frequencies and 350 intensities,
or about 300,000 tones of frequency and intensity
that can be identified in pairwise combination
(for durations gt 200 msec) For duration lt 250
msec, there are 850 frequency levels for
duration lt 10 msec only 120 levels and 170
intensities Identification of frequencies in
isolation yields much fewer tones.
24JND
JND Trivia, Timing Information Onsets of two
signals must differ by at least 2 msec to be
heard as separate sounds To identify order of
two signals, about 17 msec gap is requiredand
sounds must be 125-200 msec long However, people
use rise and fall of amplitude to segment speech
can not identify order of 4 vowels of 200 msec
duration in repeating sequence, but can identify
much shorter vowels if there are amplitude onsets
and offsets as well as 50 msec gap between
vowels. Sounds with energy onset lt 20 msec heard
as plucks otherwise,heard as bow