Title: Visual Aid For the Hearing Impaired
1Visual Aid For the Hearing Impaired
- Katherine Andrade
- Carlos Castillo
- Frank Taranto, Jr.
- Jason Vieira
2How The Ear Works
- Sounds create vibrations
- Outer ear collects the vibrations
- Sound waves strike the eardrum
- Travel through the middle ear
- Inner ear fluid is set in motion
- Nerve cells are excited and the impulses reach
the brain
3What Causes Hearing Loss?
- Anything that completely blocks the ear canal can
cause hearing loss - Blockage with earwax (also called cerumen) is
common - Infections with swelling that shuts the ear canal
- Foreign bodies in the ear
- Neural Problems
- Noise
- Birth defects
- A growth in the ear canal
- Ear Infections
- Tumors
- Noise
4Who Is Affected ?
- Hearing loss affects more American families than
any other chronic health condition - Ironically hearing loss is the most preventable
chronic health condition - More than 40 million Americans have hearing loss
- About 3 out of every 1,000 children in the
United States are born deaf or hard-of-hearing - Hearing loss affects approximately 17 in 1,000
children under age 18 - About 15 of college graduates have a level of
hearing loss
5Existing Solutions
- The first step in treating hearing loss is an
accurate diagnosis - finding out exactly what's
causing the hearing loss - Treating the underlying disease, such as
hypothyroidism, with antibiotics if a disease is
found - A hearing aid to provide amplification of sound
(for most people with sensorineural hearing loss,
amplification is the best or only option) - Surgery for mechanical causes such as chronic ear
infections with a cochlear implant - Videophone offers combined video and audio across
phone lines
6Problems With These Solutions
- Prescription Drugs
- Routine
- Side Effects
- Costly
- Hearing Aid
- Background Noise
- Evaluation Cost
- Maintenance
- Surgery
- Invasive
- Risk
- Costly
- Videophone
- Video and Audio appear out of sync
- Expensive
- Both parties must have similar technology
7Applications of our Design
- Assist a person who is hearing impaired better
understand speech over the phone by looking at a
pair of artificial lips on a screen - On television, whenever a pair of lips are not
available a hearing impaired person can look at a
generated set - By having the sound processed at the receiver,
delay between video and audio is eliminated - Lip Information Complemented with Speech improves
intelligibility in environments with low
signal-to-noise ratio (SNR)
8Intelligibility Pattern of Integrated A/V
9Procedure
- Record all 42 English among several subjects
- Photograph different subjects as they pronounce
all the phonemes and measure lip shape parameters - Using COLEA toolbox to obtain LPC coefficients,
which describe the state of the vocal tract - Utilizing neural networks to best associate LPC
coefficients to respective lip shapes - Fitting parabolas to approximate phoneme lip
shapes and animate coherent speech using MatLab
10Phoneme Acquisition
11Measuring Lip Parameters
- Measure inner height of upper lip
- Measure inner height
- of lower lip
- Measure outer height
- of upper lip
- Measure outer height
- of lower lip
12Organs of Speech and Linear Predictive Coding
(LPC)
- A technique for modeling the vocal tract
- Ideal for pitch and formant detection determines
area functions of vocal tract - Two types of methods for LPC analysis
- Autocorrelation method
- Covariance method
- Another way of representing a sound is by
Cepstral Coefficients, which are more stable than
LPC coefficients
13Extracting LPC coefficients
- Linear Predictive Coding is a means to compress
a continuous signal - LPC coefficients are derived from previous values
- a0s(j)a1s(j-1) .... ans(j-n)
- aCoefficients
- s(j)Present sample
- Procedure is repeated over set of n samples
- Optimum number of coefficients between 10 and 20
14Using Parabolas To Simulate Lip Shapes
- We begin with a parabola, because its properties
resemble a lip - By varying d1 and d2 we obtain various
configurations of a parabola y ax2c - 4 parabolas are used to apply 2-D effect
D1
d2
Fig. 1
Fig. 2
Fig. 3
15Long a, /A/, as in Fonzies Greeting
16Other Phonemes
Short a, /a/, as in flat
b sound, /b/, as in ball
17Using Neural Networks
- Neural Networks complement the measurement of LPC
coefficients - Operates by supplying network with training set
of data (LPC inputs and parabolic coefficient
outputs) and performing least squares for varying
sets - Robust method for determining lips shapes for
people with different pitches and vocal tract
elasticity
18Fundamentals of Neural Networks
- Units known as perceptrons contain weights for
each of its inputs and biases - Weights are adjusted during training, thereby
decreasing error over several epochs - Three options of intelligibility
- Linear
- Logarithmic Sigmoidal
- Tangent Sigmoidal
19Variability in LPC Values
20Variability in Cepstral Values
21Neural Network Training
- For a system that matches a tangent network
function, error will decrease to zero - Actual systems will have variability
- In our case, we used a normalized training set to
reduce the effect of anomalies
22Architecture of Neural Networks
- LPC coefficients serves as the input layer
- Eight parabola coefficients serve as the output
layer - As a rule of thumb, we selected 22 hidden units
(50 more units than input layer)
Architecture for a 2 layer feed-forward back
propagation network
23Simulating Neural Network
- Tested the accuracy of the trained network by
inputting a random phoneme and comparing to
actual lip shape - Trained network with 2 training sets of inputs
and targets combined the mean and standard
deviation of both - Consonants were the most difficult to mimic
acute sound intermixed with vowels
24Implementation of Animation
- Sound sampled at 22050 Hz was partitioned into
sections to achieve a frame rate of between 15-30
fps - LPC coefficients obtained directly through
auto-correlation of the samples constant noise
will not substantially change LPCC - An input of LPC coefficients would result in an
output of 8 normalized parabola coefficients that
are backconverted using stored mean and standard
deviation
25Shortcomings of Animation
- No apparent parallel processing feature in MatLab
allowing for simultaneous sound playback and lip
animation - Higher frame rate sacrificed for smoother audio
playback - MatLab only does a hard pixel redrawing of lip
shapes yielding a rough, though effective
animation
26Final Design
- Now, lets look at our animation system.
27Future Considerations
- Train neural network with more subjects (male and
female) - Test cepstral coefficients in place of LPC
coefficients for greater stability - Use consonant sounds in different vowel contexts
(eg. Ka, Ke, Ki, Ko, Ku) - Use open source toolbox (Netlab) for greater
availability and less restrictive delivery to
market - Test teeth and tongue formations for each phoneme
28Acknowledgement
- The authors wish to thank the advising support
provided by Dr. Richard Foulds, Dr. Joel
Schesser, Dr. Tara Alvarez, Dr. Sergei Adamovich,
Mr. Michael Bergen, Mr. John Hoinkowski