Title: Role of Object Identification in Sonification System for Visually Impaired
1Role of Object Identification in Sonification
System forVisually Impaired
- Presented By,
- Ranjan Bangalore Seetharama
2Agenda
- Introduction
- Hardware of NAVI System
- Object Identification
- Stereo Sound Generation
3Introduction
- The Navigation Assistance for Visually Impaired
(NAVI) System includes a - single board processing system (SBPS),
- vision sensor mounted on headgear and
- stereo earphones.
-
- The vision sensor captures the vision information
in front of the blind user. - The captured image is processed to identify the
object in the image. - Object identification is achieved by a real time
image processing methodology using fuzzy
algorithms.
4Fuzzy Algorithms
- Traditional logic has only two possible outcomes,
true or false. Fuzzy logic instead uses a graded
scale with many intermediate values, like a
number between 0.0 and 1.0. (Similar to what
probability theory does.) - A fuzzy algorithm would then use fuzzy logic to
operate on inputs and give a result. Applications
include control logic (controlling engine speed,
for instance, where it can be handy to have some
intermediate values between "full speed" and
"full stop") and edge detection in images.
5- The processed image is mapped onto stereo
acoustic patterns and transferred to the stereo
earphones in the system. - The vOICe is one of the patented image
sonification system. - Video camera is used as vision sensor. A
dedicated hardware was constructed for image to
sound conversion. The image captured is scanned
in the left-right direction with sine wave as
sound generator.The top portion of the image is
transformed into high frequency tones and the
bottom portion into low frequency tones. The
brightness of the pixel is transcoded into
loudness.
6- Background fills more area in the image frame
than the objects, as the sound produced
from the unprocessed image will contain more
information of the background. - It is also noted that most of the background is
of light colors and the sound produced on
it will be of high amplitude compared
to the objects in the scene. - Object identification is achieved using
a clustering algorithm. The identified
objects are enhanced. Importance is given to
the objects in the environment than the
background of the environment for sound
production. This will enable the blind user
to identify the obstacles easier.
7HARDWARE OF NAVI SYSTEM
- Navigation Assistance for Visually Impaired
(NAVI) - The hardware model constructed for this vision
substitution system has a headgear mounted
with the vision sensor, stereo earphone
and Single Board Processing System (SBPS)
in a specially designed vest for this
application. - The SBPS is placed in a pouch provided at the
backside of the vest.
8Source Fuzzy Learning Vector Quantization in
Intelligent vision Recognition for Blind
Navigation By R Nagarajan, Yaacob and Sainarayanan
9Object identification
- Digital video camera mounted in the headgear
captures the vision information of scene in front
of the blind user and the image is processed
in the SBPS in real time. - The processed image is mapped to sound patterns.
- Since the processing is done in real time,
the time factor has to be critically
considered.
10Object identification
- The proposed vision substitutive system, the
nature of object to be identified is
undefined, un certain and time varying. - One of important features needed by the
blind user in the image from the environment
are the orientation and size of the object and
obstacles. - During sonification, the amplitude of sound
generated from the image directly depends on
the pixel intensity. In any gray image, pixel
value of white color is of maximum of 255 and
black is with minimum of zero.
11-
- As the image pixels of light color produces
sound of higher amplitude than darker pixels. - If the image is transferred to sound without
any enhancement, it will be a complex
task to understand the sound, which is the
major problem faced in early works. - The main objective of this work is
to suppress' the background and to enhance
the object for this, the gray levels of
the object and background have to be
identified. - Image used for processing is of 32x32 pixel size
and of four gray levels namely black (BL), white
(WH), dark gray (DG) and light gray (LG).
12- Feature extraction is the most critical part in
image processing. - The extracted features should represent the image
with limited data. - In this work each image will have four
feature vector namely - XBL X1, X2, X3. X4,
- XDG X1. X2, X3, X4,
- XLG X1, X2, X3, X4,
- XWH X1. X2, X3, X4
13- X1 Represents the number of respective
gray pixel in the image, this is a histogram
value of the particular pixel. - X2 Represents the number of respective gray
pixel in the central area of the image.
Generally the object of interest will be in
the center of human vision. - X3 Represents the pixel distribution
gradient. x3 is calculated by the sum of the
gradient values assigned to the pixel location. - X4 Represents the gray value of the pixel.
Generally most of the background in the real
world are of light colors than the objects.
14FLVG Fuzzy Learning Vector Quantization
- Artificial Neural Network (ANN) is playing a
major role in pattern classification. - It has the ability to learn and is fault
tolerant, which makes it as a powerful tool for
pattern recognition. - One form of ANN is LVQ network.
- The objective of the LVQ network is to identify
the output node that is nearest to the input
vector. - The weights are updated by competitive learning.
15FLVG Fuzzy Learning Vector Quantization
- Let, Go be gray level as classified to object
class of FLVQ network, - Gb be the gray level as classified to background
class of FLVQ network and - I be the preprocessed image.
- For i, j 1, 2, , 32
- if I(i,j) Go
- then I(i,j) K1
- If I(i,j) Gb
- then I(i,j) K2 (1)
- End
- I1 I
- where K1and K2 are chosen scalar constants,
K1gtgtK2 and
16Superimposing And Normalization
- Use any edge detection algorithms to detect edges
in image I. Let the image of edges be I1. - Let I2 be the background suppressed image of
previous stage. - I1 and I2 are superimposed to form an image
matrix. - Thus, we have a normalized image which is
background suppressed, object enhanced and edge
predominated.
17Source Fuzzy Learning Vector Quantization in
Intelligent vision Recognition for Blind
Navigation By R Nagarajan, Yaacob and
Sainarayanan
18Source Fuzzy Learning Vector Quantization in
Intelligent vision Recognition for Blind
Navigation By R Nagarajan, Yaacob and
Sainarayanan
19Source Fuzzy Learning Vector Quantization in
Intelligent vision Recognition for Blind
Navigation By R Nagarajan, Yaacob and
Sainarayanan
20Sonification
- Transformation of data in relation to perceived
associations to an acoustic signal for the
purpose of facilitating communication or
interpretation is defined as Sonification. - Human auditory system can sense frequencies
between 20 Hz to 20,000 Hz. - From literature and experimentations it is
observed that the system is most sensitive to
frequencies between 20 Hz to 4000 Hz. - This range is adopted in the proposed
sonification module.
21Sonification
- In order to create variations in pitch in the
sonification module, the pixel position in a
column of the image pattern is made to be
inversely related to the frequency of sine wave. - The loudness is made to depend directly on the
pixel value of the processed image.
22Sonification
- The processed image is sonified to stereo
acoustic patterns. - The image is sonified to stereo sound by proper
mapping of the image, by which information
regarding image data corresponding to left side
of a blind are transferred to the left earphone
and the right half image data to the right
earphone.
23Sonification
- Let fo be the fundamental frequency of the sound
generator - G be a constant gain
- FD, the frequency difference between adjacent
pixels in vertical direction. - The changes in frequency corresponding to (I,j)th
of the pixel in 32x32 image matrix is given by. - Fi fo FD
- Where FD Gfo(32-i) i 1,2,3,,32
24Sonification
- The generated sound pattern is hence given by
- Where S(j) is the sound pattern for column j of
the image - t 0 to D and D depends on the total duration of
the acoustic information for each column of the
image -
- where f, is the frequency corresponding to row,
i.
25Sonification
- The sine wave with the designed frequency is
multiplied with gray scale of each pixel of a
column and summed up to produce the sound
pattern. - The scanning is performed from leftmost column
towards the center and from right most column
towards the center. - Sound pattern to the left earphone is SL S(1)
to S(n/2) appended from the left side. - Sound pattern to the right earphone is SR S(n)
to S(n/2) appended from the right side - where n is the total number of columns. In our
case n 32.
26Future Work
- In this research, information regarding depth of
the object is not considered. - An object is perceived bigger through the
variation in sound pattern as the blind moves
near to the object.
27References
- Fuzzy Learning Vector Quantization in Intelligent
vision Recognition for Blind Navigation - By R Nagarajan, Yaacob and Sainarayanan
- Role of Object Identification in Sonification
System for Visually impaired - By R Nagarajan, Yaacob and Sainarayanan
- http//en.wikipedia.org/wiki/Fuzzy_clustering
28?