Title: SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
1SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
Presented by Richard Duncan Group??? Lab??? Compa
ny???
in collaboration with Hualin Gao, Richard
Duncan, Julie A. Baca, Joseph Picone Human and
Systems Engineering Center of Advanced Vehicular
System Mississippi State University
2WHICH TWO ARE THE SAME PHONEME?
We need to extract meaningful information from
the signal for a speech recognition system to
model
3WHICH TWO ARE THE SAME PHONEME?
a ow
b aa
c ow
4WHAT IS AN ACOUSTIC FRONT-END?
It encapsulates the signal processing of a speech
recognition system. It computes a sequence of
feature vectors from an audio stream. These
vectors are then processed by HMMs, neural
networks, or other classifiers.
5- A Front-end has many areas of complexity
- Run-time efficiency
- File I/O
- Data management (framing)
- DSP algorithm complexity
- Algorithm re-use
- Our system abstracts the researcher/student from
these mundane issues to so he or she can focus on
the algorithms
6framen
framen1
windown
windown1
New data
Shared data
7FEATURES OF ISIP FOUNDATION CLASSES
- Efficient memory management and tracking
- System and I/O libraries that abstract details of
the operating system - Math classes that provide basic linear algebra
and efficient matrix manipulations - Generic data structures
- Built-in unit tests to verify component
correctness.
8- A library of standard algorithms provides basic
digital signal processing (DSP) functions - New algorithms can be added without modifying
existing classes - A block diagram tool allows rapid prototyping
without programming or recompiling - The same system is used for offline feature
extraction, recognition, and general DSP work.
9- BASIC DIGITAL PROCESSING FUNCTIONS
This example shows how to realize the basic
digital signal processing functions. It computes
the energy of input vector in dB using the SUM
algorithm // declare an Energy object, input
vector, and output vector Energy egy
VectorFloat output VectorFloat input(L"0, 1,
2") // choose algorithm enrgy.setAlgorithm(Ene
rgySUM) // choose implementation egy.setImple
mentation(EnergyDB) // compute the energy of
input data egy.compute(output, input)
10- Interface contract allows extensibility to new
algorithms - All algorithms are classes that implement this
interface - Most have a default implementation.
11boolean Energyinit() const String
className() const return CLASS_NAME int
GetLeadingPad() const return 0 int
GetTrailingPad() const return 0 bool
Apply(VectorltAlgorithmDatagt output,
VectorltAlgorithmDatagt input) // determine
what channel to operate on if
(algorithm_d SUM)
computeSum(output(0).makeVectorFloat(),
input(0).getVectorFloat())
12boolean EnergycomputeSum(VectorFloat output_a,
const VectorFloat input_a) // compute the
sum of squares Float e input_a.sumSquare()
// compute the scale factor according to
specified implementation float scaled_energy
scale(e, input_a.length()) // the length of
the output vector should be 1 as it only contains
the energy output_a.setLength(1) // assign
the value of energy to the output output_a(0)
Integralmax(floor_d, scaled_energy) // exit
gracefully return true
13- Algorithm
- Input and output is an array of floating point
numbers - Correspond to basic DSP principles
- Recipe
- Collection of algorithms which are run serially,
output of An-1 is the input to An - Named input and outputs
- Allows reuse of processing blocks between systems
14HIERARCHY OF ALGORITHM CLASSES
15- FRONT-END CONFIGURATION TOOL
- Design a front-end by creating a block diagram
- Allows rapid prototyping of ideas.
- New modules can easily be added into the system
- Parameter file is then the input to a full speech
recognition system
16- FRONT-END CONFIGURATION TOOL
17- FRONT-END CONFIGURATION TOOL
18- FRONT-END CONFIGURATION TOOL
19- FRONT-END CONFIGURATION TOOL
20- FRONT-END CONFIGURATION TOOL
21- FRONT-END CONFIGURATION TOOL
22- FRONT-END CONFIGURATION TOOL
23- FRONT-END CONFIGURATION TOOL
24- FRONT-END CONFIGURATION TOOL
25- FRONT-END CONFIGURATION TOOL
26- FRONT-END CONFIGURATION TOOL
27- FRONT-END CONFIGURATION TOOL
28- FRONT-END CONFIGURATION TOOL
29- FRONT-END CONFIGURATION TOOL
30- FRONT-END CONFIGURATION TOOL
31- FRONT-END CONFIGURATION TOOL
32- FRONT-END CONFIGURATION TOOL
33- FRONT-END CONFIGURATION TOOL
34- FRONT-END CONFIGURATION TOOL
35RESPONSIBILITIES OF THE UTILITY
- Parses the file containing the recipe created in
the configuration tool - Synchronizes different paths along the block flow
diagram contained in the recipe - Prepares input and output data buffers for each
algorithm - Schedules the sequence of required signal
processing operations - Processes data through the recipe
- Manages large collections of data files.
36- The correctness The implementation of each
algorithm is verified manually or by using other
tools such as MATLAB. - Usability Assessed and enhanced the usability of
our tools through extensive user testing
conducted over the course of several training
sessions. - Speech recognition experiments The correctness
of the tools was also verified by speech
recognition experiments.
37- STATE-OF-THE-ART FEATURES
- Mel-frequency cepstral coefficients (MFCCs)
- Cepstral mean subtraction
- Energy normalization
- 1st and 2nd order differential features
- These features are used by most commercial speech
recognition systems.
38EXPERIMENTAL RESULTS
39- The front-end performs signal processing for
speech recognition systems - The ISIP front-end is implemented on an
extensible library of basic DSP building blocks - A block diagram interface is used to configure
the front-end data flow - The tools usability was optimized through
multiple training sessions with new users - The systems correctness was verified through
speech recognition experiments.