SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

Description:

Page 4 of 38. Signal Processing Tools for Speech Recognition. WHY REINVENT THE WHEEL? A Front-end has many areas of complexity: Run-time efficiency ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 40

Provided by: jennifer194

Category:

more less

Transcript and Presenter's Notes

Title: SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

1
SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
Presented by Richard Duncan Group??? Lab??? Compa
ny???
in collaboration with Hualin Gao, Richard
Duncan, Julie A. Baca, Joseph Picone Human and
Systems Engineering Center of Advanced Vehicular
System Mississippi State University
2
WHICH TWO ARE THE SAME PHONEME?
We need to extract meaningful information from
the signal for a speech recognition system to
model
3
WHICH TWO ARE THE SAME PHONEME?
a ow
b aa
c ow
4
WHAT IS AN ACOUSTIC FRONT-END?
It encapsulates the signal processing of a speech
recognition system. It computes a sequence of
feature vectors from an audio stream. These
vectors are then processed by HMMs, neural
networks, or other classifiers.
5

WHY REINVENT THE WHEEL?

A Front-end has many areas of complexity
Run-time efficiency
File I/O
Data management (framing)
DSP algorithm complexity
Algorithm re-use
Our system abstracts the researcher/student from
these mundane issues to so he or she can focus on
the algorithms

DATA FRAMING

framen
framen1
windown
windown1
New data
Shared data
7
FEATURES OF ISIP FOUNDATION CLASSES

Efficient memory management and tracking
System and I/O libraries that abstract details of
the operating system
Math classes that provide basic linear algebra
and efficient matrix manipulations
Generic data structures
Built-in unit tests to verify component
correctness.

DESIGN REQUIREMENTS

A library of standard algorithms provides basic
digital signal processing (DSP) functions
New algorithms can be added without modifying
existing classes
A block diagram tool allows rapid prototyping
without programming or recompiling
The same system is used for offline feature
extraction, recognition, and general DSP work.

BASIC DIGITAL PROCESSING FUNCTIONS

This example shows how to realize the basic
digital signal processing functions. It computes
the energy of input vector in dB using the SUM
algorithm // declare an Energy object, input
vector, and output vector Energy egy
VectorFloat output VectorFloat input(L"0, 1,
2") // choose algorithm enrgy.setAlgorithm(Ene
rgySUM) // choose implementation egy.setImple
mentation(EnergyDB) // compute the energy of
input data egy.compute(output, input)
10

ADDING NEW ALGORITHMS

Interface contract allows extensibility to new
algorithms
All algorithms are classes that implement this
interface
Most have a default implementation.

ADDING NEW ALGORITHMS

boolean Energyinit() const String
className() const return CLASS_NAME int
GetLeadingPad() const return 0 int
GetTrailingPad() const return 0 bool
Apply(VectorltAlgorithmDatagt output,
VectorltAlgorithmDatagt input) // determine
what channel to operate on if
(algorithm_d SUM)
computeSum(output(0).makeVectorFloat(),
input(0).getVectorFloat())
12

ADDING NEW ALGORITHMS

boolean EnergycomputeSum(VectorFloat output_a,

const VectorFloat input_a) // compute the
sum of squares Float e input_a.sumSquare()
// compute the scale factor according to
specified implementation float scaled_energy
scale(e, input_a.length()) // the length of
the output vector should be 1 as it only contains
the energy output_a.setLength(1) // assign
the value of energy to the output output_a(0)
Integralmax(floor_d, scaled_energy) // exit
gracefully return true
13

DEFINITIONS

Algorithm
Input and output is an array of floating point
numbers
Correspond to basic DSP principles
Recipe
Collection of algorithms which are run serially,
output of An-1 is the input to An
Named input and outputs
Allows reuse of processing blocks between systems

14
HIERARCHY OF ALGORITHM CLASSES
15

FRONT-END CONFIGURATION TOOL

Design a front-end by creating a block diagram
Allows rapid prototyping of ideas.
New modules can easily be added into the system
Parameter file is then the input to a full speech
recognition system

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

FRONT-END CONFIGURATION TOOL

35
RESPONSIBILITIES OF THE UTILITY

Parses the file containing the recipe created in
the configuration tool
Synchronizes different paths along the block flow
diagram contained in the recipe
Prepares input and output data buffers for each
algorithm
Schedules the sequence of required signal
processing operations
Processes data through the recipe
Manages large collections of data files.

VERIFICATION STRATEGY

The correctness The implementation of each
algorithm is verified manually or by using other
tools such as MATLAB.
Usability Assessed and enhanced the usability of
our tools through extensive user testing
conducted over the course of several training
sessions.
Speech recognition experiments The correctness
of the tools was also verified by speech
recognition experiments.

STATE-OF-THE-ART FEATURES

Mel-frequency cepstral coefficients (MFCCs)
Cepstral mean subtraction
Energy normalization
1st and 2nd order differential features
These features are used by most commercial speech
recognition systems.

38
EXPERIMENTAL RESULTS
39

CONCLUSION

The front-end performs signal processing for
speech recognition systems
The ISIP front-end is implemented on an
extensible library of basic DSP building blocks
A block diagram interface is used to configure
the front-end data flow
The tools usability was optimized through
multiple training sessions with new users
The systems correctness was verified through
speech recognition experiments.