Title: Research Experience
1Research Experience
- Wang Jianhong
- CECS Department
- University of Missouri-Columbia
- 201 Engineering Building West
- Columbia, MO 65211
- Phone (573)814-5224
- Email jwcdb_at_missouri.edu
2Digital Acoustic Echo Cancellation(AEC) Algorithm
- Theoretical basis of echo cancellation is
adaptive filter. There are two basic categories
of algorithms for acoustic echo cancellation. - Category One LMS(Least Mean Squares) Algorithm
The criterion function is the expected value of
the squared error and the coefficients of the
adaptive filter are updated according to the
stochastic steepest descent algorithm. This
algorithm is widely used due to its comparatively
lease computational requirements and its robust
stability characteristics. But the convergence
slows when the input signals are none-stationary
or the adaptive filters have a large number of
taps.
3Digital Acoustic Echo Cancellation(AEC) Algorithm
- Category Two LS(Least Squares) Algorithm The
criterion is to minimize the squared error summed
over time. The advantage of the algorithm is the
fast convergence with a penalty of the increased
computational requirement. And this algorithm is
prone to numeric instability which impedes its
implementation on the fixed point digital signal
processor. FTF is a fast algorithm in this
category. - Mixed FTF-NLMS Algorithm LS algorithm is always
stable in the starting period before all the
filter coefficients are updated by a non-zero
value. The disastrous overflow happens after the
starting period. So a new adaptive method is
built that in the starting period the LS
algorithm is used while after this period the LMS
algorithm is used.
4Digital Acoustic Echo Cancellation(AEC) Algorithm
- Brief Description of Mixed FTF-NLMS Algorithm
- 1. Initialization Set the taps of the adaptive
filter zero. - 2. During the starting period, using FTF
algorithm to do the acoustic echo cancellation. - 3. Then using NLMS algorithm. In this stage, a
monitor is embedded in the filter. If the effect
of the echo cancellation is not good. The
adaptive filter will be switched to FTF
algorithm.
5Digital Acoustic Echo Cancellation(AEC)
AlgorithmSimulation Result Comparison between
NLMS and proposed algorithm.
6Digital Acoustic Echo Cancellation(AEC)
AlgorithmSimulation Result Stability
characteristics of the proposed algorithm.
7The Optimal Lattice Architectures for the
Real-Time Computation of DFT/IDFT
- IntroductionIn real-time signal processing
application such as COFDM, DFT/IDFT is an
important module. Although we can take advantage
of fast algorithm FFT/IFFT, it still makes the
VLSI implementation very cost expensive when we
have to deal with a large block size of FFT/IFFT.
Several efficient schemes to VLSI implementation
will be proposed below.
8The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- One of the schemes which based on time-recursive
MDCT/MDST. We first decompose the IFFT/FFT into a
combination of a DCT-like and a DST-like
transform. By exploiting the symmetric property
of the Fourier transform as well as the fast
algorithms of the DCT and DST, we derive a new
scheme which only involve real-valued operators
to replace the complex-domain IFFT/FFT. The
structure computes the transformed data from
sequential inputs in a time-recursive way and
only requires O(N) hardware complexity.
9The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- The IFFT/FFT block diagram in this new scheme is
showed below.
10The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- At the transmitter side, to ensure the IFFT
generates only real-valued outputs, the inputs of
the IFFT have do some changes - The IFFT of a data sequence of length 2N is
11The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- By applying the conjugate-symmetric property of
the input data, we can get - The computation of the IFFT is decomposed into
two part of real-valued operators. One is a
discrete cosine transform like operation. The
other is a discrete sine transform like
operation. we will term these two transform as
modified DCT (MDCT) and DST (MDST).
12The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- Note that MDCT and MDST involve only real-valued
computation. This will reduce the computational
complexity drastically. - We employ the time-recursive approach to
implement the IFFT module. It is an efficient
implementation of the dual generation of the MDCT
and MDST . - We define the MDCT of a sequential input data
starting from x(t) and ending with x(tN-1) as
follows. After the new datum arrives , the MDCT
is updated as
13The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
14The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- For block processing, the initial states are
zeros.
15The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT According to
the equations above, we can derive the complete
time-recursive lattice module as shown in the
figure below
16The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- The implementation of the MDST is similar with
the method of MDCT. - Complexity Analysis
- In the direct implementation using the butterfly
structure , it consists log2(2N) stages in 2N
points FFT/IFFT. Each stage consists of N
multipliers and 2N adders. Note that the input
data are complex data and it requires 4
real-valued multipliers and 2 real-valued adders
for 1 complex multiplier also it requires 2
real-valued adders for 1 complex adder. Hence,
the direct approach requires a total of 4N
log2(2N) real-value multipliers and 6Nlog2(2N)
real adders.
17The Optimal Lattice Architectures for the
Real-Time Computation of FFT/IFFT
- In our approach, we decompose the IFFT into the
real-valued transform kernels (MDCT and MDST). As
a result, we only need real operations to realize
IFFT structure. Therefore, we only need a total
of 4(N-1) real multipliers and 5(N-1)2
real-valued adders. - For the FFT module, we need one SC which is an
accumulator and (N-1) lattice modules (Mn) The
overall architecture of the FFT needs a total of
4(N-1) real multipliers and 3(N-1)1 real adders. - If N is large ,the computational complexity of
our scheme is much smaller than conventional
direct approach.
18Developing a MPEG Audio Layer III Codec on
TMS320C549
- Introduction More and More high-quality, digital
stereo media is introduced to the public,
including the CD, digital audio tape (DAT),
mini-disk (MD), and digital compact cassette
(DCC). All these media assume a 20-kHz bandwidth
for the audio signal. The CD and DAT use a 16-bit
pulse-code modulation format and sampling rates
of 44.1 and 48 kHz. Thus bit rates of 1.406 and
1.536 Mbits/s are used, respectively, for
two-channel stereo. In 1992, the international
standard ISO/IEC 11172 Coding of Moving Pictures
and Associated Audio for Digital Storage Media at
up to about 1.5 Mbit/s was finalized, which is
also known as the MPEG-1 standard. MPEG audio
about Hi-Fi audio coding offers a choice of three
independent layers of compression.
19Developing a MPEG Audio Layer III Codec on
TMS320C549
- Introduction Layer III is the most complex but
offer the best audio quality, particularly for
bit rates around 64 Kbps per channel. This layer
suits audio transmission over ISDN and Internet.
A MPEG audio layer III codec on a general purpose
DSP chip TMS320C54 is presented below. - The MPEG audio layer III codec is a complicated
compound of digital signal processing algorithms
ranging from subband transform to huffman coding.
- Due to the comparably small computational
requirements of decoder part, only a small
general purpose DSP chip will be required.
20Developing a MPEG Audio Layer III Codec on
TMS320C549
- The encoder part is much more complicated and the
computational demands required to estimate the
parameters of psycho-acoustic model and operate
the noise allocation are very huge. Thanks to
MPEG standard that leaves an intentional
vagueness to allow for competing implementations
of encoder part, a simplified psycho-acoustic
model is utilized to reduce the computational
requirements and memory consumption while
preserving the coding quality.
21Developing a MPEG Audio Layer III Codec on
TMS320C549An overview of the MPEG audio layer
III encoder algorithms is described in a block
diagram below
22Developing a MPEG Audio Layer III Codec on
TMS320C549An overview of the MPEG audio layer
III decoder algorithms is described in a block
diagram below
23Developing a MPEG Audio Layer III Codec on
TMS320C549
- The MPEG audio reference source code from ISO was
written in C language. However since we want to
implement a real-time system, we chose assemble
language to develop our system. - Spectrum Digital, Inc. provides the TMS320C54X
Evaluation Module (EVM) that comes with the
fixed-point TMS320C549 DSP on stand-alone card
that lets evaluators examine certain
characteristics of the TMS320C54x digital signal
processor (DSP) to determine if this DSP meets
their application requirements. Furthermore, the
module is an excellent platform to develop and
run software on the C54x family of processors.
24Developing a MPEG Audio Layer III Codec on
TMS320C549
- The reference C language source code for MPEG
audio layer III was implemented with regards to
understanding the algorithms and total system of
the standard, and not for speed. Moreover the
computational type of C language is
floating-point. To reduce the code size,
complexity and also memory consumption, we
decided to limit our encoder to16 bit and only
support dual channel mode. We will discuss some
of the major optimizing we have done in the
sections below.
25Developing a MPEG Audio Layer III Codec on
TMS320C549
- Huffman Decoding We get the necessary
information about huffman decoding from
side-information part of every granule. Then we
can decode the Huffman bit stream in each region
according to the Huffman tree which will expedite
the decoding rate compared with traditional table
search. To avoid the too large Huffman table,
linbits skill that offers extra bits to the large
value in normal size table is used here. Because
of the same properties of two kind of tables
which have different linbits in most fields, we
can save many table memory. In order to look up
the tables quickly, there is a new table that
contains the head addresses of every table. The
decoder look up this table to determine which
Huffman table is selected according the side
information.
26Developing a MPEG Audio Layer III Codec on
TMS320C549
- Quantization Quantization part costs massive
computation requirement that will impede the
real-time realization of encoding, so we do some
simplification on it linking with the change of
psycho-acoustic model. Three layer nesting loops
will be reduced to two layer loops, that decrease
the computation explicitly. Surely, the quality
of output compressed data degrades objectively,
however the subjective listening result is fairly
good. The significant meaning of this proposal
makes the real-time MP3 encoding available to
80MIPs DSP chip.
27Developing a MPEG Audio Layer III Codec on
TMS320C549
- Dequantization Owing to 16 bit fixed-point DSP
chip used, the precision should be considered.
Especially in dequantization part, because there
is exponent function in this module. The exponent
that has power 2 contains global-gain,
subblock-gain, scalefactor and so on. We find the
final exponent has the characteristics that 4
times exponent is always integer, so the solution
of this exponent function can include two parts.
The first one is the integer part of the
exponent. Because of power 2, we can use shift
instruction to realize it. The other one contains
six index (2 0.25 , 2 0.5 , 2 0.75 , 2 -0.25 , 2
-0.5 and 2 -0.75 ) which will be multiplied to
integer part if decimal fraction is not zero.
This mixed method avoids the large exponent
function table.
28Developing a MPEG Audio Layer III Codec on
TMS320C549
- Subband Synthesis Filters The subband synthesis
filter bankis an inverse transform of subband
analysis filter bank. It consists of an
initialization section, the IDCT section, and a
windowing section. The subband filter receives 32
sub-bands of one channel and returns 32
consecutive audio samples. In each iteration of
the sub-band filter loop, the IDCT receives 32
samples and returns 64 samples. The output
samples is written to the FiFo buffer of 1024
elements for the windowing operation. Thus, the
windowing operation is done on the last 64
elements together with the results from the
previous IDCTs. A new vector is built from these
elements and is multiplied with the windowing
coefficients in the windowing section. The
elements are then formed as a pulse code
modulation (PCM)output. In the next iteration,
all the elements are shifted down 64 places in
preparation for the next IDCT output.
29Developing a MPEG Audio Layer III Codec on
TMS320C549
- Subband Synthesis Filters subband synthesis
filter demands a large amount of computation and
we should concentrate our efforts on this block
to be able to build a real-time encoder. A fast
transformations was developed for DCT32-gt32 and
then corrected to the MPEG standard. As described
above, MPEG audio decoding uses a IDCT 32-gt32.
The direct implementation of 32x32 IDCT requires
1024 multiplication and 992 additions. Therefore,
fast DCT algorithms are used. Conventionally, FFT
is utilized to reduce the computational
requirements. However, we find that Lees FDCT is
faster than FFT with trading away a little of the
quantization precision. After careful
consideration, we select FDCT scheme to implement
this transform.
30Developing a MPEG Audio Layer III Codec on
TMS320C549
- Subband Analysis Filters The subband analysis
filter consists of an initialization section, a
windowing section and, the DCT section. This
block also requires some optimization because it
consumes a long time process too. The MPEG audio
standard uses a DCT64-gt32 to divide the time
samples into 32 subbands. Due to the symmetry of
transform coefficients, we can convert input 64
samples into 32 samples. Then the DCT64-gt32 can
be transformed into a normal DCT32-gt32. Utilizing
the similar procedure described in the part of
subband synthesis filters, we can implement the
DCT transform drastically reducing the
computational requirements and the memory
consumption.
31Developing a MPEG Audio Layer III Codec on
TMS320C549
- Psycho-acoustic Model Psycho-acoustic model
calculates the Signal to Mask Ratio (SMR) for
each subbands of input samples. The bit and noise
allocation operation uses this SMR to determine
the quantizational method for each subband. Due
to the high computational requirement and memory
consumption in psycho-acoustic model, it is a
huge obstacle for real-time implementation.
Furthermore, psycho-acoustic model analysis
contains many floating-point calculations that
are very difficult to implement on fixed-pointed
DSP. Therefore, in our scheme of the MPEG
encoder, we do not use the psycho-acoustic model
analysis. By ignoring that, we do not calculate
any block type and the SMR ratio is never set.
Not using the SMR will also reduce much
computation time in iterative loop of bit and
noise allocation. This saves a lot of
computations and memory, but trades off some of
the high quality in theory.
32Developing a MPEG Audio Layer III Codec on
TMS320C549
- Decoder Results Below are some of the decoding
results and technologic specification. - 1. In order to test the robust of our decoding
part which should be able to play any bitstreams
supported by MPEG audio standard, we downloaded a
Layer III test bitstream package from Fraunhofer
IIS which contains several 10 strange
bitstreams. All of them can be decoded
successfully by our decoder. - 2. In order to test the decoding quality of our
decoder, we use a testing bitstream also
downloaded from Fraunhofer IIS compl.bit
(10Hz-10kHz/-20dB sine sweep, mono, 48 kHz). Our
decoding wave file is compared with the ideal
wave file and obtained the score of 93.34dB while
the required score is only up to 77dB. At the
same time, we also use floating-point C decoding
program offered by MPEG organization to decode
this test bitstream and the score is 93.35dB,
almost same as our result.
33Developing a MPEG Audio Layer III Codec on
TMS320C549
- 3. Memory consumption
- 1). ROM Program 4.75k Word, Parameter Table
8.50k Word Total 13.25k Word - 2). RAM Variables 11.00k Word, Total 11.00k
Word - 4. Computational complexity Because the
computing complexity is partly determined by the
bitrate, sampling frequency, stereo mode of Mp3
source bitstream and the computing precision. We
count the computing complexity using a common
testing bitstream. Bitstream funky.mp3 with
bitrate 96kbps and sampling frequency 44100hz.
Its stereo mode is Joint Stereo. Under the 32bit
computing precision, the computing complexity is
about 52.3MIPS.
34Developing a MPEG Audio Layer III Codec on
TMS320C549
35Developing a MPEG Audio Layer III Codec on
TMS320C549
36Developing a MPEG Audio Layer III Codec on
TMS320C549
- Encoder Result
- 1. Memory consumption
- 1). ROM Program 4.10k Word, Parameter Table
16.20k Word Total 20.30k Word - 2). RAM Variables 10.60k Word, Total 10.60k
Word - 2. Computational complexity Because the
computing complexity is partly determined by the
bitrate, sampling frequency, stereo mode of Mp3
source bitstream and the computing precision. We
count the computing complexity using a bitstream
with a bitrate of 128kbps and sampling frequency
of 44100hz. Coding channel mode is dual channel.
Under the 16 bit computing precision, the
computing complexity is about 61.7MIPS.
37Developing a MPEG Audio Layer III Codec on
TMS320C549
- Encoder Result
- 3. Encoder Quality Due to lack of software
testing coding quality, we tried to code several
wave files using our encoder system. Through our
informal listening test, we can not discriminate
the difference between original wave and coded
MPEG audio bitstream.
38- Thats All.
- Thank Everyone!