Mel-spectrum computation new_fe_sp.c - PowerPoint PPT Presentation

About This Presentation
Title:

Mel-spectrum computation new_fe_sp.c

Description:

Seminar Speech Recognition Mel-spectrum computation new_fe_sp.c Presentation by Yu Zhang scuyuzh_at_hotmail.com Oct 1st,2003 S[k] is the power spectrum N is the length ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 10
Provided by: ok79
Category:

less

Transcript and Presenter's Notes

Title: Mel-spectrum computation new_fe_sp.c


1
Seminar Speech Recognition
Mel-spectrum computation new_fe_sp.c
Presentation by Yu Zhang scuyuzh_at_hotmail.com Oct
1st,2003
2
Mel-frequency Wrapping
We know that human ears, for frequencies lower
than 1 kHz, hears tones with a linear scale
instead of logarithmic scale for the frequencies
higher that 1 kHz. The mel-frequency scale is a
linear frequency spacing below 1000 Hz and a
logarithmic spacing above 1000 Hz. The voice
signals have most of their energy in the low
frequencies. It is also very natural to use a
mel-spaced filter bank showing the above
characteristics.
3
Mel-frequency Wrapping
Use the following approximate formula to compute
the mels for a given frequency f in Hz
line 165 of new_fe_sp.cfloat32 fe_mel(float32
x) return( 2595.0 ( float32 ) log10 (1.0
x / 700.0 ) ) float32 fe_melinv(float32
x) return( 700.0 ( ( float32 ) pow (10.0
, x / 2595.0 ) - 1.0 ) )
4
The mel-frequency scale is a linear frequency
spacing below 1000 Hz and a logarithmic spacing
above 1000 Hz.
For each tone with an actual frequency, f,
measured in Hz, a subjective pitch is measured
on a scale called the mel scale. The pitch of
a 1 kHz tone, 40 dB above the perceptual hearing
threshold, is defined as 1000 mels.
5
Mel-frequency Wrapping
Figure 1
Figure 2
Figure 1 Power Spectrum without Mel-frequency
WrappingFigure 2 Mel-frequency Wrapping of
Power Spectrum
Considering the full image with the mel frequency
wrapping set, there is less imformation than the
one without the mel frequency.But instead if we
looking in details,we see that the image with the
mel frequency wrapping keeps the low
frequencesand removes some informaiton.To
summarize,the Mel Frequency wrapping set allows
us to keep only the part of useful information.
6
Sk is the power spectrumN is the length of
the Discrete Fourier TransformL is total
number of Triangular Mel weighting filters.
Mel spectrum
The Mel spectrum is computed by multiplying the
Power Spectrum by each of the Triangular Mel
Weighting filters and integrating the result.
7
Building the Triangular Mel Weighting filters
line 62 in new_fe_sp.c int32 fe_build_melfilters(
melfb_t MEL_FB) //estimate filter
coefficients MEL_FB-gtfilter_coeffs (float32
)fe_create_2d(MEL_FB-gtnum_filters,

MEL_FB-gtfft_size,
sizeof(float32)) MEL_FB-gtleft_apex (float32
) calloc(MEL_FB-gtnum_filters,sizeof(float32))
MEL_FB-gtwidth (int32 )
calloc(MEL_FB-gtnum_filters,sizeof(int32))
filt_edge (float32 ) calloc(MEL_FB-gtnum_filters
2,sizeof(float32)) melmax
fe_mel(MEL_FB-gtupper_filt_freq) melmin
fe_mel(MEL_FB-gtlower_filt_freq) for
(i0iltMEL_FB-gtnum_filters1 i) filt_edgei
fe_melinv(idmelbw melmin)
for (whichfilt0whichfiltltMEL_FB-gtnum_filters
whichfilt) //Building the triangular
mel weighting filters
8
Building the Mel spectrum
line 156 in new_fe_sp.c void fe_mel_spec(fe_t
FE, float64 spec, float64 mfspec) int32
whichfilt, start, i float32 dfreq dfreq
FE-gtSAMPLING_RATE/(float32)FE-gtFFT_SIZE
for (whichfilt 0 whichfiltltFE-gtMEL_FB-gtnum_filt
ers whichfilt) start
(int32)(FE-gtMEL_FB-gtleft_apexwhichfilt/dfreq)
1 mfspecwhichfilt 0 for (i0
ilt FE-gtMEL_FB-gtwidthwhichfilt i)
mfspecwhichfilt FE-gtMEL_FB-gtfilter_coeffswhic
hfiltispecstarti /FE is the
triangular mel weighting filterspec is the
power spectrummfspec is the mel
spectrumvariables marked in red are coefficients
of mel weighting filter/
l0,1,L-1
9
REFERENCES (1)SPHINX III Signal Processing Front
End Specification 31 August 1999, Michael
Seltzer (mseltzer_at_cs.cmu.edu) CMU Speech
Group(2) Digital Signal Processing Mini-Project
An Automatic Speaker Recognition System
Minh N. Do, Audio Visual Communications
Laboratory Swiss Federal Institute of
Technology, Lausanne, Switzerland (3) Project of
Digital Signal Processing - AN AUTOMATIC SPEAKER
RECOGNITION SYSTEM Swati Rastogi
(DSC) swati.rastogi_at_epfl.ch David Mayor
(DSC) david.mayor_at_epfl.ch
Write a Comment
User Comments (0)
About PowerShow.com