Automatic Speaker Recognition for Series 60 Mobile Devices

About This Presentation

Title:

Automatic Speaker Recognition for Series 60 Mobile Devices

Description:

Truncation and re-scaling to avoid overflows in the converted algorithm ... Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 23

Provided by: csJoe

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Speaker Recognition for Series 60 Mobile Devices

1
Automatic Speaker Recognition for Series 60
Mobile Devices
Specom2004, Sep 20, 2004
Juhani Saastamoinen, Evgeny Karpov, Ville
Hautamäki, and Pasi Fränti

University of Joensuu,
Department of Computer Science

2
Background

Project in National FENIX programme
New Methods and Applications in Speech Technology
7 research institutes
Project partners NRC, Lingsoft, National Bureau
of Investigation, etc.
Joensuu Speaker Recognition
http//cs.joensuu.fi/pages/pums

3
PUMS project
Juhani Saastamoinen Project manager
Pasi Fränti Professor
Evgeny Karpov Project researcher
Tomi Kinnunen Researcher
Ismo Kärkkäinen Clustering algorithms
Ville Hautamäki Project researcher
4
Application Scenarios
5
Project Goal
Port speaker recognition to Series 60 mobile phone
6
Symbian Phones

Series 60 phone features
16 MB ROM
8 MB RAM
176 x 208 display
ARM-processor
No floating-point unit!!!

UIQ
Series 60
Series 80
7
Symbian OS

Defined by Symbian consortium
Based on EPOC
Operating system for mobile phones
Real-time system
Long uptime required
Multitasking, multithreading

8
Problems of Porting

Usual considerations when porting to phone
GUI event driven program(ming)
Platform specific programming model
Real-time system, exceptions
Application specific porting problems
Number crunching without floating point unit!!!
Signal processing numerically challenging

9
Identification System
Add speaker profiles during training
Read and use all profiles during recognition
Speaker Profile Database
Decision
10
MFCC Signal Processing

pre-emph. coeff. 0.97, Hamm window, 30 triangular
mel-filters, base-2 logarithm, output 12 MFCC's

11
Fixed-Point Implementation

Numerical analysis needed for fixed-point
arithmetic implementation
Truncation and re-scaling to avoid overflows in
the converted algorithm
Minimize information loss caused by computation
in fixed-point arithmetic
Minimize relative error

12
FFT, Fixed-Point

Frequency spectrum of speech
Biggest source of numerical error
Butterflies have multiplications
Layers repeat truncation errors
Fixed number of bits per element
32, native integer size in many systems
Reference implementation FFTGEN
http//www.jjj.de/fft/fftgen.tgz

13
FFTGEN (16/16)

Multiplication 32 x 32 -bit result must fit in
32 bits truncate input
FFTGEN Truncate inputs to 16/16 bits

FFT layer input
FFT Twiddle Factor
X
16-bit integer
16-bit integer
X
32-bit multiplication result
16 used bits
16 crop-off bits
FFT layer output (part of it) Crop-off for next
layer 16 bits!
16-bit integer
14
Info Preserving FFT (22/10)

Approximate DFT operator F with G
Increase F-G, preserve more signal
information
minimize maximum relative error in scaled sine
values with respect to scale 980 good for FFT
sizes up to 1024
Truncate multiplication inputs to 22/10 bits
(signal/op)

FFT layer input
FFT Twiddle Factor
X
32-bit integer
22 used bits
10 crop-off bits
32-bit integer, 22 bits used
16-bit integer, 10 bits used
X
FFT layer output (part of it) Crop-off for next
layer 10 bits
32-bit multiplication result
15
FFT Spectrum, Fixed-Point

x-axis fixed-point FFT element abs. values
y-axis correct FFT element abs. values

16/16 abs values
22/10 abs values
original TIMIT signal
TIMIT signal x 4
16
Scale of Error in Proposed FFT
17
Magnitude Spectrum, Fixed-Point

Compute complex absolute values using maximum
coordinate and coordinate ratio
Suppose x gt y for z x i y, then
Interpret the (squared) y/x by t
Approx. square root by a polynomial P(t)
Constant time algorithm (vs. Newton)

18
Logarithm, Fixed-Point

Use base 2 instead of base 10
corresponds to output multiplication
Standard technique
Return problem to interval 1,2)
Use linear interpolation from values stored in a
look-up table
8 bits used for indexing the look-up table values

19
Rest of System, Fixed-Point

No improvement needed in VQ/GLA
Should apply similar technique as with FFT to
other signal processing
Pre-emphasis, utilize full 32 bits
Time windowing, use less bits in windowing
function
FB, use less bits in frequency responses
DCT, use less bits for the cosines

20
Effect of Signal Processing

TIMIT data sets, varying number of speakers (N)
For each N repeat (6x, 5x, 2x) train/recognize
cycles (eliminate GLA initial solution
randomness)
FFTGEN FFT with 16/16 multiplication
Fixed-point use proposed 22/10 FFT
Mixed floating-point DSP, fixed-point GLA/VQ

21
Effect of Signal Quality

GSM/PC data 16 aligned dual recordings
All computations in floating-point arith.
Signal recorded with laptop and PC mic gives
average recognition rate 100
Signal recorded with Nokia 3660 results in
average recognition rate 84,9

22
Conclusion

Speaker identification was ported to Symbian
Series 60 mobile phone
22/10 bit usage in multiplication proposed
instead of standard 16/16
Experiments indicate that recognition accuracy
improves from 68 to 95

Write a Comment

User Comments (0)

About PowerShow.com

Automatic Speaker Recognition for Series 60 Mobile Devices - PowerPoint PPT Presentation

Automatic Speaker Recognition for Series 60 Mobile Devices

Truncation and re-scaling to avoid overflows in the converted algorithm ... Multiplication: 32 x 32 -bit result must fit in 32 bits: truncate input ... – PowerPoint PPT presentation