Speaker Recognition

About This Presentation

Title:

Speaker Recognition

Description:

Title: Speaker Recognition Author: Sharat Last modified by: Sharat Created Date: 12/2/2003 3:16:50 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:212

Avg rating:3.0/5.0

Slides: 21

Provided by: Shar3203

Learn more at: https://cedar.buffalo.edu

Category:

more less

Transcript and Presenter's Notes

Title: Speaker Recognition

1
Speaker Recognition

Sharat.S.Chikkerur
Center for Unified Biometrics and Sensors
http//www.cubs.buffalo.edu

2
Speech Fundamentals

Characterizing speech
Content (Speech recognition)
Signal representation (Vocoding)
Waveform
Parametric( Excitation, Vocal Tract)
Signal analysis (Gender determination, Speaker
recognition)
Terminologies
Phonemes
Basic discrete units of speech.
English has around 42 phonemes.
Language specific
Types of speech
Voiced speech
Unvoiced speech(Fricatives)
Plosives
Formants

3
Speech production
17 cm
Speech production mechanism
Speech production model
4
Nature of speech
Spectrogram
5
Vocal Tract modeling
Smoothened Signal Spectrum
Signal Spectrum

The smoothened spectrum indciates the locations
of the formants of each user
The smoothened spectrum is obtained by cepstral
coefficients

6
Parametric Representations Formants

Formant Frequencies
Characterizes the frequency response of the vocal
tract
Used in characterization of vowels
Can be used to determine the gender

7
Parametric RepresentationsLPC

Linear predictive coefficients
Used in vocoding
Spectral estimation

20
2
40
5
200
8
Parametric RepresentationsCepstrum
10
5
40
9
Speaker Recognition

Definition
It is the method of recognizing a person based on
his voice
It is one of the forms of biometric
identification
Depends of speaker dependent characteristics.

10
Generic Speaker Recognition System
Speech signal
Score
Analysis Frames
Feature Vector
Preprocessing
Feature Extraction
Pattern Matching
Verification
Preprocessing
Feature Extraction
Speaker Model
Enrollment

Stochastic Models
GMM
HMM
Template Models
DTW
Distance Measures

LAR
Cepstrum
LPCC
MFCC

A/D Conversion
End point detection
Pre-emphasis filter
Segmentation

Choice of features
Differentiating factors b/w speakers include
vocal tract shape and behavioral traits
Features should have high inter-speaker and low
intra speaker variation

11
Our Approach

Preprocessing
Feature Extraction
Speaker model
Matching

12
Silence Removal

Preprocessing
Feature Extraction
Speaker model
Matching

13
Pre-emphasis

Preprocessing
Feature Extraction
Speaker model
Matching

14
Segmentation

Preprocessing
Feature Extraction
Speaker model
Matching

Short time analysis
The speech signal is segmented into overlapping
Analysis Frames
The speech signal is assumed to be stationary
within this frame

Q31
Q32
Q33
Q34
15
Feature Representation

Preprocessing
Feature Extraction
Speaker model
Matching

Speech signal and spectrum of two users uttering
ONE
16
Speaker Model
17
Dynamic Time Warping

Preprocessing
Feature Extraction
Speaker model
Matching

The DTW warping path in the n-by-m matrix is the
path which has minimum average cumulative cost.
The unmarked area is the constrain that path is
allowed to go.

18
Results