Speaker Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Speaker Recognition

Description:

Title: Speaker Recognition Author: Sharat Last modified by: Sharat Created Date: 12/2/2003 3:16:50 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 21
Provided by: Shar3203
Category:

less

Transcript and Presenter's Notes

Title: Speaker Recognition


1
Speaker Recognition
  • Sharat.S.Chikkerur
  • Center for Unified Biometrics and Sensors
  • http//www.cubs.buffalo.edu

2
Speech Fundamentals
  • Characterizing speech
  • Content (Speech recognition)
  • Signal representation (Vocoding)
  • Waveform
  • Parametric( Excitation, Vocal Tract)
  • Signal analysis (Gender determination, Speaker
    recognition)
  • Terminologies
  • Phonemes
  • Basic discrete units of speech.
  • English has around 42 phonemes.
  • Language specific
  • Types of speech
  • Voiced speech
  • Unvoiced speech(Fricatives)
  • Plosives
  • Formants

3
Speech production
17 cm
Speech production mechanism
Speech production model
4
Nature of speech
Spectrogram
5
Vocal Tract modeling
Smoothened Signal Spectrum
Signal Spectrum
  • The smoothened spectrum indciates the locations
    of the formants of each user
  • The smoothened spectrum is obtained by cepstral
    coefficients

6
Parametric Representations Formants
  • Formant Frequencies
  • Characterizes the frequency response of the vocal
    tract
  • Used in characterization of vowels
  • Can be used to determine the gender

7
Parametric RepresentationsLPC
  • Linear predictive coefficients
  • Used in vocoding
  • Spectral estimation


20
2
40
5
200
8
Parametric RepresentationsCepstrum
10
5
40
9
Speaker Recognition
  • Definition
  • It is the method of recognizing a person based on
    his voice
  • It is one of the forms of biometric
    identification
  • Depends of speaker dependent characteristics.

10
Generic Speaker Recognition System
Speech signal
Score
Analysis Frames
Feature Vector
Preprocessing
Feature Extraction
Pattern Matching
Verification
Preprocessing
Feature Extraction
Speaker Model
Enrollment
  • Stochastic Models
  • GMM
  • HMM
  • Template Models
  • DTW
  • Distance Measures
  • LAR
  • Cepstrum
  • LPCC
  • MFCC
  • A/D Conversion
  • End point detection
  • Pre-emphasis filter
  • Segmentation
  • Choice of features
  • Differentiating factors b/w speakers include
    vocal tract shape and behavioral traits
  • Features should have high inter-speaker and low
    intra speaker variation

11
Our Approach
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

12
Silence Removal
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

13
Pre-emphasis
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

14
Segmentation
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching
  • Short time analysis
  • The speech signal is segmented into overlapping
    Analysis Frames
  • The speech signal is assumed to be stationary
    within this frame

Q31
Q32
Q33
Q34
15
Feature Representation
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

Speech signal and spectrum of two users uttering
ONE
16
Speaker Model
17
Dynamic Time Warping
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching
  • The DTW warping path in the n-by-m matrix is the
    path which has minimum average cumulative cost.
    The unmarked area is the constrain that path is
    allowed to go.

18
Results
  • Distances are normalized w.r.t. length of the
    speech signal
  • Intra speaker distance less than inter speaker
    distance
  • Distance matrix is symmetric

19
Matlab Implementation
20
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com