Text%20independent%20speaker%20identification%20in%20multilingual%20environments - PowerPoint PPT Presentation

About This Presentation
Title:

Text%20independent%20speaker%20identification%20in%20multilingual%20environments

Description:

Only in voiced frames (intonation) High session variability. MVN for inter-session ... Short-term intonation and energy values increase language robustness ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 14
Provided by: ikerl
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Text%20independent%20speaker%20identification%20in%20multilingual%20environments


1
Text independent speaker identification in
multilingual environments
  • I. Luengo, E. Navas, I. Sainz, I. Saratxaga,
  • J. Sanchez, I. Odriozola and I. Hernaez

2
Contents
  • Introduction
  • SR in language mismatched conditions
  • Existent solutions
  • Proposed solution
  • Working database
  • Variability measures
  • Experimental results
  • Conclusions

3
Speaker Recognition System
TRAIN
Feature Extr.
Train
TEST
Accuracy decreases
Language mismatch?
Feature Extr.
Score
Decision
4
Existent solutions
  • Multi-language training
  • One model trained with various languages (per
    speaker)
  • Model learns characteristics of different
    languages
  • Multi-model training
  • One model for each language (per speaker)
  • Language detector

5
Existent solutionsDrawbacks
  • Possible languages must be known in advance for
    each speaker
  • Not generalizable for languages not seen during
    training
  • More recording sessions needed for training
  • Time ? Money
  • Desired solution Language independent
  • Suitable for languages not seen during training
  • Capable of single-language training

6
Proposed solution
  • Language-independent features
  • Normalization?
  • New features?
  • Short-term intonation and energy values
  • High speaker discrimination capability
  • Global distribution may change little with
    language
  • Combinable with MFCC
  • Only in voiced frames (intonation)
  • High session variability
  • MVN for inter-session normalization

7
Database
  • Bilingual Spanish-Basque speech database
  • 22 speakers (11 Male, 11 Female)
  • 4 sessions (inter-session variability)
  • 7 numeric sequences (8 digits) per session and
    language

8
Variability measures
  • Adding new features ALWAYS increases
    separability/variability
  • Speaker separability ? discrimination
  • Language variability ? model/test mismatch
  • Session variability ? model/test mismatch
  • Key issue Does speaker separability increase
    more than language/session variability?

9
Variability measures
  • Kullback-Leibler divergence for variability
    estimation
  • Interesting measures
  • Good if new features increase these ratios

10
Variability measures
MFCC MFCCP Gain
Lang - 4.09 4.61 12
Spk S 6.34 8.25 30
Spk B 6.82 8.77 29
Ses S 3.62 4.81 33
Ses B 3.52 4.64 32
Spk/Lang S 1.55 1.79 15
Spk/Lang B 1.67 1.90 14
Spk/Ses S 1.75 1.72 -2
Spk/Ses B 1.94 1.89 -3
11
Experimental results
S-S B-B S-B B-S SB-S SB-B
MFCC (ref) 98.3 97.3 63.6 67.3 96.8 95.6
MFCC (V) 97.6 96.8 62.6 67.0 96.6 95.6
MFCCP (V) 97.1 96.3 71.0 73.0 96.1 94.4
Gain (V) -0.5 -0.5 13.4 9.0 -0.5 -1.3
  • X-Y ? Training in X, testing in Y

12
Conclusions
  • Short-term intonation and energy values increase
    language robustness
  • Little accuracy drop on language-matched
    conditions
  • Very useful if test language is unpredictable
  • Variability measures predict results reasonably
  • Allows easy selection of features prior to
    experiments

13
Text independent speaker identification in
multilingual environments
  • I. Luengo, E. Navas, I. Sainz, I. Saratxaga,
  • J. Sanchez, I. Odriozola and I. Hernaez
Write a Comment
User Comments (0)
About PowerShow.com