Application of Gender Identification to Automatic Speech Recognition - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Application of Gender Identification to Automatic Speech Recognition

Description:

Enormous amount of audio data collected from intercepted telephone conversations ... National Monument the site is known as the location of Custer's last stand ... – PowerPoint PPT presentation

Number of Views:1413
Avg rating:3.0/5.0
Slides: 35
Provided by: Vic455
Category:

less

Transcript and Presenter's Notes

Title: Application of Gender Identification to Automatic Speech Recognition


1
Application of Gender Identification to Automatic
Speech Recognition
  • Jen Burge
  • Vonetta Lewis
  • Intelligent Information Engineering Laboratory
  • Department of Computer Science and Engineering
  • Oakland University

2
Voice Scoring System
  • Enormous amount of audio data collected from
    intercepted telephone conversations
  • Use speech recognition to transcribe text
  • Use transcription to decide what data should be
    examined by human

3
Diagram from Sethi. Scoring Voice-stream for
Homeland Security, Oakland University. 2003.
4
Speech Recognition
  • Recognize speech data (convert to text or
    perform commands, etc)
  • Applications
  • Use of speech as a form of input (dictation,
    control, etc.)
  • Content based retrieval for audio files

5
  • Feature extraction (LPC or MFCC)
  • Model words or individual phonemes
  • Attempt to match new sounds to models to
    recognize speech
  • Can use knowledge of language to improve accuracy

Diagram from http//www.mor.itesm.mx/omayora/T
utorial/tutorial.html
6
Training Recognizer
  • Speaker-dependent
  • Specific user trains system
  • Speaker-independent
  • Attempt to recognize speech from any speaker
  • Need to train for all possible speakers
  • Less accurate

7
Variability Across Speakers
  • Main obstacle in speaker independent speech
    recognition systems
  • Gender, accent
  • Approaches to Problem
  • Build models for variability and choose
    appropriate one
  • Adapt to current speaker during recognition

8
Project Goals
  • Improve accuracy of speaker-independent
    recognition by having gender dependent models
  • Automatically determine gender of speaker
  • Use the gender recognition to select appropriate
    model for transcription
  • Investigate Language Identification

9
Gender-related Acoustic Differences
  • Differences in length of vocal tract and vocal
    folds causes different
  • Fundamental frequency
  • Formant frequencies
  • Other differences

Diagram from Kent Read. Acoustic Analysis of
Speech. 2002.
10
Past Work on Gender Problem
  • Recognition by pitch
  • Difficulty accurately measuring fundamental
    frequency
  • Male/Female threshold usually 160 Hz
  • Model-based techniques
  • Hidden Markov Model (HMM)
  • Gaussian Mixture Model (GMM)
  • Other
  • Formant positions

11
Data Collection
  • Used CNN and NPR radio broadcasts
  • 16 kHz, 16 bit mono for gender recognition
    experiments
  • 22 kHz, 16 bit mono for ViaVoice tests

12
Pitch Threshold Experiments
  • Measured fundamental frequency using
    autocorrelation algorithm
  • Averaged frequency over entire clip
  • Compared to 160 Hz threshold
  • Achieved approx. 89 accuracy

13
Gaussian Mixture Model
  • Model data as a weighted sum of Gaussian
    distributions
  • Use Expectation-Maximization algorithm to fit GMM
    to training data
  • New data classified by finding the likelihood it
    was produced by each model

14
GMM Experiments
  • Features MFCCs (10,12,14,16)
  • GMM components (32,64,128)
  • Trained with 10 speakers per gender, 20-22 second
    samples
  • Tested with 51 utterances 23 male, 28 female
    of varying length

15
(No Transcript)
16
Observations
  • Slight improvement in accuracy with GMM (14 or 16
    MFCCs)
  • Misclassifications usually people with accents
    not represented in training speakers

17
ViaVoice Testing
  • Attempt to improve accuracy using gender
    dependent models
  • Trained one male model and one female model
    (trained by specific people)

18
Experiments
  • Transcribed 24 radio clips with both male and
    female models as well as built-in ViaVoice
    speaker-independent model
  • Measure of accuracy
  • words correctly recognized
  • total of words spoken

100
19
(No Transcript)
20
(No Transcript)
21
Observations
  • Accuracy improvement by using same-gender model
    as opposed to opposite gender model 353
  • Average accuracy of generic model 68.5

22
Language Identification
  • Goal Automatically determine language spoken in
    order to select appropriate language model for
    transcription
  • Previous methods
  • HMM
  • GMM

23
Experiments
  • Hypothesized that English ViaVoice would
    recognize fewer words for clips of foreign
    languages
  • Compared words recognized per second to voiced
    regions per second

24
Data Collection
  • Used World Radio Network, BBC, and NPR radio
    broadcasts
  • Recorded at 22 kHz, 16 bit mono
  • 13-30 second samples
  • 13 foreign languages, 17 different accents in
    English

25
Sample Transcriptions
  • English sample
  • Persian sample

26
(No Transcript)
27
English/Non-English Classifier
  • Use line (Voiced/Sec Words/Sec) 0 as
    classifier
  • Negative English, Positive Non-English
  • Tested on 82 clips (65 Non-English, 22 English)
  • Accuracy 80.5

28
Observations
  • A previous paper achieved 76.8 in an
    English/Japanese decision
  • To distinguish between multiple languages besides
    English/non-English, would need to perform
    similar procedure for all expected languages

29
Conclusions
  • Gender classification with GMM slightly more
    accurate than with pitch
  • ViaVoice model trained by person of same gender
    as speaker more accurate than one trained by
    person of opposite sex
  • Achieved 80 accuracy in English/non-English
    decision based on the number of words produced by
    English ViaVoice

30
Future Work
  • Possibility of accuracy improvement with
    speaker-independent male and female models
  • Need for accent consideration
  • Use of grammatical and semantic information to
    improve English/non-English decision

31
Acknowledgements
  • IIE Lab Mingkun, Victor, Aiyesha, Dingguo,
    Shuo, Rishi
  • Dr. Sethi
  • Test Subjects
  • Sniffy Fluffy

32
Books
  • Kent, R. and Read, C. Acoustic Analysis of
    Speech Second Edition. Thompson Learning,
    Albany, NY, 2002.
  • Deller, J., Proakis, J., and Hansen, J.
    Discrete-Time Processing of Speech Signals.
    Macmillan, New York, NY, 1993.
  • Rabiner,L. and Schafer, R. Digital Processing of
    Speech Signals. Prentice Hall, New Jersey, 1978.
  • Proakis, J. and Manolakis, D. Digital Signal
    Processing Principles, Algorithms, and
    Applications. Prentice Hall, New Jersey, 1996.

33
Papers
  • 1. Parris, E. Carey, M, Language Independent
    Gender Identification, Proc. ICASSP, 1996.
  • 2. Sethi, I. Scoring Voice-stream for Homeland
    Security, Oakland University, 2003.
  • 3. Abdulla, W. Kasabov, N. Improving speech
    recognition performance through gender
    separation Proceedings of the Fifth Biannual
    Conference on Artificial Neural Networks and
    Expert Systems (ANNES2001), pp.218-222, 2001.
  • 4. Chen, T., Huang, C., Chang, E., Wang, J.
    Automatic Accent Identification using Gaussian
    Mixture Models Proceedings IEEE Automatic Speech
    Recognition and Understanding Workshop
    (ASRU2001), Italy, 2001
  • 5. Vergin, R., Farhat, A., OShaughnessy,D.
    Robust Gender-Dependent Acoustic-Phonetic
    Modeling in Continuous Speech Recognition Based
    On a New Automatic Male/Female Classification,
    Proc. ICASSP, 1996.

34
Papers
  • 6. Gu, Q. Shibata, T. Speaker and Text
    Independent Language Identification Using
    Predictive Error Histogram, Proc. ICASSP, 2003.
Write a Comment
User Comments (0)
About PowerShow.com