Author :Panikos Heracleous, - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Author :Panikos Heracleous,

Description:

We compared the two languages based on the International Phonetic Alphabet (IPA) ... has been developed by the International Phonetic Association, and is a set of ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 22
Provided by: Pili2
Category:

less

Transcript and Presenter's Notes

Title: Author :Panikos Heracleous,


1
AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A
COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING
  • Author Panikos Heracleous,
  • Tohru Shimizu

Reporter Chen, Tzan Hwei
2
Reference
  • Panikos Heracleous, Tohru Shimizu , AN EFFICIENT
    KEYWORD SPOTTING TECHNIQUE USING A
    COMPLEMENTARYLANGUAGE FOR FILLER MODELS
    TRAINING, Eurospeech 2003
  • Rose, R.C. Paul, D.B , A hidden Markov model
    based keyword recognition system, ICASSP 1990

3
Outline
  • Introduction
  • Proposed system
  • Definitions of evaluation measures
  • Experiments
  • Conclusions

4
Introduction
  • The task of keyword spotting is to detect a set
    of keywords (single or multiple keywords)
  • In a keyword spotter not only the keywords, but
    also the non-keywords or noise components must be
    explicitly modeled.
  • a set of HMM (garbage or filler models) is chosen
    to represent the non-keyword intervals

5
Introduction (cont)
  • the choice of an appropriate garbage model set is
    a critical issue.
  • The most common approaches are as follows
  • The training corpus for a specific task is split
    into keyword and non-keyword (extraneous) data.
  • The man disadvantage of such methods is
    task-dependency
  • The garbage models are selected from a set of
    common acoustic models
  • The main disadvantage of such methods is the high
    rate of false rejections

6
Proposed system
  • we propose a novel method for modeling the
    non-keyword intervals based on the use of
    bilingual hidden Markov models.
  • To develop a task-independent keyword spotter.
  • To overcome the problem of the overlapping of
    contexts.

7
Proposed system (cont)
  • Main requirements in our approach is acoustic
    similarity between the target and garbage
    languages.
  • We compared the two languages based on the
    International Phonetic Alphabet (IPA)
  • The IPA has been developed by the International
    Phonetic Association, and is a set of symbols
    which represents the sounds of language in
    written form.

8
Proposed system (cont)
  • Based on IPA, American English acoustically
    covers the Japanese language efficiently.
  • The English HMM garbage models - trained from a
    large speech corpus of guaranteed non-keyword
    speech - are expected to represent the
    non-keyword intervals without rejecting the true
    keyword hits.

9
Proposed system (cont)
Fig. 1. Block diagram of the system
10
Proposed system (cont)
  • The background network is composed of garbage
    models connected to form syllables as in Japanese
    language.
  • Using background network, we can account for the
    variabilities in time of the keyword scores.
  • The decision for separating true keyword hits
    from false alarms is more reliable.

11
Proposed system (cont)
Fig. 2. Histogram of log likelihood ratio scores
12
Definitions of evaluation measures
  • Recognition Rate (RCR) - The percentage of
    keywords detected.
  • Rejection Rate (RJR) - The percentage of
    non-keywords rejected.
  • Equal Rate (ER) - It shows equal RCR and RJR.

13
Experiments
  • The Japanese keywords are represented by
    gender-dependent, context-dependent HMM.
  • The feature vectors are of size 38 (12 MFCC 12
    delta-MFCC 12 delta-delta-MFCC delta-Energy
    delta-delta-Energy).
  • A set of 28 context-independent, 3-state single
    Gaussian HMM trained using the same speech corpus
    is chosen as the Japanese garbage models for
    comparison purposes.

14
Experiments (cont)
  • The English garbage models are represented by
    context-independent, 3-state single Gaussian HMM.
  • Twenty-eight models trained using the MACROPHONE
    American English telephone speech corpus are
    used.
  • Allowing at most one keyword per utterance.
  • The vocabulary consists of 100 keywords.

15
Experiments (cont)
  • The English garbage models are represented by
    context-independent, 3-state single Gaussian HMM.
  • Twenty-eight models trained using the MACROPHONE
    American English telephone speech corpus are
    used.
  • Allowing at most one keyword per utterance.
  • The vocabulary consists of 100 keywords.

16
Experiments (cont)
  • the test set consists of Japanese telephone
    contains 1,548 short utterances (of which only
    1,133 contain a keyword).

Fig. 3. Recognition rates (left fig.) and
rejection rates (right fig.) using clean test
data In first pass
17
Experiments (cont)
Fig. 4. Performance using English garbage models
(clean test data) (2nd pass)
Fig. 5. Performance using Japanese garbage models
(clean test data) (2nd pass)
18
Experiments (cont)
Fig. 6. Recognition rates using noisy test data
(first pass)
Fig. 7. Rejections rates using noisy test data
(first pass)
19
Conclusions
  • The main advantage of this method is the
    task-independency, and also parameter tuning
    (e.g. word insertion penalty) does not have a
    serious effect on the performance.
  • In a future study, we plan to evaluate our method
    using larger vocabularies.

20
Utterance verification evaluation
21
  • the True Rejection Rate (TRR)
  • the False Rejection Rate (FRR)
Write a Comment
User Comments (0)
About PowerShow.com