Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS

Description:

Development of a Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS) ... HTK-compatible acoustic models. ECHOS. Educational platform ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 21
Provided by: vnd
Category:

less

Transcript and Presenter's Notes

Title: Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS


1
Development of a Korean Large Vocabulary
Continuous Speech Recognition Platform (ECHOS)
Oriental COCOSDA 2007, December 4-6, Hanoi,
Vietnam
  • December 5, 2007
  • Oh-Wook Kwon 1, Hoirin Kim 2, Sukbong Kwon 2,
    Sungrack Yun 3, Gyucheol Jang 3, Yong-Rae Kim 1,
    Bong-Wan Kim 4, Changdong Yoo 3, Yong-Ju Lee 4
  • 1 Chungbuk National University, 2 ICU, 3 KAIST,
  • 4 SITEC, Wonkwang University, Korea

2
Outline
  • 1. Introduction
  • 2. ECHOS
  • 3. Search Algorithm
  • 4. Performance Evaluation
  • 5. Conclusion

3
1. Introduction
  • Motivations
  • Hard to know the details of the conventional
    speech recognition platforms (HTK, Sphinx,
    Julius, ISIP)
  • The source codes lack in readability and
    reusability
  • Hard to modify the source code and implement a
    new idea

I have an idea but
HTK
ISIP
Julius
Sphinx
How should I modify the source code for my idea?
4
Contributions
  • Developed a speech recognition platform (ECHOS)
    for education and research purposes
  • Easy and compact
  • Object-oriented structure
  • Programmers manual
  • Application Programming Interface (API)
  • Implemented FSN-based and statistical language
    model (LM)-based search algorithms
  • Lexical tree search
  • Two-pass search
  • Compared its performance with HTK and Julius

5
2. ECHOS
Easy Easy to understand UML-based
documentation High-level API Compact Standard
template library (STL) Hangeul Korean processing
modules Automatic text-to-pronunciation
conversion with morphological analysis Object-orie
nted Modular structure Sample codes for improved
reusability Speech recognizer Noise reduction
modules Decoder only HTK-compatible acoustic
models
ECHOS
Educational platform Research platform Baseline
platform
6
Block Diagram
Speech
Speech detection
Noise reduction
Word sequence
Feature extraction
Search
Post- processing
Search Tree/ Search network
Acoustic model
Language model
Dictionary
Speaker adaptation
Adaptation text
7
Specifications
  • Input
  • 8/16 kHz, 16 bit PCM
  • Isolated word, continuous speech
  • Speech detection with continuous listening
    capability
  • Output
  • Recognition results 1-best, N-best, Lattice
  • Additional information word likelihood, state
    segmentation information
  • Tasks
  • Isolated word recognition
  • Continuous speech recognition
  • Finite state network (FSN)
  • Large vocabulary continuous speech recognition
    (LVCSR)
  • Lexical tree
  • 30,000 words

8
Supported Algorithms (1)
9
Supported Algorithms (2)
10
Application Programming Interface
  • Two-level APIs for beginners and experts

11
Documentation
  • Manuals for beginner and expert
  • Users manual
  • Programmers manual
  • Documentation based on Unified Modeling Language
    (UML)
  • Requirement Use case diagram
  • Design Package diagram, Class diagram
  • Implementation Sequence diagram, State-chart
    diagram

12
Package Diagram Sequence Diagram
Platform
Search module
13
Class Diagram
14
Programmers Manual
  • Describes the details of the source code
  • Algorithms
  • Implementation Class, Member variables

15
3. Search Algorithm
  • Lexical tree search
  • Combining lexical tree with flat lexicon for
    single-phone words (Fig. a)
  • Incorporating the duration model to handle short
    words (Fig. b)

Word transitions with short
duration are checked with
duration models.
Lexical tree
Leaf nodes
of
single
-
phone or
Null
-
node
Null
-
node
short
-
phone size
words
Flat lexicon
(single
-
phone words)
(a)
(b)
16
Search Algorithm
  • Two-pass search
  • Forward Bigram, word graph optimization
    (unfolding, boundary optimization, pruning,
    merging)
  • Backward Stack decoding with trigram

Knowledge source 2
  • Unfolding into tree structure
  • Boundary optimization for
  • removal of same word sequence
  • Pruning
  • Merging

Knowledge source 1
Back pointer table
Word graph
Viterbi beam search (bigram)
Word graph generation
Stack decoding (trigram)
Word graph optimization
1-best back tracking
1-best or N-best results
1-best or N-best results
17
4. Performance Evaluation
  • Database
  • 8000-word CSR
  • SiTEC Dict01 Database
  • Test data 1050 sentences (10 speakers)
  • Feature
  • MFCC, ?-MFCC, Delta energy
  • Acoustic and language models
  • Triphone
  • Bigram
  • Search
  • Lexical tree search

18
Evaluation Results (1)
  • Testing search algorithms
  • Flat lexicon Similar to HTK
  • Lexical tree Reduced 50 of recognition time
    with 40 relative error rate increase

19
Evaluation Results (2)
  • Performance of two-pass search
  • Comparison with Julius

20
5. Conclusion
  • Korean speech recognition platform for
    educational and research purpose
  • Signal processing Noise reduction
  • Feature extraction MFCC, PLP, ETSI
  • Acoustic model HMM
  • Language model FSN, bigram, trigram
  • Search FSN search, lexical tree search
  • Post-processing Lattice-based Confidence
  • Recent Activities
  • Distributed to 20 Korean universities
  • SiTEC Technical Seminars
  • Thanks a lot
Write a Comment
User Comments (0)
About PowerShow.com