Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS

About This Presentation

Title:

Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS

Description:

Development of a Korean Large Vocabulary Continuous Speech Recognition Platform (ECHOS) ... HTK-compatible acoustic models. ECHOS. Educational platform ... – PowerPoint PPT presentation

Number of Views:129

Avg rating:3.0/5.0

Slides: 21

Provided by: vnd

Category:

more less

Transcript and Presenter's Notes

Title: Development of a Korean Large Vocabulary Continuous Speech Recognition Platform ECHOS

1
Development of a Korean Large Vocabulary
Continuous Speech Recognition Platform (ECHOS)
Oriental COCOSDA 2007, December 4-6, Hanoi,
Vietnam

December 5, 2007
Oh-Wook Kwon 1, Hoirin Kim 2, Sukbong Kwon 2,
Sungrack Yun 3, Gyucheol Jang 3, Yong-Rae Kim 1,
Bong-Wan Kim 4, Changdong Yoo 3, Yong-Ju Lee 4
1 Chungbuk National University, 2 ICU, 3 KAIST,
4 SITEC, Wonkwang University, Korea

2
Outline

1. Introduction
2. ECHOS
3. Search Algorithm
4. Performance Evaluation
5. Conclusion

3
1. Introduction

Motivations
Hard to know the details of the conventional
speech recognition platforms (HTK, Sphinx,
Julius, ISIP)
The source codes lack in readability and
reusability
Hard to modify the source code and implement a
new idea

I have an idea but
HTK
ISIP
Julius
Sphinx
How should I modify the source code for my idea?
4
Contributions

Developed a speech recognition platform (ECHOS)
for education and research purposes
Easy and compact
Object-oriented structure
Programmers manual
Application Programming Interface (API)
Implemented FSN-based and statistical language
model (LM)-based search algorithms
Lexical tree search
Two-pass search
Compared its performance with HTK and Julius

5
2. ECHOS
Easy Easy to understand UML-based
documentation High-level API Compact Standard
template library (STL) Hangeul Korean processing
modules Automatic text-to-pronunciation
conversion with morphological analysis Object-orie
nted Modular structure Sample codes for improved
reusability Speech recognizer Noise reduction
modules Decoder only HTK-compatible acoustic
models
ECHOS
Educational platform Research platform Baseline
platform
6
Block Diagram
Speech
Speech detection
Noise reduction
Word sequence
Feature extraction
Search
Post- processing
Search Tree/ Search network
Acoustic model
Language model
Dictionary
Speaker adaptation
Adaptation text
7
Specifications

Input
8/16 kHz, 16 bit PCM
Isolated word, continuous speech
Speech detection with continuous listening
capability
Output
Recognition results 1-best, N-best, Lattice
Additional information word likelihood, state
segmentation information
Tasks
Isolated word recognition
Continuous speech recognition
Finite state network (FSN)
Large vocabulary continuous speech recognition
(LVCSR)
Lexical tree
30,000 words

8
Supported Algorithms (1)
9
Supported Algorithms (2)
10
Application Programming Interface

Two-level APIs for beginners and experts

11
Documentation

Manuals for beginner and expert
Users manual
Programmers manual
Documentation based on Unified Modeling Language
(UML)
Requirement Use case diagram
Design Package diagram, Class diagram
Implementation Sequence diagram, State-chart
diagram

12
Package Diagram Sequence Diagram
Platform
Search module
13
Class Diagram
14
Programmers Manual

Describes the details of the source code
Algorithms
Implementation Class, Member variables

15
3. Search Algorithm

Lexical tree search
Combining lexical tree with flat lexicon for
single-phone words (Fig. a)
Incorporating the duration model to handle short
words (Fig. b)

Word transitions with short
duration are checked with
duration models.
Lexical tree
Leaf nodes
of
single
-
phone or
Null
-
node
Null
-
node
short
-
phone size
words
Flat lexicon
(single
-
phone words)
(a)
(b)
16
Search Algorithm

Two-pass search
Forward Bigram, word graph optimization
(unfolding, boundary optimization, pruning,
merging)
Backward Stack decoding with trigram

Knowledge source 2

Unfolding into tree structure
Boundary optimization for
removal of same word sequence
Pruning
Merging

Knowledge source 1
Back pointer table
Word graph
Viterbi beam search (bigram)
Word graph generation
Stack decoding (trigram)
Word graph optimization
1-best back tracking
1-best or N-best results
1-best or N-best results
17
4. Performance Evaluation