Brief Overview of Different Versions of Sphinx - PowerPoint PPT Presentation

About This Presentation

Title:

Brief Overview of Different Versions of Sphinx

Description:

... 2000, it starts to be open-sourced in Sourceforge under Berkeley's style license ... Outlook of all recognizers. Sphinx II. Sorry, we won't support it too much. ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 30

Provided by: Arthu61

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Brief Overview of Different Versions of Sphinx

1
Brief Overview of Different Versions of Sphinx

Arthur Chan

2
Introduction

Software aspect of the recognizer is very
important
Research always require correct use of the
software.
Sphinx II III IV SphinxTrain
100 k lines of code
Each of them are fairly complex

3
This presentation (30 pages)

Introduction (3 pages)
History of Sphinx (13 pages)
Sphinx I (2 pages)
Sphinx II (2 pages)
Sphinx III (3 pages)
SphinxTrain (3 pages)
Sphinx IV (3 pages)
How do I get the source code? (4 pages)
Versioning
Three rules of not getting lost in different
recognizers
Where can I get official information? (2 pages)
Outlook in each recognizer. (3 pages)
Conclusion

4
Brief history of Sphinx

Largely adapted from
Ritas The Sphinx Speech Recognition Systems
www.cs.cmu.edu/rsingh/
Kevin et als Speech Recognition Past, Present
and Future
www.cs.cmu.edu/msiegler/ASR/futureofcmu-final.htm
l

5
Before Sphinx

Dragon
One of the first use of HMM in speech recognition
One of the first use of purely statistically
model in speech
Express the knowledge using HMM network
Harpy
One of the first use of beam search
Use phoneme to represent words.

6
Sphinx I

Before Sphinx ...
From ATTs literature, the concept of
speaker-independence was proposed in 1979
In 1979-1987, most systems are either,
Speaker dependent
Speaker independent but in a very small domain
(lt100 words)
Sphinx I is therefore outstanding
Accuracy is 90 on Resource Management

7
Sphinx I (1987)

By Kai-Fu Lee and Roberto Bisiani
Key developer included Hsiao-wuen Hon, Fil Alleva
Written in C.
Continuous speech recognizer using discrete HMM
with 3 codebooks of size 256.
Using simple word-pair grammar
Generalize triphones
Real-time on Sun3 or Dec 3000
Where is the source code? Good antique!

8
Sphinx II (1992)

By Xuedong Huang
Hardwired to 5-state Bakis topology
3-gram language models
Decision-tree tying of HMM (by Mei-Yuh Huang)
90 in WSJ task (0 or 1?)

9
Fast Beam Search v. X

FBS-6 flat lexicon decoder
FBS-7 lexicon tree-based.
FBS-8 decoder (written by Ravi Mosur, see thesis
in 96)
Support multiple types of beam pruning.
Lexical tree
Tricks in GMM Computation
Machine optimization loop unrolling
Predictive Codebook computation
Phoneme lookahead
Best path search .

10
Other facts about Sphinx II

We license it at the beginning (seem to back till
days like 95)
In 2000, it starts to be open-sourced in
Sourceforge under Berkeleys style license
You could incorporate Sphinxs source code
You dont need to open your source code. (No
recursive legal binding)
Similar to LGPL
In 2001, a major alpha release by Kevin that
ensures portability in several platforms.

11
Sphinx III flat lexicon decoder
(s3,s3flat,s3slow)

Sphinx III (by Ravi Mosur)
Flat Lexicon
Support both CHMM and SCHMM
Poor-man trigram
Use only the most likely first word, this avoid
D2 expansion of the word lattice.
Arbitrary topology
Very accurate, used in evaluation of BN and
others.
Derivative from the search include
N-best generator
Aligner
Phone recognizer

12
Sphinx III tree lexicon decoder(s3.x,s3fast,
s3inaccurate)

What is s3.x actually?
A spin-off of the Sphinx III flat lexicons
source code
First use was in BN 10x RT evaluation in 1999
From s3.0 -gt s3.2
Use tree-lexicon with unigram lookahead
Lexical tree with approximation to avoid memory
problem
One of the first in the world used Sub-vector
quantization in speed-up GMM computation

13
(cont.)

From s3.2 -gt s3.3 (Rita, Ricky)
Live mode recognizer (livedecode) and simulator
(livepretend)
From s3.3 -gt s3.4 (Evandro, Arthur C, Jahanzeb,)
4-level of speed-up of GMM computation, phoneme
lookahead
Bug fixes in live mode
From s3.4 -gt s3.5 (Evandro, Arthur C, Yitao)
(Tentative) Speaker adaptation documentation

14
Facts about S3

A Java version exists -gt sphin3j
Open source at 2002
Always being maintained by Evandro from 2001 to
now.
s3.5 is the current active branch in S3
development.

15
SphinxTrain

Equally important and very complex
But not well understood.
What is SphinxTrain?
A collection of 40 tools for Sphinx 2, 3 and 4
acoustic model training
A set of perl scripts to do training
Sphinx 2 and 3 all have slight different formats
of models

16
Mini-history

Baum Welch trainer and Viterbi trainer existed
very long time ago.
Training tool in general was not systematic and
was no structured.
From the chaos, Eric Thayer first pull everything
together to create the package SphinxTrain
Rita did numerous bug fixes and modification of
the current trainer
Innovate the use of automatic question
generation. (make_quest)
Built a set of training scripts for RM (the 0/
scripts)
Write the first set of systematic tutorial on
training
Ricky refined the code and wrote the first set of
perl script for Training.
He made a PHD out of it too. (PHD Push Here
Dummy!)
Alan and Kevin
Put the set of code to sourceforge
Alan build a set of training script that can
run-through

17
Sphinx IV

Why Sphinx IV?
Too many limitations in SphinxTrain and Sphinx
III
Only N-gram
Approximation of triphones
Fast GMM computation could be very troublesome to
understood
Bw doesnt skip silence. We heavily rely on
force alignement in training.

18
Sphinx IV (cont.)

(By no mean complete)
Lead Design Bhiksha (MERL)
Lead Team Developer Willer Walker (Sun)
Key developers Evandro, Rita, Phillip Kwok and
Paul Lamere
Many heavy weight speech advisors Evandro, Rita,
Ravi, Bhiksha, Medro Moreno

19
Is Sphinx IV good?

Very accurate, very fast, very versatile and very
nicely-pakcaged Java-based speech recognizer
Some internal benchmark in RM and WSJ 5k is shown
to be faster and more accurate than s3.3 (under
1xRT and 10 better)
Support N-gram, FSM and FSG.
Will provide facilities like confidence-scoring
Still under development (just have first alpha
release)
Trainer is not stable

20
Summary of the recognizers and trainers

Sphinx I -gt obsolete
Sphinx II -gt we are using the fast recognizer now
Sphinx III, the following coexists
S3 flat
S3 fast (s3.4 stable, s3.5 devel)
SphinxTrain (0.92 in the CVS)
Sphinx IV
Recognizer is alpha released
Trainer not yet stable

21
How can I get version X of Sphinx?

Official Web page of Sphinx
http//cmusphinx.sourceforge.net
Give announcement and news of development
Some documentation is there.
For the tarballs
http//sourceforge.net/projects/cmusphinx
Releases
sphinx2-0.4.tgz (s2)
sphinx3-0.1.tgz (s3.3)
sphinx3-0.4-rc2.tgz (s3.4 release candidate II)
sphinx4-0.1alpha-src.zip (s4)

22
Rule 2 If it doesnt exist in CVS, officially it
doesnt exist

Simply speaking, no one actually support and
maintain them. Software fall into this category
CMU LM Toolkit (we havent touched it for a
while)
We may do it in the future.
Phoenix (Distributed somewhere else)
Training scripts in csh
Rita always actively support it.

23
Rule 1 If they were no tarballs, they are in CVS

ANYONE can get the following modules through CVS
by using the following commands
cvs z3 dpserveranonymous_at_cvs.sourceforge.net/
cvsroot/cmusphinx co modulname
modulename
SphinxTrain -gt SphinxTrain
archive_s3 -gt s3 s3.0 s3.2 s3.3
sphinx2 -gt devel ver. of sphinx2
sphinx3 s3.4 -gt we will check base on this to
develop s3.5
share cepview lm3g2dmp
sphinx3j the java version of sphinx3
Sphinx4 development version of sphinx4

24
Rule 3 You may need other modules to complete
your task

SphinxTrain heavily rely on force alignment so
you also need s3-align
Usage of any s3 recognizers required the LM in
DMP format so you need the tool lm3g2dmp which
can be found in sphinx2 or share.

25
Where can I get more information for the
recognizer?

People to ask
s2 Evandro , Ravi
S3 flat Evandro, Ravi , ArthurC
S3 tree Evandro, Ravi, ArthurC
SphinxTrain Rita, Evandro, Ravi, ArthurC, Rong,
Ziad, Murali.
S4 S4s developers in Sourceforge
Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

26
Web page to look up

Ritas web page
www.cs.cmu.edu/rsingh
Contains the manual of training
Twiki web page for sphinx 4 design
www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view
/Sphinx4/WebHome/
ArthurCs web page
Risk his life to write a manual for Sphinx 3.4
Also collect some information for each Sphinx

27
Outlook of all recognizers

Sphinx II
Sorry, we wont support it too much.
Reason, s3.4 and s4 are proved to have very nice
speed and accuracy performance
Sphinx III
Only active branch is s3.5
Moderate change in s3flat
Motivated by project CALO
This quarter make adaptation works.
SphinxTrain
Write a set of scripts for Continuous HMM
training
Silence deletion problem will be fixed.

28
(cont.)

sphinxDoc
Chapter 1 and 2 completed (sigh, still 7 left)
Only begin written when Arthur C is
procrastinating and dont want to read and play
video game.
Will be there at around Sep or Oct.
Sphinx IV
Alpha release
Trainer will be fixed
Argus
Incorporate the advantages of many speech
recognizers together
Not yet started.

29
Conclusion