Title: Brief Overview of Different Versions of Sphinx
1Brief Overview of Different Versions of Sphinx
2Introduction
- Software aspect of the recognizer is very
important - Research always require correct use of the
software. - Sphinx II III IV SphinxTrain
- 100 k lines of code
- Each of them are fairly complex
3This presentation (30 pages)
- Introduction (3 pages)
- History of Sphinx (13 pages)
- Sphinx I (2 pages)
- Sphinx II (2 pages)
- Sphinx III (3 pages)
- SphinxTrain (3 pages)
- Sphinx IV (3 pages)
- How do I get the source code? (4 pages)
- Versioning
- Three rules of not getting lost in different
recognizers - Where can I get official information? (2 pages)
- Outlook in each recognizer. (3 pages)
- Conclusion
4Brief history of Sphinx
- Largely adapted from
- Ritas The Sphinx Speech Recognition Systems
- www.cs.cmu.edu/rsingh/
- Kevin et als Speech Recognition Past, Present
and Future - www.cs.cmu.edu/msiegler/ASR/futureofcmu-final.htm
l
5Before Sphinx
- Dragon
- One of the first use of HMM in speech recognition
- One of the first use of purely statistically
model in speech - Express the knowledge using HMM network
- Harpy
- One of the first use of beam search
- Use phoneme to represent words.
6Sphinx I
- Before Sphinx ...
- From ATTs literature, the concept of
speaker-independence was proposed in 1979 - In 1979-1987, most systems are either,
- Speaker dependent
- Speaker independent but in a very small domain
(lt100 words) - Sphinx I is therefore outstanding
- Accuracy is 90 on Resource Management
7Sphinx I (1987)
- By Kai-Fu Lee and Roberto Bisiani
- Key developer included Hsiao-wuen Hon, Fil Alleva
- Written in C.
- Continuous speech recognizer using discrete HMM
with 3 codebooks of size 256. - Using simple word-pair grammar
- Generalize triphones
- Real-time on Sun3 or Dec 3000
- Where is the source code? Good antique!
8Sphinx II (1992)
- By Xuedong Huang
- Hardwired to 5-state Bakis topology
- 3-gram language models
- Decision-tree tying of HMM (by Mei-Yuh Huang)
- 90 in WSJ task (0 or 1?)
9Fast Beam Search v. X
- FBS-6 flat lexicon decoder
- FBS-7 lexicon tree-based.
- FBS-8 decoder (written by Ravi Mosur, see thesis
in 96) - Support multiple types of beam pruning.
- Lexical tree
- Tricks in GMM Computation
- Machine optimization loop unrolling
- Predictive Codebook computation
- Phoneme lookahead
- Best path search .
10Other facts about Sphinx II
- We license it at the beginning (seem to back till
days like 95) - In 2000, it starts to be open-sourced in
Sourceforge under Berkeleys style license - You could incorporate Sphinxs source code
- You dont need to open your source code. (No
recursive legal binding) - Similar to LGPL
- In 2001, a major alpha release by Kevin that
ensures portability in several platforms.
11Sphinx III flat lexicon decoder
(s3,s3flat,s3slow)
- Sphinx III (by Ravi Mosur)
- Flat Lexicon
- Support both CHMM and SCHMM
- Poor-man trigram
- Use only the most likely first word, this avoid
D2 expansion of the word lattice. - Arbitrary topology
- Very accurate, used in evaluation of BN and
others. - Derivative from the search include
- N-best generator
- Aligner
- Phone recognizer
12Sphinx III tree lexicon decoder(s3.x,s3fast,
s3inaccurate)
- What is s3.x actually?
- A spin-off of the Sphinx III flat lexicons
source code - First use was in BN 10x RT evaluation in 1999
- From s3.0 -gt s3.2
- Use tree-lexicon with unigram lookahead
- Lexical tree with approximation to avoid memory
problem - One of the first in the world used Sub-vector
quantization in speed-up GMM computation
13(cont.)
- From s3.2 -gt s3.3 (Rita, Ricky)
- Live mode recognizer (livedecode) and simulator
(livepretend) - From s3.3 -gt s3.4 (Evandro, Arthur C, Jahanzeb,)
- 4-level of speed-up of GMM computation, phoneme
lookahead - Bug fixes in live mode
- From s3.4 -gt s3.5 (Evandro, Arthur C, Yitao)
- (Tentative) Speaker adaptation documentation
14Facts about S3
- A Java version exists -gt sphin3j
- Open source at 2002
- Always being maintained by Evandro from 2001 to
now. - s3.5 is the current active branch in S3
development.
15SphinxTrain
- Equally important and very complex
- But not well understood.
- What is SphinxTrain?
- A collection of 40 tools for Sphinx 2, 3 and 4
acoustic model training - A set of perl scripts to do training
- Sphinx 2 and 3 all have slight different formats
of models
16Mini-history
- Baum Welch trainer and Viterbi trainer existed
very long time ago. - Training tool in general was not systematic and
was no structured. - From the chaos, Eric Thayer first pull everything
together to create the package SphinxTrain - Rita did numerous bug fixes and modification of
the current trainer - Innovate the use of automatic question
generation. (make_quest) - Built a set of training scripts for RM (the 0/
scripts) - Write the first set of systematic tutorial on
training - Ricky refined the code and wrote the first set of
perl script for Training. - He made a PHD out of it too. (PHD Push Here
Dummy!) - Alan and Kevin
- Put the set of code to sourceforge
- Alan build a set of training script that can
run-through
17Sphinx IV
- Why Sphinx IV?
- Too many limitations in SphinxTrain and Sphinx
III - Only N-gram
- Approximation of triphones
- Fast GMM computation could be very troublesome to
understood - Bw doesnt skip silence. We heavily rely on
force alignement in training.
18Sphinx IV (cont.)
- (By no mean complete)
- Lead Design Bhiksha (MERL)
- Lead Team Developer Willer Walker (Sun)
- Key developers Evandro, Rita, Phillip Kwok and
Paul Lamere - Many heavy weight speech advisors Evandro, Rita,
Ravi, Bhiksha, Medro Moreno
19Is Sphinx IV good?
- Very accurate, very fast, very versatile and very
nicely-pakcaged Java-based speech recognizer - Some internal benchmark in RM and WSJ 5k is shown
to be faster and more accurate than s3.3 (under
1xRT and 10 better) - Support N-gram, FSM and FSG.
- Will provide facilities like confidence-scoring
- Still under development (just have first alpha
release) - Trainer is not stable
20Summary of the recognizers and trainers
- Sphinx I -gt obsolete
- Sphinx II -gt we are using the fast recognizer now
- Sphinx III, the following coexists
- S3 flat
- S3 fast (s3.4 stable, s3.5 devel)
- SphinxTrain (0.92 in the CVS)
- Sphinx IV
- Recognizer is alpha released
- Trainer not yet stable
21How can I get version X of Sphinx?
- Official Web page of Sphinx
- http//cmusphinx.sourceforge.net
- Give announcement and news of development
- Some documentation is there.
- For the tarballs
- http//sourceforge.net/projects/cmusphinx
- Releases
- sphinx2-0.4.tgz (s2)
- sphinx3-0.1.tgz (s3.3)
- sphinx3-0.4-rc2.tgz (s3.4 release candidate II)
- sphinx4-0.1alpha-src.zip (s4)
22Rule 2 If it doesnt exist in CVS, officially it
doesnt exist
- Simply speaking, no one actually support and
maintain them. Software fall into this category - CMU LM Toolkit (we havent touched it for a
while) - We may do it in the future.
- Phoenix (Distributed somewhere else)
- Training scripts in csh
- Rita always actively support it.
23Rule 1 If they were no tarballs, they are in CVS
- ANYONE can get the following modules through CVS
by using the following commands - cvs z3 dpserveranonymous_at_cvs.sourceforge.net/
cvsroot/cmusphinx co modulname - modulename
- SphinxTrain -gt SphinxTrain
- archive_s3 -gt s3 s3.0 s3.2 s3.3
- sphinx2 -gt devel ver. of sphinx2
- sphinx3 s3.4 -gt we will check base on this to
develop s3.5 - share cepview lm3g2dmp
- sphinx3j the java version of sphinx3
- Sphinx4 development version of sphinx4
24Rule 3 You may need other modules to complete
your task
- SphinxTrain heavily rely on force alignment so
you also need s3-align - Usage of any s3 recognizers required the LM in
DMP format so you need the tool lm3g2dmp which
can be found in sphinx2 or share.
25Where can I get more information for the
recognizer?
- People to ask
- s2 Evandro , Ravi
- S3 flat Evandro, Ravi , ArthurC
- S3 tree Evandro, Ravi, ArthurC
- SphinxTrain Rita, Evandro, Ravi, ArthurC, Rong,
Ziad, Murali. - S4 S4s developers in Sourceforge
- Willie, Paul, Phillip, Bhiksha, Rita, Evandro.
26Web page to look up
- Ritas web page
- www.cs.cmu.edu/rsingh
- Contains the manual of training
- Twiki web page for sphinx 4 design
- www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view
/Sphinx4/WebHome/ - ArthurCs web page
- Risk his life to write a manual for Sphinx 3.4
- Also collect some information for each Sphinx
27Outlook of all recognizers
- Sphinx II
- Sorry, we wont support it too much.
- Reason, s3.4 and s4 are proved to have very nice
speed and accuracy performance - Sphinx III
- Only active branch is s3.5
- Moderate change in s3flat
- Motivated by project CALO
- This quarter make adaptation works.
- SphinxTrain
- Write a set of scripts for Continuous HMM
training - Silence deletion problem will be fixed.
28(cont.)
- sphinxDoc
- Chapter 1 and 2 completed (sigh, still 7 left)
- Only begin written when Arthur C is
procrastinating and dont want to read and play
video game. - Will be there at around Sep or Oct.
- Sphinx IV
- Alpha release
- Trainer will be fixed
- Argus
- Incorporate the advantages of many speech
recognizers together - Not yet started.
29Conclusion
- This presentation
- Summarize the current code status of Sphinx and
SphinxTrain. - We still have a lot of work to do
- Next presentation
- s3 or s3.4 from main to the search.