Brief Overview of Different Versions of Sphinx - PowerPoint PPT Presentation

About This Presentation
Title:

Brief Overview of Different Versions of Sphinx

Description:

... 2000, it starts to be open-sourced in Sourceforge under Berkeley's style license ... Outlook of all recognizers. Sphinx II. Sorry, we won't support it too much. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 30
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Brief Overview of Different Versions of Sphinx


1
Brief Overview of Different Versions of Sphinx
  • Arthur Chan

2
Introduction
  • Software aspect of the recognizer is very
    important
  • Research always require correct use of the
    software.
  • Sphinx II III IV SphinxTrain
  • 100 k lines of code
  • Each of them are fairly complex

3
This presentation (30 pages)
  • Introduction (3 pages)
  • History of Sphinx (13 pages)
  • Sphinx I (2 pages)
  • Sphinx II (2 pages)
  • Sphinx III (3 pages)
  • SphinxTrain (3 pages)
  • Sphinx IV (3 pages)
  • How do I get the source code? (4 pages)
  • Versioning
  • Three rules of not getting lost in different
    recognizers
  • Where can I get official information? (2 pages)
  • Outlook in each recognizer. (3 pages)
  • Conclusion

4
Brief history of Sphinx
  • Largely adapted from
  • Ritas The Sphinx Speech Recognition Systems
  • www.cs.cmu.edu/rsingh/
  • Kevin et als Speech Recognition Past, Present
    and Future
  • www.cs.cmu.edu/msiegler/ASR/futureofcmu-final.htm
    l

5
Before Sphinx
  • Dragon
  • One of the first use of HMM in speech recognition
  • One of the first use of purely statistically
    model in speech
  • Express the knowledge using HMM network
  • Harpy
  • One of the first use of beam search
  • Use phoneme to represent words.

6
Sphinx I
  • Before Sphinx ...
  • From ATTs literature, the concept of
    speaker-independence was proposed in 1979
  • In 1979-1987, most systems are either,
  • Speaker dependent
  • Speaker independent but in a very small domain
    (lt100 words)
  • Sphinx I is therefore outstanding
  • Accuracy is 90 on Resource Management

7
Sphinx I (1987)
  • By Kai-Fu Lee and Roberto Bisiani
  • Key developer included Hsiao-wuen Hon, Fil Alleva
  • Written in C.
  • Continuous speech recognizer using discrete HMM
    with 3 codebooks of size 256.
  • Using simple word-pair grammar
  • Generalize triphones
  • Real-time on Sun3 or Dec 3000
  • Where is the source code? Good antique!

8
Sphinx II (1992)
  • By Xuedong Huang
  • Hardwired to 5-state Bakis topology
  • 3-gram language models
  • Decision-tree tying of HMM (by Mei-Yuh Huang)
  • 90 in WSJ task (0 or 1?)

9
Fast Beam Search v. X
  • FBS-6 flat lexicon decoder
  • FBS-7 lexicon tree-based.
  • FBS-8 decoder (written by Ravi Mosur, see thesis
    in 96)
  • Support multiple types of beam pruning.
  • Lexical tree
  • Tricks in GMM Computation
  • Machine optimization loop unrolling
  • Predictive Codebook computation
  • Phoneme lookahead
  • Best path search .

10
Other facts about Sphinx II
  • We license it at the beginning (seem to back till
    days like 95)
  • In 2000, it starts to be open-sourced in
    Sourceforge under Berkeleys style license
  • You could incorporate Sphinxs source code
  • You dont need to open your source code. (No
    recursive legal binding)
  • Similar to LGPL
  • In 2001, a major alpha release by Kevin that
    ensures portability in several platforms.

11
Sphinx III flat lexicon decoder
(s3,s3flat,s3slow)
  • Sphinx III (by Ravi Mosur)
  • Flat Lexicon
  • Support both CHMM and SCHMM
  • Poor-man trigram
  • Use only the most likely first word, this avoid
    D2 expansion of the word lattice.
  • Arbitrary topology
  • Very accurate, used in evaluation of BN and
    others.
  • Derivative from the search include
  • N-best generator
  • Aligner
  • Phone recognizer

12
Sphinx III tree lexicon decoder(s3.x,s3fast,
s3inaccurate)
  • What is s3.x actually?
  • A spin-off of the Sphinx III flat lexicons
    source code
  • First use was in BN 10x RT evaluation in 1999
  • From s3.0 -gt s3.2
  • Use tree-lexicon with unigram lookahead
  • Lexical tree with approximation to avoid memory
    problem
  • One of the first in the world used Sub-vector
    quantization in speed-up GMM computation

13
(cont.)
  • From s3.2 -gt s3.3 (Rita, Ricky)
  • Live mode recognizer (livedecode) and simulator
    (livepretend)
  • From s3.3 -gt s3.4 (Evandro, Arthur C, Jahanzeb,)
  • 4-level of speed-up of GMM computation, phoneme
    lookahead
  • Bug fixes in live mode
  • From s3.4 -gt s3.5 (Evandro, Arthur C, Yitao)
  • (Tentative) Speaker adaptation documentation

14
Facts about S3
  • A Java version exists -gt sphin3j
  • Open source at 2002
  • Always being maintained by Evandro from 2001 to
    now.
  • s3.5 is the current active branch in S3
    development.

15
SphinxTrain
  • Equally important and very complex
  • But not well understood.
  • What is SphinxTrain?
  • A collection of 40 tools for Sphinx 2, 3 and 4
    acoustic model training
  • A set of perl scripts to do training
  • Sphinx 2 and 3 all have slight different formats
    of models

16
Mini-history
  • Baum Welch trainer and Viterbi trainer existed
    very long time ago.
  • Training tool in general was not systematic and
    was no structured.
  • From the chaos, Eric Thayer first pull everything
    together to create the package SphinxTrain
  • Rita did numerous bug fixes and modification of
    the current trainer
  • Innovate the use of automatic question
    generation. (make_quest)
  • Built a set of training scripts for RM (the 0/
    scripts)
  • Write the first set of systematic tutorial on
    training
  • Ricky refined the code and wrote the first set of
    perl script for Training.
  • He made a PHD out of it too. (PHD Push Here
    Dummy!)
  • Alan and Kevin
  • Put the set of code to sourceforge
  • Alan build a set of training script that can
    run-through

17
Sphinx IV
  • Why Sphinx IV?
  • Too many limitations in SphinxTrain and Sphinx
    III
  • Only N-gram
  • Approximation of triphones
  • Fast GMM computation could be very troublesome to
    understood
  • Bw doesnt skip silence. We heavily rely on
    force alignement in training.

18
Sphinx IV (cont.)
  • (By no mean complete)
  • Lead Design Bhiksha (MERL)
  • Lead Team Developer Willer Walker (Sun)
  • Key developers Evandro, Rita, Phillip Kwok and
    Paul Lamere
  • Many heavy weight speech advisors Evandro, Rita,
    Ravi, Bhiksha, Medro Moreno

19
Is Sphinx IV good?
  • Very accurate, very fast, very versatile and very
    nicely-pakcaged Java-based speech recognizer
  • Some internal benchmark in RM and WSJ 5k is shown
    to be faster and more accurate than s3.3 (under
    1xRT and 10 better)
  • Support N-gram, FSM and FSG.
  • Will provide facilities like confidence-scoring
  • Still under development (just have first alpha
    release)
  • Trainer is not stable

20
Summary of the recognizers and trainers
  • Sphinx I -gt obsolete
  • Sphinx II -gt we are using the fast recognizer now
  • Sphinx III, the following coexists
  • S3 flat
  • S3 fast (s3.4 stable, s3.5 devel)
  • SphinxTrain (0.92 in the CVS)
  • Sphinx IV
  • Recognizer is alpha released
  • Trainer not yet stable

21
How can I get version X of Sphinx?
  • Official Web page of Sphinx
  • http//cmusphinx.sourceforge.net
  • Give announcement and news of development
  • Some documentation is there.
  • For the tarballs
  • http//sourceforge.net/projects/cmusphinx
  • Releases
  • sphinx2-0.4.tgz (s2)
  • sphinx3-0.1.tgz (s3.3)
  • sphinx3-0.4-rc2.tgz (s3.4 release candidate II)
  • sphinx4-0.1alpha-src.zip (s4)

22
Rule 2 If it doesnt exist in CVS, officially it
doesnt exist
  • Simply speaking, no one actually support and
    maintain them. Software fall into this category
  • CMU LM Toolkit (we havent touched it for a
    while)
  • We may do it in the future.
  • Phoenix (Distributed somewhere else)
  • Training scripts in csh
  • Rita always actively support it.

23
Rule 1 If they were no tarballs, they are in CVS
  • ANYONE can get the following modules through CVS
    by using the following commands
  • cvs z3 dpserveranonymous_at_cvs.sourceforge.net/
    cvsroot/cmusphinx co modulname
  • modulename
  • SphinxTrain -gt SphinxTrain
  • archive_s3 -gt s3 s3.0 s3.2 s3.3
  • sphinx2 -gt devel ver. of sphinx2
  • sphinx3 s3.4 -gt we will check base on this to
    develop s3.5
  • share cepview lm3g2dmp
  • sphinx3j the java version of sphinx3
  • Sphinx4 development version of sphinx4

24
Rule 3 You may need other modules to complete
your task
  • SphinxTrain heavily rely on force alignment so
    you also need s3-align
  • Usage of any s3 recognizers required the LM in
    DMP format so you need the tool lm3g2dmp which
    can be found in sphinx2 or share.

25
Where can I get more information for the
recognizer?
  • People to ask
  • s2 Evandro , Ravi
  • S3 flat Evandro, Ravi , ArthurC
  • S3 tree Evandro, Ravi, ArthurC
  • SphinxTrain Rita, Evandro, Ravi, ArthurC, Rong,
    Ziad, Murali.
  • S4 S4s developers in Sourceforge
  • Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

26
Web page to look up
  • Ritas web page
  • www.cs.cmu.edu/rsingh
  • Contains the manual of training
  • Twiki web page for sphinx 4 design
  • www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view
    /Sphinx4/WebHome/
  • ArthurCs web page
  • Risk his life to write a manual for Sphinx 3.4
  • Also collect some information for each Sphinx

27
Outlook of all recognizers
  • Sphinx II
  • Sorry, we wont support it too much.
  • Reason, s3.4 and s4 are proved to have very nice
    speed and accuracy performance
  • Sphinx III
  • Only active branch is s3.5
  • Moderate change in s3flat
  • Motivated by project CALO
  • This quarter make adaptation works.
  • SphinxTrain
  • Write a set of scripts for Continuous HMM
    training
  • Silence deletion problem will be fixed.

28
(cont.)
  • sphinxDoc
  • Chapter 1 and 2 completed (sigh, still 7 left)
  • Only begin written when Arthur C is
    procrastinating and dont want to read and play
    video game.
  • Will be there at around Sep or Oct.
  • Sphinx IV
  • Alpha release
  • Trainer will be fixed
  • Argus
  • Incorporate the advantages of many speech
    recognizers together
  • Not yet started.

29
Conclusion
  • This presentation
  • Summarize the current code status of Sphinx and
    SphinxTrain.
  • We still have a lot of work to do
  • Next presentation
  • s3 or s3.4 from main to the search.
Write a Comment
User Comments (0)
About PowerShow.com