From Main to the search routine in Sphinx 3 s3accurate - PowerPoint PPT Presentation

About This Presentation
Title:

From Main to the search routine in Sphinx 3 s3accurate

Description:

Number of lines in Harry Potter novels. Book 1 : ~9000 lines. Book 2 : ~9000 lines ... It is much shorter than the whole series of Harry Potter novels. ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 21
Provided by: Arthu61
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: From Main to the search routine in Sphinx 3 s3accurate


1
From Main() to the search routine in Sphinx 3
(s3accurate)
  • Arthur Chan
  • July 8, 2004

2
This presentation
  • Design goals of Sphinx 3 (s3accurate) (4 pages)
  • How hard is to trace the Sphinx code? (2 pages)
  • How to trace source code in general? (1 page)
  • A tour from the main function to the search
    routine. (11 pages)
  • Next time detail of s3accurate or we could start
    s3fast.

3
Design Goals of a Speech Recognizer/Trainer
  • Different goals of building a speech recognition
    software
  • Software Accuracy
  • Software Speed
  • Software Usability
  • Code Readability Is it entertaining or easy to
    read?

4
If we optimize on only one aspect of Design goal
  • Optimizing Accuracy
  • Use 1000xRT techniques to optimize recognition
    rate.
  • Optimizing Speed
  • Cryptic coding makes programmer unable to
    understand whats going on.
  • Optimizing Usability
  • Allow gt 1000 options for users in the application
  • But the code become extremely complex.

5
Conflicts between goals
  • Accuracy vs Speed
  • Usage of approximate search usually give faster
    decoding time but lower accuracy.
  • Whole sentence decoding make sense for many
    techniques but cause a lot of delay.
  • Usability vs Readability
  • To make a software to be very usable, the code
    will be more complicated.
  • Complicated code is generally hard to read.

6
Design goal of Sphinx 3 (s3accurate)
  • Accuracy gt Readability gt Usability gt Speed
  • Sphinx 3 the flat lexicon version,
  • was mainly used for research purpose.
  • very modular and easy to be changed by the
    researchers with experience in C
  • Not many difficult code optimization could be
    found.

7
How hard is to trace code of Sphinx? (My view).
  • As many programs
  • Sphinx is just as a set of commands which do
    additions, subtractions, multiplications and
    divisions in a specific orders.
  • So
  • It is not difficult to understand the code if the
    underlining algorithm is understood.

8
Is there a lot of things to read?
  • Number of lines of code of all .c and .h files.
  • 36143
  • Number of lines in Harry Potter novels
  • Book 1 9000 lines
  • Book 2 9000 lines
  • Book 3 12000 lines
  • Book 4 21000 lines
  • Book 5 27000 lines
  • So
  • It is much shorter than the whole series of Harry
    Potter novels.
  • Actually, it is not much to read.

9
How to read the source code in general?
  • Several advices
  • Jot notes on the dependency of the code
  • (A file can be found in README.tracing can be
    found in s3fast and SphinxTrain. )
  • Jot notes on how certain parts of the code work
  • Something useful but not necessary An editor
    with program statistics and hyperlink of function
    definitions.
  • Such as Microsoft Visual C (?)
  • Or emacs (?) or vi (?)
  • Grab a comfortable chair.
  • Be patient.

10
A tour from the main function to the search
routine Overview
  • Physical layout of the code
  • s3decode-anytopo
  • Brief tour of the programs
  • Initialization
  • Processing
  • Post-processing.

11
Physical Layout of s3
  • Getting the code from Sourceforge
  • cvs -dpserveranonymous_at_cvs.sourceforge.net
    /cvsroot/cmusphinx/ co archive_s3
  • archive_s3
  • S3 ltlt THE ONE
  • S3.0 lt-Why is it there?
  • S3.2 lt- Legacy implementation s3fast wo
    live-mode decoder
  • S3.3 lt- Legacy implementation s3fast w
    live-mode decoder

12
s3
  • config/ lt- for configuration of make
  • doc/ lt- documentation
  • include/ lt- header files
  • src/ lt- all the .c files.
  • lib/ lt- where the library will be
  • bin/ lt- where the binary will be.

13
Inside src/
  • libio/ file IO functions
  • libutil/ useful data structure
  • libfbs/ source for all searching code
  • libfeat/ code for feature extraction

14
Inside libfbs/
  • Several files with main() include
  • align-main.c -gt the entry point for s3align
  • main.c -gt the entry point for s3decode-anytopo
  • allphone-main.c -gt the entry point for s3allphone
  • astar-main.c -gt the entry point for s3astar
  • nbestrescore-main.c -gt the entry point for
    s3nbestrescore
  • dag.c -gt the entry point for s3dag

15
Logical structure of the code
  • The forward search
  • 1, Do GMM computation for every senones
  • 2, Do search for 1 frames.
  • 3, Iterate until the end of the utterances.

16
main.c
  • Pseudocode of top level
  • 1, Read command line arguments
  • cmd_ln_define
  • cmdline_parse
  • 2, Initialization
  • Initialize log table lookup, logs3_init
  • Initialize feature feat_init
  • Initialize models models_init
  • 3, Processing of all control files (also do
    decoding)
  • Process_ctlfiles

17
process_ctlfiles
  • Pseudocode
  • 1, Read in more parameters
  • 2, For every cepstral file,
  • Run decoding (decode_utt)

18
decode_utt
  • Run forward search fwdvit
  • Dump hypothesis/lattice/statistics to the output

19
fwd_vit
  • 1, Initialization of the feature vector
  • 2, For every frame
  • a, Computation of feature
  • b, Computation of Gaussian distribution
  • c, Do forward search for 1 frame

20
We are almost done !
  • We went-through the key logic that leads us from
    the main() function to the GMM computation and
    search
  • Next time
  • Each component in more details.
Write a Comment
User Comments (0)
About PowerShow.com