Sphinx on Handhelds - PowerPoint PPT Presentation

About This Presentation
Title:

Sphinx on Handhelds

Description:

Typically 200-400MHz ARM/XScale. Faster than the workstations Sphinx started out on ... ARM has very fast and sophisticated integer ISA. Memory and storage ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 9
Provided by: davidhugg
Learn more at: http://www.cs.cmu.edu
Category:
Tags: arm | handhelds | sphinx

less

Transcript and Presenter's Notes

Title: Sphinx on Handhelds


1
Sphinx on Handhelds
  • David Huggins-Daines
  • dhuggins_at_cs.cmu.edu

2
Sphinx on Handhelds?
  • Handheld/embedded devices are pretty speedy these
    days
  • LVCSR on them is not unreasonable
  • An open-source one does not exist yet
  • CALOs new focus on mobility
  • S2S translation projects could use it
  • Sublime, smartphone applications, etc
  • ISL has it, so should we!

3
Handheld challenges
  • CPU speed
  • Typically 200-400MHz ARM/XScale
  • Faster than the workstations Sphinx started out
    on
  • No hardware floating-point instructions
  • ARM has very fast and sophisticated integer ISA
  • Memory and storage capacity/speed
  • DRAM is very limited (32 or 64MB)
  • Storage is very slow (typically CF cards)
  • Inefficient and clumsy operating systems
  • WinCE has no stdio, broken malloc, 32MB limit
  • PalmOS is much, much worse!

4
Plan for Sphinx on Handhelds
  • Start out with Sphinx2
  • Its fast
  • People use it already
  • Convert hot spots to integer math
  • Precompute model files
  • Avoid parsing (no stdio, remember)
  • Allow memory-mapped I/O (subvert the 32MB limit
    on WinCE)
  • Disable non-useful features in libraries
  • e.g. flat lexicon search, CDHMM

5
Current Status
  • Sphinx2 on Sharp Zaurus
  • Linux, 40MB system RAM, 206MHz ARM
  • Performance on RM1 1.7x realtime
  • No degradation in accuracy
  • Integer front-end and GMM code complete
  • Front end also has a faster mode
  • 10 faster, 10 degradation in accuracy
  • Memory consumption is too high
  • WSJ5k can just barely run
  • Sphinx2 consumes about 16MB of heap space
  • Requires quantized mixture weights (-8bsen)
  • Sphinx3.x is much smaller and slower

6
Implementation details
  • FFT is done with 1616 fixed point
  • Bits 3116 are whole part and sign
  • Bits 150 are fractional part
  • I.e. all numbers scaled by 65536
  • Lossless multiplication done using 4 integer
    shift-multiply-accumulates (ARM is really good at
    this)
  • Mel-spectrum calculated in log scale
  • Using base 1.0001 in order to exploit existing
    add-table implementation
  • Faster mode uses 284 fixed point instead
  • Overflows saturated to INT_MAX
  • Zeroes floored to log(2-4) - very important!

7
Implementation details
  • Abstract types for intermediate values
  • mfcc_t, powspec_t, mean_t, var_t
  • define FIXED_POINT to make them ints
  • Arithmetic macros (fixpoint.h)
  • fixed32 type analogous to float32
  • addition and subtraction work as expected
  • MFCCMUL(), MFCC2FLOAT(), FLOAT2MFCC() macros
    become no-ops in floating-point build
  • GMMADD(), GMMSUB() do saturating addition and
    subtraction
  • ARM has special instructions for this too! Wow!

8
Future Work
  • Rationalize the file formats
  • General WinCE porting (Mohit)
  • Front-end optimization
  • Implement fixed-point FHT
  • Investigate Sphinx 3.x for embedded
  • SubVQ and GS can make it fast and cut memory
    consumption even more
  • Much nicer architecture
  • But not widely used, API not as stable
Write a Comment
User Comments (0)
About PowerShow.com