Hardware Speech Recognition for User Interfaces in Low Cost, Low Power Devices - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Hardware Speech Recognition for User Interfaces in Low Cost, Low Power Devices

Description:

HTK (Hidden Markov Model Toolkit)-based software simulator ... Synthesised in 0.18 micron CMOS process, at 1.08V. Synopsys Design Compiler ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 21
Provided by: dac97
Category:

less

Transcript and Presenter's Notes

Title: Hardware Speech Recognition for User Interfaces in Low Cost, Low Power Devices


1
Hardware Speech Recognition for User Interfaces
in Low Cost, Low Power Devices
Sergiu Nedevschi, Rabin Patra, Eric
Brewer University of California, Berkeley
2
Motivation
  • ICTs are empowering technologies, but only for
    those with access to it
  • Current technology built with different
    assumptions about
  • Cost
  • Power
  • User Interfaces
  • 3 billion below US2000 per annum, 862 million
    illiterate adults
  • User Interfaces relying on low-cost, low-power
    hardware speech recognition
  • Makes IT accessible to illiterate and
    semi-literate people
  • Reduces cost by replacing expensive displays
  • Low power lower battery costs

3
Solution Requirements
  • Low cost power
  • Flexibility
  • Various languages, vocabularies
  • Different recognition algorithms coding
    techniques
  • Re-trainability
  • Scalability
  • Simple design, add more processors
  • Extensible to bigger vocabulary

4
Speech Recognition Basics
5
Speech Decoding Computation
END
HMM Models for Speech Utterances
  • Transition Probabilities
  • Observation Probabilities
  • Multivariate Gaussian Mixtures

START
b1(O1)

P(O,q1q2q4q4q5q5q6M)
6
Speech Decoding Computation
fj(t) maxi fj (t-1) aij bj (Ot )
7
fj(t) maxi fj (t-1) aij bj (Ot )
8
Algorithmic Design Decisions
  • Use regular grammar language models
  • Unified recognition network (big HMM)
  • Use Token Passing Algorithm for Likelihood
    Computation
  • Reduce complexity by partitioning vocabulary in
    active sets of words (lt 100 words each)

9
Hardware Design Decisions
  • Parallel design
  • Reduced frequency, voltage scaling
  • Set of small simple Processing Elements (PEs)
  • On-chip embedded FLASH and SRAM
  • No additional packaging, shorter wires, lower
    voltages
  • Multiple memory modules can deliver high
    throughput at low frequencies
  • Scaled fixed-point arithmetic
  • Much simpler smaller, almost no accuracy
    penalties
  • Data scaling determined by search for each
    operator
  • Single-cycle data path and gated clocks
  • Small frequency allows for long critical paths
    gated clocks

10
Architecture
  • General-purpose CPU
  • Computes the PE allocation, initialization
  • Provides the observation vectors from the DSP
    front-end
  • Processing Elements
  • Assigned a set of Gaussians and a set of HMM
    nodes
  • Aggregators
  • Aggregate filter data from PEs, pass results to
    PEs

11
Processing Element
12
Workload Allocation
  • 3 important steps
  • Language loading
  • Phoneme speech models loaded
  • Common pool of phonemes
  • Application loading
  • Active set of words loaded
  • User Interface Context loading
  • Regular grammar for allowable phrases
  • Word Interconnections

13
Observation Probability Scheduling
  • Computation
  • Multivariate Gaussian mixtures
  • Common pool of phonemes, often repeated
  • Solution
  • Phonemes equally divided among PEs
  • Results propagated by aggregator
  • Performed at language load time
  • Writing parameters in every PEs local FLASH

14
Token Probability Scheduling
fj(t) maxi fj (t-1) aij bj (Ot )
  • Best if adjacent states assigned to same PE
  • Step 1 at application load time
  • Full words assigned to PEs
  • Writing RAM of each PE
  • Step 2 at UI context change
  • Word interconnections handled by aggregator
  • Writing aggregators RAM, at each UI context
    change

15
Implementation
  • HTK (Hidden Markov Model Toolkit)-based software
    simulator
  • FPGA implementation on BEE (Berkeley Emulation
    Engine)
  • A real-time hardware emulation engine using 20
    high-density Xilinx Virtex-E FPGA chips
  • Needed due to memory constraints
  • Memory contents loaded at synthesis time
  • ASIC implementation
  • Synthesised in 0.18 micron CMOS process, at 1.08V
  • Synopsys Design Compiler

16
Area and Power Estimates
  • Area estimate (for 8PEs) 2.5 mm2
  • Power estimate (at 5MHz) 20mW
  • 5 mW excluding memory (only 0.1 mW leakage)
  • 15mW in memory (12.5 mw in FLASH 2.5 mw in
    SRAM)

17
Energy Savings
  • Compare with power consumption for a low-power
    general purpose ARM processor with similar
    throughput
  • 2 methods to estimate power consumption for ARM

18
Language Independence
  • Tested our recognizer (software simulator) on a
    Tamil dataset
  • 4600 speech samples, 30 Tamil speakers
  • Collection performed in 3 villages in Tamil Nadu
    by briefly trained volunteers
  • Simple word-based recognition

19
Envisioned Recognition Platform
20
Questions/Comments?
Write a Comment
User Comments (0)
About PowerShow.com