Speech Enabling Mobile Devices using a Java Based Distributive Speech Recognition System PowerPoint PPT Presentation

presentation player overlay
1 / 16
About This Presentation
Transcript and Presenter's Notes

Title: Speech Enabling Mobile Devices using a Java Based Distributive Speech Recognition System


1
Speech Enabling Mobile Devices using a Java Based
Distributive Speech Recognition System
  • Speech Technology and Research Group
  • Department of Electrical Engineering
  • University of Cape Town
  • Dale Isaacs

2
Overview
  • Motivation for Research
  • Applications
  • Background
  • Current DSR Standards
  • Implementation of J2ME Front-End
  • Implementation of SPHINX4 Back-End
  • Preliminary Results
  • Summary

3
Motivation for Research
  • Increased performance of mobile phones
  • Advances in wireless technology
  • Rapid evolution in mobile phone services
  • Ubiquitous computing
  • Users forced to use small keypads and screens
  • Speech is most natural form of communicating
  • Advantageous for blind users

4
Motivation for Research
  • In 2005 2,168,433,600 mobile devices worldwide
  • 708 Million Java-equipped Handsets
  • Currently there are 2.5 Billion mobile devices
  • Number of Java-equipped phones has increased
  • Current working implementations DSR only written
    in C
  • Implement Speech Recognition System using J2ME
    Front-End
  • Sphinx4 Back-End (Java Based)

5
Applications
  • Hands-free communication with devices
  • Voice navigation, Voice Dialling
  • Dictate outgoing SMSs
  • Replay incoming SMSs
  • Security Feature Speaker Identification /
    Speaker Verification
  • Access to network based services
  • Allow easy access for blind users with new
    devices

6
Background
  • Speech Technology
  • Speech Synthesizer
  • Text-to-Speech
  • Speech Recognizer
  • Automatic Speech Recognition Speech-to-Text
  • Speaker Identification identify persons by
    sound of their voice
  • Speaker Verification is the person who he/she
    claims to be?

7
Background
  • 3 ways to implement a Automatic Speech
    Recognition (ASR) System on a mobile device
  • Embedded Speech Recognition
  • Network Speech Recognition
  • Distributed Speech Recognition

8
Embedded Speech Recognition
Mobile Device
Built in Speech Recognizer
Pattern Recognizer
Feature Extractor
Recognition Decision
Features
Speech Input
  • All components sit on mobile device
  • Mobile devices still lack processing power
  • Voice Navigation / Command Control Small
    Vocabulary

9
Network Speech Recognition
Server
Client
Speech Input
Recognition Decision
Pattern Recognizer
Speech Decoder
Feature Extraction
Speech Encoder
  • Client Server architecture
  • High bit-rate requirement

10
Distributed Speech Recognition
Front-End
Back-End
Server
Client
Speech Input
Recognition Decision
Speech Decoder
Pattern Recognizer
Feature Extraction
Encoder
  • Also Client Server architecture
  • Feature Extraction done on client side
  • Work load evenly distributed
  • Low bit-rate requirement

11
Current DSR Standards
  • In 2000, European Telecommunications Standards
    Institute (ETSI) released first standard for
    feature extraction in Front-Ends ES 201 108
  • In 2002, released ES 202 050 noise reduction
  • In 2003, released ES 202 211 and ES 202 212
    allowing reconstruction of intelligible speech
  • All of the above standards existing only in C
  • Any Speech recognizer can be used at the
    Back-End ISIP, HTK or SPHINX

12
Implementation of J2ME Front-End
  • Based on ES 201 108

OFFCOM
PE
FRAMING
ADC
LOG
MF
FFT
W
DCT
LogE
13 Cepstral Coefficients
Speech In

LogE
FEATURE COMPRESSION
BIT STREAM FORMATTING FRAMING
14 Compressed Cepstral Coefficients
to Transmission Channel
13
Implementation of SPHINX4 Back-End
Application
Tools Utilities
Control
Compressed Features from Transmission Channel
Result
Decoder
Linguist
Search Manager
AccousticModel
ActiveList
Dictionary
Speech Decoder
LanguageModel
Pruner
Scorer
Feature Reconstructor
Search Graph
Configuration Manager
14
Preliminary Results
  • WER Lower is better
  • Tested the Back-End (Stand Alone) using existing
    speech databases
  • Values are what we expect this recognizer
  • Once we link Front-End, expect results to be
    acceptable but less impressive due to noise and
    errors over transmission channel

15
Summary
  • Implementing speech recognition will improve user
    experience with their devices
  • DSR is best solution for implementing speech
    recognition on mobile devices
  • More Java-Enabled handsets
  • Implement a Java based system
  • Preliminary results look promising at Back-End
  • Aim is to match baseline results for C equivalent

16
..Thank You..
  • Questions ??
  • Email dale_at_crg.ee.uct.ac.za
Write a Comment
User Comments (0)
About PowerShow.com