Title: John McCoey
1Methods for Improving Readability of Speech
Recognition Transcripts
I
2What is a Speech Recognition Transcript?
I
- Direct output from a Speech-to-Text Translation
(STT) system - Uses
- Everyday Communication and Word Processing
- Recording lectures/court cases
- Assistive technology for the hearing impaired
- Especially focusing on classroom settings
3What is Readability?
I
- Readability is not the same as word accuracy
- Change of speaker, change of thought, accent,
pauses, disfluent words, capitalization,
punctuation - Example
Images from Jones, Douglas, et al., Measuring
the Readability of Automatic Speech-to-Text
Transcripts Proc. Eurospeech, pp. 1585-1588,
2003.
4Measuring Readability
I
- Definitive measure for scoring the Accuracy and
Readability of a Speech-to-Text Transcript - Word Accuracy Percentage Score
- (Words Spoken Word Errors) / Words Spoken 100
- Readability Percentage Score
- (Words Spoken Sentences Speaker Changes)
(Word Errors Sentence Errors Speaker Change
Errors) / (Words Spoken Sentences Speaker
Changes) 100
R. Stuckless. Recognition means more than just
getting the words right Beyond accuracy to
readability. Speech Technology, Oct. /Nov. 1999,
pp. 30--35, 1999.
5What Factors Negatively Effect Readability?
I
- Searching number of recognizable words
- What if word isnt recognized?
- Discontinuous words, pauses, or unexpected
changes of thought - Capitalization and punctuation
- Implementation
- Change in speaker
6STT Systems and Algorithms
I
- CMU Sphinx
- Uses a large vocabulary and Hidden Markov Models
to determine probability of next spoken word
15
80
60
15
25
5
20
5
75
7STT Systems and Algorithms
I
Mosur K. Ravishankar. Efficient Algorithms for
Speech Recognition Ph.D. Thesis, Technical
Report CMU-CS-96-143, Computer Science
Department, Carnegie Mellon University, 1996.
8VUST System
I
Image From Richard Kheir and Thomas Way.
Inclusion of Deaf Students in Computer Science
Classes using Real-Time Speech Transcription.
ITiCSE07. Applied Computing Technology
Laboratory, Department of Computing Sciences,
Villanova University, 2007.
9Classroom Use
I
- Real-time Text Display
- Disadvantages?
- Note-taking / Study Guide
- Missed class
- Review for later
- Personal laptop connection
- Accessed only by individuals who require access
(hearing impaired) - Ability to save to personal computer again for
future study guide
10Proposed Work
I
- Incorporate Pauses in Training / DiBS
- Pause Detection Software
- Short
- Comma, semicolon
- Normal
- End of sentence
- Long
- End of paragraph, change of speaker, etc.
11Any Questions?
I
- John McCoey
- CSC 3990-001
- Villanova University
- October 24, 2007