NAMEAWARE SPEECH RECOGNITION FOR INTERACTIVE QUESTION ANSWERING

About This Presentation

Title:

NAMEAWARE SPEECH RECOGNITION FOR INTERACTIVE QUESTION ANSWERING

Description:

How many times has Rush Limbaugh been married? Modified with NE. Rush Limbaugh. Target NE. How many times has Limbaugh been married? Original TREC question ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 2

Provided by: dtur

Category:

more less

Transcript and Presenter's Notes

Title: NAMEAWARE SPEECH RECOGNITION FOR INTERACTIVE QUESTION ANSWERING

1
NAME-AWARE SPEECH RECOGNITION FOR INTERACTIVE
QUESTION ANSWERING Svetlana Stoyanchev 3, Gokhan
Tur2 , Dilek Hakkani T\ur1 1 International
Computer Science Institute (ICSI), Speech Group,
Berkeley, CA, USA2 SRI International, Speech
Technology and Research (STAR) Lab., Menlo Park,
CA, USA 3 State University of New York (SUNY),
Stony Brook, NY, USA svetastenchikova_at_gmail.com
gokhan_at_speech.sri.com dilek_at_icsi.berkeley.edu
INTRODUCTION
DATASET
APPROACH
Goal Improving speech recognition in a
voice-enabled question answering application
through interactivity

Allow interactivity in specifying a named
entity.
Recognize a named entity using grammars and
retrieve matching documents
Build a question-specific language model

40 questions from TREC 2007 competition AQUAINT
corpus (3 Gigabyte document collection)
Question Answering QA is a natural language
interface for information retrieval TREC is a
yearly competition. Participants are given a
document set and a set of questions factoid
(who, what, when, where), list, or other (find
other relevant information on the given
topic).
EXPERIMENTS AND RESULTS
Evaluation 3 speakers read 40 questions
twice Set 3 40 questions with a named entity
(NE) Set 4 40 questions without a named entity
CONTROL FLOW

System asks a user to specify a named entity
(1). A named entity is recognized using grammar
constructed from a database of named entities
(2). Next, we extract documents matching the
named entity (3). Using these documents we build
a language model using named entity-specific
language model (4). In parallel Build a language
model using TREC questions (5). Merge the two
Language Models (6)
In the final step we recognize the question
using the new language model (7).

On what date did Michael Brown resign as a head
of FEMA?
Identify that the target is DATE, identify a
named entities Michael Brown and FEMA
NE Michael Brown, FEMA Phrases head of
FEMA Verb resigned
Search can use web or a document collection
Find Candidate Sentences, identify DATEs
On September 12, 2005, in the wake of what was
widely believed to be feckless handling of the
aftermath of Hurricane Katrina and facing
allegations that he had falsified portions of his
résumé, Brown resigned, it is mentioned
earlier in the document that Michael Brown was a
head of FEMA After his September 12 resignation,
Brown continued working for FEMA
Cheating model (contains questions)
1
2
September 12, 2005 September 12
Candidate answers
Identify match between September12, 2005 and
September 12.
3

Motivation
The word error rates of the state-of-the-art
open-domain speech recognition technology are
around 25-30.
Performance is known to be even lower for names
and rare words.
Named entities (NE) are strongly associated with
the content words. For example for the target
name Gordon Gekko, one question used in TREC 2004
evaluations is In what film is Gordon Gekko the
main character?, including non-function words
related to the movie industry, such as film or
character.
The goal is to capture content words using the
documents where the named entity appears
frequently.

Conclusions
The question-specific model achieves 32.2
reduction in word error rate from the baseline
using the questions where pronominal references
are resolved. (from 58.36 to 26.02)
Used name specific language models in question
answering task where target name in question is
asked to the user beforehand.
Used TREC benchmark questions on the AQUAINT
corpus.
Future Work
Grounding process of the named entity
ask a user for the type of the named entity (e.g.
person, organization, location, movie) and
associations
build a focused grammar a user may be
specifying
Orhan Pamuk, a Turkish writer.

5
4
6
7
Related Work

R. Iyer and M. Ostendorf, Modeling long distance
dependencies
in language Topic mixtures versus dynamic cache
model IEEE Transactions on Speech and Audio
Processing
D. Gildea and T. Hofmann, Topic-based language
models using EM in Proceedings of Eurospeech
F. Bechet, G. Riccardi, and D. Hakkani-Tur
Mining spoken dialog corpora for system
evaluation and modeling