Title: Focus of Attention Based on Speech Recognizer Hypothesis
1Focus of Attention Based on Speech Recognizer
Hypothesis
Michael Katzenmaier Interactive Systems
Labs University of Karlsruhe
2Outline
- Motivation
- Basics
- Settings
- Features
- Methods of Classification
- Experiments
- Preliminary Testing of Features
- Results Comparison
- Summary Outlook
3Motivation
- Machines are increasingly involved in human
interaction, e.g. - Improved Usability through integration of
human-machine communication (HM) - in human-human communication (HH)
- gt Requires Determination of Focus of Attention
- intelligent room with voice-operated equipment
(lighting, video/audio, etc.) - household robot
- home multimedia terminal
4Possible Input Modalities forTracking Focus of
Attention
hypothesis of speech recognizer eye gaze state
of dialogue place of speaker gesture ...
conversation human-human (HH)
classifier
commands human-machine (HM)
5In this work
- restriction on hypothesis
- of speech recognizer
- extracted features
- on hypothesis and transcription
6Experiment Settings
- data - approx. 10 min. real collected dialogues
- - some100 text only sentences (commands)
- (2 Persons 1 Room/Robot)
- for every turn X
- transcript XTrans
- N-Gram-Recognizer hypothesis XN-Gram
- CFG-Recognizer hypothesis XCFG
- corresponding features
- hand-crafted CFG customized for Human Robots
Project - N-Gram-Recognizer JANUS Verbmobil-Evaluation
System - Stuttgart Neuronal Network Simulator for
NN-experiments
7Features
- sentence length ?1,2,.. S(X)
XN-Gram - CFG- parseable ?0,1 Z(X)
- perplexity with Verbmobil-LM PerpHH1(X)
- ?0..? with VODIS-LM PerpHM1(X)
- with trans. HM-monologs PerpHM2(X)
- with trans. HH-dialogs PerpHH2(X)
- correlation btw. XN-Gram and XCFG
- ?0..1 of words
Kwrd(XCFG,XN-Gram) - of letters Kltr(XCFG,XN-Gram)
- occurrence of robot ?0,1 R(X)
- number of imperatives ?1,2,.. I(X)
8Methods of Classification
- simple comparison (threshold, etc.)
- perpHM(X) lt ? gt perpHH(X) hypothesis perpHM(X
?HM)ltperpHH(X ?HM) - K(XCFG,XN-Gramm) gt? threshold t hypothesis
X?HM ?K(XCFG,XN-Gramm) gt t - Bayes-Classifier
- P(HMX) lt ? gt P(HHX)
- Multilayer Perception
- input features
- output HM or HH
P(HMX) P(HHX)
perpHM(X) perpHH(X) S(X) K(XCFG,X N-Gramm)
Z(X)
9Preliminary Testing 1
perpHM2
perpHM1
transcript
(XN-Gram)
hypothesis
perpHH2
perpHH1
HH
HM
10Preliminary Testing 2
frequency
frequency
correlation of letters
correlation of words
HH
HM
11Preliminary Testing 3
frequency
HH
CFG parseable?
56
15
13
6
HH HM
HH HM
hypothesis transcript
HM
length of sentence
(XN-Gram)
12Preliminary Testing 4
Imperatives intact?
Does robot occur?
44
13
2
2
HH HM
HH HM
hypothesis transcript
transcript
(XN-Gram)
13Simple Linear Classifierusing Perplexity
LM without Data with Data
without with
error 26 39 precision 63
75 recall 50 33
error 26 39 precision 69 88 recall 50
39
14Simple Linear Classifierusing Correlation
frequency
letters
results wrd ltr errors
20 25 precision 100 50
recall 25 13
HH
HM
words
correlation
treshold twrd 10 tltr 50
15Bayes Classifier
- estimate two models p(e(X)HH) and p(e(X)HM)
with Gaussian distribution - e(X) ? perpHM(X), perpHH(X), S(X), K(XCFG,X
N-Gramm), Z(X) - estimate P(HH) and P(HM) by counting
- classificate C argmax P(e(X)C)P(C)
HH,HM
perp2(X)
e(X)
perpHH1/2(X),S(X)
perp1/2(X),S(X),Z(X)
error
23
20
28
precision
75
75
47
recall
19
38
56
the three best results
16Multi-layer Perceptron
Used Architectures A1 PerpHH1(X)
PerpHM1(X) A2 A1 S(X) Z(X) A3
A2 PerpHH2(X) PerpHM2(X) A4 A3
K(X) T1 some as A3 T2 A4
R(X) I(X)
on hypothesis (XN-Gram)
on transcript
A1
e(X)
T1
T2
A2
A3
A4
error
23
23
13
13
33
18
65
precision
100
56
90
83
43
recall
12
56
56
62
81
69
Besten Ergeb. mit 2,4,6 7 Merkmalen.
transcript
17Comparison of Methods
prec.
on tran- script
recall
prec.
error
Com-PP
26
50
69
HH
Correl.
20
25
100
Bayes
guess
38
75
20
HM
MLP
18
69
65
recall
18Summary Outlook
- MLP has better results than all other methods
- ( Com-PP gt Bayes gt Corrl
- decision criterion F-Measure)
- Focus of Attention on hypothesis only possible
with 65 precision and 69 recall - on transcript better than on hypothesis
- High expectations for further
modalities, especially state of dialogue and
gaze ? Diplomarbeit (further work)