Focus of Attention Based on Speech Recognizer Hypothesis - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Focus of Attention Based on Speech Recognizer Hypothesis

Description:

Focus of Attention Based on Speech Recognizer Hypothesis. Michael Katzenmaier ... some100 text only sentences (commands) (2 Persons 1 Room/Robot) for every turn X: ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 19
Provided by: i13p4
Category:

less

Transcript and Presenter's Notes

Title: Focus of Attention Based on Speech Recognizer Hypothesis


1
Focus of Attention Based on Speech Recognizer
Hypothesis
Michael Katzenmaier Interactive Systems
Labs University of Karlsruhe
2
Outline
  • Motivation
  • Basics
  • Settings
  • Features
  • Methods of Classification
  • Experiments
  • Preliminary Testing of Features
  • Results Comparison
  • Summary Outlook

3
Motivation
  • Machines are increasingly involved in human
    interaction, e.g.
  • Improved Usability through integration of
    human-machine communication (HM)
  • in human-human communication (HH)
  • gt Requires Determination of Focus of Attention
  • intelligent room with voice-operated equipment
    (lighting, video/audio, etc.)
  • household robot
  • home multimedia terminal

4
Possible Input Modalities forTracking Focus of
Attention
hypothesis of speech recognizer eye gaze state
of dialogue place of speaker gesture ...
conversation human-human (HH)
classifier
commands human-machine (HM)
5
In this work
  • restriction on hypothesis
  • of speech recognizer
  • extracted features
  • on hypothesis and transcription

6
Experiment Settings
  • data - approx. 10 min. real collected dialogues
  • - some100 text only sentences (commands)
  • (2 Persons 1 Room/Robot)
  • for every turn X
  • transcript XTrans
  • N-Gram-Recognizer hypothesis XN-Gram
  • CFG-Recognizer hypothesis XCFG
  • corresponding features
  • hand-crafted CFG customized for Human Robots
    Project
  • N-Gram-Recognizer JANUS Verbmobil-Evaluation
    System
  • Stuttgart Neuronal Network Simulator for
    NN-experiments

7
Features
  • sentence length ?1,2,.. S(X)
    XN-Gram
  • CFG- parseable ?0,1 Z(X)
  • perplexity with Verbmobil-LM PerpHH1(X)
  • ?0..? with VODIS-LM PerpHM1(X)
  • with trans. HM-monologs PerpHM2(X)
  • with trans. HH-dialogs PerpHH2(X)
  • correlation btw. XN-Gram and XCFG
  • ?0..1 of words
    Kwrd(XCFG,XN-Gram)
  • of letters Kltr(XCFG,XN-Gram)
  • occurrence of robot ?0,1 R(X)
  • number of imperatives ?1,2,.. I(X)

8
Methods of Classification
  • simple comparison (threshold, etc.)
  • perpHM(X) lt ? gt perpHH(X) hypothesis perpHM(X
    ?HM)ltperpHH(X ?HM)
  • K(XCFG,XN-Gramm) gt? threshold t hypothesis
    X?HM ?K(XCFG,XN-Gramm) gt t
  • Bayes-Classifier
  • P(HMX) lt ? gt P(HHX)
  • Multilayer Perception
  • input features
  • output HM or HH

P(HMX) P(HHX)
perpHM(X) perpHH(X) S(X) K(XCFG,X N-Gramm)
Z(X)
9
Preliminary Testing 1
perpHM2
perpHM1
transcript
(XN-Gram)
hypothesis
perpHH2
perpHH1
HH
HM
10
Preliminary Testing 2
frequency
frequency
correlation of letters
correlation of words
HH
HM
11
Preliminary Testing 3
frequency
HH
CFG parseable?
56
15
13
6
HH HM
HH HM
hypothesis transcript
HM
length of sentence
(XN-Gram)
12
Preliminary Testing 4
Imperatives intact?
Does robot occur?
44
13
2
2
HH HM
HH HM
hypothesis transcript
transcript
(XN-Gram)
13
Simple Linear Classifierusing Perplexity
LM without Data with Data
without with
error 26 39 precision 63
75 recall 50 33
error 26 39 precision 69 88 recall 50
39
14
Simple Linear Classifierusing Correlation
frequency
letters
results wrd ltr errors
20 25 precision 100 50
recall 25 13
HH
HM
words
correlation
treshold twrd 10 tltr 50
15
Bayes Classifier
  • estimate two models p(e(X)HH) and p(e(X)HM)
    with Gaussian distribution
  • e(X) ? perpHM(X), perpHH(X), S(X), K(XCFG,X
    N-Gramm), Z(X)
  • estimate P(HH) and P(HM) by counting
  • classificate C argmax P(e(X)C)P(C)

HH,HM
perp2(X)
e(X)
perpHH1/2(X),S(X)
perp1/2(X),S(X),Z(X)
error
23
20
28
precision
75
75
47
recall
19
38
56
the three best results
16
Multi-layer Perceptron
Used Architectures A1 PerpHH1(X)
PerpHM1(X) A2 A1 S(X) Z(X) A3
A2 PerpHH2(X) PerpHM2(X) A4 A3
K(X) T1 some as A3 T2 A4
R(X) I(X)
on hypothesis (XN-Gram)
on transcript
A1
e(X)
T1
T2
A2
A3
A4
error
23
23
13
13
33
18
65
precision
100
56
90
83
43
recall
12
56
56
62
81
69
Besten Ergeb. mit 2,4,6 7 Merkmalen.
transcript
17
Comparison of Methods
prec.
on tran- script
recall
prec.
error
Com-PP
26
50
69
HH
Correl.
20
25
100
Bayes
guess
38
75
20
HM
MLP
18
69
65
recall
18
Summary Outlook
  • MLP has better results than all other methods
  • ( Com-PP gt Bayes gt Corrl
  • decision criterion F-Measure)
  • Focus of Attention on hypothesis only possible
    with 65 precision and 69 recall
  • on transcript better than on hypothesis
  • High expectations for further
    modalities, especially state of dialogue and
    gaze ? Diplomarbeit (further work)
Write a Comment
User Comments (0)
About PowerShow.com