Title: Extending the Wizard of Oz Methodology for LanguageEnabled Multimodal Systems
1Extending the Wizard of Oz Methodology for
Language-Enabled Multimodal Systems
Marita Ailomaa, Miroslav Melichar, Martin Rajman
Artificial Intelligence Laboratory
Ecole
Polytechnique Fédérale de Lausanne (EPFL),
Switzerland
Agnes Lisowska, Susan Armstrong
ISSCO/TIM/ETI
University of Geneva,
Switzerland
Standard Wizard of Oz for speech interfaces
Extended Wizard of Oz for multimodal interfaces
Users work environment
Wizards environment
Goal of the experiments
Overview
Collected data
- Gather data about
- Interaction sequences
- Use of language for commands vs. queries
- Complexity of language input (keywords, short
expressions, full sentences) - The proportional use of different modalities
91 users 60 hours
of recording 5 months of experiments
- In a Wizard of Oz (WOz) experiment, the user
believes to be interacting with a fully automated
natural language dialogue system which, in fact,
is controlled by a wizard, who simulates one or
several components of the system, typically the
speech recognition, natural language
understanding and/or dialogue management. - The WOz methodology is useful because
- It allows to study the characteristics of
human-computer interaction as distinct from
human-human interaction - It enables to perform user evaluations of the
natural language interface at early stages of
development - It provides empirical data for the design and
implementation of the NLP components based on
real life interaction rather than hypothetical
models
Camera (hands)
Camera (face)
User
Loud speakers
Input wizard
- Use gathered data for
- Making design decisions about modalities to
include in the interface - Implementing appropriate NLP algorithms
- Validating and improving multimodal dialogue
strategies for the system
Output wizard
(A) Output Control Interface
(B) Users screen
(C) Users face
(D) Input Control Interface
Multimodal user interface
Recording equipment
Important issues
Input Control Interface
Output Control Interface
- Complex technical setup network of computers,
live video/audio streaming, advanced recording
equipment - Wizards cognitive load very high, two wizards
recommended - Wizards response time needs to be balanced for
the different modalities - Interaction flow user-driven, default prompts
may need to be changed dynamically by wizard
Interaction cycle
System-driven or mixed-initiative dialogue
User input from one modality only
An example The Archivus system
User
Wizard
A system for multimedia meeting data browsing and
retrieval
- The input modalities
- Text entry
- Voice
- The output modalities
- Audio
- Text
Semantic interpretation
Id like to eat Chinese food
Short-cuts for language commands
User request
Options for semantic interpretations of language
queries
Semantic interpretation
Speech generation
Constant view of database (search space)
Meeting book
What time?
Default prompt
Wizards prompts
The Wizards keep track on many aspects of the
interaction simultanously e.g. current graphical
element on the screen, current search criteria,
choice of next modality, task being solved
Database
Dialogue manager
Conclusions
- The WOz methodology is a valuable framework for
developing and evaluating multimodal interfaces,
because it allows to evaluate interdependent
elements of the interaction in parallel (language
use, other modality use, graphical interface
navigation) without setting too many a-priori
assumptions about how the users may want to
interact - A WOz setup for vocal dialogue systems needs to
be adapted to multimodal systems - Developing efficient interfaces for the wizards
is important for the quality of the data that is
elicited with a complex multimodal system
System prompt
Current search criteria
Form-filling dialogue model
Simple technical setup (one computer)
Text input
SELECT () FROM any table WHERE
Search criteria definition buttons