Title: Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread
1Multimodal Interfaces Robust interaction where
graphical user interfaces fear to tread
- Philip R. Cohen
- Professor and Co-Director
- Center for Human-Computer Communication
- Oregon Health and Science Univ.
- http//www.cse.ogi.edu/CHCC
- and
- Natural Interaction Systems, LLC
2Team Effort
Co-PI Sharon Oviatt
Xiao Huang Ed Kaiser Sanjeev Kumar Rebecca
Lunsford Richard Wesson
- Rajah Annamalai
- Alex Arthur
- Paulo Barthelmess
- Rachel Coulston
- Marisa Flecha-Garcia
Multidisciplinary research
3Outline
- Multimodal Interaction
- Demonstration
- Multimodal architecture
- Benefits
- Tangible multimodal systems
- Educational applications
4Multimodal Interaction
- Use of one or more natural communication
modalitiese.g. , Speech, gesture, sketch - Advantages over GUI and unimodal systems
- Easier to use Less training
- Robust, flexible
- Preferred by users
- Faster, more efficient
- Supports new functionality
- Applies to many different environments and form
factors that challenge GUI, especially mobile ones
5Potential Application Areas
- Architecture and Design
- Geographical Information Systems
- Emergency Operations
- Field-based Operations
- Mobile Computing and Telecommunications
- Virtual/Augmented Reality
- Pervasive/Ubiquitous Computing
- Computer-Supported Collaborative Work
- Education
- Entertainment
6Challenges for multimodal interface design
- More than 2 modes e.g. spoken, gestural, facial
expression, gaze various sensors - Inputs are uncertain vs. Keyboard/mouse
- Corrupted by noise
- Multiple people
- Recognition is probabilistic
- Meaning is ambiguous
Design for uncertainty
7Approach
- Gain robustness via
- Fusion of inputs from multiple modalities
- Using strengths of one mode to compensate for
weaknesses of othersdesign time and run time - Avoiding/correcting errors
- Statistical architecture
- Confirmation
- Dialogue context
- Simplification of language in a multimodal
context - Output affecting/channeling input
-
8Demo
Started with 50 100Mhz 486
9Multimodal Architecture
10System Architecture
VR/AR Interfaces MAVEN BARS
Interagent Communication Language
Facilitator routing, triggering dispatching,
Sketch/ Gesture
Simulators
COM objects
ICL Horn Clauses
WebSvcs (XML, SOAP, )
Other Facilitators
Databases
Now core is 1 DLL
AAA
11Late MM Integration
- Parallel recognizers and understanders
- Time-stamped meaning fragments for each stream
- Common framework for meaning representation
typed feature structures - Meaning fusion operation -- unification
- Process for determining a joint interpretation
(subject to semantic, and spatiotemporal
constraints) - Statistical ranking
- Flexible asynchronous architecture
- Must handle unimodal and multimodal input
12Approach
- Parallel continuous speech recognition (via
Scansoft, Microsoft, IBM recognizers) and
continuous pen-gesture recognition (OGI) - Common meaning representation typed feature
structures - Meaning fusion via unification of typed feature
structures - Subject to semantic and temporal constraints
- Compare n spoken X n gestural interpretations
13Temporal Constraints
- Oviatt et al., 1997 (CHI97)
- Speech and gesture overlap, or
- Gesture precedes speech by lt 4 seconds
- Speech does not precede gesture
- Given sequence speech1 gesture speech2
- possible grouping speech1 (gesture speech2)
- Finding (Oviatt et al. 2004, 2005) -
- Users have a consistent temporal integration
style ? adapt -
14Advantages of multimodal integration via typed
feature structure unification
- Partiality
- Structure Sharing
- Mutual Compensation
- Multimodal Discourse
15Feature Structures
Type
Very common representation in Comp. Ling.-- FUG,
LFG, PATR e.g., lexical entries, grammar
rules, etc. Partiality can accumulate
features
16(No Transcript)
17MutualDisambiguation
speech
gesture
object
multimodal
mm1
s1
g1
o1
mm2
g2
o2
s2
- Each input mode provides a set of scored
recognition hypotheses
mm3
s3
g3
o3
mm4
g4
- MD derives the best joint interpretation by
unification of meaning representation fragments
- PMM aPS ßPG C learn a, ß, and C over a
multimodal corpus
- MD stabilizes system performance in challenging
environments
18Benefits of mutual disambiguation
Application RER Reference
Non-native speakers and moderate mobility 19-41 multimodal cmds Oviatt 99
Exerted users 35 multimodal cmds Kumar et al., ICMI, 2004
Multimodal 3D AR/VR environments 67 multimodal cmds Kaiser et al., 2003
New vocabulary speech and handwriting 66 phoneme 16 HW Kaiser et al., 2004, and Kaiser PhD thesis
Audiovisual speech recog in noisy environments 35-50 Words Potamianos, Neti et al. 2003
19Efficiency Benefits
CPOF MM 16x faster (NIS)
Lines Areas
20Multimodal New Vocabulary Recognition
Kaiser et al., 2004
Preliminary results Phone error rate via speech
47 via HW 19 via SPHW 16 Ongoing
work Timing-based OOV Phonetic alignment
(following Kondrak) Abbreviations Other knowledge
sources
21Personal Assistant that Learns
- DARPA project, via SRI International, 30
universities, 2nd year of 5 year project (we
hope) - CALO system
- helps with office tasks,
- assists during meetings
- learns from user behavior
22CALO Meeting Understanding Concept
Summarize Visualize Search Assist
- User Need
- Understand "What topics were discussed, what were
the participants positions, what decisions
resulted, what were the action items?" - Assist in carrying out their organizational role
- Approach
- Capture, analyze, summarize and enable users to
interactively explore meetings - Operate within context of the user and their
business process
23Sensing early processing
All data synchronized to enable fusion Multiple
sites may be involved
24Meeting room
Personal CALO introduces context Meeting agenda,
participants Projects, schedules,
people Articles, documents, email Topics and
language Personal models for voice, gesture .
25Our contributions
- Cognitive models of multimodal interaction during
meetings, - Intermodality timing, redundancy/complementarity
of modes, etc. - Multimodal integration of speech, gesture,
posture - Multimodal fusion of speech, sketch, and writing,
- Especially new vocabulary acquisition (other data
sources as well) - Useful for whiteboard activity and personal notes
on digital paper - Dialogue manager based on joint intention theory
- Track users individual and joint
(organizational) commitments through meetings,
email, tasks
26Data collection infrastructure
27Demonstration
CMU -- speech MIT body tracking OHSU
multimodal fusion (speech writing/sketch, 3D
gesture) Stanford (NLP, dialogue)
28Summary
- Multiparty multimodal dialogues incorporating 2D
(handwriting, sketch) and 3D gesture, head
position - Will be tested in May
- Goal is a system that people will use, initially
limited functionality - Future -- apply MNVR to notes, documents, email,
etc. to improve speech recognition
29Tangible Multimodal Systems for Safety-Critical
Applications
Whats Missing?
A Division Command Post during an exercise
McGee et al., CHI 02 Cohen McGee, CACM04
30What they use
31Many work practices rely on paper
ATC -- Mackay 98
ICU -- Gorman et al., 2000
32- Why do they use paper?
- Already know the interface
- Poor computer interfaces
- Fail-safe robust to power outages
- High resolution
- Large/small scale
- Cheap
- Lightweight
- Portable
- Collaboration
33Clinical Data Entry
- Perhaps the single greatest challenge that has
consistently confronted every clinical system
developer is to engage clinicians in direct data
entry (IOM, 1997, p. 125) -
- To make it simple for the practitioner to
interact with the record, data entry must be
almost as easy as writing. (IOM. 1997, p. 88)
34Multimodal Interaction with Paper(NIS)
Based on Anoto technology
35Benefits
- Most people (incl. kids, seniors) know how to use
the pen - Portability (works over cell phone)
- Ubiquity paper is everywhere
- Collaborative multiple simult. pens
- Next use for note-taking, alone or in meetings
fuse with ongoing speech - Many new applications e.g., architecture,
engineering, education, field data capture
36Elementary Science Education
37Quiet Interfaces that Help People Think
Sharon Oviatt oviatt_at_cse.ogi.edu http//www.cse.o
gi.edu/CHCC/