Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread


1
Multimodal Interfaces Robust interaction where
graphical user interfaces fear to tread
  • Philip R. Cohen
  • Professor and Co-Director
  • Center for Human-Computer Communication
  • Oregon Health and Science Univ.
  • http//www.cse.ogi.edu/CHCC
  • and
  • Natural Interaction Systems, LLC

2
Team Effort
Co-PI Sharon Oviatt
Xiao Huang Ed Kaiser Sanjeev Kumar Rebecca
Lunsford Richard Wesson
  • Rajah Annamalai
  • Alex Arthur
  • Paulo Barthelmess
  • Rachel Coulston
  • Marisa Flecha-Garcia

Multidisciplinary research
3
Outline
  • Multimodal Interaction
  • Demonstration
  • Multimodal architecture
  • Benefits
  • Tangible multimodal systems
  • Educational applications

4
Multimodal Interaction
  • Use of one or more natural communication
    modalitiese.g. , Speech, gesture, sketch
  • Advantages over GUI and unimodal systems
  • Easier to use Less training
  • Robust, flexible
  • Preferred by users
  • Faster, more efficient
  • Supports new functionality
  • Applies to many different environments and form
    factors that challenge GUI, especially mobile ones

5
Potential Application Areas
  • Architecture and Design
  • Geographical Information Systems
  • Emergency Operations
  • Field-based Operations
  • Mobile Computing and Telecommunications
  • Virtual/Augmented Reality
  • Pervasive/Ubiquitous Computing
  • Computer-Supported Collaborative Work
  • Education
  • Entertainment

6
Challenges for multimodal interface design
  • More than 2 modes e.g. spoken, gestural, facial
    expression, gaze various sensors
  • Inputs are uncertain vs. Keyboard/mouse
  • Corrupted by noise
  • Multiple people
  • Recognition is probabilistic
  • Meaning is ambiguous

Design for uncertainty
7
Approach
  • Gain robustness via
  • Fusion of inputs from multiple modalities
  • Using strengths of one mode to compensate for
    weaknesses of othersdesign time and run time
  • Avoiding/correcting errors
  • Statistical architecture
  • Confirmation
  • Dialogue context
  • Simplification of language in a multimodal
    context
  • Output affecting/channeling input

8
Demo
Started with 50 100Mhz 486
9
Multimodal Architecture
10
System Architecture
VR/AR Interfaces MAVEN BARS
Interagent Communication Language
Facilitator routing, triggering dispatching,

Sketch/ Gesture
Simulators
COM objects
ICL Horn Clauses
WebSvcs (XML, SOAP, )
Other Facilitators
Databases
Now core is 1 DLL
AAA
11
Late MM Integration
  • Parallel recognizers and understanders
  • Time-stamped meaning fragments for each stream
  • Common framework for meaning representation
    typed feature structures
  • Meaning fusion operation -- unification
  • Process for determining a joint interpretation
    (subject to semantic, and spatiotemporal
    constraints)
  • Statistical ranking
  • Flexible asynchronous architecture
  • Must handle unimodal and multimodal input

12
Approach
  • Parallel continuous speech recognition (via
    Scansoft, Microsoft, IBM recognizers) and
    continuous pen-gesture recognition (OGI)
  • Common meaning representation typed feature
    structures
  • Meaning fusion via unification of typed feature
    structures
  • Subject to semantic and temporal constraints
  • Compare n spoken X n gestural interpretations

13
Temporal Constraints
  • Oviatt et al., 1997 (CHI97)
  • Speech and gesture overlap, or
  • Gesture precedes speech by lt 4 seconds
  • Speech does not precede gesture
  • Given sequence speech1 gesture speech2
  • possible grouping speech1 (gesture speech2)
  • Finding (Oviatt et al. 2004, 2005) -
  • Users have a consistent temporal integration
    style ? adapt

14
Advantages of multimodal integration via typed
feature structure unification
  • Partiality
  • Structure Sharing
  • Mutual Compensation
  • Multimodal Discourse

15
Feature Structures
Type
Very common representation in Comp. Ling.-- FUG,
LFG, PATR e.g., lexical entries, grammar
rules, etc. Partiality can accumulate
features
16
(No Transcript)
17
MutualDisambiguation
speech
gesture
object
multimodal
mm1
s1
g1
o1
mm2
g2
o2
s2
  • Each input mode provides a set of scored
    recognition hypotheses

mm3
s3
g3
o3
mm4
g4
  • MD derives the best joint interpretation by
    unification of meaning representation fragments
  • PMM aPS ßPG C learn a, ß, and C over a
    multimodal corpus
  • MD stabilizes system performance in challenging
    environments

18
Benefits of mutual disambiguation
Application RER Reference
Non-native speakers and moderate mobility 19-41 multimodal cmds Oviatt 99
Exerted users 35 multimodal cmds Kumar et al., ICMI, 2004
Multimodal 3D AR/VR environments 67 multimodal cmds Kaiser et al., 2003
New vocabulary speech and handwriting 66 phoneme 16 HW Kaiser et al., 2004, and Kaiser PhD thesis
Audiovisual speech recog in noisy environments 35-50 Words Potamianos, Neti et al. 2003

19
Efficiency Benefits
CPOF MM 16x faster (NIS)
Lines Areas
20
Multimodal New Vocabulary Recognition
Kaiser et al., 2004
Preliminary results Phone error rate via speech
47 via HW 19 via SPHW 16 Ongoing
work Timing-based OOV Phonetic alignment
(following Kondrak) Abbreviations Other knowledge
sources
21
Personal Assistant that Learns
  • DARPA project, via SRI International, 30
    universities, 2nd year of 5 year project (we
    hope)
  • CALO system
  • helps with office tasks,
  • assists during meetings
  • learns from user behavior

22
CALO Meeting Understanding Concept
Summarize Visualize Search Assist
  • User Need
  • Understand "What topics were discussed, what were
    the participants positions, what decisions
    resulted, what were the action items?"
  • Assist in carrying out their organizational role
  • Approach
  • Capture, analyze, summarize and enable users to
    interactively explore meetings
  • Operate within context of the user and their
    business process

23
Sensing early processing
All data synchronized to enable fusion Multiple
sites may be involved
24
Meeting room
Personal CALO introduces context Meeting agenda,
participants Projects, schedules,
people Articles, documents, email Topics and
language Personal models for voice, gesture .
25
Our contributions
  • Cognitive models of multimodal interaction during
    meetings,
  • Intermodality timing, redundancy/complementarity
    of modes, etc.
  • Multimodal integration of speech, gesture,
    posture
  • Multimodal fusion of speech, sketch, and writing,
  • Especially new vocabulary acquisition (other data
    sources as well)
  • Useful for whiteboard activity and personal notes
    on digital paper
  • Dialogue manager based on joint intention theory
  • Track users individual and joint
    (organizational) commitments through meetings,
    email, tasks

26
Data collection infrastructure
27
Demonstration
CMU -- speech MIT body tracking OHSU
multimodal fusion (speech writing/sketch, 3D
gesture) Stanford (NLP, dialogue)
28
Summary
  • Multiparty multimodal dialogues incorporating 2D
    (handwriting, sketch) and 3D gesture, head
    position
  • Will be tested in May
  • Goal is a system that people will use, initially
    limited functionality
  • Future -- apply MNVR to notes, documents, email,
    etc. to improve speech recognition

29
Tangible Multimodal Systems for Safety-Critical
Applications
Whats Missing?
A Division Command Post during an exercise
McGee et al., CHI 02 Cohen McGee, CACM04
30
What they use
31
Many work practices rely on paper
ATC -- Mackay 98
ICU -- Gorman et al., 2000
32
  • Why do they use paper?
  • Already know the interface
  • Poor computer interfaces
  • Fail-safe robust to power outages
  • High resolution
  • Large/small scale
  • Cheap
  • Lightweight
  • Portable
  • Collaboration

33
Clinical Data Entry
  • Perhaps the single greatest challenge that has
    consistently confronted every clinical system
    developer is to engage clinicians in direct data
    entry (IOM, 1997, p. 125)
  • To make it simple for the practitioner to
    interact with the record, data entry must be
    almost as easy as writing. (IOM. 1997, p. 88)

34
Multimodal Interaction with Paper(NIS)
Based on Anoto technology
35
Benefits
  • Most people (incl. kids, seniors) know how to use
    the pen
  • Portability (works over cell phone)
  • Ubiquity paper is everywhere
  • Collaborative multiple simult. pens
  • Next use for note-taking, alone or in meetings
    fuse with ongoing speech
  • Many new applications e.g., architecture,
    engineering, education, field data capture

36
Elementary Science Education
  • Sharon Oviatt

37
Quiet Interfaces that Help People Think
Sharon Oviatt oviatt_at_cse.ogi.edu http//www.cse.o
gi.edu/CHCC/
Write a Comment
User Comments (0)
About PowerShow.com