Title: Distributed Multimodal Synchronization Protocol (DMSP)
1Distributed Multimodal Synchronization Protocol
(DMSP)
- Chris Cross
- IETF 66
- July 13, 2006
With Contribution from Gerald McCobb and Les
Wilson
2DMSP Background
- Result of 4 year IBM and Motorola RD effort
- OMA Architecture Document
- Multimodal and Multi-device Architecture
- http//member.openmobilealliance.org/ftp/Public_do
cuments/BAC/MAE/Permanent_documents/OMA-AD-MMMD-V1
_0-20060612-D.zip - ID to IETF July 8, 2005 by IBM Motorola
- Reason for contribution
- A standard is needed to synchronize network based
services implementing distributed modalities in
multimodal applications - Other protocols may have overlap but do not
address all multimodal interaction requirements - Other IETF IDs and RFCs
- Media Server Control Protocol (MSCP)
- LRDP The Lightweight Remote Display Protocol
(Remote UI BoF) - Media Resource Control Protocol Version 2
(MRCPv2) - Widex
- RFC 1056 Distributed Mail System for Personal
Computers (also DMSP ?)
3Why do you need a distributed system, i.e., a
Thin Client?
A thick client has speech recognition and
synthesis on the device. As resources available
on a device shrink or the application
requirements increase (larger application
grammars) then the performance of the system
becomes unacceptable. When that threshold is
reached then it is economically feasible to
distribute the speech over the network.
Grammar Size and Complexity G
R Resources memory and MIPS on the client
device G Size and Complexity of application
grammars R/G 1 Resources are adequate to
perform real time recognition and synthesis.
4Multimodal Use Cases
- Opera XV Pizza demo
- XV
- JV
- Future W3C multimodal languages (VoiceXML 3, etc.)
5DMSP Architecture
- There are 4 DMSP building blocks
- Modalities
- Model-View-Controller (MVC) design pattern
- View Independent Model
- Event-based modality synchronization
6DMSP Architecture Building Blocks
- Modalities are Views in the MVC Pattern
- GUI, Speech, Pen
- Individual browsers for each modality
- Compound browsers for multiple modalities
Compound Browser
7DMSP Architecture Building Blocks
- Model-View-Controller (MVC) design pattern
- Multimodal system can be modeled in terms of the
MVC pattern - Each modality can be decomposed and implemented
in its own MVC pattern - A modality can implement a view independent model
and controller locally or use one in the network
(e.g., an IM)
8DMSP Architecture Building Blocks
- View Independent Model
- Enables a centralized model
- Modality interaction updates view and model
- Local event filters reflect important events to
view independent model - A modality listens to view independent model for
only the events it cares about - Compound clients and centralized control
through an Interaction Manager as well as
distributed modalities all enabled with a single
protocol
9DMSP Architecture Building Blocks
- Event-based synchronization
- Compound Client All modalities rendered in
client - Interactions in one modality reflected in others
thru event based changes to one or more model - GUI DOM can serve as View Independent model
- Something about connecting non-dom UAs to the
ones with a dom
10DMSP Architecture Building Blocks
- Event-based synchronization (CONTD)
- Distributed Modality A modality is handled in
the infrastructure - Requires the DMSP for distributing modality
- Event based synchronization via View Independent
Model gives a modality independent distribution
mechanism - Enables multiple topographies
- Compound Client w/ Distributed Modality
- Interaction Manager
11DMSP Design
- There are 4 abstract interfaces
- Command
- Response
- Event
- Signal
- Each interface defines a set of methods and
related data structures exchanged between user
agents - Specified as a set of messages
- XML and Binary message encodings
12DMSP Message Types
- Signals
- One-way asynchronous messages used to negotiate
internal processing states - Initialization (SIG_INIT)
- VXML Start (SIG_VXML_START)
- Close (SIG_CLOSE)
13DMSP Message Types
- Command and control messages
- Add and remove event listener (CMD_ADD/REMOVE_EVT_
LISTENER) - Can dispatch (CMD_CAN_DISPATCH)
- Dispatch event (CMD_DISPATCH_EVT)
- Load URL (CMD_LOAD_URL)
- Load Source (CMD_LOAD_SRC)
- Get and Set Focus (CMD_GET/SET_FOCUS)
- Get and Set Fields (CMD_GET/SET_FIELDS)
- Cancel (CMD_CANCEL)
- Execute Form (CMD_EXEC_FORM)
- Get and Set Cookies (CMD_GET/SET_COOKIES)
14DMSP Message Types
- Responses
- Response messages to commands
- OK (RESP_OK)
- Boolean (RESP_BOOL)
- String (RESP_STRING)
- Fields (RESP_FIELDS)
- Contains 1 or more Field data structures
- Error (RESP_ERROR)
15DMSP Message Types
- Events
- Asynchronous notifications between user agents
with a common data structure - Events correlated with event listeners
- DOM events
- DOMActivate, DOMFocusIn, and DOMFocusOut
- HTML 4 events
- Click, Mouse, Key, submit, reset, etc
- Error and abort
- VXML Done (e.g., VoiceXML form complete)
16DMSP Message Types
- Events (CONTD)
- Form Data
- One or more Field data structures (GUI or Voice)
- Recognition Results
- One or more Result data structures with raw
utterance, score, and one or more Field data
structures - Recognition Results EX
- One or more Result EX data structures with raw
utterance, score, grammar, and semantics - Start and stop play back
- Play back of audio or TTS prompts has started or
stopped - Start and stop play back mark
- TTS encounters a mark in the play text
- Custom (i.e., application-defined)
17DMSP Conclusion
- A protocol dedicated to distributed multimodal
interaction - Based on the Model-View-Controller design pattern
- Enables both Interaction Manager and Client based
View Independent Model topographies - Asynchronous signals and events
- Command-response messages
- Can be generalized for other modalities besides
GUI and Voice - Supports application specific result protocols
(e.g. EMMA) through extension TBD - Interested in getting more participation
18Draft Charter
- The convergence of wireless communications with
information technology and the miniaturization of
computing platforms have resulted in advanced
mobile devices that offer high resolution
displays, application programs with graphical
user interfaces, and access to the internet
through full function web browsers. - Mobile phones now support most of the
functionality of a laptop computer. However the
miniaturization that has made the technology
possible and commercially successful also puts
constraints on the user interface. Tiny displays
and keypads significantly reduce the usability of
application programs. - Multimodal user interfaces, UIs that offer
multiple modes of interaction, have been
developed that greatly improve the usability of
mobile devices. In particular multimodal UIs that
combine speech and graphical interaction are
proving themselves in the marketplace. - However, not all mobile devices provide the
computing resources to perform speech recognition
and synthesis locally on the device. For these
devices it is necessary to distribute the speech
modality to a server in the network.
19Draft Charter (cont.)
- The Distributed Multimodal Working Group will
develop the protocols necessary to control,
coordinate, and synchronize distributed
modalities in a distributed Multimodal system.
There are several protocols and standards
necessary to implement such a system including
DSR and AMR speech compression, session control,
and media streaming. However, the DM WG will
focus exclusively on the synchronization of
modalities being rendered across a network, in
particular Graphical User Interface and Voice
Servers. - The DM WG will develop an RFC for a Distributed
Multimodal Synchronization Protocol that defines
the logical message set to effect synchronization
between modalities and enough background on the
expected multimodal system architecture (or
reference architecture defined elsewhere in W3C
or OMA) to present a clear understanding of the
protocol. It will investigate existing protocols
for the transport of the logical synchronization
messages and develop an RFC detailing the message
format for commercial alternatives, including,
possibly, HTTP and SIP.
20Draft Charter (cont.)
- While not being limited to these, for simplicity
of the scope the protocol will assume RTP for
carriage of media, SIP and SDP for session
control, and DSR, AMR, QCELP, etc., for speech
compression. The working group will not consider
the authoring of applications as it will be
assumed that this will be done with existing W3C
markup standards such as XHTML and VoiceXML and
commercial programming languages like Java and
C/C.
21Draft Charter (cont.)
- It is expected that we will coordinate our work
in the IETF with the W3C Multimodal Interaction
Work Group. - The following are our goals for the Working
Group - Date Milestone
- TBD Submit Internet Draft Describing DMSP
(standards track) - TBD Submit Drafts to IESG for publication
- TBD Submit DMSP specification to IESG