Title: A Context Inference and Multimodal Approach to Mobile Information Access
1A Context Inference and Multimodal Approach to
Mobile Information Access
- David West
- Trent Apted
- Aaron Quigley
2Overview
- Multimodal Interaction (David West)
- Motivation Email Scenario
- Agent-Based Architecture
- EMMA
- Application Supplied Components
- Input Generation
- Implicit IO Agent Configuration
- Application Processing Context
- Implementation
- Context Awareness (Trent Apted)
- Relevance
- Our Approach
- Implementation
- Conclusion
3Motivation - Example
Instrumented Environment
4Motivation
- To support perceptual spaces - multiple users,
multiple (embedded) input/output devices,
multiple applications - To enable seamless roaming between PAN/PLAN/PWAN
environments - User should be able to approach any device and
use it to continue her multimodal dialogue
without interruption (e.g. without explicit
configuration). - To decouple multimodal application processing
logic from modality specific input/output
processing logic
5Distributed Agent-Based Architecture
Input Agents
Application Agents
Output Agents
Speech
Context Plane
Context Plane
Email
Voice
EMMA
Pen
Graphics (XHTML/ XUL/)
Scrap book
EMMA
Mouse/ Keyboard
....
- Applications agents, input/output agents reside
on multiple devices - Context plane controls their bindings
6EMMA Example
ltemmaemma emmaversion"1.0" gt
ltemmainterpretation emmaid"speech1"
emmastart"2004-07-26T00000.2"
emmaend"2004-07-26T00000.4"
emmaconfidence"0.8" emmamedium"acoustic"
emmamode"speech" gt ltcommandgtnextlt/comma
ndgt lt/emmainterpretationgt ltemmainterpretatio
ngtlt/emmainterpretationgt lt/emmagt
7Application Supplied Components for Input and
Output Agents
8Input Generation
- Recognition Users produce an input signal which
is passed through a recogniser in the input
agent. Recogniser is constrained by a grammar. - E.g. an email application speech grammar may look
like - public ltCOMMANDgt Read Next Delete ltFILEgt
- ltFILEgt (File store move) (in to) folder
ltFOLDERgt - ltFOLDERgt personal spam work
9Application Supplied Components for Input and
Output Agents
10Input Generation
- Interpretation Once user input is recognized, it
is passed through an interpretation component to
produce EMMA. - E.g. An EMMA document may contain the following
interpretation - ltcommandgt
- File
- ltfoldergt
- personal
- lt/foldergt
- lt/commandgt
11Implicit Activation of IO Agents
- Command agent activation of input and output
agents - An identification component user indicates they
wish to use an agent - Simple GUI login mechanism
- Could use biometric identification mechanisms
(thumbprint, retina scanners, ) - RFID, proximity sensors,
12Seamless Agent Configuration
- Pen agent requests ApplicationWithFocus for
current user - Pen agent requests dataplane location of grammar
and interpreter - Pen agent loads grammar and interpreter
- 4-6. As for 1-3.
- Pen agent sends EMMA to email agent
- Email agent sends application defined output to
Voice agent.
13Application Processing Context
- Input agents also load command grammars/interprete
rs - Allow application focus shifts
- E.g. Switch to my scrap book application
- Application focus shift causes context plane to
trigger all input/output agents in use by a user
to rebind to the new application - Currently only one active application allowed.
14Implementation
- Object Oriented framework
- Input modes pen, speech, GUI
- Output modes text-to-speech, GUI
- Test applications email and scrapbook
- See our UbiComp 2004 demo paper The UbiComp
scrapbook - Current context plane implemented using LIME
(Linda In a Mobile Environment), providing a
shared tuple space abstraction - Grammars, interpreters, stylers are Java classes.
- Stored in the data plane
- Custom class loaders in IO agents load these from
the data plane as required
15Context Awareness
16Context Awareness
- Not just concerned with location and time, nor
simply computing resources - Applications
- present their own context
- provide evidence to influence the resolution
- we resolve the context into high-level concepts
- We want to function well without this evidence,
but better when we have it - We want to learn from user decisions
17Motivation - Recap
Instrumented Environment
18Scenario
- As you walk into a room, you receive an email
- There is evidence to suggest that the email is
urgent and confidential - From past actions, we know that you prefer to
read email on large displays - However, the large, public display in this room
can be seen by anyone also in the room - But we also know there is nobody else in the room
19Decisions
- How do we convey the context to the email
application? - How do we initially present the email?
- How do you influence the choice of output
modality we make for you? - How are you able to reply to the email?
- How do we learn from the choices you make?
20Our Approach
- Basic Rules (domain knowledge)
- Ontologies to aid our sharing of context
- Infer context through relationships
- Probabilistic and temporal logic
- Resolve context to establish possible actions
- Rank the possibilities based on suitability, user
preference and evidence provided - Feedback (reinforcement learning)
- If the user adjusts the decision adjust the
reasoning model - The Context Plane
21Implementation (current work)
- We want to harness existing work
- Representation, Rule Based Inference
- CYC Upper Ontology (domain knowledge)
- F-OWL (f-logic for the Web Ontology Language)
- Probabilistic and Temporal Logic
- Dynamic Bayesian Networks (K. Murphy 2002)
- Intel Probabilistic Networks Library (PNL)
- Infrastructure (the Plane)
- Applications tap in regardless of connectivity
- Feedback, new applications, new context
- Our own techniques for collecting evidence and
dynamically adapting our inference network
22Conclusion
- An infrastructure and protocol for multi-modal
interaction - supports multiple users across multiple
applications - multiple I/O modalities in a mobile/instrumented
environment - Integrated with a supporting information access
infrastructure (currently using LIME) - Discussion
- dwest, tapted, aquigley_at_it.usyd.edu.au
23Questions / Discussion
24This slide intentionally left blank ?
25Virtual Personal Server Space (VPSS)
26Application Input
- Input agents send application defined, modality
neutral input to application agents in the form
of Extensible MultiModal Annotation Language
(EMMA), part of W3Cs multimodal application
framework - Emma consists of
- Instance data application specific
interpretation(s) of user intent - Data model specifies constraints on the format
of instance data. E.g. XML schema, DTD. May be
implicit - Metadata information about the instance data.
E.g. timestamps, confidence scores, process
information...
27Software Architecture
- Objected-Oriented Framework
- Application writers write top-most layer only
28Our Method
- The Context Plane
- Collects and resolves context from the
infrastructure - Makes context available to mobile devices
- Collects context/evidence from applications to
share with other applications and assist
inferences - Uses a common protocol between applications
- Bind application and I/O agents across multiple
devices