Title: An Oxygenated Presentation Manager
1An Oxygenated Presentation Manager
- Larry Rudolph
- Oxygen Workshop, January, 2002
2Goals Overview
- Integrate Many Oxygen Technologies
- Application Driven
- Use an application that we understand
- Personally use often
- Would help if were more human-centric
- Portable (as opposed to E-21)
- Develop Architectural Infrastructure
- Exposes new requirements
- Critique of Presentation Manager
- What is wrong with it
- What needs improvement
3Application Scenario
4An Oxygen Application
- Components
- Input
- Vision
- Speech
- Touch
- Output
- Projector
- Handheld
- Archive
- Processing
- Changing configuration
- Equipment
- Today, it is too hard ?
- Linux laptop windows laptop camera microphone
network projector power blocks - Tomorrow, much easier ?
- a couple of H21s
5Camera watching laser point on screen
- Camera Challenges
- Inexpensive ones have wrong focal length
- Alignment issues
- Use edge of screen, display pattern, figure out
from what is known to be visible - We ended up displaying a pattern of concentric
circles - Relative size of laser point depends on distance
- Beyond ten feet, had to use only certain types of
lasers - Could slow-down camera and let pixels saturate
(too complicated)
6Camera watching laser point on screen (cont)
- Camera Interface
- Click at point (x,y)
- Hold laser at same location for 5 seconds
- Select horizontal line ( (x1,y1) , (x1,y2) )
- Sweep laser back and forth, line is diameter of
ellipse - Select object centered at point (x,y)
- Sweep laser in circle, point is center of circle
- Previous or Next
- Click in left (right) 1/8 of screen
7Microphone listening to speaker
- Microphone
- Many technologies
- Lapel-mic mic array room microphone
- Current approach ipaq
- Continuous recognition
- Push to speak
- Audio server on ipaq
- Detects start and stop
- Best results when human pushes to start and
releases to stop - Audio wave file sent to Galaxy speech system
- Galaxy output actions via CGI-script
- A nice unifying mechanism
- One more complicated component
8Speaker controlling presentation via ipaq
- Ipaq output to CGI-script Server
- Same actions as from speech server
- Action are
- Next slide, Previous slide, Goto slide n, Goto
slide named ltxxxgt - Next item, Previous item, Goto item n, Goto item
named ltxxxgt - Next animations, previous animation, goto
animation n - Start presentation ltnamegt, End presentation,
Pause presentation - Initialize Camera, test microphone
- Handheld (Ipaq) display
- GUI generated from speechbuilder grammar
- List of slides, items per slides
- Currently use ad-hoc solution where power-point
sends lists to ipaq. Need more automatic
solution
9Output to projector, handheld, archive
- Unlimited number of video / audio output
producers - E.g. powerpoint just one producer of output
- At any time, each output device has an associated
producer - This producer can receive input from several
producers - Handheld has proxy
- To reduce bandwidth to ipaq
- Current slide, list of slides, list of commands
- Archive
- Each slide shown, audio (from a different
microphone) sent to archive - Currently just gif of current slide
10Processing controlling session
- Do not let powerpoint control the world
- Slide viewer movie player program execution
browser etc - Want to mix all types of applications
- Presenter has control of the output
- Eg Switch output producer from powerpoint to
media player - Remove interrupting technologies
- Dynamically disconnect any input / output source
- All done via core language
- Or some other glue language, e.g. meta-glue
- Which does all the other infrastructure issues
11Multi-Modal Input
- Shalini Agarwal
- Oxygen Conference
- January 8th, 2002
12Initial Experience With Presentation Manager
- One Single Monolithic Context
- Command within slide, between slides, between
applications - Problem
- Too many false positives
- Preliminary Solution
- Slide tracking
- e.g. recognize Next Slide command only after at
least 60 of words on slide have been said - e.g. recognize Show Demo only after slide 17
- Still lots of problems
- Many slide styles hard to track (e.g. figures not
words on slide) - Tracking for within slide different than for
between slides
13A Better Solution Multiple Contexts
- Very Active Research Area
- Intelligent-room project Galaxy Others
- Three layers, each having its own context
- Slide (Next Item, Next Animation)
- Presentation (Next Slide, Goto Conclusion, Goto
Example) - Session (Start Presentation, Switch to Browser,
Show Questions) - Challenges
- Each context requires its own speech recognition
system - Multicasting sound wave to each system
- Selecting the best result
14Extending the Galaxy System
- Start with context for speech and then extend
- Note, our goals are similar but not identical to
those of the Spoken Language Group - We are not dialog-based
- Exploit their work
- Follow Galaxy
- Recognizer scores different guesses at words
- Language Processing Unit uses input grammar to
select best input sentence - Scott Cyphers gave us the nbest interface
15Recognizer chooses 10 best guesses at word
matches (for this context)
Language Processor picks best sentence from
recognizer based on input grammar
16System Structure
17System Structure
Language Processor
Recognizer
next item
next item
next movie
Slide Layer
previous item
Selector
start presentation
Language Processor
Recognizer
Session Layer
end presentation
start presentation
start presentation
start explorer
18System Structure
19Add Recognizer for T9
Language Processor
Recognizer
next item
Slide Layer
Language Processor
T9 Input
Recognizer
Selector
Presentation Layer
go to slide nine
Sound Input
start presentation
Language Processor
Recognizer
Session Layer
start presentation
20Add Recognizer for Graffiti
Language Processor
next item
Slide Layer
Recognizer
Language Processor
T9 Input
Selector
Presentation Layer
go to slide nine
Sound Input
Graffiti Input
start presentation
Recognizer
Language Processor
Session Layer
start presentation
Recognizer
21Other Input Modes
- T9 (telephone keypad)
- To input a, b, or c press 2
- Current cell phones have dictionary to select
- correct word
- Lots of false positives (very annoying)
- Remember my introduction?
- Using an application-dependent grammar would
reduce errors
- Pen-based character input
- Use strokes to input characters
- Current palm pilot only recognizes Graffiti
alphabet - Lots of false positives (very annoying)
- Using an application-dependent grammar would
reduce errors
22Replacing the Recognizers
- Build recognizers for T9 and Graffiti
- Use Galaxy system to process results from new
recognizers
Language Generation
Speech Synthesis
Dialogue Management
Hub
Database Server
Audio
T9 Recog.
Discourse Resolution
Speech Recog.
Language Processing
Graffiti Recog.
23Conclusion
- Each application defines an input grammar
- This grammar can be used to
- Ensure that each application gets valid input
- It might not be what the user wanted, but the
application will understand it - Reduce false-positives
- Identify the input suitable for associated
application - Choose the application with the highest score
- If tie, must do something else (future research)
- Enable T9, Graffiti, Speech, other input modes
24Critique of Presentation Manager
25Vision / Gesture Recognition
- Laser Pointer
- Great for drawing attention to content
- Audience is primary consumer
- Secondary use to control presentation
- But it is not a mouse
- Semantics are tied to slide context
- Differs from Intelligent-room use
- Small number of identified gestures
- Gestures easily punctuated
- Low computational overhead
- Soon will be handled with a H21
26Critique of Vision / Gesture Recognition
- Laser Pointer
- Great for drawing attention to content
- Cheap technology but mostly distracting
- Too shaky, imprecise
- But it is not a mouse
- More awkward to use than mouse
- Another gadget to hold in the hand, button to
identify, batteries to maintain - Small number of identified gestures
- There are better ways of drawing attention to
slide content - I rarely use it and dont like it when others do
- Low computational overhead
- Dumb vs Intelligent Device Discussion
27Speech Recognition
- Initially seems like great idea
- Speaker is already speaking, so can use it to
control presentation - Want passive, intelligent listener
- Not a dialog
- No prompt alienating distraction
- Want no mistakes
- For dialog, better to guess than ignore
- For us, high cost for incorrect guess
- Most words are not relevant to speech system
- More trouble than it is worth
- But may be good for real-time search of content
28More useful aspect Output modalities
- Presenter has put the time and effort into the
production - Simplier is better
- Audience has harder task
- Understand material being presented
- Record thoughts, impressions, connections
- Filter for later review
- Process in real-time
- Keep-up with presentation
- Do all this with minimal distractions
- Output modalities
- Content for live audience
- Content for speaker (superset of audience)
- Content for retrieval
- Correlate notes with content
29Record and correlate notes with presentation
30CORE Communication Oriented Routing Environment
31Assumptions
- Actuators / Sensors (I/O) in the environment
- Many are shared by apps users
- Many are flaky / faulty
- User does not know much about them
- Environment, application, users desires change
over time
32An Oxygen Application
- Interconnected Collection of Stuff
- Who specifies the stuff?
- I dont know, but its mostly virtual stuff
- Many layers of abstraction
- Dont ask, its turtles all the way down
- Two main layers of programming
- Professionals
- Users, e.g. grandmother
33Communications-Oriented Programs
- Connecting the (virtual) stuff done by user
- Home stereo / theater analogy
- Plug Stuff together unplug it if doesnt work
- Dont like it, unplug it
- Device drivers, services, clients, dont know to
whom or to what they connect - In client/server model,
- server knows a lot about the client,
- the client knows even more about the server
- Extend Unix Pipes
34CORE
Other COREs
Larry Bear
35Message Flow
- Messages flow between nodes core
- Core is both language and router
- Within Core Router, some messages
- are interpreted and may trigger actions
- other messages get routed to other nodes
- Request-Reply message strategy
- Even number of messages
- No reply within time period, means error
36CORE Language Elements
- Four elements
- Nodes,
- Links,
- Messages,
- Rules
- Features
- Interpreted Language
- Statement is a message reply
- Each element has an inverse
37Nodehandler (nickname, specifier)
Nodes Specify via INS
Cam deviceweb-cam location518
PTRvision deviceprocess OSLinuxFileLaser
Vision, ..
CORE
Laser Vision
38 Node Statement Handler
- When node message arrives
- Verified for correctness (statements allowed)
- Routed to Node Manager (just another node)
- Node Manager
- INS lookup, verifies if allowed, creates if
needed - Creates core thread to manage communication with
node - Bookkeeping reply message with handle/error
39Links
Lcamera,vision (Cam,PTRvision)
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
40Link Statement Handler
- Message routed to link manager
- Two queries to node mng for thread cntl
- Message to thread controller of source node
- Specifying destination thread controller
- Message to thread controller of dest node
- Specifying source thread controller
- Bookkeeping reply message handler/error
41Messages
Messages flow over the links
Next Slide!
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
42Message Handling
- Messages can be encrypted
- Core statement messages have fixed format
- Everything else is data message
- Each node thread has two unbounded buffers
- Core to node Node to core
- Logging, rollback, fault-tolerance
43Rules
RULES (trigger,action)
( MESSQuestion , Lslide,lcd -- Lslide,qlcd )
Slide Speech
Presentation Speech
Questions
Command Speech
CORE
Questions
Questions
Laser Vision
44Rule Statement Handler
- ( trigger , consequence )
- Both are event sets
- Eight basic events
- Node, -Node, Link, -Link
- Message, -Message, Rule, -Rule
- Event set is a set of events
- Trigger is true when events are true
- Consequence makes events true
45Rules A link is a rule
- A message event is of form
- (node, message specifier)
- ( message specifier , node )
- Message came from or going to node
- A link (x,y) is just shorthand for the rule
- ( x , m ) ? ( - (x, m) , (m , y) )
- If a message m arrives at node x, then make that
event false (remove the message) and make the
event of m arriving at y from core true.
46Rules Access Control Lists
- An access control list is just a rule
- When messages arrive at node, if they arrive from
valid node, then allowed to continue to flow. - Modifying access control lists is just adding or
removing rules.
47Rules
- Rule statement gets sent to rule manager
- Event set is just another shorthand for rules
- Rule manager sends command to trigger node thread
that tells it about the consequence - Rules are reversible
48Reversibility
- Each statement is invertible (reversible)
- If there is an error in the application
specification, then can undo it all. - General debugging is possible with reversible
rules and message flow