An Oxygenated Presentation Manager - PowerPoint PPT Presentation

About This Presentation

Title:

An Oxygenated Presentation Manager

Description:

Current palm pilot only recognizes 'Graffiti' alphabet. Lots of false positives (very annoying) ... recognizers for T9 and Graffiti. Use Galaxy system to ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 49

Provided by: spokenlang

Learn more at: http://csg.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: An Oxygenated Presentation Manager

1
An Oxygenated Presentation Manager

Larry Rudolph
Oxygen Workshop, January, 2002

2
Goals Overview

Integrate Many Oxygen Technologies
Application Driven
Use an application that we understand
Personally use often
Would help if were more human-centric
Portable (as opposed to E-21)
Develop Architectural Infrastructure
Exposes new requirements
Critique of Presentation Manager
What is wrong with it
What needs improvement

3
Application Scenario
4
An Oxygen Application

Components
Input
Vision
Speech
Touch

Output
Projector
Handheld
Archive

Processing
Changing configuration

Equipment
Today, it is too hard ?
Linux laptop windows laptop camera microphone
network projector power blocks
Tomorrow, much easier ?
a couple of H21s

5
Camera watching laser point on screen

Camera Challenges
Inexpensive ones have wrong focal length
Alignment issues
Use edge of screen, display pattern, figure out
from what is known to be visible
We ended up displaying a pattern of concentric
circles
Relative size of laser point depends on distance
Beyond ten feet, had to use only certain types of
lasers
Could slow-down camera and let pixels saturate
(too complicated)

6
Camera watching laser point on screen (cont)

Camera Interface
Click at point (x,y)
Hold laser at same location for 5 seconds
Select horizontal line ( (x1,y1) , (x1,y2) )
Sweep laser back and forth, line is diameter of
ellipse
Select object centered at point (x,y)
Sweep laser in circle, point is center of circle
Previous or Next
Click in left (right) 1/8 of screen

7
Microphone listening to speaker

Microphone
Many technologies
Lapel-mic mic array room microphone
Current approach ipaq
Continuous recognition
Push to speak
Audio server on ipaq
Detects start and stop
Best results when human pushes to start and
releases to stop
Audio wave file sent to Galaxy speech system
Galaxy output actions via CGI-script
A nice unifying mechanism
One more complicated component

8
Speaker controlling presentation via ipaq

Ipaq output to CGI-script Server
Same actions as from speech server
Action are
Next slide, Previous slide, Goto slide n, Goto
slide named ltxxxgt
Next item, Previous item, Goto item n, Goto item
named ltxxxgt
Next animations, previous animation, goto
animation n
Start presentation ltnamegt, End presentation,
Pause presentation
Initialize Camera, test microphone
Handheld (Ipaq) display
GUI generated from speechbuilder grammar
List of slides, items per slides
Currently use ad-hoc solution where power-point
sends lists to ipaq. Need more automatic
solution

9
Output to projector, handheld, archive

Unlimited number of video / audio output
producers
E.g. powerpoint just one producer of output
At any time, each output device has an associated
producer
This producer can receive input from several
producers
Handheld has proxy
To reduce bandwidth to ipaq
Current slide, list of slides, list of commands
Archive
Each slide shown, audio (from a different
microphone) sent to archive
Currently just gif of current slide

10
Processing controlling session

Do not let powerpoint control the world
Slide viewer movie player program execution
browser etc
Want to mix all types of applications
Presenter has control of the output
Eg Switch output producer from powerpoint to
media player
Remove interrupting technologies
Dynamically disconnect any input / output source
All done via core language
Or some other glue language, e.g. meta-glue
Which does all the other infrastructure issues

11
Multi-Modal Input

Shalini Agarwal
Oxygen Conference
January 8th, 2002

12
Initial Experience With Presentation Manager

One Single Monolithic Context
Command within slide, between slides, between
applications
Problem
Too many false positives
Preliminary Solution
Slide tracking
e.g. recognize Next Slide command only after at
least 60 of words on slide have been said
e.g. recognize Show Demo only after slide 17
Still lots of problems
Many slide styles hard to track (e.g. figures not
words on slide)
Tracking for within slide different than for
between slides

13
A Better Solution Multiple Contexts

Very Active Research Area
Intelligent-room project Galaxy Others
Three layers, each having its own context
Slide (Next Item, Next Animation)
Presentation (Next Slide, Goto Conclusion, Goto
Example)
Session (Start Presentation, Switch to Browser,
Show Questions)
Challenges
Each context requires its own speech recognition
system
Multicasting sound wave to each system
Selecting the best result

14
Extending the Galaxy System

Start with context for speech and then extend
Note, our goals are similar but not identical to
those of the Spoken Language Group
We are not dialog-based
Exploit their work

Follow Galaxy
Recognizer scores different guesses at words
Language Processing Unit uses input grammar to
select best input sentence
Scott Cyphers gave us the nbest interface

15
Recognizer chooses 10 best guesses at word
matches (for this context)
Language Processor picks best sentence from
recognizer based on input grammar
16
System Structure
17
System Structure
Language Processor
Recognizer
next item
next item
next movie
Slide Layer
previous item
Selector
start presentation
Language Processor
Recognizer
Session Layer
end presentation
start presentation
start presentation
start explorer
18
System Structure
19
Add Recognizer for T9
Language Processor
Recognizer
next item
Slide Layer
Language Processor
T9 Input
Recognizer
Selector
Presentation Layer
go to slide nine
Sound Input
start presentation
Language Processor
Recognizer
Session Layer
start presentation
20
Add Recognizer for Graffiti
Language Processor
next item
Slide Layer
Recognizer
Language Processor
T9 Input
Selector
Presentation Layer
go to slide nine
Sound Input
Graffiti Input
start presentation
Recognizer
Language Processor
Session Layer
start presentation
Recognizer
21
Other Input Modes

T9 (telephone keypad)
To input a, b, or c press 2
Current cell phones have dictionary to select
correct word
Lots of false positives (very annoying)
Remember my introduction?
Using an application-dependent grammar would
reduce errors

Pen-based character input
Use strokes to input characters
Current palm pilot only recognizes Graffiti
alphabet
Lots of false positives (very annoying)
Using an application-dependent grammar would
reduce errors

22
Replacing the Recognizers

Build recognizers for T9 and Graffiti
Use Galaxy system to process results from new
recognizers

Language Generation
Speech Synthesis
Dialogue Management
Hub
Database Server
Audio
T9 Recog.
Discourse Resolution
Speech Recog.
Language Processing
Graffiti Recog.
23
Conclusion

Each application defines an input grammar
This grammar can be used to
Ensure that each application gets valid input
It might not be what the user wanted, but the
application will understand it
Reduce false-positives
Identify the input suitable for associated
application
Choose the application with the highest score
If tie, must do something else (future research)
Enable T9, Graffiti, Speech, other input modes

24
Critique of Presentation Manager
25
Vision / Gesture Recognition

Laser Pointer
Great for drawing attention to content
Audience is primary consumer
Secondary use to control presentation
But it is not a mouse
Semantics are tied to slide context
Differs from Intelligent-room use
Small number of identified gestures
Gestures easily punctuated
Low computational overhead
Soon will be handled with a H21

26
Critique of Vision / Gesture Recognition

Laser Pointer
Great for drawing attention to content
Cheap technology but mostly distracting
Too shaky, imprecise
But it is not a mouse
More awkward to use than mouse
Another gadget to hold in the hand, button to
identify, batteries to maintain
Small number of identified gestures
There are better ways of drawing attention to
slide content
I rarely use it and dont like it when others do
Low computational overhead
Dumb vs Intelligent Device Discussion

27
Speech Recognition

Initially seems like great idea
Speaker is already speaking, so can use it to
control presentation
Want passive, intelligent listener
Not a dialog
No prompt alienating distraction
Want no mistakes
For dialog, better to guess than ignore
For us, high cost for incorrect guess
Most words are not relevant to speech system
More trouble than it is worth
But may be good for real-time search of content

28
More useful aspect Output modalities

Presenter has put the time and effort into the
production
Simplier is better
Audience has harder task
Understand material being presented
Record thoughts, impressions, connections
Filter for later review
Process in real-time
Keep-up with presentation
Do all this with minimal distractions
Output modalities
Content for live audience
Content for speaker (superset of audience)
Content for retrieval
Correlate notes with content

29
Record and correlate notes with presentation
30
CORE Communication Oriented Routing Environment

(Oxygen Research Group)

31
Assumptions

Actuators / Sensors (I/O) in the environment
Many are shared by apps users
Many are flaky / faulty
User does not know much about them
Environment, application, users desires change
over time

32
An Oxygen Application

Interconnected Collection of Stuff
Who specifies the stuff?
I dont know, but its mostly virtual stuff
Many layers of abstraction
Dont ask, its turtles all the way down
Two main layers of programming
Professionals
Users, e.g. grandmother

33
Communications-Oriented Programs

Connecting the (virtual) stuff done by user
Home stereo / theater analogy
Plug Stuff together unplug it if doesnt work
Dont like it, unplug it
Device drivers, services, clients, dont know to
whom or to what they connect
In client/server model,
server knows a lot about the client,
the client knows even more about the server
Extend Unix Pipes

34
CORE
Other COREs
Larry Bear
35
Message Flow

Messages flow between nodes core
Core is both language and router
Within Core Router, some messages
are interpreted and may trigger actions
other messages get routed to other nodes
Request-Reply message strategy
Even number of messages
No reply within time period, means error

36
CORE Language Elements

Four elements
Nodes,
Links,
Messages,
Rules
Features
Interpreted Language
Statement is a message reply
Each element has an inverse

37
Nodehandler (nickname, specifier)
Nodes Specify via INS
Cam deviceweb-cam location518
PTRvision deviceprocess OSLinuxFileLaser
Vision, ..
CORE
Laser Vision
38
Node Statement Handler

When node message arrives
Verified for correctness (statements allowed)
Routed to Node Manager (just another node)
Node Manager
INS lookup, verifies if allowed, creates if
needed
Creates core thread to manage communication with
node
Bookkeeping reply message with handle/error

39
Links
Lcamera,vision (Cam,PTRvision)
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
40
Link Statement Handler

Message routed to link manager
Two queries to node mng for thread cntl
Message to thread controller of source node
Specifying destination thread controller
Message to thread controller of dest node
Specifying source thread controller
Bookkeeping reply message handler/error

41
Messages
Messages flow over the links
Next Slide!
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
42
Message Handling

Messages can be encrypted
Core statement messages have fixed format
Everything else is data message
Each node thread has two unbounded buffers
Core to node Node to core
Logging, rollback, fault-tolerance

43
Rules
RULES (trigger,action)
( MESSQuestion , Lslide,lcd -- Lslide,qlcd )
Slide Speech
Presentation Speech
Questions
Command Speech
CORE
Questions
Questions
Laser Vision
44
Rule Statement Handler

( trigger , consequence )
Both are event sets
Eight basic events
Node, -Node, Link, -Link
Message, -Message, Rule, -Rule
Event set is a set of events
Trigger is true when events are true
Consequence makes events true

45
Rules A link is a rule

A message event is of form
(node, message specifier)
( message specifier , node )
Message came from or going to node
A link (x,y) is just shorthand for the rule
( x , m ) ? ( - (x, m) , (m , y) )
If a message m arrives at node x, then make that
event false (remove the message) and make the
event of m arriving at y from core true.

46
Rules Access Control Lists

An access control list is just a rule
When messages arrive at node, if they arrive from
valid node, then allowed to continue to flow.
Modifying access control lists is just adding or
removing rules.

47
Rules

Rule statement gets sent to rule manager
Event set is just another shorthand for rules
Rule manager sends command to trigger node thread
that tells it about the consequence
Rules are reversible

48
Reversibility