An Oxygenated Presentation Manager - PowerPoint PPT Presentation

About This Presentation
Title:

An Oxygenated Presentation Manager

Description:

Current palm pilot only recognizes 'Graffiti' alphabet. Lots of false positives (very annoying) ... recognizers for T9 and Graffiti. Use Galaxy system to ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 49
Provided by: spokenlang
Learn more at: http://csg.csail.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: An Oxygenated Presentation Manager


1
An Oxygenated Presentation Manager
  • Larry Rudolph
  • Oxygen Workshop, January, 2002

2
Goals Overview
  • Integrate Many Oxygen Technologies
  • Application Driven
  • Use an application that we understand
  • Personally use often
  • Would help if were more human-centric
  • Portable (as opposed to E-21)
  • Develop Architectural Infrastructure
  • Exposes new requirements
  • Critique of Presentation Manager
  • What is wrong with it
  • What needs improvement

3
Application Scenario
4
An Oxygen Application
  • Components
  • Input
  • Vision
  • Speech
  • Touch
  • Output
  • Projector
  • Handheld
  • Archive
  • Processing
  • Changing configuration
  • Equipment
  • Today, it is too hard ?
  • Linux laptop windows laptop camera microphone
    network projector power blocks
  • Tomorrow, much easier ?
  • a couple of H21s

5
Camera watching laser point on screen
  • Camera Challenges
  • Inexpensive ones have wrong focal length
  • Alignment issues
  • Use edge of screen, display pattern, figure out
    from what is known to be visible
  • We ended up displaying a pattern of concentric
    circles
  • Relative size of laser point depends on distance
  • Beyond ten feet, had to use only certain types of
    lasers
  • Could slow-down camera and let pixels saturate
    (too complicated)

6
Camera watching laser point on screen (cont)
  • Camera Interface
  • Click at point (x,y)
  • Hold laser at same location for 5 seconds
  • Select horizontal line ( (x1,y1) , (x1,y2) )
  • Sweep laser back and forth, line is diameter of
    ellipse
  • Select object centered at point (x,y)
  • Sweep laser in circle, point is center of circle
  • Previous or Next
  • Click in left (right) 1/8 of screen

7
Microphone listening to speaker
  • Microphone
  • Many technologies
  • Lapel-mic mic array room microphone
  • Current approach ipaq
  • Continuous recognition
  • Push to speak
  • Audio server on ipaq
  • Detects start and stop
  • Best results when human pushes to start and
    releases to stop
  • Audio wave file sent to Galaxy speech system
  • Galaxy output actions via CGI-script
  • A nice unifying mechanism
  • One more complicated component

8
Speaker controlling presentation via ipaq
  • Ipaq output to CGI-script Server
  • Same actions as from speech server
  • Action are
  • Next slide, Previous slide, Goto slide n, Goto
    slide named ltxxxgt
  • Next item, Previous item, Goto item n, Goto item
    named ltxxxgt
  • Next animations, previous animation, goto
    animation n
  • Start presentation ltnamegt, End presentation,
    Pause presentation
  • Initialize Camera, test microphone
  • Handheld (Ipaq) display
  • GUI generated from speechbuilder grammar
  • List of slides, items per slides
  • Currently use ad-hoc solution where power-point
    sends lists to ipaq. Need more automatic
    solution

9
Output to projector, handheld, archive
  • Unlimited number of video / audio output
    producers
  • E.g. powerpoint just one producer of output
  • At any time, each output device has an associated
    producer
  • This producer can receive input from several
    producers
  • Handheld has proxy
  • To reduce bandwidth to ipaq
  • Current slide, list of slides, list of commands
  • Archive
  • Each slide shown, audio (from a different
    microphone) sent to archive
  • Currently just gif of current slide

10
Processing controlling session
  • Do not let powerpoint control the world
  • Slide viewer movie player program execution
    browser etc
  • Want to mix all types of applications
  • Presenter has control of the output
  • Eg Switch output producer from powerpoint to
    media player
  • Remove interrupting technologies
  • Dynamically disconnect any input / output source
  • All done via core language
  • Or some other glue language, e.g. meta-glue
  • Which does all the other infrastructure issues

11
Multi-Modal Input
  • Shalini Agarwal
  • Oxygen Conference
  • January 8th, 2002

12
Initial Experience With Presentation Manager
  • One Single Monolithic Context
  • Command within slide, between slides, between
    applications
  • Problem
  • Too many false positives
  • Preliminary Solution
  • Slide tracking
  • e.g. recognize Next Slide command only after at
    least 60 of words on slide have been said
  • e.g. recognize Show Demo only after slide 17
  • Still lots of problems
  • Many slide styles hard to track (e.g. figures not
    words on slide)
  • Tracking for within slide different than for
    between slides

13
A Better Solution Multiple Contexts
  • Very Active Research Area
  • Intelligent-room project Galaxy Others
  • Three layers, each having its own context
  • Slide (Next Item, Next Animation)
  • Presentation (Next Slide, Goto Conclusion, Goto
    Example)
  • Session (Start Presentation, Switch to Browser,
    Show Questions)
  • Challenges
  • Each context requires its own speech recognition
    system
  • Multicasting sound wave to each system
  • Selecting the best result

14
Extending the Galaxy System
  • Start with context for speech and then extend
  • Note, our goals are similar but not identical to
    those of the Spoken Language Group
  • We are not dialog-based
  • Exploit their work
  • Follow Galaxy
  • Recognizer scores different guesses at words
  • Language Processing Unit uses input grammar to
    select best input sentence
  • Scott Cyphers gave us the nbest interface

15
Recognizer chooses 10 best guesses at word
matches (for this context)
Language Processor picks best sentence from
recognizer based on input grammar
16
System Structure
17
System Structure
Language Processor
Recognizer
next item
next item
next movie
Slide Layer
previous item
Selector
start presentation
Language Processor
Recognizer
Session Layer
end presentation
start presentation
start presentation
start explorer
18
System Structure
19
Add Recognizer for T9
Language Processor
Recognizer
next item
Slide Layer
Language Processor
T9 Input
Recognizer
Selector
Presentation Layer
go to slide nine
Sound Input
start presentation
Language Processor
Recognizer
Session Layer
start presentation
20
Add Recognizer for Graffiti
Language Processor
next item
Slide Layer
Recognizer
Language Processor
T9 Input
Selector
Presentation Layer
go to slide nine
Sound Input
Graffiti Input
start presentation
Recognizer
Language Processor
Session Layer
start presentation
Recognizer
21
Other Input Modes
  • T9 (telephone keypad)
  • To input a, b, or c press 2
  • Current cell phones have dictionary to select
  • correct word
  • Lots of false positives (very annoying)
  • Remember my introduction?
  • Using an application-dependent grammar would
    reduce errors
  • Pen-based character input
  • Use strokes to input characters
  • Current palm pilot only recognizes Graffiti
    alphabet
  • Lots of false positives (very annoying)
  • Using an application-dependent grammar would
    reduce errors

22
Replacing the Recognizers
  • Build recognizers for T9 and Graffiti
  • Use Galaxy system to process results from new
    recognizers

Language Generation
Speech Synthesis
Dialogue Management
Hub
Database Server
Audio
T9 Recog.
Discourse Resolution
Speech Recog.
Language Processing
Graffiti Recog.
23
Conclusion
  • Each application defines an input grammar
  • This grammar can be used to
  • Ensure that each application gets valid input
  • It might not be what the user wanted, but the
    application will understand it
  • Reduce false-positives
  • Identify the input suitable for associated
    application
  • Choose the application with the highest score
  • If tie, must do something else (future research)
  • Enable T9, Graffiti, Speech, other input modes

24
Critique of Presentation Manager
25
Vision / Gesture Recognition
  • Laser Pointer
  • Great for drawing attention to content
  • Audience is primary consumer
  • Secondary use to control presentation
  • But it is not a mouse
  • Semantics are tied to slide context
  • Differs from Intelligent-room use
  • Small number of identified gestures
  • Gestures easily punctuated
  • Low computational overhead
  • Soon will be handled with a H21

26
Critique of Vision / Gesture Recognition
  • Laser Pointer
  • Great for drawing attention to content
  • Cheap technology but mostly distracting
  • Too shaky, imprecise
  • But it is not a mouse
  • More awkward to use than mouse
  • Another gadget to hold in the hand, button to
    identify, batteries to maintain
  • Small number of identified gestures
  • There are better ways of drawing attention to
    slide content
  • I rarely use it and dont like it when others do
  • Low computational overhead
  • Dumb vs Intelligent Device Discussion

27
Speech Recognition
  • Initially seems like great idea
  • Speaker is already speaking, so can use it to
    control presentation
  • Want passive, intelligent listener
  • Not a dialog
  • No prompt alienating distraction
  • Want no mistakes
  • For dialog, better to guess than ignore
  • For us, high cost for incorrect guess
  • Most words are not relevant to speech system
  • More trouble than it is worth
  • But may be good for real-time search of content

28
More useful aspect Output modalities
  • Presenter has put the time and effort into the
    production
  • Simplier is better
  • Audience has harder task
  • Understand material being presented
  • Record thoughts, impressions, connections
  • Filter for later review
  • Process in real-time
  • Keep-up with presentation
  • Do all this with minimal distractions
  • Output modalities
  • Content for live audience
  • Content for speaker (superset of audience)
  • Content for retrieval
  • Correlate notes with content

29
Record and correlate notes with presentation
30
CORE Communication Oriented Routing Environment
  • (Oxygen Research Group)

31
Assumptions
  • Actuators / Sensors (I/O) in the environment
  • Many are shared by apps users
  • Many are flaky / faulty
  • User does not know much about them
  • Environment, application, users desires change
    over time

32
An Oxygen Application
  • Interconnected Collection of Stuff
  • Who specifies the stuff?
  • I dont know, but its mostly virtual stuff
  • Many layers of abstraction
  • Dont ask, its turtles all the way down
  • Two main layers of programming
  • Professionals
  • Users, e.g. grandmother

33
Communications-Oriented Programs
  • Connecting the (virtual) stuff done by user
  • Home stereo / theater analogy
  • Plug Stuff together unplug it if doesnt work
  • Dont like it, unplug it
  • Device drivers, services, clients, dont know to
    whom or to what they connect
  • In client/server model,
  • server knows a lot about the client,
  • the client knows even more about the server
  • Extend Unix Pipes

34
CORE
Other COREs
Larry Bear
35
Message Flow
  • Messages flow between nodes core
  • Core is both language and router
  • Within Core Router, some messages
  • are interpreted and may trigger actions
  • other messages get routed to other nodes
  • Request-Reply message strategy
  • Even number of messages
  • No reply within time period, means error

36
CORE Language Elements
  • Four elements
  • Nodes,
  • Links,
  • Messages,
  • Rules
  • Features
  • Interpreted Language
  • Statement is a message reply
  • Each element has an inverse

37
Nodehandler (nickname, specifier)
Nodes Specify via INS
Cam deviceweb-cam location518
PTRvision deviceprocess OSLinuxFileLaser
Vision, ..
CORE
Laser Vision
38
Node Statement Handler
  • When node message arrives
  • Verified for correctness (statements allowed)
  • Routed to Node Manager (just another node)
  • Node Manager
  • INS lookup, verifies if allowed, creates if
    needed
  • Creates core thread to manage communication with
    node
  • Bookkeeping reply message with handle/error

39
Links
Lcamera,vision (Cam,PTRvision)
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
40
Link Statement Handler
  • Message routed to link manager
  • Two queries to node mng for thread cntl
  • Message to thread controller of source node
  • Specifying destination thread controller
  • Message to thread controller of dest node
  • Specifying source thread controller
  • Bookkeeping reply message handler/error

41
Messages
Messages flow over the links
Next Slide!
Slide Speech
Presentation Speech
Command Speech
CORE
Laser Vision
42
Message Handling
  • Messages can be encrypted
  • Core statement messages have fixed format
  • Everything else is data message
  • Each node thread has two unbounded buffers
  • Core to node Node to core
  • Logging, rollback, fault-tolerance

43
Rules
RULES (trigger,action)
( MESSQuestion , Lslide,lcd -- Lslide,qlcd )
Slide Speech
Presentation Speech
Questions
Command Speech
CORE
Questions
Questions
Laser Vision
44
Rule Statement Handler
  • ( trigger , consequence )
  • Both are event sets
  • Eight basic events
  • Node, -Node, Link, -Link
  • Message, -Message, Rule, -Rule
  • Event set is a set of events
  • Trigger is true when events are true
  • Consequence makes events true

45
Rules A link is a rule
  • A message event is of form
  • (node, message specifier)
  • ( message specifier , node )
  • Message came from or going to node
  • A link (x,y) is just shorthand for the rule
  • ( x , m ) ? ( - (x, m) , (m , y) )
  • If a message m arrives at node x, then make that
    event false (remove the message) and make the
    event of m arriving at y from core true.

46
Rules Access Control Lists
  • An access control list is just a rule
  • When messages arrive at node, if they arrive from
    valid node, then allowed to continue to flow.
  • Modifying access control lists is just adding or
    removing rules.

47
Rules
  • Rule statement gets sent to rule manager
  • Event set is just another shorthand for rules
  • Rule manager sends command to trigger node thread
    that tells it about the consequence
  • Rules are reversible

48
Reversibility
  • Each statement is invertible (reversible)
  • If there is an error in the application
    specification, then can undo it all.
  • General debugging is possible with reversible
    rules and message flow
Write a Comment
User Comments (0)
About PowerShow.com