The Universal Speech Interface USI PDG Progress Report - PowerPoint PPT Presentation

About This Presentation
Title:

The Universal Speech Interface USI PDG Progress Report

Description:

nameless states (speech interface must have name for everything! ... Compared Speech Graffiti (SG) & natural language MovieLines ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 44
Provided by: alexander5
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: The Universal Speech Interface USI PDG Progress Report


1
The Universal Speech Interface (USI) PDG
Progress Report
  • Thomas Harris, Stefanie Tomko, Arthur Toth, James
    Sanders,
  • Alex Rudnicky, Roni Rosenfeld
  • School of Computer Science
  • Carnegie Mellon University
  • 4 June 2003

2
Outline
  • USI Project Summary
  • USI Device Control
  • USI User Studies
  • Tech Transfer Initiative
  • USI Application Generator

3
Program Goals and Plan
  • Overall program goal
  • Design a universal (i.e. device-independent)
    interface for speech-based interaction with
    wearable and home devices
  • Program plan milestones
  • Q1 analysis, interaction principles
  • Q2 build device-simulation environment
  • Q3 build first device prototype
  • Q4 initial user studies development tools

4
Program Deliverables
  • A novel universal design for speech-based
    interaction with wearable- and home-devices
  • At least one demonstration system exemplifying
    the new interface
  • A set of tools for rapid prototyping of compliant
    applications

5
The Universal Speech Interface (USI)In a Nutshell
  • Unifying approach to human-machine speech
    communication
  • Unified look and feel across all applications
  • analogous to the Xerox/Macintosh/Windows GUI
    look-and-feel
  • Stylized, semi-natural interaction
  • analogous to the Graffiti alphabet for the Palm
    PDA

6
Existing Speech Paradigm 1Command-and-control
Systems
  • Specialized language, optimized for a given
    application
  • each application has its own interface
  • Intensive training of each user
  • Daily use helps retain knowledge

7
Existing Speech Paradigm 2Unconstrained Dialog
Systems
  • Off-the-street users, no training required
  • System models existing human behavior
  • But this comes at a cost
  • each application requires a great deal of data,
    labor, human expertise
  • Speech Recognition technology is pushed to the
    limit
  • user does not easily grasp the applications
    functional limits
  • Out-Of-Vocabulary words (OOV)
  • Out-Of-Domain concepts, requests

8
Is a Third Paradigm Needed?
  • In practice, people are likely to use
  • a handful of apps daily
  • scheduler, contact manager, email,...
  • many apps occasionally
  • weather, restaurants, ...
  • To exploit this, we need
  • flexible, powerful interface for familiar
    applications.
  • immediate engagement with occasional or new
    applications.

9
Our Approach
  • Identify application-independent universals
  • user-side
  • machine-side
  • Find suitable, general solutions
  • Human and machine meeting halfway
  • Design a stylized, universal look and feel
  • Teach it in 5 minutes

10
Universal Semantic primitives
  • Help primitives
  • what can the machine do? how do I do X? what
    can I say?
  • Speech channel primitives
  • detect correct ASR errors finished talking?
  • Interaction primitives
  • turn taking question answering session
    management undo
  • Application primitives
  • environment variables query, set
  • objects (e.g. lists) describe, navigate, create,
    modify, delete

11
USI Systems Developed
  • Information Access
  • MovieLine
  • FlightLine
  • ApartmentLine
  • Device Control
  • Stereo system
  • X-10 control (e.g., lights)
  • Alarm Clock applet
  • Digital Video Camera
  • Windows Media Player

12
USI Demonstration
  • MovieLine
  • Experimental subject

13
USI Device Control

14
Device Interaction Analysis
  • Analysis was done on multiple devices
  • alarm clock / radio
  • VCR
  • cell phone
  • MP3 player
  • memo pad / email / vmail
  • copier/fax

15
USI/Device Design Issues
  • Confirmation strategy
  • Error handling strategy
  • Exploration
  • Navigation
  • Disambiguation / context mgmt
  • Orientation
  • Querying state variables

16
USI/Device Design Issues
  • Confirmation strategy restate--execute
  • Error handling strategy ignore
  • Exploration OPTIONS
  • Navigation use concept of focus
  • Disambiguation / context mgmt implicit
  • Orientation STATUS
  • Querying state variables WHAT IS THE...?

17
Hooking up with the PUC project
  • Fits within the PUC projects vision of
    automatically generated interfaces with different
    modalities and form factors
  • But, can also be used as a standalone speech
    interface
  • Compatibility with visual design is desirable,
    but not always natural
  • nameless states (speech interface must have name
    for everything!)
  • speech interface can have shortcuts (MODE CD
    vs. CD)

18
Meshing with the PUC project
  • Device capabilities specified by XML doc
  • States vs. Action dichotomy of the visual
    interface does not always conform to speech
    interface intuition.
  • For now, creating our own interface specification
    document
  • Ultimately, will augment XML DTD, so both
    interfaces can co-exist

19
USI Device control(a.k.a. James the Butler)
Hardware hacking courtesy of the PUC project
20
USI Demonstration
  • Device Control
  • Alarm Clock Example

21
User Studies
22
User study
  • Compared Speech Graffiti (SG) natural language
    MovieLines
  • How does Speech Graffiti compare to a natural
    language interface?
  • Subjective user satisfaction
  • Task completion rates
  • Word error rates
  • How do well do users "get" Speech Graffiti?
  • How often do they speak within the grammar?
  • In what ways do they deviate from the grammar?

23
Subjective user satisfaction
  • 17 of 23 preferred Speech Graffiti (SG)
  • SG user satisfaction ratings higher than NL in
    all categories
  • SG ratings positive except in annoyance
    habitability

24
Computer experience training
  • Computer Science / Engineering backgrounds and /
    or programming experience
  • Higher user satisfaction ratings
  • Better task completion rates
  • Training in-domain vs. out-of-domain
  • No differences in user satisfaction or task
    completion rates

25
Task completion
  • Overall
  • 67.9 SG tasks
  • 67.4 NL tasks
  • Individual means
  • 5.43 of 8 SG tasks
  • 5.30 of 8 NL tasks

26
Time-to-completion
  • Completed tasks
  • 67.9 seconds SG
  • 73.4 seconds NL
  • Incomplete tasks

27
Turns-to-completion
  • Completed tasks
  • 8.2 turns SG
  • 3.9 turns NL
  • Incomplete tasks

28
Word error rates
  • Very high for both systems
  • On "cleaned" set (on-task, non-noisy utts)
  • Concept error is lower for USI
  • SG 29.2 from WER
  • NL 0.8 from WER
  • Low error rate is key to acceptance
  • 6 who preferred NL-ML had highest SG WER

29
WER user satisfaction
  • Good correlation for SG

30
How often do users speak within the Speech
Graffiti grammar?
  • Actually, pretty often!
  • and
  • grammaticality leads to user satisfaction

31
How do users deviate from the grammar?
32
Future Interface Design Work
  • Redesign Help facility
  • SG works best for those who "get it"
  • Current system provides no assistance to
    "clueless user"
  • Error analysis
  • Compare failure cases in SG and NL interfaces
  • Compare user recovery attempts in SG and NL
  • Address issues of generalizability
  • Promoting transparency of slot set and response
    sets
  • Accessing information sets rather than single
    items
  • Adjust grammar components

33
Future Architecture Work
  • Integrate current USI environments
  • Information Access
  • Device Control
  • Improve interface between PUC and USI components
  • Identify USI-specific techniques to achieve lower
    WER
  • Improved documentation and distribution packaging

34
Tech Transfer Initiative

35
Tech Transfer Initiative
  • Tools for creating new USI apps
  • 3 days to create a new application
  • prior exposure to speech technology highly
    beneficial
  • decided to further reduce the barrier
  • ? create an application generator

36
From 3 Days to a Few Hours
  • A USI Application Generator
  • New USI applications w/out programming!
  • XML document fully specifies the application
  • slot names
  • accepted inputs
  • data types
  • slot properties
  • ...

37
From a Few Hours to 15 minutes?
  • Created a Web interface to generating the XML
    document
  • Form filling, pulldown menus
  • Strong effort to further simplify the process,
    minimize complexity of form
  • many defaults
  • for less common choices, edit the XML doc.
  • More importantly, no computer savvy needed

38
Web Application Generator
  • Repository and tool for creating USI database
    applications
  • Abundant online help to guide users through
    process
  • Accessible to anyone with an Internet connection

39
Web Application Generator
  • Two step process
  • General specification
  • Slot-by-slot specification
  • choose datatype from built-in list, or create own
  • Fully featured system with save, copy, delete
    functionality
  • Hides intricacies of XML document writing
  • Advanced users have ability to further alter the
    final XML document

40
General Specification screen with help box
displayed.
41
Web Application Generator
  • Built-in generic voice can record own voice
  • DB backend
  • Postgres
  • Oracle
  • ODBC (including ASCII files)
  • Ultimately web tables
  • Platform
  • originally mixed Unix/Windows, telephone based
  • converted to pure Windows, telephone or laptop

42
Transferring USI to PDG members
  • We do house calls!
  • Carnegie Mellon will install USI developer
    environment for each interested member and will
    train member staff in the use of the developer
    environment
  • Provide a short tutorial on USI principles and
    interface design

43
Thank you!Pittsburgh Digital Greenhouse
Write a Comment
User Comments (0)
About PowerShow.com