CMPUT 301: Lecture 31 Out of the Glass Box - PowerPoint PPT Presentation

About This Presentation
Title:

CMPUT 301: Lecture 31 Out of the Glass Box

Description:

Speech. Prosody: ... Speech is transient (hard to review or browse) ... variation from day to day and over years for a single user ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 38
Provided by: Kenny85
Category:
Tags: cmput | box | glass | lecture | out

less

Transcript and Presenter's Notes

Title: CMPUT 301: Lecture 31 Out of the Glass Box


1
CMPUT 301 Lecture 31Out of the Glass Box
  • Martin Jagersand
  • Department of Computing Science
  • University of Alberta

2
Overview
  • Idea
  • why only use the sense of vision in user
    interfaces?
  • increase the bandwidth of the interaction by
    using multiple sensory channels, instead of
    overloading the visual channel

3
Overview
  • Multi-sensory systems
  • use more than one sensory channel in interaction
  • e.g., sound, video, gestures, physical actions
    etc.

4
Overview
  • Usable senses
  • sight, sound, touch, taste, smell,
  • Haptics, proprioception and accelerations
  • each is important on its own
  • together, they provide a fuller interaction with
    the natural world

5
Overview
  • Usable senses
  • computers rarely offer such a rich interaction
  • we can use sight, sound, and sometimes touch
  • Flight simulators and some games uses
    accelerations to create a multimodal immersion
    experience.
  • we cannot (yet) use taste or smell

6
Overview
  • Multi-modal systems
  • use more than one sense in the interaction
  • e.g., sight and sound a word processor that
    speaks the words as well as rendering them on the
    screen

7
Overview
  • Multi-media systems
  • use a number of different media to communicate
    information
  • e.g., a computer-based teaching system with
    video, animation, text, and still images

8
Speech
  • Human speech
  • natural mastery of language
  • instinctive, taken for granted
  • difficult to appreciate the complexities
  • potentially a useful way to extend human-computer
    interaction

9
Speech
  • Structure
  • phonemes (English)
  • 40 (24 consonant and 16 vowel sounds)
  • basic atomic units of speech
  • sound slightly different depending on context

10
Speech
  • Structure
  • allophones
  • 120 to 130
  • all the sounds in the language
  • count depends on accents

11
Speech
  • Structure
  • morphemes
  • basic atomic units of language
  • part or whole words
  • formed into sentences using the rules of grammar

12
Speech
  • Prosody
  • variations in emphasis, stress, pauses, and pitch
    to impart more meaning to sentences
  • Co-articulation
  • the effect of context on the sound
  • transforms phonemes into allophones

13
Speech Recognition
  • Problems
  • different people speak differently(e.g., accent,
    stress, volume, etc.)
  • background noises
  • ummm and errr
  • speech may conflict with complex cognition

14
Speech Recognition
  • Issues
  • recognizing words is not enough
  • need to extract meaning
  • understanding a sentence requires context, such
    as information about the subject and the speaker

15
Speech Recognition
  • Phonetic typewriter
  • developed for Finnish(a phonetic language)
  • trained on one speaker, tries to generalize to
    others
  • uses neural network that clusters similar sounds
    together, for a character
  • poor performance on speakers it has not been
    trained on
  • requires a large dictionary of minor variations

16
Speech Recognition
  • Currently
  • single user, limited vocabulary systems can work
    satisfactorily
  • no general user, general vocabulary systems are
    commercial successful, yet
  • Current commercial examples
  • Simple telephone based UI such as Train schedule
    information systems

17
Speech Recognition
  • Potential
  • for users with physical disabilities
  • for lightweight, mobile devices
  • for when users hands are already occupied with a
    manual task (auto mechanic, surgeon)

18
Speech Synthesis
  • What
  • computer-generated speech
  • natural and familiar way of receiving information

19
Speech Synthesis
  • Problems
  • human find it difficult to adjust to monotonic,
    non-prosodic speech
  • computer needs to understand natural language and
    the domain
  • Speech is transient(hard to review or browse)
  • produces noise in the workplace or requires
    headphones(intrusive)

20
Speech Synthesis
  • Potential
  • screen readers
  • read a textual display to a visually impaired
    person
  • warning signals
  • spoken information especially for aircraft pilots
    whose visual and haptic channels are busy

21
Speech Synthesis
  • Virtual newscaster (Ananova)

22
Uninterpreted Speech
  • What
  • fixed, recorded speech
  • e.g., played back in airport announcements
  • e.g., attached as voice annotation to files

23
Uninterpreted Speech
  • Digital processing
  • change playback speed without changing pitch
  • to quickly scan phone messages
  • to manually transcribe voice to text
  • to figure out the lyrics and chords of a song
  • spatialization and environmental effects

24
Non-Speech Sound
  • What
  • boings, bangs, squeaks, clicks, etc.
  • commonly used in user interfaces to provide
    warnings and alarms

25
Non-Speech Sound
  • Why
  • fewer typing mistakes with key clicks
  • video games harder without sound

26
Non-Speech Sound?
  • Doh!

27
Non-Speech Sound
  • Dual mode displays
  • information presented along two different sensory
    channels
  • e.g., sight and sound
  • allows for redundant presentation
  • user uses whichever they find easiest
  • allows for resolution of ambiguity in one mode
    through information in the other

28
Non-Speech Sound
  • Dual mode displays
  • humans can react faster to auditory than visual
    stimuli
  • sound is especially good for transient
    information that would otherwise clutter a visual
    display
  • sound is more language and culture independent
    (unlike speech)

29
Non-Speech Sound
  • Auditory icons
  • use natural sounds to represent different types
    of objects and actions in the user interface
  • e.g., breaking glass sound when deleting a file
  • direction and volume of sounds can indicate
    position and importance/size
  • SonicFinder
  • not all actions have an intuitive sound

30
Non-Speech Sound
  • Earcons
  • synthetic sounds used to convey information
  • structured combinations of motives (musical
    notes) to provide rich information

31
Non-Speech Sound
  • Earcons

32
Handwriting Recognition
  • Handwriting
  • text and graphic input
  • complex strokes and spaces
  • natural

33
Handwriting Recognition
  • Problems
  • variation in handwriting between users
  • variation from day to day and over years for a
    single user
  • variation of letters depending on nearby letters

34
Handwriting Recognition
  • Currently
  • limited success with systems trained on a few
    users, with separated letters
  • generic, multi-user, cursive text recognition
    systems are not accurate enough to be
    commercially successful
  • Current applications e.g. pre-sorting of mail
    (but human has to assist with failures)

35
Handwriting Recognition
  • Newton
  • printing or cursive writing recognition
  • dictionary of words
  • contextual recognition
  • fine tune spacing and letter shapes
  • fine tune recognition speed
  • learn handwriting over time

36
Handwriting Recognition
  • Newton

37
End
  • What did I learn today?
  • What questions do I still have?
Write a Comment
User Comments (0)
About PowerShow.com