Audio Based Interaction - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Audio Based Interaction

Description:

Simulated automobile Interface ... wipers family - chorus; ventilation family - bells; radio family - Horns. ... 1.simulated automobile interface and auditory ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 90
Provided by: Integrated79
Category:

less

Transcript and Presenter's Notes

Title: Audio Based Interaction


1
Audio Based Interaction
  • Zheng Wang

2
Audio Based Interaction
  • Audio Based Interaction
  • Audio Input
  • Audio Output (Audio Feedback)
  • Speech Audio Interaction
  • Non-Speech Audio Interaction

3
Audio Feedback
  • Combine graphical and auditory information
  • most efficient and natural way possible -----
    Human Nature.
  • much of the information we need about our
    environment.
  • advantages of using multimedia/multimodal
  • , the senses enhance each other in various
    ways, adding synergies or further informational
    dimensions.
  • ------
    Blattner and Dannenberg (1992)

4
Audio Feedback
  • Advantages Examples
  • concentrate our visual attention to one task,
    e.g.. editing a document, monitor the state of
    other tasks on our machine.
  • driving. Concentrate visual attention on road,
    turn on radio, change channel

5
Audio Feedback
  • Research on the combination ( I )
  • Visual Search Experiments Brown, Newsome and
    Glinert (1989)
  • Aim to reduce visual workload by using multiple
    sensory modalities.
  • Conclusions
  • 1. With auditory modality, more effective than
    the visual one.
  • 2. humans can extract complex information from
    sound and then act upon it

6
Audio Feedback
  • Research on the combination ( II )
  • Locate Visual Targets by Using 3D Sound
  • Perrott,
    Sadralobadi, Saberi and Strybel (1991)
  • Conclusions
  • 1.The presence of spatial information from
    the auditory channel can reduce the time required
    to locate and identify a visual target
  • 2. particularly evident when a substantial
    shift in gaze is required in the presence of a
    cluttered visual field

7
Audio Feedback
  • Research on the combination ( III )
  • Sonically-Enhanced Scrollbar vs. Standard Visual
    one. ----Brewster, Wright and
    Edwards (1994)
  • Conclusions
  • 1. significantly reduced the time taken by
    participants.
  • 2. reduced the mental workload.
  • 3. participants strongly preferred the
    sonically enhanced
  • scrollbar.

8
Audio Feedback
  • Research on the combination ( IV )
  • Add sound to graphical buttons
  • ------ Brewster, Wright,
    Dix and Edwards (1994)
  • Problem Users can mis-hit graphical buttons and
    not notice.
  • difficult to solve with extra graphical feedback

  • ---
    attention shifts away
  • adding sound can solve this problem.
  • Conclusions
  • 1. participants strongly preferred.
  • 2. reduced time recovering from such mis-hit
    errors.
  • 3. annoyance was not increased by adding
    sound.

9
Audio Feedback
  • Conclusion
  • Adding sound can be effective at improving
  • usability.

10
Non-Speech Audio Feedback
  • One method for presenting information in sound
  • -----
    Earcons

11
Non-Speech Audio Feedback
  • What is earcons?
  • non-verbal audio messages that are used in
    the
  • computer/user interface to provide
    information to the user about some computer
    object, operation or interaction.
  • ----- Blattner, Sumikawa and
    Greenberg (1989) , Sumikawa
  • (1985)
    and Sumikawa, Blattner, Joy and Greenberg
    (1986)

12
Non-Speech Audio Feedback
  • What is earcons?
  • structured sequences of synthetic tones
  • can be used in different combinations
  • create complex audio messages
  • composed of motives (short, rhythmic sequences of
    pitches)
  • with variable intensity, timbre and register

13
Non-Speech Audio Feedback
  • One usage of earcons
  • ------ Menu
    Hierarchies

14
  • Earcons as a Method of Providing Navigational
    Cues in a Menu Hierarchy
  • Stephen
    Brewster, Veli-Pekka
  • Raty Atte
    Kortekangas

15
Hierarchal Earcons
  • Represent Menu hierarchies
  • Using Hierarchal Earcons
  • What is Hierarchal Earcons?
  • Earcons which can be used to Represent
    information by using complex manipulations of
    the parameters of sound such as timbre,
    register, intensity, pitch and rhythm.
  • An example

16
Hierarchal Earcons
  • Example
  • Tree
  • Every earcon is a node
  • Inherits earcons above
  • Different levels earcons have different
    parameters (e.g. rhythm, pitch, timbre)

17
Hierarchal Earcons
  • The experiment
  • Aim
  • To use Hierarchal earcons to represent a bigger
    menu which has 25 nodes on four levels.

18
Hierarchal Earcons
19
Hierarchal Earcons
  • The experiment
  • Hypotheses
  • participants should be able to recall the
    position of a node in the hierarchy by the
    information contained in an earcon.
  • even if they have not heard it before by using
    the rules from which the earcons were constructed.

20
Hierarchal Earcons
  • The experiment
  • Participants
  • Twelve volunteer participants
  • All were familiar with computers and computer
    file systems

21
Hierarchal Earcons
  • The experiment
  • Sounds used
  • Level 1
  • a constant sound
  • Flute timbre
  • central spatial location
  • a pitch of D3 (261Hz)
  • neutral sounding

22
Hierarchal Earcons
  • The experiment
  • Sounds used
  • Level 2
  • Each family had a separate timbre, register and
    spatial location
  • Register lowest on the left highest on the right
  • Stereo position mirroring their position in the
    hierarchy

23
Hierarchal Earcons
  • The experiment
  • Sounds used in
  • Level 1 Level 2

24
Hierarchal Earcons
  • The experiment
  • In Level 1 Level 2
  • Three parameters were used Timbre, Stereo
    position, Register.
  • Advantage forget instruments, can still by
    Stereo position.

25
Hierarchal Earcons
  • The experiment
  • Level 3
  • rhythm used
  • repeated continuously
  • once every 2.5 s

26
Hierarchal Earcons
  • The experiment
  • Level 4
  • faster tempo used
  • same rhythm as level 3
  • repeated more frequently (once every 1 s)

27
Hierarchal Earcons
  • The experiment
  • Training
  • 1.the experimenter showed each of nodes of the
    hierarchy in turn and played the associated
    earcon. once only.
  • 2. participants learn the earcons by themselves
    with no help, given five minutes

28
Hierarchal Earcons
  • The experiment
  • Testing
  • 14 earcons randomly selected
  • 12 of the sounds were ones that participants had
    heard during the training
  • last two earcons were previously unheard (AB)
  • earcon was played, the participants then had to
    choose where it fitted into the hierarch

29
Hierarchal Earcons
  • The experiment
  • Testing
  • The node and level in hierarchy for each of the
    questions
  • This is the order that the
  • questions were presented to participants.

30
Hierarchal Earcons
  • The experiment
  • Results
  • Overall correctly recalled earcons 81.5
  • the percentage of correct answers for each
    question.

31
Hierarchal Earcons
  • The experiment
  • Results
  • three worst recalled earcons Space Invaders,
  • Paint and Business Letters. All in level
    4.
  • Paint was recalled worst of all.
  • Dont know exactly why.

32
Hierarchal Earcons
33
Hierarchal Earcons
  • The experiment
  • Results
  • New, previously unheard earcons (AB)
  • Ten out of twelve participants recognised the
    earcon for A.
  • all the participants recognised the earcon for B.
  • Conclusion the participants were able to use the
    rules to work out where an unheard earcon
    belonged.

34
Hierarchal Earcons
  • The experiment
  • Discussion --- advantages
  • Be used where visual feedback is not possible
    Telephone-based interfaces.
  • visually disabled people.
  • 27 node hierarchy can easily be represented.
  • After short training, A recall rate of 81.5 is
    achieved.
  • users could easily learn those rules -- Listeners
    could recognize new earcons that had not been
    heard before (with 91.5 accuracy)
  • earcons are an effective way of providing
    hierarchy information

35
Hierarchal Earcons
  • The experiment
  • Discussion --- disadvantages
  • Difficult to get the information from the bottom
    of the hierarchy--remember all of the earcon
    construction rules. Old people
  • Once the parameters have been used then there is
    nothing left to manipulate to create new
    levelslevels number is limited.
  • How can this problem be solved? --- see next one

36
  • Using Compound Earcons to Represent Hierarchies
  • ---- Stephen
    Brewster, Adrian
  • Capriotti
    and Cordelia Hall

  • University of Glasgow

37
Serial Compound Earcons
  • The Experiment
  • The Same
  • Hierarchy menu
  • Hypothesis
  • Training
  • Testing
  • Two results could be directly compared

38
Serial Compound Earcons
  • The Experiment
  • Sounds used
  • single notes
  • 1 sec duration
  • sequentially
  • played at C3 (261Hz)
  • created on a Yamaha TG100 synthesizer

39
Serial Compound Earcons
  • The Experiment
  • Sounds used
  • 0 a sitar
  • 1 a piano
  • 2 an orchestral hit
  • 3 a bell
  • 4 a flute
  • dot a marimba

40
Serial Compound Earcons
  • The Experiment
  • Sounds used
  • 59 the same instruments as 14, note played
    two octaves higher. E.g. 5 would be a note played
    at C1 (1046Hz) on the sitar.
  • greater than 9 the two motives be added
  • together. E.g. 10 would be a piano followed
    by a
  • sitar
  • Examples 11 would be a piano a piano
  • 1.1 would be a piano a marimba a piano

41
Serial Compound Earcons
  • The Experiment
  • Method to represent menu hierarchy

42
Serial Compound Earcons
  • The Experiment
  • Results
  • overall correctly recall rate 97 v.s previous
  • one is 81.5
  • The recognition rate of the new, unheard
  • earcons 97

43
Serial Compound Earcons
  • The experiment
  • Discussion -- advantages
  • compound earcons can provide effective navigation
    information in hierarchies.
  • create arbitrarily sized hierarchies.
  • unheard earcons could be recognised by the
    listeners with a high degree of accuracy.
  • number of rules 72 -- as easy to remember as
    possible

44
Serial Compound Earcons
  • The experiment
  • Discussion disadvantages
  • user has to listen to the full earcon before
    he/she gets the location.
  • the longer the sound gets the harder it is to
    recall. remember the latter parts forget the
    former.
  • Whats the maximum size of hierarchy it can
    represent?
  • may take a long time to play
  • --- how to solve it? See next

45
  • Parallel Earcons Reducing the Length of
  • Audio Messages
  • STEPHEN
    A. BREWSTER1,

  • PETER C. WRIGHT2 AND

  • ALISTAIR D. N. EDWARDS2

46
Parallel Compound Earcons
  • What is parallel Earcons?
  • playing sound simultaneously.
  • use the musical attributes counterpoint, in which
    individual instruments play separate musical
    lines which come together to make a musical whole.

47
Parallel Compound Earcons
  • Experiment
  • Aim
  • whether the recognition of parallel earcons was
    as accurate as that of serial earcons.
  • Participants
  • Twenty-four participants totally
  • split into two groups of twelve
  • half of the participants in each group being
    musicians
  • who could play a musical instrument and read
    music.
  • undergraduate and postgraduate students from the
    University of York.

48
Parallel Compound Earcons
  • Experiment
  • Three phases
  • 1. participants learned earcons for objects
    (icons) like File, Folder, Application.
  • 2. participants learned earcons for actions
    (menus) like Open, Print, Copy.
  • 3. participants heard combined earcons made up of
    actions and objects.

49
Parallel Compound Earcons
50
Parallel Compound Earcons
  • Sound used
  • all lasted one second

51
Parallel Compound Earcons
  • Phase I Objects
  • Training
  • learn the names of all the icons
  • Listen to the sound
  • Each family of related items shared the same
    timbre. E.g. the paint program, the paint folder
    and paint files all had the same timbre
  • Items of the same type shared the same rhythm.
    e.g. all the programs had the same rhythm.
  • a unique sound to be created for each of the
    icons.

52
Parallel Compound Earcons
  • Phase I Objects
  • Testing
  • screen was cleared
  • the earcons were played back in a random order.
  • supply what information he/she could remember

53
Parallel Compound Earcons
  • Phase II Actions
  • Each menu had its own timbre
  • the items on each menu
  • were differentiated by rhythm, pitch or
    intensity.
  • Testing the same as Phase I.

54
Parallel Compound Earcons
  • Phases I and II were identical for both groups of
    participants.
  • Purpose to make sure the participants would
    recognize the earcons when used in phase III.
  • any participant who did not reach a 65
    recognition rate was rejected.

55
Parallel Compound Earcons
  • Phases III
  • Serial case an action sound was followed by an
    object one.
  • Parallel case an object and an action were
    played together.
  • Nine out of a possible set of 81 earcons were
    presented,each was played once.
  • participant was then instructed to give all the
    information he/she could about the family, type,
    menu and item of the stimulus heard.
  • The stimulus was then presented again,the
    participant could correct a previous answer or
    fill in any parts not recognised after the first
    presentation.

56
Parallel Compound Earcons
  • Results Discussion
  • compound parallel earcons are as capable as
    compound serial earconsat
  • communicating information
  • an effective means of reducing the length of
    compound earcons without compromising
  • recognition rates.

57
Parallel Compound Earcons
  • Results Discussion
  • the more earcons were heard the better the
    recognition rates would be
  • as there were no overall differences in terms of
  • group, this increase does not indicate that
    parallel earcons are more easily recognised
  • Musicians have been shown to be no better than
    non-musicians.--will be usable by most users,

58
  • Earcons can significantly increase user
    efficiency during navigation of a visual menu
    system.
  • whether the same advantages exist when earcons
    are added to spoken menu systems e.g.
    telephone-mediated database access?
  • See next

59
  • COMBINING SPEECH AND EARCONS TO ASSIST MENU
    NAVIGATION

  • Maria L.M. Vargas

  • Sven Anderson

60
Combining Speech Earcons
  • Experiment--A SONIFIED AUTOMOBILE INTERFACE
  • Why use automobile?
  • control of many existing automobile accessories
    (e.g., the radio) requires a driver to redirect
    her attention away from the road.
  • Direct visual feedback from such controls can
    divert visual attention from driving and should
    be minimized.
  • drivers who have physical limitations--controls
    via a small set of buttons attached to the
    steering wheel.

61
Combining Speech Earcons
  • Simulated automobile Interface
  • Implemented in Java 1.3 using the standard
    Application Programmers Interface (API)
  • None of the various graphical controls is active.
  • The state of various accessories is changed by
    using key presses to navigate an acoustically
    presented menu of subcategories corresponding to
    the lights.

62
Combining Speech Earcons
  • Menu

63
Combining Speech Earcons
  • How did Users traverse the tree?
  • Up-arrow down-arrow keys change level.
  • Right and left arrow keys traverse the current
    level.
  • Home key returns to the root node.
  • Enter key select the current node.

64
Combining Speech Earcons
  • Sound
  • Speech
  • prerecorded tokens collected from
  • an adult male speaker of American English.
  • Earcons
  • top-level menu item-- particular simulated
    instrument (timbre) and motif (chord).
    lights family - piano windshield wipers family -
    chorus ventilation family - bells radio family
    - Horns.

65
Combining Speech Earcons
  • Earcons
  • All items beneath a top-level entry inherit the
    instrument and notes of the top-level motif.
  • Within each node, earcons share timbre and motif
    and are therefore differentiated on the basis of
    melody and rhythm.
  • Earcons precedes speech in feedback.
  • Earcon speech playback can be interrupted by
    pressing any of the navigation keys.

66
Combining Speech Earcons
  • Methods
  • Participants totally Thirty six
  • Two groups Speech Only Group or Earcon and
    Speech Group.
  • Training
  • 1.simulated automobile interface and auditory
    menu were explained.
  • 2.permitted to become familiar with the software
  • and the menu.
  • 3.performe five practice tasks in 5 minutes.

67
Combining Speech Earcons
  • Test
  • Totally 43 tasks
  • Software logged all user keystrokes time.

68
Combining Speech Earcons
  • Results
  • Time
  • Speech Only Group 11.5 seconds.
  • Earcon and Speech Group 13.6 seconds.
  • Additional task time 18 -- significant
  • Reason auditory items (earcon plus speech)
    takes longer than the speech alone. On average,
    the earcons plus speech take approximately 90
    longer than speech alone.

69
Combining Speech Earcons
  • Keystroke Count
  • Speech Only Group mean number of keystrokes is
    496.8
  • Earcon and Speech Group mean number of
    keystrokes is 431.0
  • Efficiency Familiarity

70
Combining Speech Earcons
  • Task Completion and Errors
  • Speech Only Group average number of completed
    tasks--39.5 average errors number 5.6
  • Earcon and Speech Group average number of
    completed tasks--40.3 average errors number 1.7

71
Combining Speech Earcons
  • Workload -- NASA Task Load Index
  • Temporal and mental demands
  • effort
  • No differences attained significance

72
Combining Speech Earcons
  • Results
  • Earcons can be added to spoken menu systems
  • Decrease the number of keystrokes and errors
  • without making appreciable changes to the
    overall perceived workload.
  • Disadvantage longer time.

73
Speech Interaction
74
  • TalkBack a conversational answering machine
  • Vidya
    Lakshmipathy,
  • Chris
    Schmandt, Natalia Marmasse

  • MIT Media Lab

75
Conversational Interface
  • What is TalkBack?
  • Is an answering machine.
  • Asynchronous message interface.
  • Allows the message receiver to "converse" with
    the messages left for them on the system.
  • Simplifies the process of answering

76
Conversational Interface
  • Technical Specifications
  • Client Compaq Ipaq
  • Server Java 2 enabled
  • FTP Server, Voicemail system, SOX (SOund
    eXchange), SOLA Time Compression

Voicemail
Server
Client
Pause detection
FTP Server
Time compression
Response
77
Conversational Interface
  • How does it work?
  • Leave a speech message.

78
Conversational Interface
79
Conversational Interface
  • Segmentation -- Pause Finding Algorithm (I)
  • Pauses found by comparing the average magnitude
    of nonoverlapping 200 millisecond windows with a
    silence threshold.
  • Threshold initialized to be the average
    magnitude of the first 200 ms of the recording,
    which was assumed to be silence.
  • Average magnitude of any 200 ms window was less
    than the silence threshold, the silence threshold
    was reset to that value.
  • If the average magnitude of any window was within
    12 of the
  • silence threshold, it was considered
    silence.

80
Conversational Interface
  • Segmentation -- Pause Finding Algorithm (II)
  • find the silence threshold, i.e. the minimum
  • find the overall average magnitude of the entire
    recording.
  • The dynamic range is the difference between the
    overall average and the silence threshold.
  • look at the average magnitudes of adjacent 200 ms
    non-overlapping windows.
  • If (the difference between these window
    averages is greater than 10 of the dynamic
    range)
  • if (the average magnitudes are
    increasing)
  • the second window is the
    beginning of speech
  • if (the average magnitudes are
    decreasing)
  • the second window is end of
    speech

81
Conversational Interface
  • Receive and Reply the message

82
Conversational Interface
83
Conversational Interface
  • The client, is an iPaq (hidden) placed in a
    picture frame connected to the local area
    network.
  • Listener does not have to respond to every
    segment system detects the silence and plays the
    next section.
  • The recipient can also interrupt and inject a
    response at any point during
  • Playback.

84
Conversational Interface
  • Receive the response

85
Conversational Interface
86
Conversational Interface
  • receives a small portion of the original message,
    time-compressed by half.
  • Responses can be delivered via phone or via the
    Internet as files.

87
Conversational Interface
  • It is good because
  • face-to-face conversation--requires little
    training.
  • More convenient and efficient.
  • populations who desire simple, easy to use,
    interfaces
  • Extreme age groups very old and very young
  • Memory constraints

88
  • References
  • Visual Search Experiments Brown, Newsome and
    Glinert (1989)
  • Locate Visual Targets by Using 3D Sound Perrott,
    Sadralobadi, Saberi and Strybel (1991)
  • Sonically-Enhanced Scrollbar vs. Standard Visual
    one. -Brewster, Wright and Edwards (1994)
  • Add sound to graphical buttons -- Brewster,
    Wright, Dix and Edwards (1994)
  • Earcons as a Method of Providing Navigational
    Cues in a Menu Hierarchy-Stephen Brewster,
    Veli-Pekka Raty Atte Kortekangas
  • Using Compound Earcons to Represent
    Hierarchies--- Stephen Brewster, Adrian Capriotti
    and Cordelia Hall
  • Parallel Earcons Reducing the Length of Audio
    Messages-STEPHEN A. BREWSTER1, PETER C. WRIGHT2
    AND ALISTAIR D. N. EDWARDS2
  • COMBINING SPEECH AND EARCONS TO ASSIST MENU
    NAVIGATION-- Maria L.M. Vargas Sven Anderson
  • TalkBack a conversational answering machine
    Vidya Lakshmipathy, Chris Schmandt, Natalia
    Marmasse,MIT Media Lab

89
  • Questions Discussion
Write a Comment
User Comments (0)
About PowerShow.com