Title: Audio Based Interaction
1Audio Based Interaction
2Audio Based Interaction
- Audio Based Interaction
- Audio Input
- Audio Output (Audio Feedback)
- Speech Audio Interaction
- Non-Speech Audio Interaction
3Audio Feedback
- Combine graphical and auditory information
- most efficient and natural way possible -----
Human Nature. - much of the information we need about our
environment. - advantages of using multimedia/multimodal
- , the senses enhance each other in various
ways, adding synergies or further informational
dimensions. - ------
Blattner and Dannenberg (1992)
4Audio Feedback
- Advantages Examples
- concentrate our visual attention to one task,
e.g.. editing a document, monitor the state of
other tasks on our machine. - driving. Concentrate visual attention on road,
turn on radio, change channel
5Audio Feedback
- Research on the combination ( I )
- Visual Search Experiments Brown, Newsome and
Glinert (1989) - Aim to reduce visual workload by using multiple
sensory modalities. - Conclusions
- 1. With auditory modality, more effective than
the visual one. - 2. humans can extract complex information from
sound and then act upon it
6Audio Feedback
- Research on the combination ( II )
- Locate Visual Targets by Using 3D Sound
- Perrott,
Sadralobadi, Saberi and Strybel (1991) - Conclusions
- 1.The presence of spatial information from
the auditory channel can reduce the time required
to locate and identify a visual target - 2. particularly evident when a substantial
shift in gaze is required in the presence of a
cluttered visual field
7Audio Feedback
- Research on the combination ( III )
- Sonically-Enhanced Scrollbar vs. Standard Visual
one. ----Brewster, Wright and
Edwards (1994) - Conclusions
- 1. significantly reduced the time taken by
participants. - 2. reduced the mental workload.
- 3. participants strongly preferred the
sonically enhanced - scrollbar.
8Audio Feedback
- Research on the combination ( IV )
- Add sound to graphical buttons
- ------ Brewster, Wright,
Dix and Edwards (1994) - Problem Users can mis-hit graphical buttons and
not notice. - difficult to solve with extra graphical feedback
-
---
attention shifts away - adding sound can solve this problem.
- Conclusions
- 1. participants strongly preferred.
- 2. reduced time recovering from such mis-hit
errors. - 3. annoyance was not increased by adding
sound.
9Audio Feedback
- Conclusion
- Adding sound can be effective at improving
- usability.
10Non-Speech Audio Feedback
- One method for presenting information in sound
-
- -----
Earcons
11Non-Speech Audio Feedback
- What is earcons?
- non-verbal audio messages that are used in
the - computer/user interface to provide
information to the user about some computer
object, operation or interaction. - ----- Blattner, Sumikawa and
Greenberg (1989) , Sumikawa - (1985)
and Sumikawa, Blattner, Joy and Greenberg
(1986) -
12Non-Speech Audio Feedback
- What is earcons?
- structured sequences of synthetic tones
- can be used in different combinations
- create complex audio messages
- composed of motives (short, rhythmic sequences of
pitches) - with variable intensity, timbre and register
13Non-Speech Audio Feedback
- One usage of earcons
- ------ Menu
Hierarchies
14- Earcons as a Method of Providing Navigational
Cues in a Menu Hierarchy -
- Stephen
Brewster, Veli-Pekka - Raty Atte
Kortekangas
15Hierarchal Earcons
- Represent Menu hierarchies
- Using Hierarchal Earcons
- What is Hierarchal Earcons?
- Earcons which can be used to Represent
information by using complex manipulations of
the parameters of sound such as timbre,
register, intensity, pitch and rhythm. - An example
16Hierarchal Earcons
- Example
- Tree
- Every earcon is a node
- Inherits earcons above
- Different levels earcons have different
parameters (e.g. rhythm, pitch, timbre)
17Hierarchal Earcons
- The experiment
- Aim
- To use Hierarchal earcons to represent a bigger
menu which has 25 nodes on four levels.
18Hierarchal Earcons
19Hierarchal Earcons
- The experiment
- Hypotheses
- participants should be able to recall the
position of a node in the hierarchy by the
information contained in an earcon. - even if they have not heard it before by using
the rules from which the earcons were constructed.
20Hierarchal Earcons
- The experiment
- Participants
- Twelve volunteer participants
- All were familiar with computers and computer
file systems
21Hierarchal Earcons
- The experiment
- Sounds used
- Level 1
- a constant sound
- Flute timbre
- central spatial location
- a pitch of D3 (261Hz)
- neutral sounding
22Hierarchal Earcons
- The experiment
- Sounds used
- Level 2
- Each family had a separate timbre, register and
spatial location - Register lowest on the left highest on the right
- Stereo position mirroring their position in the
hierarchy
23Hierarchal Earcons
- The experiment
- Sounds used in
- Level 1 Level 2
-
24Hierarchal Earcons
- The experiment
- In Level 1 Level 2
- Three parameters were used Timbre, Stereo
position, Register. - Advantage forget instruments, can still by
Stereo position.
25Hierarchal Earcons
- The experiment
- Level 3
- rhythm used
- repeated continuously
- once every 2.5 s
-
26Hierarchal Earcons
- The experiment
- Level 4
- faster tempo used
- same rhythm as level 3
- repeated more frequently (once every 1 s)
27Hierarchal Earcons
- The experiment
- Training
- 1.the experimenter showed each of nodes of the
hierarchy in turn and played the associated
earcon. once only. - 2. participants learn the earcons by themselves
with no help, given five minutes
28Hierarchal Earcons
- The experiment
- Testing
- 14 earcons randomly selected
- 12 of the sounds were ones that participants had
heard during the training - last two earcons were previously unheard (AB)
- earcon was played, the participants then had to
choose where it fitted into the hierarch
29Hierarchal Earcons
- The experiment
- Testing
- The node and level in hierarchy for each of the
questions - This is the order that the
- questions were presented to participants.
30Hierarchal Earcons
- The experiment
- Results
- Overall correctly recalled earcons 81.5
- the percentage of correct answers for each
question.
31Hierarchal Earcons
- The experiment
- Results
- three worst recalled earcons Space Invaders,
- Paint and Business Letters. All in level
4. - Paint was recalled worst of all.
- Dont know exactly why.
32Hierarchal Earcons
33Hierarchal Earcons
- The experiment
- Results
- New, previously unheard earcons (AB)
- Ten out of twelve participants recognised the
earcon for A. - all the participants recognised the earcon for B.
- Conclusion the participants were able to use the
rules to work out where an unheard earcon
belonged.
34Hierarchal Earcons
- The experiment
- Discussion --- advantages
- Be used where visual feedback is not possible
Telephone-based interfaces. - visually disabled people.
- 27 node hierarchy can easily be represented.
- After short training, A recall rate of 81.5 is
achieved. - users could easily learn those rules -- Listeners
could recognize new earcons that had not been
heard before (with 91.5 accuracy) - earcons are an effective way of providing
hierarchy information
35Hierarchal Earcons
- The experiment
- Discussion --- disadvantages
- Difficult to get the information from the bottom
of the hierarchy--remember all of the earcon
construction rules. Old people - Once the parameters have been used then there is
nothing left to manipulate to create new
levelslevels number is limited. - How can this problem be solved? --- see next one
36- Using Compound Earcons to Represent Hierarchies
-
- ---- Stephen
Brewster, Adrian - Capriotti
and Cordelia Hall -
University of Glasgow
37Serial Compound Earcons
- The Experiment
- The Same
- Hierarchy menu
- Hypothesis
- Training
- Testing
- Two results could be directly compared
38Serial Compound Earcons
- The Experiment
- Sounds used
- single notes
- 1 sec duration
- sequentially
- played at C3 (261Hz)
- created on a Yamaha TG100 synthesizer
39Serial Compound Earcons
- The Experiment
- Sounds used
- 0 a sitar
- 1 a piano
- 2 an orchestral hit
- 3 a bell
- 4 a flute
- dot a marimba
40Serial Compound Earcons
- The Experiment
- Sounds used
- 59 the same instruments as 14, note played
two octaves higher. E.g. 5 would be a note played
at C1 (1046Hz) on the sitar. - greater than 9 the two motives be added
- together. E.g. 10 would be a piano followed
by a - sitar
- Examples 11 would be a piano a piano
- 1.1 would be a piano a marimba a piano
41Serial Compound Earcons
- The Experiment
- Method to represent menu hierarchy
42Serial Compound Earcons
- The Experiment
- Results
- overall correctly recall rate 97 v.s previous
- one is 81.5
- The recognition rate of the new, unheard
- earcons 97
43Serial Compound Earcons
- The experiment
- Discussion -- advantages
- compound earcons can provide effective navigation
information in hierarchies. - create arbitrarily sized hierarchies.
- unheard earcons could be recognised by the
listeners with a high degree of accuracy. - number of rules 72 -- as easy to remember as
possible
44Serial Compound Earcons
- The experiment
- Discussion disadvantages
- user has to listen to the full earcon before
he/she gets the location. - the longer the sound gets the harder it is to
recall. remember the latter parts forget the
former. - Whats the maximum size of hierarchy it can
represent? - may take a long time to play
- --- how to solve it? See next
45- Parallel Earcons Reducing the Length of
- Audio Messages
- STEPHEN
A. BREWSTER1, -
PETER C. WRIGHT2 AND -
ALISTAIR D. N. EDWARDS2
46Parallel Compound Earcons
- What is parallel Earcons?
- playing sound simultaneously.
- use the musical attributes counterpoint, in which
individual instruments play separate musical
lines which come together to make a musical whole.
47Parallel Compound Earcons
- Experiment
- Aim
- whether the recognition of parallel earcons was
as accurate as that of serial earcons. - Participants
- Twenty-four participants totally
- split into two groups of twelve
- half of the participants in each group being
musicians - who could play a musical instrument and read
music. - undergraduate and postgraduate students from the
University of York.
48Parallel Compound Earcons
- Experiment
- Three phases
- 1. participants learned earcons for objects
(icons) like File, Folder, Application. - 2. participants learned earcons for actions
(menus) like Open, Print, Copy. - 3. participants heard combined earcons made up of
actions and objects.
49Parallel Compound Earcons
50Parallel Compound Earcons
- Sound used
- all lasted one second
51Parallel Compound Earcons
- Phase I Objects
- Training
- learn the names of all the icons
- Listen to the sound
- Each family of related items shared the same
timbre. E.g. the paint program, the paint folder
and paint files all had the same timbre - Items of the same type shared the same rhythm.
e.g. all the programs had the same rhythm. - a unique sound to be created for each of the
icons.
52Parallel Compound Earcons
- Phase I Objects
- Testing
- screen was cleared
- the earcons were played back in a random order.
- supply what information he/she could remember
53Parallel Compound Earcons
- Phase II Actions
- Each menu had its own timbre
- the items on each menu
- were differentiated by rhythm, pitch or
intensity. - Testing the same as Phase I.
54Parallel Compound Earcons
- Phases I and II were identical for both groups of
participants. - Purpose to make sure the participants would
recognize the earcons when used in phase III. - any participant who did not reach a 65
recognition rate was rejected.
55Parallel Compound Earcons
- Phases III
- Serial case an action sound was followed by an
object one. - Parallel case an object and an action were
played together. - Nine out of a possible set of 81 earcons were
presented,each was played once. - participant was then instructed to give all the
information he/she could about the family, type,
menu and item of the stimulus heard. - The stimulus was then presented again,the
participant could correct a previous answer or
fill in any parts not recognised after the first
presentation.
56Parallel Compound Earcons
- Results Discussion
- compound parallel earcons are as capable as
compound serial earconsat - communicating information
- an effective means of reducing the length of
compound earcons without compromising - recognition rates.
57Parallel Compound Earcons
- Results Discussion
- the more earcons were heard the better the
recognition rates would be - as there were no overall differences in terms of
- group, this increase does not indicate that
parallel earcons are more easily recognised - Musicians have been shown to be no better than
non-musicians.--will be usable by most users,
58- Earcons can significantly increase user
efficiency during navigation of a visual menu
system. - whether the same advantages exist when earcons
are added to spoken menu systems e.g.
telephone-mediated database access? - See next
59- COMBINING SPEECH AND EARCONS TO ASSIST MENU
NAVIGATION -
Maria L.M. Vargas -
Sven Anderson
60Combining Speech Earcons
- Experiment--A SONIFIED AUTOMOBILE INTERFACE
- Why use automobile?
- control of many existing automobile accessories
(e.g., the radio) requires a driver to redirect
her attention away from the road. - Direct visual feedback from such controls can
divert visual attention from driving and should
be minimized. - drivers who have physical limitations--controls
via a small set of buttons attached to the
steering wheel.
61Combining Speech Earcons
- Simulated automobile Interface
- Implemented in Java 1.3 using the standard
Application Programmers Interface (API) - None of the various graphical controls is active.
- The state of various accessories is changed by
using key presses to navigate an acoustically
presented menu of subcategories corresponding to
the lights.
62Combining Speech Earcons
63Combining Speech Earcons
- How did Users traverse the tree?
- Up-arrow down-arrow keys change level.
- Right and left arrow keys traverse the current
level. - Home key returns to the root node.
- Enter key select the current node.
64Combining Speech Earcons
- Sound
- Speech
- prerecorded tokens collected from
- an adult male speaker of American English.
- Earcons
- top-level menu item-- particular simulated
instrument (timbre) and motif (chord).
lights family - piano windshield wipers family -
chorus ventilation family - bells radio family
- Horns. -
65Combining Speech Earcons
- Earcons
- All items beneath a top-level entry inherit the
instrument and notes of the top-level motif. - Within each node, earcons share timbre and motif
and are therefore differentiated on the basis of
melody and rhythm. - Earcons precedes speech in feedback.
- Earcon speech playback can be interrupted by
pressing any of the navigation keys.
66Combining Speech Earcons
- Methods
- Participants totally Thirty six
- Two groups Speech Only Group or Earcon and
Speech Group. - Training
- 1.simulated automobile interface and auditory
menu were explained. - 2.permitted to become familiar with the software
- and the menu.
- 3.performe five practice tasks in 5 minutes.
67Combining Speech Earcons
- Test
- Totally 43 tasks
- Software logged all user keystrokes time.
68Combining Speech Earcons
- Results
- Time
- Speech Only Group 11.5 seconds.
- Earcon and Speech Group 13.6 seconds.
- Additional task time 18 -- significant
- Reason auditory items (earcon plus speech)
takes longer than the speech alone. On average,
the earcons plus speech take approximately 90
longer than speech alone.
69Combining Speech Earcons
- Keystroke Count
- Speech Only Group mean number of keystrokes is
496.8 - Earcon and Speech Group mean number of
keystrokes is 431.0 - Efficiency Familiarity
70Combining Speech Earcons
- Task Completion and Errors
- Speech Only Group average number of completed
tasks--39.5 average errors number 5.6 - Earcon and Speech Group average number of
completed tasks--40.3 average errors number 1.7
71Combining Speech Earcons
- Workload -- NASA Task Load Index
- Temporal and mental demands
- effort
- No differences attained significance
72Combining Speech Earcons
- Results
- Earcons can be added to spoken menu systems
- Decrease the number of keystrokes and errors
- without making appreciable changes to the
overall perceived workload. - Disadvantage longer time.
73Speech Interaction
74- TalkBack a conversational answering machine
- Vidya
Lakshmipathy, - Chris
Schmandt, Natalia Marmasse -
MIT Media Lab
75Conversational Interface
- What is TalkBack?
- Is an answering machine.
- Asynchronous message interface.
- Allows the message receiver to "converse" with
the messages left for them on the system. - Simplifies the process of answering
76Conversational Interface
- Technical Specifications
- Client Compaq Ipaq
- Server Java 2 enabled
- FTP Server, Voicemail system, SOX (SOund
eXchange), SOLA Time Compression
Voicemail
Server
Client
Pause detection
FTP Server
Time compression
Response
77Conversational Interface
- How does it work?
- Leave a speech message.
78Conversational Interface
79Conversational Interface
- Segmentation -- Pause Finding Algorithm (I)
- Pauses found by comparing the average magnitude
of nonoverlapping 200 millisecond windows with a
silence threshold. - Threshold initialized to be the average
magnitude of the first 200 ms of the recording,
which was assumed to be silence. - Average magnitude of any 200 ms window was less
than the silence threshold, the silence threshold
was reset to that value. - If the average magnitude of any window was within
12 of the - silence threshold, it was considered
silence.
80Conversational Interface
- Segmentation -- Pause Finding Algorithm (II)
- find the silence threshold, i.e. the minimum
- find the overall average magnitude of the entire
recording. - The dynamic range is the difference between the
overall average and the silence threshold. - look at the average magnitudes of adjacent 200 ms
non-overlapping windows. - If (the difference between these window
averages is greater than 10 of the dynamic
range) -
- if (the average magnitudes are
increasing) - the second window is the
beginning of speech - if (the average magnitudes are
decreasing) - the second window is end of
speech -
81Conversational Interface
- Receive and Reply the message
82Conversational Interface
83Conversational Interface
- The client, is an iPaq (hidden) placed in a
picture frame connected to the local area
network. - Listener does not have to respond to every
segment system detects the silence and plays the
next section. - The recipient can also interrupt and inject a
response at any point during - Playback.
84Conversational Interface
85Conversational Interface
86Conversational Interface
- receives a small portion of the original message,
time-compressed by half. - Responses can be delivered via phone or via the
Internet as files.
87Conversational Interface
- It is good because
- face-to-face conversation--requires little
training. - More convenient and efficient.
- populations who desire simple, easy to use,
interfaces - Extreme age groups very old and very young
- Memory constraints
88- References
- Visual Search Experiments Brown, Newsome and
Glinert (1989) - Locate Visual Targets by Using 3D Sound Perrott,
Sadralobadi, Saberi and Strybel (1991) - Sonically-Enhanced Scrollbar vs. Standard Visual
one. -Brewster, Wright and Edwards (1994) - Add sound to graphical buttons -- Brewster,
Wright, Dix and Edwards (1994) - Earcons as a Method of Providing Navigational
Cues in a Menu Hierarchy-Stephen Brewster,
Veli-Pekka Raty Atte Kortekangas - Using Compound Earcons to Represent
Hierarchies--- Stephen Brewster, Adrian Capriotti
and Cordelia Hall - Parallel Earcons Reducing the Length of Audio
Messages-STEPHEN A. BREWSTER1, PETER C. WRIGHT2
AND ALISTAIR D. N. EDWARDS2 - COMBINING SPEECH AND EARCONS TO ASSIST MENU
NAVIGATION-- Maria L.M. Vargas Sven Anderson - TalkBack a conversational answering machine
Vidya Lakshmipathy, Chris Schmandt, Natalia
Marmasse,MIT Media Lab
89