Empirical Evaluation - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Empirical Evaluation

Description:

Empirical Evaluation – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 39
Provided by: johns81
Category:

less

Transcript and Presenter's Notes

Title: Empirical Evaluation


1
Empirical Evaluation
  • Assessing usability

2
Agenda
  • Evaluation overview
  • Empirical studies
  • Process
  • Experimental design
  • Variables
  • Methods
  • Results

3
One Model
4
Evaluation
  • Earlier
  • Interpretive and Predictive
  • Heuristic evaluation, walkthroughs, ethnography
  • Now
  • User involved, evaluate usage
  • Experiments, usage observations, interviews...

5
Users Involved
  • Interpretive (naturalistic) vs. Empirical
  • Naturalistic
  • In realistic setting, usually includes some
    detached observation, careful study of users
  • Empirical
  • People use system, manipulate independent
    variables and observe dependent ones

6
Evaluation
  • Summative vs. Formative
  • What were they?

7
Evaluation Choices
  • Why done?
  • Summative
  • Formative

System already existsMeasuring against some
criteria
Inform and support iterative design
8
Evaluation Data Gathering
  • Information we gather about an interface can be
    objective or subjective
  • Information also can be qualitative or
    quantitative
  • Which are tougher to measure?

9
Empirical Usability Testing
  • Key
  • Perform experiments and observe users performing
    benchmark tasks with interface under study
  • Gather data to learn about usability,
    satisfaction, etc.
  • Use that to inform iterative redesign and
    refinement

10
Validity Concerns
  • Are typical users tested?
  • Are typical tasks used?
  • Is the physical environment typical?
  • Is the social context appropriate?

11
Process
  • Steps in formative evaluation using experiments
  • Develop the experiment
  • Direct the evaluation sessions
  • Collect the data
  • Analyze the data
  • Draw conclusions to form a resolution for each
    design problem
  • Redesign and implement the revised interface

12
Develop Experiment
  • Recruit participants
  • Use bribes cookies, wash their car, real
  • Make sure people have good attitude
  • Fit user population
  • 3-5 people as pilots
  • Do they carry through to next round?
  • Maybe 1 out 3 moves on to next stage

13
Develop Experiment
  • Developing tasks
  • Benchmark tasks - gather quantitative data
  • Representative tasks - add breadth, can help
    understand process
  • Tell what to do, not how to do it
  • Have introductory remarks and explanation written
    down

14
Develop Experiment
  • Developing tasks (contd.)
  • Lab testing versus field testing issues
  • Informed consent form
  • Run pilot versions to shake out the bugs

15
Directing Sessions
  • Issues
  • Are you in same room or not?
  • Single person session or pairs of people
  • Objective data -- stay detached

16
Collecting Data
  • Data gathering
  • Note-taking
  • Audio and video tape
  • Instrumented user interface
  • Post-experiment questions and interviews

17
Collecting Data
  • Identifying errors can be difficult
  • Qualitative techniques
  • Think-aloud - can be very helpful
  • Post-hoc verbal protocol - review video
  • Critical incident logging - positive negative
  • Structured interviews - good questions
  • What did you like best/least?
  • How would you change..?
  • More to come next time...

18
Data Analysis
  • Simple analysis
  • Determine the means (time, of errors, etc.) and
    compare with goal values (coming up)
  • Determine
  • Why did the problems occur?
  • What were their causes?

19
Objective Data
Today's Focus
  • Users interact with interface
  • You observe, monitor, calculate, examine,
    measure,
  • Objective, scientific data gathering
  • Comparison to interpretive/predictive evaluation

20
Experiments
  • Utilize classical scientific method of
    hypothesis, experiment and analysis
  • Key is methodology

21
Benchmark Tasks
  • Specific, clearly stated task for users to carry
    out
  • Example Email handler
  • Find the message from Mary and reply with a
    response of Tuesday morning at 11.
  • Users perform these under a variety of conditions
    and you measure result

22
Experimental Methodology
  • Variables - facets or attributes of study that
    can vary
  • We want to control all variables but the ones
    were testing

subject experience gender
interface 1 vs. interface 2 lighting
intelligence location color vs. b/w
etc.
23
Control
  • Two methods of achieving it
  • Dont allow it to vary
  • Make subjects/attributes as representative of
    population as possible
  • In both mean and range
  • Often, second method is all you can do

24
Types of Variables
  • Participants are a random variable
  • In experiment, we have independent and
    dependent variables
  • Independent - What youre studying, what you
    intentionally vary (eg, interface feature)
  • Dependent - What the study produces and you
    tabulate, measure or examine (eg, time, number of
    errors)

25
Example
  • Do people complete operations faster with a
    black-and-white display or a color one?
  • Independent - color or b/w
  • Dependent - time it takes to complete

26
Experimental Designs
  • 1. Within Subjects
  • Every participant provides a score for all levels
    or conditions

Color
B/W P1 12 secs. 17
secs. P2 19 secs. 15
secs. P3 13 secs. 21
secs. ...
27
Experimental Designs
  • 2. Between Subjects
  • Each participant provides results for only one
    condition

Color B/W P1 12 secs.
P2 17 secs. P7 19 secs. P5
15 secs. P3 13 secs. P8 21 secs. ...
28
Which to Use?
  • What are the advantages and disadvantages of the
    two techniques?

29
Within Advantages
  • Within subjects gives you more relative
    information - Each person is their own control
  • Need bigger number of participants in between
    subjects to average it out better

30
Between Advantages
  • Within subjects tests are much more liable to
    ordering effects
  • Participant may learn from first condition
  • Fatigue may make second performance worse
  • Half go first in one condition, half go first in
    other

31
Experimental Results
  • How does one know if an experiments results mean
    anything or confirm any beliefs?
  • Example 20 people participated, 11 preferred
    interface 1, 9 preferred interface 2
  • What do you conclude?

32
Hypothesis Testing
  • In experiment, we set up a null hypothesis to
    check
  • Basically, it says that what occurred was simply
    because of chance
  • For example, any participant has an equal chance
    of preferring interface 1 over interface 2

33
Hypothesis Testing
  • If probability result happened by chance is low,
    then your results are said to be significant
  • Statistical measures of significance levels
  • 0.05 often used
  • Less than 5 possibility it occurred by chance

34
Example
Experiment 1 Group 1 Group 2 Mean 7
Mean 10 1,10,10 3,6,21
Experiment 2 Group 1 Group 2 Mean 7
Mean 10 6,7,8 8,11,11
35
Other Methods
  • Another kind of test is contingency table measure
  • May have two variables and two conditions

System A System B
Men
13
17
Women
11
15
36
Errors
  • Errors do occur
  • Types
  • Type I/False positive - You conclude there is a
    difference, when in fact there isnt
  • Type II/False negative - You conclude there is no
    different when there is
  • Type III

37
Presentation Techniques
Middle 50
Age
low
high
Mean
0
20
Time in secs.
38
Upcoming
  • Gathering data
  • Observing users
  • Subjective data, querying users
  • Usability Specifications
  • CHI videos
Write a Comment
User Comments (0)
About PowerShow.com