Title: CS376 Evaluation
1Evaluation Methods
Jeffrey Heer 28 April 2009
2Project Abstracts
- For final version (due online Fri 5/1 _at_ 7am)
- Flesh out concrete details. What will you build?
If running an experiment, what factors will you
vary and what will you measure? What are your
hypotheses and why? Provide rationale! - Need to add study recruitment plan and related
work sections (see http//cs376/project.html). - Iterate more than once! Stop by office hours to
discuss.
3What is Evaluation?
- Something you do at the end of a project to show
it works - so you can publish it.
- Part of the design-build-evaluate iterative
design cycle - A way a discipline validates the knowledge it
creates. - a reason papers get rejected.
4Establishing Research Validity
- Methods for establishing validity vary
depending on the nature of the contribution. They
may involve empirical work in the laboratory or
the field, the description of rationales for
design decisions and approaches, applications of
analytical techniques, or proof of concept
system implementations - - CHI 2007 Website
5Evaluation Methods
- http//www.usabilitynet.org/tools/methods.htm
6(No Transcript)
7What to evaluate?
- Enable previously difficult/impossible tasks
- Improve task performance or outcome
- Modify/influence behavior
- Improve ease-of-use, user satisfaction
- User experience
- Sell more widgets
- What is the motivating research goal?
8(No Transcript)
9(No Transcript)
10(No Transcript)
11UbiFit Consolvo et al
12Momento
Momento Carter et al
13Evaluation Methods in HCI
- Inspection (Walkthrough) Methods
- Observation, User Studies
- Experience Sampling
- Interviews and Surveys
- Usage Logging
- Controlled Experimentation
- Fieldwork, Ethnography
- Mixed-Methods Approaches
14Proof by Demonstration
- Prove feasibility by building prototype system
- Demonstrate that the system enables task
- Small user study may add little insight
15Inspection Methods
- Often called discount usability techniques
- Expert review of user interface design
- Heuristic Evaluation (Nielsen, useit.com/papers/he
uristic)
- Visibility of system status
- Match between system and real world
- User control and freedom
- Consistency and standards
- Error prevention
- Recognition over recall
- Flexibility and efficiency of use
- Aesthetic and minimalist design
- Help users recognize, diagnose, and recover from
errors - Help and documentation
16How many evaluators?
17Usability Testing
- Observe people interacting with prototype
- May include
- Providing tasks (e.g., easy, medium, hard)
- Talk-aloud protocol (users verbal reports)
- Usage logging
- Pre/post study surveys
- NASA TLX workload assessment survey
- QUIS user interaction satisfaction
18Wizard-of-Oz Techniques
19Controlled Experiments
- What are the important concerns?
20Controlled Experiments
- Measure response of dependent variables to
manipulation of independent variables. - Within or between-subjects design
- Change indep vars within or across subjects
- Randomization, replication, blocking
- Learning effects
- Choice of measure and statistical tests
- t-Test, ANOVA, Chi-squared ?2, Non-parametric
21Experimental Desiderata
- P-value probability that results due to chance
- Type I Error accept spurious result
- Bonferronis principle if you run enough
significance tests, youll eventually get lucky - Type II Error mistakenly reject result
- Inappropriate measure or test?
- Statistical vs. practical significance
- N1000, p lt 0.001, avg dt 0.12 sec.
22Internal Validity
- Internal validity is a causal relation between
two variables properly demonstrated? - Confounds is there another factor at play?
- Selection (bias) approp. subject population?
- Experimenter bias researcher actions
23External Validity
- External validity do the results generalize to
other situations of populations? - Subjects do subjects aptitudes interact with
independent variables? - Situation time, location, lighting, duration
24Ecological Validity
- The degree to which the methods, materials and
setting of the study approximate the real-life
situation under investigation. - Flight simulator vs. flying a plane
- Simulated community activity vs. open web
25(No Transcript)
26(No Transcript)
27Next Time Distributed Cognition
- The Power of Representation in Things that Make
Us Smart, 1993, pp. 43-76. - Donald Norman
- On Distinguishing Pragmatic from Epistemic
Action, Cognitive Science, 1994, pp. 513-549,
David Kirsh and Paul Maglio