Title: MP2 Experimental Design Review HCI W2014
1MP2Experimental Design ReviewHCI W2014
What is experimental design? How do I plan an
experiment?
Acknowledgement Much of the material in this
lecture is based on material prepared for similar
courses by Saul Greenberg (University of Calgary)
as adapted by Joanna McGrenere
2Experimental Planning Flowchart
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Problem
Planning
Conduct
Analysis
Interpret-
definition
research
ation
feedback
research
define
data
interpretation
pilot
idea
variables
reductions
testing
generalization
literature
review
controls
statistics
data
reporting
collection
apparatus
hypothesis
statement of
testing
problem
procedures
hypothesis
select
development
subjects
experimental
design
feedback
3Whats the goal?
- Overall research goals impact choice of study
design - Exploratory research vs. hypothesis confirmation
- Ecological validity vs tightly controlled
- The stage in the design process impacts the
choice of study design - Formative evaluation (to get iterative feedback
on initial design and/or design choices) - Summative evaluation (to determine whether the
design is better/stronger/faster than alternative
approaches)
4Whats the research question?
- Study research questions impact choice of
- Protocol, task
- Experimental conditions (factors)
- Constructs (effectiveness)
- Measures (task completion, error rate)
- Testable hypotheses impact
- choice of statistical analysis (also impacted by
nature of the data and experimental design)
5Experimental Planning Flowchart
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Problem
Planning
Conduct
Analysis
Interpret-
definition
research
ation
feedback
research
define
data
interpretation
pilot
idea
variables
reductions
testing
generalization
literature
review
controls
statistics
data
reporting
collection
apparatus
hypothesis
statement of
testing
problem
procedures
hypothesis
select
development
subjects
experimental
design
feedback
Reality check does the final design support the
research questions
6Quantitative system evaluation
- Quantitative
- precise measurement, numerical values
- bounds on how correct our statements are
- Methods
- Controlled Experiments
- Statistical Analysis
- Measures
- Objective user performance (speed accuracy)
- Subjective user satisfaction
7Controlled experiments
- The traditional scientific method
- clear convincing result on specific issues
- in HCI
- insights into cognitive process, human
performance limitations, ... - allows comparison of systems, fine-tuning of
details ... - Strive for
- lucid and testable hypothesis (usually a causal
inference) - quantitative measurement
- measure of confidence in results obtained
(inferential statistics) - ability to replicate the experiment
- control of variables and conditions
- removal of experimenter bias
8The experimental method
- a) Begin with a lucid, testable hypothesis
- H0 there is no difference in user performance
(time and error rate) when selecting a single
item from a pop-up or a pull down menu,
regardless of the subjects previous expertise in
using a mouse or using the different menu types
9The experimental method
- b) Explicitly state the independent variables
that are to be altered - Independent variables
- the things you control (independent of how a
subject behaves) - two different kinds
- treatment manipulated (can establish
cause/effect, true experiment) - subject individual differences (can never fully
establish cause/effect) - in menu experiment
- menu type pop-up or pull-down
- menu length 3, 6, 9, 12, 15
- expertise expert or novice (a subject variable
the researcher can not manipulate)
10The experimental method
- c) Carefully choose the dependent variables that
will be measured - Dependent variables
- variables dependent on the subjects behaviour /
reaction to the independent variable - Make sure that what you measure actually
represents the higher level concept! - in menu experiment
- time to select an item
- selection errors made
- Higher level concept (user performance)
11The experimental method
- d) Judiciously select and assign subjects to
groups - Ways of controlling subject variability
- recognize classes and make them an independent
variable - minimize unaccounted anomalies in subject group
- superstars versus poor performers
- use reasonable number of subjects and random
assignment
12The experimental method...
- e) Control for biasing factors
- unbiased instructions experimental protocols
- prepare ahead of time
- double-blind experiments, ...
- Potential confounding variables
- Order effects
- Learning effects
- Counterbalancing (http//www.yorku.ca/mack/RN-Coun
terbalancing.html)
13The experimental method
- f) Apply statistical methods to data analysis
- Confidence limits the confidence that your
conclusion is correct - The hypothesis that mouse experience makes no
difference is rejected at the .05 level (i.e.,
null hypothesis rejected) - means
- a 95 chance that your finding is correct
- a 5 chance you are wrong
- g) Interpret your results
- what you believe the results mean, and their
implications - yes, there can be a subjective component to
quantitative analysis
14Experimental designs
- Between subjects Different participants -
single group of participants is allocated
randomly to the experimental conditions. - Within subjects Same participants - all
participants appear in both conditions. - Matched participants participants are matched in
pairs, e.g., based on expertise, gender, etc. - Mixed Some independent variables are within
subjects, some are between subjects
15Within-subjects
- It solves the individual differences issues
- Allows participants to make comparisons between
conditions - But raises other problems
- Need to look at the impact of experiencing the
two conditions
16Order Effects
- Changes in performance resulting from (ordinal)
position in which a condition appears in an
experiment (always first?) - Arises from warm-up, learning, learning what they
will be asked to reflect upon, fatigue, etc. - Effect can be averaged and removed if all
possible orders are presented in the experiment
and there has been random assignment to orders
17Sequence effects
- Changes in performance resulting from
interactions among conditions (e.g., if done
first, condition 1 has an impact on performance
in condition 2) - Effects viewed may not be main effects of the IV,
but interaction effects - Can be controlled by arranging each condition to
follow every other condition equally often
18Counterbalancing
- Controlling order and sequence effects by
arranging subjects to experience the various
conditions (levels of the IV) in different orders - Self-directed learning investigate the different
counterbalancing methods - Randomization
- Block Randomization
- Reverse counter-balancing
- Latin squares and Greco squares (when you cant
fully counterbalance) - http//www.experiment-resources.com/counterbalance
d-measures-design.html
19Between, within, matched participant design
20Internal Validity
- the extent to which a causal conclusion based on
a study is warranted - Internal validity is reduced due to the presence
of controlled/confounded variables - But not necessarily invalid
- Its important for the researcher to evaluate the
likelihood that there are alternative hypotheses
for observed differences - Need to convince self and audience of the validity
21External validity
- The extent to which the results of a study can be
generalized to other situations and to other
people - If the experimental setting more closely
replicates the setting of interest, external
validity can be higher than in a true experiment
run in a controlled lab setting - Often comes down to what is most important for
the research question - Control or ecological validity?
22Control
- True experiment complete control over the
subject assignment to conditions and the
presentation of conditions to subjects - Control over the who, what, when, where, how
- Control of the who gt random assignment to
conditions - Only by chance can other variables be confounded
with IV - Control of the what/when/where/how gt control
over the way the experiment is conducted
23Quasi-Experiment
- When you cant achieve complete control
- Lack of complete control over conditions
- Subjects for different conditions come from
potentially non-random pre-existing groups - Experts vs novices
- Early adopters vs technophobes?
24Its a matter of control
- Random assignment of subjects to condition
- Manipulate the IV
- Control allows ruling out of alternative
hypotheses
- Selection of subjects for the conditions
- Observe categories of subjects
- If the subject variable is the IV, its a quasi
experiment - Dont know whether differences are caused by the
IV or differences in the subjects
25Other features
- In some instances cannot completely control the
what, when, where, and how - Need to collect data at a certain time or not at
all - Practical limitations to data collection,
experimental protocol