Experimental Design - The basics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Experimental Design - The basics

1
Experimental Design - The basics

Richard Preziosi

2
How to formulate hypotheses

Where do you start?
What is a hypothesis?
Stating a hypothesis
Generating predictions
Statistical hypotheses (different!)
Only after completing this process will you be
able to decide what data to collect

3
Hypotheses Where do you start?

Start by stating your research question
E.g. Why are male and female humans different
sizes?
Your question may easily produce more than one
hypothesis, thats fine.

4
Hypothesis

A hypothesis is a clear statement articulating a
plausible candidate explanation for observations
It should be constructed in such a way as to
allow gathering of data that can be used to
either refute or support the candidate explanation

5
Stating a Hypothesis

Phrase your hypothesis as a possible answer to
your research question.
E.g. Male and female size differ because males
grow faster than females

6
Generating predictions

These are the testable statements that follow
logically from your hypothesis
E.g. males have a faster growth rate than
females

7
Statistical hypotheses

Predictions should lead you to testable
statistical hypotheses
Note that the hypothesis of interest in
statistics is the one where nothing is different
(the null hypothesis)
A clearly stated null hypothesis will generally
lead you to the correct statistical test
E.g. There is no difference in the growth rate
of males and females

8
Question
Hypothesis
Predictions
Statistical (Null) hypothesis
9
Pitfalls of generating predictions

Weak tests
Indirect measures
Non-useful outcomes
Your tests must satisfy the devils advocate
(e.g. reviewers or examiners)

10
Weak test

Consider the hypothesis Students enjoy the
course in radiation training more than the
workshop in experimental design
Prediction Students will get better grades in
radiation training than in experimental design
This is a weak test (prediction) because other
explanations are equally likely AND because we
have used an indirect measure (grades as a
measure of enjoyment)

11
Non-useful outcomes

These are hypotheses that may well prove
interesting if true but are uninformative if false

12
Satisfying skeptics

Reviewers will look for logical flaws in your
experiments. You do not want to finish your paper
with
My results indicate that mechanism A determines
apoptosis rates. Although mechanism B could also
produce the same response I believe that
mechanism A is the important one
This will earn you a review of the form
This study provides no clear evidence to
distinguish between mechanisms A and B. The
authors need to redesign their study and start
again. Recommendation, reject this manuscript

13
Pilot Studies and Preliminary Data

May be observational or mini-experiments
Ensures sensible questions
Can you observe the phenomenon?
Practice and validate techniques
Minimize training effects of data
Recognize logistic constraints
Standardization across observers
Allows tuning of design and statistics
Assessment of sample sizes (power)
Test run of statistical analysis

14
Experimental ManipulationVs. Natural Variation

In Manipulation studies you change an aspect of
the system and measure effects on traits of
interest (majority of lab studies and
Agricultural studies)
In Correlational studies you measure associations
between traits of interest (often assuming one is
influencing the other) (Many Environmental and
most Human studies)
Consider the hypothesis Long tail streamers seen
in many species of birds have evolved to make
males more attractive to females

15
Correlational study usingNatural Variation

In the bird tail length example we could
Measure the tails of males at the beginning of
the breeding season
Observe the number of matings each male has
Do statistics to determine if there is a
relationship between tail length and number of
matings
Results showing a relationship would support our
hypothesis
Results not showing a relationship would go
against our hypothesis

16
Manipulative study

In the bird tail length example we could have 4
groups of birds
Results showing males with artificially long
tails had more mates supports our hypothesis
Results showing males with reduced tails had
fewer mates also supports our hypothesis
A comparison of group 1 males with the
unmanipulated males acts as a control comparison

17
Arguments for correlational studies

Often less work (but larger sample sizes usually
needed)
Deals with real levels of biological variation
(manipulations may take things outside naturally
occurring limits)
Requires less handling of organisms (important if
there are constraints like stress to animals or
endangered species)
Manipulative studies may produce unintended
effects (e.g. flight ability in example or
epistatic effects in knockouts)
Manipulation may not be possible
May provide a baseline study manipulative expts.

18
Arguments for manipulative studies(really,
against correlational studies)

Third variables
Reverse causation
These can be BIG problems if they occur

19
Third Variables

Third variables occur when there is an apparent
link between A and B but in fact there is no
direct link or mechanism. Instead both A and B
depend on C, the third variable.
This means that patterns in correlations studies
are just that, correlations.
Remember, correlation does not imply causation

20
Third Variables - an example

In the bird tail length example lets say that we
do see a correlation between tail length and
number of mates
Suppose that females are actually attracted to
territories not males, but that males on better
territories can grow larger tails
The third variable here is territory quality and
it drives both tail length and number of mates
and produces an apparent relationship.

21
Third Variables - Two famous examples

Fisher suggested that the link between smoking
and cancer was correlational not causative and
that another factor, perhaps stress, led people
both to smoke and develop cancer.
Fewer women postgrads marry than women in the
population as a whole. This relationship is
presumable due to some other correlated factor
(third variable)

22
Reverse causation

This occurs when it is assumed that correlation
implies causation
In some cases this can be ruled out based on
other data or common sense
In the bird example it is unlikely that the
number of mates for a male has any effect on tail
length measured at the start of the mating
season.

23
Reverse causation - a famous example

There is a correlation between the number of
storks nesting in chimneys and the number of
children in a house (old data from Holland)
Although storks bringing babies makes a nice
story the causation is likely reversed
Larger families tend to live in larger houses
with more chimneys, and hence more opportunities
for storks to nest.

24
Variation, replication and sampling

Variation among individuals
Replication and the experimental unit
Pseudoreplication

25
Variation among individuals

Variation among individuals is a given for most
biological systems
In any experiment we are concerned with variation
in the Response or Dependent Variable
Variation in the response variable can be divided
into
Variation explained by experimental factors (IV)
Variation not explained by experimental factors
(AKA error variation, random variation noise)
In most studies we are interested in reducing
noise and, hopefully, increasing explained
variation

26
Variation among individuals

Single measurements from each treatment do not
allow us to distinguish between noise and effect
make sure you have a sufficient number of
individuals that experience the same manipulation
These individuals that receive the same
manipulation are called replicates
What is the experimental unit?

27
Pseudoreplication

This occurs when there is confusion between
treatments, replicates and blocks.
Consider an experiment comparing the effect of a
toxicant on fish behaviour.
Lets say the toxicant is prepared in a batch and
drip fed into the treatments tanks (water is drip
fed into the control tanks)
Are the replicates
Each fish in a tank?
Each tank?
Each set of tanks on a common drip?
Each batch of toxicant?
Dont expect a simple answer, the answer is in
the biology, not in statistics

28
Common sources of Pseudoreplication

Shared enclosures
Common environments
Relatedness
Pseudoreplicated stimulus
Non-independence of group behaviour
Pseudoreplicated measurements over time
Species comparisons
Sometimes pseudoreplication is unavoidable

29
Random sampling

Proper random sampling means that each individual
has an equal chance of being allocated to each
treatment group
The problem with non-random treatment of samples
is that any bias in assignment of individuals or
systematic pattern to errors may bias your
results
True random samples almost always require the use
of computers or random number tables

30
Random assignment and treatment

Random means not only random assignment but also
random treatment
Lets say that you are examining the effect of
rhizosphere bacteria on plant growth.
Not only should each plant have an equal
opportunity of being assigned to the bacterial or
non-bacterial (control) group all other aspects
of the process should be random as well.
Plants should be planted in equivalent compost
(possibly in random order)
Plants should be randomly allocated to growth
chambers and perhaps positions in chambers

31
Haphazard sampling

Haphazard does not mean Random
A haphazard sample is based on personal
assignment by the experimenter in a fashion that
they believe is random
Often severely biased even if the experimenter is
consciously trying to take a random sample
Consider trying to randomly select mice from a
bucket or randomly pippetting out aliquots of a
cell culture
True random samples usually involve setting up
experimental units BEFORE assigning treatments
BUT this is not always possible, use common sense
(or blind assignment)

32
Self selection

This is a real problem with survey or poll data
The subset of a population that respond to
surveys is rarely a random sample and thus may
bias your results
By all means use surveys to inform your research
BUT be very suspicious of anything but general
conclusions

33
Pitfalls of Random Sampling

Make sure that the randomization procedure you
use does what you intend
Randomise the order of collecting data - learning
effects
Random samples Vs. Representative samples - dont
let computers do your thinking for you

34
Sample size - how many replicates

Too few replicates can be a disaster - too many
can be a crime!
Always use educated guesswork - i.e. look at
similar experiments by previous workers and
determine what worked.
Pay attention to differences between the studies
Formal power analysis - do if possible!!!
Requires that you have some guess of variation
among replicates
Requires that you have an idea of how big of a
treatment effect you can expect (or require)
Requires that you know what statistical test you
will use

35
Sample size - Resource Equation Model

Can be used for complex studies or when variation
among individuals is unknown
Only appropriate for quantitative data
Gives conservative estimates of sample size so
more appropriate for
Large effect size (e.g. lab rather than clinical)
Testing for significant effects rather than
estimating parameters
E N - T - B
N is the total number of individuals -1
T is the number of treatments -1
B is the number of blocks -1
E is the error df and should be between 10 and 20
In some cases E should be larger (see Festing et
al.)

36
Sample size optimization (Festing et al.)
37
Controls

This is the reference against which the results
of an experimental manipulation can be compared
Thus your control group should be identical to
your treatment group in everything except the
treatment itself
Simple concept, common mistake
If the predictions and statistical hypotheses
have been constructed well then the control group
will be obvious
Lack of a control group makes an experiment
pointless

38
Types of Controls

Negative control - unmanipulated
Positive control - manipulated but not treated
(vehicle control, sham procedure control)
Concurrent control - run at the same time as the
treatment group
Historic control - based on previous data (be
certain that individuals are identical except for
the treatment)

39
Blind Procedures

Designed to remove the perception that
unconscious bias might taint results
Particularly useful when response variables are
measured in a subjective way
Blind Procedure - person measuring has does not
know what treatment has been applied
Double Blind - Both the subject and the person
measuring does not know the treatment assigned
(human studies)

40
When controls are not needed (or allowed)

In medical or veterinary studies controls may be
an ethical issue, Historical controls can be used
but give careful consideration to criticisms
When sets of treatments are being compared (e.g.
effect of two drugs on rat behaviour)

41
Factorial experiments

2 group comparison (t-test) design
Treatment and control compared
1 factor design
Control and several levels of treatment compared
2 factor design
More than one treatment considered simultaneously
Allows estimation of both main effects AND the
interaction between them

42
Main effects and interactions

Food Strain Interact

X - -
X X -
X X X
X X X
X X X
43
Main effects and interactions
44
Completely randomized designs Vs. Blocking

Completely Randomized designs are usually simple
Completely Randomized designs assume small among
individual variation
If among individual variation can be attributed
to a known factor then you can BLOCK by that
factor, reduce error variation and increase your
signal to noise ratio (clearer results)

45
Advantages of blocking
46
Advantages of blocking

Blocking is commonly used to remove effects of
Space
Time
Individual characters that can be ranked
Continuous characters that effect among
individual variation can be used as covariates to
remove effects and improve signal to noise ratio

47
The most common design errors

Ad hoc designs
Inappropriate control/treatment groups
Sample sizes too large or too small
Failure to use blocking
Lab animal studies failure to use isogenic
strains when GxE unimportant

Write a Comment

User Comments (0)

About PowerShow.com

Experimental Design - The basics PowerPoint PPT Presentation