Title: Experiments in the Real World
1Experiments in the Real World
2- Last time we talked about
- Experiments, Good and Bad
3Review Experiments, Good and Bad
- The aim of most statistical studies is to show
how changes in explanatory variables cause
changes in response variables - In an experiment we set the values of the
explanatory variables rather than just observing
them - Observational studies and one-track experiments
often fail to produce useful conclusions because
of confounding with lurking variables - The effects of these confounded variables are
mixed up with the effect of the treatment making
it impossible to say exactly what the effect of
the treatment is
4Review Experiments, Good and Bad
- A randomized comparative experiment solves this
problem - Compare two or more treatments
- Use random chance to assign subjects to the
treatment and control groups - Use enough subjects to ensure that the possible
effect of random chance in assigning subjects to
groups is small - Comparing two groups of randomly assigned
subjects controls for lurking variables such as
the placebo effect because they act equally on
all the groups
5Review Experiments, Good and Bad
- Differences between groups that are so large that
they would rarely be the result of random chance
(in assigning subjects to groups) are called
statistically significant - Statistically significant results from randomized
comparative experiments are the best available
evidence that changes in an explanatory variable
really do cause changes in a response variable - Observational studies of cause-and-effect
questions are more impressive if they compare
matched groups and measure as many lurking
variables as possible to allow for statistical
adjustment
6- Questions from last time ??
7Equal treatment for all
- The logic of a randomized comparative experiment
requires that all subjects be treated exactly
alike except for the treatment(s) - Any unequal treatment may lead to bias
- But, treating subjects exactly the same is hard!
8Example 1 Mice, rats, and rabbits
- There are special breeds of mice, rats and
rabbits that result in all animals of a given
type effectively being clones (genetically
identical) - This insures no genetic variability between them
- Even so there are lots of other ways for
variability to creep in - Identical rats grown in an upper row of cages
grows slightly faster than their cousins living
in the bottom row of cages - This biases any study that may use growth rates
as an outcome, like determining the relative
nutritive value of different breakfast cereals
9Double-blind experiments
- It is a well-proven fact that placebos work
- This fact means that to demonstrate
effectiveness medical studies must clearly
demonstrate that their treatment is better than a
placebo - So, part of equal treatment for all subjects is
to be sure that the placebo effect applies to all
subjects
10Statistical controversies herbal remedies
- What is a Natural Supplement?
- The FDA requires that new prescription drugs and
medical devices demonstrate safety and
effectiveness in randomized trials - Natural supplements are not required to meet
these standards - Natural supplements cannot claim to cure
diseases, but they can help natural conditions - What do we have to say about claims not backed by
well-designed experiments - What are the potential problems ???
11Example 2 The powerful placebo
- The bald placebo a well-designed study found
that 42 of balding men who took a placebo
increased or maintained head hair! - The poison ivy placebo 13 poison-ivy sensitive
patients had poison ivy rubbed on one arm and a
placebo on the other they were told that the
placebo was poison ivy and the poison ivy was the
placebo - All 13 developed a rash from the harmless placebo
- Only 2 developed a rash from the real poison ivy!
- The strength of the placebo effect depends a lot
on the exact treatment and setting, but it must
always be taken into account
12Double-blind experiments
- In the baldness example, 86 of men given the
treatment responded well (kept or grew hair) - The treatment was better than the placebo, but
part of the treatments effect is the placebo
effect - Because the placebo effect can be so strong, it
would be a seriously bad idea to tell subjects
what they are getting - If they know they are getting a placebo, there
will likely be no placebo effect - No placebo effect means we cant sort out how
much of the treatment effect is the placebo effect
13Double-blind experiments
- Similarly, it would be unwise to tell the
experimenters (doctors in the baldness case)
which subjects are getting the treatment and
which arent - The experimenters may give less attention to the
placebo subjects because they know that their
effort is effectively wasted on the placebo
subjects - This is especially true in medical experiments in
which doctors may interact closely with patients
any systematic difference in the amount of
attention given to one treatment group over
another may bias the results - Whenever possible experiments with human subjects
should be double-blind
14Double-blind experiments
15(No Transcript)
16Double-blind experiments
- In a double-blind experiment, only the studys
statistician knows whos who and whats what
until the study is over - This is a typical quote from a medical journal
describing an experiment testing a vaccine
delivered as a nasal spray - This study was a randomized, double-blind,
placebo-controlled trial. Participants were
enrolled from 13 sites across the continental
United States between mid-September and
mid-November 1997.
17Refusals, nonadherers, and dropouts
- Experiments suffer from problems similar to
nonresponse for sample surveys - Subjects who participate but dont follow a
treatment are called nonadherers - This can lead to bias if the type of nonadherance
or nonadherance rate is systematically different
across treatment groups - Example AIDS patients in a clinical trial of a
new drug may be so concerned about actually
getting the new drug that they have their
medication tested, and if they are on the
placebo, supplement their treatment with other
drugs that are not part of the trial!
18Example 3 Minorities in clinical trials
- Rufusal to participate in big clinical trials is
a serious problem - If those who refuse are systematically different
from those who agree, then there will be bias and
the results will not represent the community at
large - Minorities, women and the poor are historically
underrepresented in clinical trials - Often because they were never asked!
- The law now requires equal representation of
these groups and for the most part this is now
true - But, refusals remain a big problem
19Example 3 Minorities in clinical trials
- Minorities, especially the black community, are
still more likely to refuse - The governments Office of Minority Health
-
- Though recent studies have shown that African
Americans have increasingly positive attitudes
toward cancer medical research, several studies
corroborate that they are still cynical about
clinical trials. A major impediment for lack of
participation is a lack of trust in the medical
establishment
20Example 3 Minorities in clinical trials
- Where would this lack of trust come from ???
- The Tuskegee Study
- Some remedies for lack of trust are
- Complete and clear information about the trial
- Insurance coverage for experimental treatments
- Participation of black researchers
- Cooperation with doctors and health organizations
in black communities
21Refusals, nonadherers, and dropouts
- Experiments that extend over a long time suffer
from dropouts - If equal numbers drop out from each treatment
group, not such a big problem - But if subjects drop out in response to their
treatment, then bias can result
22Example 4 Dropouts from a medical study
- Orlistat is a drug that prevents absorption of
fat in foods - Its effectiveness as a weight loss treatment was
investigated in a randomized, double-blind
placebo-controlled clinical trial - Start with 1,187 obese subjects
- Give placebo for four weeks and drop those who
cannot take the treatment regularly ? bye bye
nonadherers! - Randomly assign remaining 892 to Orlistat or
placebo, both with a weight-loss diet - After one year, 576 subjects remain
23Example 4 Dropouts from a medical study
- On average during first year, Orlistat group lost
7 pounds more than placebo group - Go on for another year emphasizing keeping the
weight off Orlistat group regained on average 5
pounds less than placebo group - 403 subjects left at end of second year
- Looks good for Orlistat, can we trust it?
- Overall dropout rates similar in Orlistat and
placebo groups 57 in placebo, 54 in Orlistat
24Example 4 Dropouts from a medical study
- Are dropout rates related to the treatments?
- Not surprisingly, subjects in the placebo group
are often more likely to dropout in weight-loss
experiments - This means that the subjects remaining in the
placebo group by the end are the ones who could
lose weight just by dieting, and this would bias
against Orlistat - Because the dropouts would have lost less weight
and gained more back - Did this happen the statisticians looked closely
and concluded that there was little bias - But, the situation is a lot muddier than we would
like!
25Can we generalize?
- A well-designed experiment tells us that changes
in the explanatory variable cause changes in the
response variable - More specifically, that in a specific environment
certain changes in the explanatory variable led
to certain changes in the response variable - Usually wed like to say something more general
and exciting like changes in the explanatory
variable always lead to changes in the response
variable - The question can we generalize our findings from
a small group of subjects to a wider population?
26Can we generalize?
- The first step is to be sure our result is
statistically significant - This ensures that the result does not occur very
often by chance - We will assume that the study has a good
statistician who can reassure us on this point - The serious threat is that the treatments, the
subjects of the environment of the experiment may
not be realistic
27Example 5 Studying frustration
- A psychologist wants to study the effects of
frustration group relationships - She enrolls a number of student subjects to play
a game together, and the game is rigged so that
they lose or fail most of the time - The psychologist watches through a one-way mirror
to observe changes in their behavior as the
evening wears on - How similar is this to a real situation
- Playing for small stakes in a lab knowing that
the session will soon be over versus - Working for months on a product or project that
is finally abandoned by your boss
28Example 5 Studying frustration
- In the experiment the subjects know they are in
an experiment, and the experiment is unrealistic - The game is rigged
- The environment is a lab
- The timeframe is short and defined
- The psychologists aim is to make claims a about
teamwork in the workplace, but the environment of
this experiment limits her ability to draw
general conclusions of this type - Designing and conducting generalizable
experiments is often difficult
29Example 6 Brake lights
- Randomized comparative experiments in the 1980s
with fleets of rental and business vehicles
concluded that a high center brake light reduced
rear-end collisions by 50! - Based on this, beginning in 1986 all cars sold in
the US are required to have a high center brake
light - Ten years later the insurance institute compared
the rear-end rate of the many cars with and
without high rear brake lights that had been on
the roads by then - A high rear brake light reduced rear-end rate by
only 5 - What happened?
30Example 6 Brake lights
- The environment in which high rear brakes lights
operate changed - Before 1986 only a few cars had them and so they
were unusual and caught peoples eye - As they became common after 1986 no one paid
attention to them anymore and so they became less
effective at alerting people to imminent stops - So, both measurements are in fact reasonably
accurate, but the real effectiveness of high
center brake lights changed because the
environment in which they operate changed
31Example 7 Are subjects treated too well?
- Patients in medical trials often get on average
better care than other patients - They are part of a carefully designed process
whose procedures must be followed carefully, and
this means that they are checked and monitored
more frequently - Their doctors are often specialists in their
field, and they are dedicated to making the
experiment work well - As a result the environment (level/quality of
care) in which the trial patients receive their
treatment is likely to be better than an ordinary
patient - As a result the therapy (whatever it is) is not
likely to work as well on ordinary patients
32Example 7 Are subjects treated too well?
- So, although any therapy that beats a placebo is
likely to have a positive effect on ordinary
patients, the cure rate measured in the trial
is likely an overestimate of the cure rate when
the therapy is applied to ordinary patients
33Can we generalize?
- When experiments are not fully realistic, it is
difficult to generalize the results to a wider
population - Experimenters try very hard to make experiments
as realistic as possible, but this is difficult
or impossible in some cases - Experimenters generalizing from students in a lab
to workers in the real world must argue based on
findings of their experiments and their knowledge
of how people function in the real world - Generalizing from rats cages to people is even
harder !
34Can we generalize?
- A single experiment is rarely enough a question
must be investigated - By multiple experiments
- In multiple environments
- A convincing case that an experiment is
sufficiently realistic to be generalizable rests
on both the statistical design of the experiment
and the experimenters knowledge of the subject - So, good experiments must combine statistical
principles and good understanding of the specific
field of study
35Experimental design in the real world
- Up to now the experimental designs we have met
all have the same pattern - Randomly assign subjects to as many groups as
there are treatments - Apply the treatments to the groups
- This is the a completely randomized design
36Experimental design in the real world
- We have only dealt with experiments that have one
explanatory variable - Drug vs. placebo
- Classroom vs. web instruction
37Example 8 Effects of TV advertising
- What are the effects of repeated exposure to an
advertising message? - This may depend on
- How many times the ad is shown
- How long the ad message is
- An experiment to study this used undergrads
(surprise!) as subjects - Each had to watch a 40 minute TV program
- During the program some saw a 30-second, others a
90-second commercial for a digital camera - The same commercial was repeated 1, 3, or 5 times
during the program
38Example 8 Effects of TV advertising
- After viewing the program all subjects were asked
about their attitude toward the camera and
whether they were likely to buy it - There are two explanatory variables in this
experiment - Length of commercial 2 levels
- Number of times the commercial was shown 3
levels - This means there are a total of 6 (2x3) possible
treatments in this experiment - There are also multiple response variables
- How a subject feels about the camera
- Whether (or not) the subject wants to buy the
camera
39(No Transcript)
40Example 8 Effects of TV advertising
- Frequently, we want to study the combined effects
of multiple variables at once - This is more complicated the effects of multiple
factors can interact to produce results that
cannot be predicted from the solitary effects of
each factor by itself - In the TV example, perhaps by themselves both
longer commercials and more commercials increase
interest in a product - BUT, maybe when a viewer has to sit through more
commercials that are longer, it just gets
annoying and their interest in the product
suffers - The six treatments in example 8 help sort this out
41Matched pairs and block designs
- Completely randomized designs are the simplest
statistical designs for experiments - However, they are often inferior to more
elaborate statistical designs - Matching the subjects in various ways can produce
more precise results than simple randomization - One common design that combines randomization
with matching is the matched pairs design
42Matched pairs and block designs
- A matched pairs design compares just two
treatments - Assign one of the treatments to a matched pair by
tossing a coin or using random digits - Sometimes each matched pair in a matched pair
design consists of just one subject who gets the
treatments one after another - Each subject is their own control
- The order of the treatments could influence the
subjects response so the order is randomized
43Example 9 Coke vs. Pepsi
- Pepsi wanted to demonstrate that avowed Coke
drinkers would actually prefer Pepsi if they
didnt know what they were drinking - The coke-drinking subject each tasted colas from
glasses without brand markings and said which
they liked best - This is a matched pairs design in which each
subject compares two colas - Because the order of tasting might affect the
response the order of tasting should be
randomized
44Example 9 Coke vs. Pepsi
- More than half the subjects said they liked Pepsi
better - Coke responded that the experiment was biased
- The glasses with Pepsi were labeled M while the
glasses with Coke were labeled Q - Could be people just like M more than Q and
this biased their tasting !!! - You could do a better job of this yourself how?
45Matched pairs and block designs
- Matched pairs designs use the principles of
comparison and randomization, but - The randomization is not complete the subjects
are not all randomly assigned to treatments - Instead we only randomize within each matched
pair - This allows us to reduce the effect of variation
among the subjects - Matched pairs is a specific example of block
designs
46Matched pairs and block designs
- A block design combines the idea of creating
equivalent treatment groups by matching with the
principle of creating treatment groups at random - Blocks are another form of control they control
the effects of some outside variables by bringing
them into the experiment in the form of blocks
47Example 10 Men, women and advertising
- Women and men respond to advertising differently
- An experimenter wants to test the effectiveness
of three new TV commercials - She will want to take into account the different
ways the women and men will respond - A completely randomized design would consider
women and men together in one block
randomization would distribute them into the
three treatments without regard to sex, and
this would ignore the differences between women
and men
48Example 10 Men, women and advertising
- A better design would consider women and men
together two blocks - Randomly assign the women to three groups one
for each commercial - Do the same for the men
- Compare the commercials separately for women and
men
49Example 10 Men, women and advertising
50Matched pairs and block designs
- A block is a group of subjects defined before the
experiment starts - The treatment is a condition that we impose on
the subjects during the experiment - In the last example there are 2 blocks and 3
treatments, not 6 treatments - The advantages of block designs are similar to
those of stratified samples - We can draw separate conclusions about each block
- The results are more precise because they remove
the systematic differences between the blocks,
for example between women and men
51Matched pairs and block designs
- Blocking is another important idea in the
statistical design of experiments - A good experiment will include blocks based on
the most important and unavoidable sources of
variability among the experimental subjects - Randomization then averages out the effects of
the remaining variation within each block to
allow an unbiased comparison of the treatments
within each block
52Summary
- As with samples, experiments need both good
statistical design and careful consideration of
practical problems - The placebo effect is strong so clinical trials
and other experiments with people should always
be double-blind - Double-blinding helps to ensure equal treatment
for all subjects except for the treatments that
the experiment is comparing - Experiments suffer from uncooperative subjects
- Refusal to participate
- Drop outs
- Poor compliance
53Summary
- An important limitation of many experiments is
that they cannot generalize widely - Special subjects (like college students)
- Unrealistic treatments
- Unusual environments
- To help with generalizability, we need to repeat
the experiment with different subjects in
different environments at different times - Many experiments use designs more complex than
the basic completely randomized design
54Summary
- Matched pair designs compare two treatments by
giving them at random to each of a pair of
similar subjects or in random order to the same
subject - Block designs form blocks of similar subjects and
assign treatments at random separately within
each block