The Basics of Study Design

About This Presentation

Title:

The Basics of Study Design

Description:

time spent exercising and a diagnosis of skin cancer. Barry Braun, Ph.D. Basics of Study Design ... that is worn on the hip and is sensitive to motion. ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 82

Provided by: barry53

Category:

more less

Transcript and Presenter's Notes

Title: The Basics of Study Design

1
The Basics of Study Design
Barry Braun, PhD, FACSM Associate
Professor Director, Energy Metabolism
Laboratory Department of Kinesiology University
of Massachusetts Amherst, MA
2
A fairy tale

While boardsailing in Belize,
physician/
scientist Dr. Dulcinea Toboso gets hit on the
head by
her mast and knocked unconscious. She wakes up in
a hut where she is cared for by a tribe of people
who
share a remarkable characteristic every person
is
lean and toned, even though they eat massive
meals
and do absolutely no exercise. They tell her the
secret is the bark of a rare tree that only grows
in the
misty cloud forests that hide the interior of the
island.
The bark smells like elephant feces and somehow,
tastes even worse.

Though it is strictly
forbidden, Dr. Toboso leaves
with several kilograms of bark hidden
in her bathing suit. She
flies to San Francisco and heads to her
laboratory
to isolate the active ingredient, which she plans
to
market as "Bark-a-lounge", a dietary supplement
designed to cause fat loss and muscle growth
without any need for exercise. As a conscientious
scientist, she decides to do a research study to
show how well it works. She writes the study
design on her prescription pad and orders her
long-suffering assistant to do the following
study

A group of 12 men she knows from her gym will
participate in the study. They will weigh
themselves
at home and then come to the laboratory so their
body fat can be measured using skin fold
calipers.
Then they will do as many pushup and situps as
they possibly can. They will be given 30 doses of
"Bark-a-Lounge" in pill form and
told to
take 2 per day for about 15 days.
Then, they
will re-weigh themselves, come back to the lab to
have body fat re-measured and do as many
pushups and situps as possible. Dr. Toboso is
sure
that the men will lose fat but gain strength
after
taking "Bark-a-Lounge" for 15 days.

5
Objective

Although we have to give Dr. Toboso credit for
even considering actually subjecting her product
to
scientific testing, many of you recognize that
her
study design is not optimal. The overall goal of
this
lecture is to allow you to recognize the
strengths
and the flaws in published studies and media
reports. If you plan to conduct your own studies,
this
lecture will aid you in designing them in a way
that
maximizes their contribution to the body of
scientific
knowledge that is used to enhance the performance
of athletes and the health of the general public.

6
Plan of attack

Part 1 True Lies
What kind of study? Epidemiology vs. experiment
cross sectional vs. longitudinal, association and
causality, validity and reliability
Part 2 Of Mice and (Wo)Men
Humans, animals or cells?
Controlling
confounding variables vs. real world application.

7
More plan of attack

Part 3 Sub-divide and conquer
How do you attack big important questions?
One big study or many small ones?
Part 4 The Color of Money
Can the funding source affect the
study
design? The results?
Part 5 You cant always get what you want
All studies have flaws. Why continue to do them?

8
Some useful terms

Subjects participants in a study (usually only
used when participants are human)
Variable Something that can be measured.
Independent variables are controlled by the
investigator (research scientist). Dependent
variables are not.
Treatment What subjects are exposed to. Also
called exposure or condition.

Outcomes The dependent variables. The
answers to the question you are interested in.
Control group or condition What the treatment or
exposure is compared with. Can be the initial
state
(baseline) or can be a group that is either given
no
treatment or a non-functional placebo.
Relative to starting weight (baseline), what is
effect on body weight (outcome) when I give 100
people (subjects) three pints of ice cream per
day
for 6 months (treatment) as compared with 100
people who get no ice cream (control group)?

10
Epidemiological Studies

One or more characteristics of a
population (e.g. weight or blood lipids or
dietary habits) are assessed (usually by using
questionnaires but other techniques used as
well). Subjects are not asked to change behavior
or subjected to treatments like exercise or diet
change.
Researchers do not control the experimental
conditions they are trying to understand
behavior or physiology or metabolism in a
natural setting.

11
Cross Sectional

The variables of interest are measured once.
E.g., survey 600 subjects (300 W and 300 M) and
measure height. Exposure is gender and the
outcome is height.
Mean (average) height for men 175 cm
Mean height for women 165 cm
Based on your data, you might conclude that men
are taller than women.

Note that EVERY man was not taller than
EVERY woman. There is a lot of variation in
human height (lets say men in your sample
ranged from 155-195 cm and women from
148-185 cm).
But the average or mean height for men (175 cm)
is
greater than the mean height for women (165 cm).

148 165 175 195
13

Because there is so much variation in height
within
each gender (about 30 cm in your sample)
compared to the mean DIFFERENCE in height
(only 10 cm), you need to study a lot of subjects
to
see a difference between men and women that
accurately represents the population.

Although very useful to illustrate a relationship
between exposures and outcomes, a problem with
observational studies is that you often cant
determine if the exposure caused the outcome.
Lets say you are interested in whether doing a
lot of
aerobic exercise lowers the risk for getting
cancer
in particular, skin cancer. You send out surveys
to
hundreds of people asking about their exercise
habits and whether they had skin cancer. This is
a
case-control study it compares people who got a
disease (cases) with those who didnt
(controls).

15
Retrospective studies

You could do this study retrospectively, that
is,
you could look through medical records, find
cases
of skin cancer, and mail surveys to the people
you
identified asking them about their exercise
habits.
The downside to this approach is that you depend
on peoples memory of their past habits. You
might
minimize this problem by having people mail you
their training diaries but many will be
non-existent or
incomplete and you have no way to determine
whether or not they are accurate.

16
Prospective studies

You can also do this study prospectively. You
start
with a group of individuals who DONT have the
disease and track them for some period of time.
Then, you look for differences between people
who got the disease vs. those who didnt.
You might randomly contact 5000 people from the
phone book and assess their exercise habits every
year. At the end of 5 years you would see who got
skin cancer and if there was a relationship
between
time spent exercising and a diagnosis of skin
cancer.

The advantage of a prospective design is that the
subjects are followed longitudinally, that is
over
time rather than cross-sectionally which only
gives a single snapshot at one time point.
But to get meaningful comparisons you need to
have a fairly large number of people who get the
disease so that you can separate them into groups
that differ by exercise habits. And some of the
subjects will move away or lose interest over
time.
So to get accurate results often requires
recruiting
and tracking thousands of people for multiple
years.

18
Questions and answers

Lets say that your results show that people who
run
and cycle and swim gt 20 hours/week have higher
rates of skin cancer than people who dont
exercise
at all. Can you conclude that triathlon training
causes skin cancer? Alert the media!
Most triathletes spend an enormous amount
of time outdoors with a lot of skin exposure
to the sun. So is it exercise that causes more
skin
cancer or is it more exposure to UV radiation
from
the sun. Unless you collected data on sun
exposure
in your survey, you would have no way to know

19
Isolating the outcome of interest

With enough subjects and enough information
there are statistical methods to separate the
key variables. E.g., if you had good data on both
exercise habits and sun exposure you would see
that if you remove or factor out the sun
exposure
variable, there is no longer any association
between exercise habits and skin cancer. So it is
sun exposure, not exercise, that increases the
risk
for skin cancer.

Take another example. Lets say you want to test
the
hypothesis that a high intake of fat increases
the risk
for heart disease. You would need to
1. accurately identify the men and women in
the population who get heart disease
2. accurately assess how much fat is in the
diet of each person
3. compare dietary fat in people who get heart
disease with dietary fat in people who dont

21
of people who get heart disease 0 20 40
60 80
10 30 50 70
dietary fat as a of total kilocalories

This graph (I made it up) says that the number
of people who get heart disease increases as
the amount of fat in the diet increases.
What are potential problems with this story?
Well, did
we measure what we thought we were measuring?

22
Validity

Validity refers to the accuracy or truthfulness
of a measurement. In other words, are you
actually measuring what you think you are
measuring?
This can be obvious (using a body weight scale to
measure body fat), less obvious (are lower blood
lipids after starting exercise training due to
training or accompanying weight loss?) or very
subtle (do athletes perform better when given
carbohydrate during exercise because the sugar
does something directly or because they think
they should do better when given carbohydrate?)

23
Measuring physical activity

Activity monitors are a good example of how
difficult it can be to develop tools that yield
valid
measurements of physical activity. There are
many types of activity monitors available
pedometers, accelerometers, etc.
If you are a scientist interested in accurately
measuring daily physical activity how valid are
these tools?

For example, you decide that collecting physical
activity information using questionnaires is too
subjective and prone to bias so you decide to
measure it objectively using an activity monitor
that is worn on the hip and is sensitive to
motion.
You give the accelerometers to 20 people and
measure their activity for 7 days to assess their
physical activity. 10 of your subjects
are world class cyclists and 10 are typical
college
students. After 7 days your measurements
indicate
the college students are more active than the
elite
cyclists! How can this be?

Since the activity monitor only measures
movement in the vertical plane, the 600 miles
each
of your cyclists covered during the week on their
bicycles was not detected as movement by the
monitor.
This is an extreme case but researchers
are constantly forced to consider am I
really measuring what I need to measure?.

26
What do your subjects eat?

One of the most common measurements
attempted in Sport Nutrition is diet analysis. It
seems straightforward you collect information
from subjects about what they eat over the course
of a few days and enter the foods into a database
which spits out grams of carbohydrate and protein
and thiamine and iron and vitamin C, etc.
In reality, the measurement is fraught with
potential inaccuracy.

27
Sources of potential error

How do you account for portion size? Estimate
based on showing the subjects plastic food models
before you start the study? Have them weigh their
food? Better but they have to carry their scales
everywhere with them. What about combination
foods? How do they tell you ingredients and
portion sizes of the seafood paella they had at
their best friends wedding? And how do you know
they are remembering to report
everything they ate?

And the process of having to weigh their food and
write everything down changes their typical
behavior.
People avoid foods that are difficult to record
accurately and start choosing easy things like
prepackaged foods that are conveniently labeled.
Diet records are often inaccurate even in
the hands of experienced users. Many subjects
under-report their actual food intake by hundreds
of
kilojoules/day. In contrast , women with eating
disorders may OVER-report actual food intake.

29
Internal Validity

Chance what is the chance that the outcome you
observe could occur even with NO association
between the exposure and outcome you measure?
Measured statistically and reported as a
p-value showing probability of obtaining the
result by chance. Commonly define p-value lt.05
(5) as statistically significant. This means
there is a 95 chance that the observed effect is
NOT due to chance alone.
Is this good enough? Is it too restrictive?

What are the consequences of getting it wrong?
Willing to accept an error rate higher than 5 if
the
consequence is getting the wrong sandwich.
Not willing to accept error rate greater than
0.1 if
consequence is landing on jagged rocks.
Every reader will have to use their own judgment
regarding their comfort level with a given
probability that the results are due to chance.
Most
journal editors have a comfort level right at 5.

Bias a systematic error that misrepresents the
association between the treatment and outcome.
Investigators may design the study in a way that
makes it more likely to get a particular outcome.
Or, in conducting the study, they may treat the
subjects in one group differently than in the
other
group (e.g. more encouragement during a maximal
exercise test with the treatment than the
placebo)
Subjects can bias a study as well. Food intake is
often not accurately reported e.g. faulty
memory or
wanting to supply the right answer.

32
Reliability

Reliability refers to the reproducibility of a
measurement. Measurement tools (surveys,
activity monitors, etc) are often tested
extensively before being used in studies to
determine if the values they report are
reproducible. Reliability is the main reason
researchers often need to make multiple
measurements over several days .

33
Reliability
It is important to be clear on the distinction
between validity and reliability. A measurement
can be reliable but not valid i.e., it measures
incorrectly every time. Investigators require
results to be both reliable and valid.
Reliable but not valid
Neither
Reliable AND Valid
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
34
Reliability influences of measurements

Some measurements, e.g. maximal oxygen
consumption (VO2max) are very reliable. You can
measure VO2max on different days, different times
of day, before or after a snack, and the results
will
almost always be within a few of each other.
On the other hand, resting metabolic rate varies
day to day and is very sensitive to time of day,
food intake, exercise, room temperature, etc.
Need very controlled conditions and have to
repeat
measurements at least 3 times

Back to the made-up graph which indicates that
the
number of people who get heart disease increases
as
the amount of fat in the diet increases.
What are other potential problems with this
story?
Did account for all the other confounding
variables?

A confounding variable is associated with both
the exposure and the outcome and that affects the
association between the exposure and outcome.

more exercise hours per week
more skin cancer
more sun exposure
The relationship between exercise and skin cancer
is confounded by strong relationships between
exercise and sun exposure and between sun
exposure and skin cancer. Trying to minimize
confounding variables is the most difficult and
time-consuming part of study design
37

Can we accurately measure the rate of heart
disease (probably) and the amount of fat in the
diet
(much more problematic)?
Do other factors need to be considered?
gender (true for men AND women?),
age (maybe elderly people eat more fat)
ethnicity (directly or indirectly)
other risky behavior (smoking, lack of
exercise,
less frequent physicals, etc.) in people who
eat more fat in diet?

Can you consider all the other factors?
Clearly not b/c we dont even know what they all
are
(e.g. there is a lot of recent evidence that the
conditions a fetus encounters in utero can have
an
impact on adult-onset disease).
Even if you could, does a positive relationship
between 2 things (as 1 goes up, the other also
goes up) prove that one causes the other?

39
price of gasoline
distance from the Earth to Saturn

During this time period (2005), there was strong
association between the distance from Earth to
Saturn and the price of gasoline. Did gasoline
prices
rise because Earth was getting farther from
Saturn?
The relationship is a coincidence
Association does not mean causality

So, epidemiological studies are difficult to
design
in a way that gives you clear, definitive
answers.
To get a sharper picture of the causal
relationships
between diet and health or performance you can do
an experimental study.
Take a group of healthy people, feed them
different
amounts of fat, and see who gets heart disease?

41
Experimental Studies

The key difference from an observational study is
that the investigator actively manipulates the
treatment instead of letting things happen by
chance. Because the experimental conditions are
controlled, there is a much greater chance that
the outcomes are directly related to the
treatment.
A disadvantage is that by manipulating the
conditions, the results may have less direct
relevance to what happens in the real-world

42
Experimental Studies

In experimental research, study subjects (whether
human or animal) are selected according to
relevant characteristics and then assigned to
either an experimental group or a control group.
The subjects in the experimental group receive
treatment and the control group receives no
treatment or a placebo. If you do this
correctly, you can assume that differences
between the groups at the end of the study were
caused by the treatment.

43
Experimental Cross Sectional

Experimental studies can be cross-sectional
(multiple groups getting a single treatment) or
cross-over (one group getting multiple treatments
including control). In a cross-sectional design,
subjects are randomly assigned to either a
treatment or a control group. They are exposed
to the treatment or control for a period of time
and then the outcome is compared between the two
groups. Lets say you wanted to test whether
consuming only simple sugars for 28 days would
cause more synthesis of muscle glycogen compared
with a normal diet.

Your cross-sectional design might look something
like this

Group 1
Group 2
28 days
Baseline test of muscle glycogen synthesis
Groups randomly assigned
Re-test of muscle glycogen synthesis
45
Assigning subjects to groups

One of the keys to doing this right is to ensure
that
the 2 groups of subjects are as similar as
possible.
To do this, subjects are usually randomly
assigned
to the placebo or control group.
An alternative is to match subjects in each group
on some key characteristics (e.g. age, weight,
training status, aerobic capacity). This helps to
distribute any characteristics that might
influence
the results across the groups.

An example of why randomization is important can
be seen in the following example
Researchers want to determine if a high fat diet
during marathon training can improve performance.
They do a baseline (before any treatment) test of
aerobic fitness to all of the potential subjects.
Then
they assign them to different groups 20 to the
high-
fat diet group and 20 to the high-carbohydrate
diet
group. Then they train them using the different
diets
for 12 weeks.

At the end of that time, they redo the test of
aerobic fitness and find that the high-fat group
has improved considerably more (increased
VO2max from 45 to 52 ml/kg/min) than the high-
carbohydrate group (only increased from 68 to 70
ml/kg/min). They report in all of the media
outlets
that runners can gain twice the training effect
by
using a high-fat diet. Is this reasonable?

Notice that the baseline VO2max was considerably
higher in the high-fat group. Runners were
clearly
not randomly assigned the high-carbohydrate
group seems to have contained really fit elite
runners (whose VO2max is already about as high
as it can be) and the high-fat group look like
mainly
novice runners (who can improve a lot with
training).
If the groups had been randomly assigned, the
baseline VO2max would have been similar in the 2
groups. In that case, a larger improvement in the
high-fat group could be interpreted as due to the
diet (assuming everything else had been done
right!)

49
Blinding

Randomization is often blinded to limit
experimental bias (an interest in having a
particular
result). Blinding is used to prevent bias from
influencing the behavior of both the
investigators
and the subjects. There are two types of
blinding,
single blind and double blind. In a single
blinded
study the investigators know which treatment the
subjects are getting but the participants do not.
In a
double blinded study, a neutral third party
assigns
the groups and neither the investigators nor the
participants are aware of the group assignments.

A drawback of cross-sectional study design is
that
no matter how well you match the 2 groups on
important characteristics like age, height,
weight,
fitness, etc., there is no way to do this
perfectly.
Two groups may be similar but they cant be
identical, meaning inter-individual variability
(genetic and other differences between people)
will
be a limitation to showing clear differences
between the treatment and the control groups.
Wouldnt it be great if you could clone each
subject and use their clone in the other group?

51
Experimental Cross Over

In a cross over design, subjects serve as their
own controls. Half of the subjects get the
treatment and the other half get placebo. Then
the same subjects undergo the opposite protocol.

½ of group
½ of group
28 days
Baseline test of muscle glycogen synthesis
order of treatment randomly assigned
28 days
Re-test of muscle glycogen synthesis
Final test of muscle glycogen synthesis
1 month washout
52
Washout period

A potential problem with the cross-over design is
that effects of the first condition (e.g.
treatment)
may have an impact on the response to the second
treatment (e.g. control). The solution is to put
a
washout period between the 2 conditions to
allow
the effects of the first condition to disappear.
This washout period may be long (months for
some interventions like training or lipid-soluble
anabolic agents). This makes the study very
lengthy
and it can be difficult to keep subjects in the
study.

53
External Validity

Also referred to as generalizability meaning how
applicable are the results to the general
population. To increase the external validity,
investigators can study subjects varying in
gender, race, ethnicity, age, weight, etc. By
doing this, it is more likely that results can be
applied to the general population.

54
Overgeneralizing

Many classic studies in nutrition (for example
the
response to semi-starvation and re-feeding human
protein requirements) were performed almost
solely using Caucasian, male, healthy subjects in
their 20s and 30s.
Nutritional requirements were generalized from
those studies to the entire population, despite
few
data on women, children, ethnic/racial minorities
or
people with underlying health problems

55
Trade-offs

All major funding agencies now mandate inclusion
of women and minorities or require a strong
justification for not doing that.
Why not include as many types of subjects as
possible in order to maximize the external
validity?
Increasing external validity also means
increasing
the number of potential confounding variables. In
some studies, it is more prudent to use a
specific
population to minimize confounding variables

56
Basic research studies

Experiments under highly controlled conditions
are often necessary to confirm observations or
uncover how a process works (the mechanism of
action). They may be conducted in vitro (e.g.
with cell populations on culture plates) or with
animals.
These studies allow the investigator to isolate
one variable of interest without confounding
variables such as environmental factors, genetic
variation, and differences in dietary or physical
activity patterns.