Study design Sample selection - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Study design Sample selection

Description:

Big picture. The type of study and the sample selection ... Ex. Studies comparing the diet / exercise of mothers of babies with and without birth defects ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 31
Provided by: bhea
Category:
Tags: better | big | design | diseases | doing | ex | for | men | my | of | or | pictures | random | rare | sample | selection | study | women | worse

less

Transcript and Presenter's Notes

Title: Study design Sample selection


1
Study design / Sample selection
  • Summer Course
  • Brian Healy

2
What have learned so far
  • Types of data
  • Descriptive statistics
  • Tables and graphs
  • Basics of R

3
What are we doing today?
  • Association vs. causation
  • Types of studies
  • How do we choose a sample?
  • Generalizability
  • Sampling variability

4
Big picture
  • The type of study and the sample selection
    determine the type of conclusions one can make
    from the study
  • Sometimes the study design can be chosen by the
    experimenter, but often the study design is
    dictated by the question of interest. For
    example, if we are interested in determining the
    difference between men and women we cannot
    randomize people to gender

5
Association vs Causation
  • Association
  • Definition two factors are related, ex. height
    and weight
  • There is no implied cause and effect relationship
  • Causation
  • Definition one factor causes a second factor,
    either directly or indirectly, ex. laying in the
    sun causes sun burn
  • The causal factor is required for the outcome to
    occur

6
Types of studies
  • Intervention
  • Randomized trial
  • Observational
  • Cohort study
  • Case-control study

7
Randomized clinical trial
  • Definition study in which patients are randomly
    assigned their exposure
  • One of the most common study types, especially
    for determining the effect of a treatment on an
    outcome because patients can be randomly assigned
    treatment
  • Advantages
  • Can draw causal conclusions
  • Controls for known and unknown risk factors
    related to the outcome and exposure
  • Disadvantages
  • More difficult to design
  • Only applies to certain types of exposures
  • Adherence

8
Why randomize?
  • We randomize subjects to treatment to ensure that
  • Patients in each group are exchangeable, meaning
    that the two groups are exactly the same other
    than the treatment assignment
  • This allows the investigator to say that the
    difference in the outcome over the two groups is
    caused by the treatment
  • Therefore, we can say the CE Y(treatment group)
    Y(No treatment group)

9
How do we randomize?
  • The choice of randomization technique is often
    based on the type of intervention
  • For a drug treatment, we can randomize
    individuals
  • For a group treatment, we may need to randomize
    at the hospital or school level
  • The key to any randomization is to ensure that
    the groups are the same other than the exposure
    of interest
  • Simplest technique is to use a random number
    generator and assign people based on this number

10
When can we not randomize?
  • Exposures that cannot be changed
  • Sex
  • Age
  • Exposures that are not ethical to assign
  • Smoking
  • Obesity
  • Rare diseases
  • If we randomize to exposure, we may never observe
    the disease

11
Observational studies
  • Def Studies in which the group membership is
    simply observed, rather than assigned by the
    investigator
  • Advantages
  • Do not need to intervene on subjects
  • Can complete studies after disease has already
    occurred
  • Disadvantages
  • More difficult to make causal conclusions
  • Must collect many other factors to ensure you
    have controlled for confounders

12
Observational studies
  • Case-control study Patients are broken into
    groups based on disease status and exposure
    status is compared
  • Cohort study Patients are broken into groups
    based on exposure status and disease status is
    compared
  • Prospective cohort study Patients are observed
    with a specific exposure and followed over time
    to determine if they develop the disease
  • Retrospective cohort study Patients are observed
    after the disease may or may not have developed,
    but groups are still defined by exposure status

13
Case-control study
  • Rare disease
  • With a rare disease, you cannot match based on
    the exposure because you are rarely going to have
    anyone develop the disease
  • This design allows the study participants to be
    found using all available cases and the
    appropriate controls. Finding appropriate
    controls is often the most important design
    aspect of the study.
  • There is a link between the odds ratio of
    exposure given disease and the odds ratio of
    disease given exposure, which allows conclusions
    about the exposure

14
Choosing controls
  • In case-control study, it is often difficult to
    determine what is the best control group
  • Depending on exposure of interest can use
  • Uninfected family members (Likely to participate)
  • Other patients in the hospital (Convenient
    sample)
  • Neighbors (Environmental factors)
  • Random people
  • The key to any set of controls is to ensure that
    selection bias is limited.
  • Must also consider recall bias

15
Prospective cohort study
  • Advantages
  • Ensures a temporal relationship between the
    exposure and the outcome
  • Easier to find controls
  • Limited recall bias
  • Disadvantages
  • Can take significant time
  • Require many factors to be collected
  • Often many are lost to follow-up
  • Only non-rare disease

16
Retrospective cohort study
  • Advantages
  • Can collect all of the data at one time, no
    problem with loss to follow-up
  • Can use relative risk and risk difference, rather
    than odds ratio (case-control), which are more
    interesting
  • Disadvantages
  • Can have recall bias
  • Temporal relationship may not be clear

17
How do we pick our sample?
  • Choosing exposure group
  • Choosing control group
  • Sample size

18
Reasons for differences between groups
  • Actual effect-when there is a difference between
    the two groups (ex. the treatment has an effect)
  • Chance
  • Bias
  • Confounding

19
Chance
  • When we run a study, we can only take a sample of
    the population. Our conclusions are based on the
    sample we have drawn. Just by chance, sometimes
    we can draw an extreme sample from the
    population. If we had taken a different sample,
    we may have drawn different conclusions. We call
    this sampling variability.
  • The key for this is that the sample must be
    different purely by chance, not because of a flaw
    in the study design, which is a type of bias.

20
Example
  • Lets say we were interested in determining if
    there is a difference in the average height in
    Boston and New York.
  • Assume that the distributions are the same and we
    take two random samples from the population (B
    and N).
  • Just by chance, sometimes these will be very
    different

B
N
21
Chance
  • The most common application of biostatistics
    determines the probability of observing a certain
    difference. If the probability of the event is
    sufficiently small, we say that the difference is
    likely not due simple to chance and we have an
    actual effect. We call the probability, the
    p-value, and the cut-off, the alpha level.

22
Bias
  • Selection bias Patients in one group are more
    likely to have the disease
  • Ex. Patients in exposure group are systematically
    more ill than patients in control group
  • Recall bias Patients in one group have better or
    worse memory of the exposure
  • Ex. Studies comparing the diet / exercise of
    mothers of babies with and without birth defects
  • Observation bias Patients or doctors in one
    group are more likely to detect disease
  • Ex. Doctors of patients with a disease
    susceptibility may be more likely to spot the
    disease because they are looking for it
  • Interviewer bias People conducting interviews
    gather information differently in two groups

23
  • You can usually not correct for bias after the
    study has been conducted
  • There is some research into using ideas of DAGs
    and causal inference to control for selection
    bias (EPI 289), but you need to collect data on
    the reason for the bias
  • You must anticipate the possible types of bias in
    your study design

24
Confounding
  • Def when a factor is related to the exposure and
    the outcome and therefore can mask the true
    effect
  • To control for confounding, you must collect
    information about all of the possible confounders.

25
  • As an example, think about alcohol consumption
    and lung cancer. It can be shown that there is an
    association between alcohol consumption and lung
    cancer, but this effect may be caused by the fact
    that smoking causes lung caner and smoking is
    associated with alcohol use

Cigarette smoking
Alcohol use
Lung cancer
26
Controlling for confounding
  • In a randomized trial, there should be no
    confounding because no factor is associated with
    the exposure. Remember the randomized trial is
    the best form of study because we can draw causal
    conclusions.
  • There is significant research (much here at HSPH)
    into ways to control for confounding, which is
    trying to make an observational study like a
    randomized trial to allow causal conclusions.
  • The objective is to make the exposure group and
    the control group similar so that the observed
    differences are caused by the exposure
  • Much more on this in EPI 289 and EPI 207

27
Conclusions
  • When we observe a difference between two groups
    we want to determine if this difference is
    important or due to chance, bias or confounding
  • The main job of the statistician after the study
    is to determine if the difference is simply due
    to chance, which will be the focus of most of the
    remainder of the summer
  • Control of bias and confounding requires the
    statistician and the subject area expert to
    ensure that the biases are limited and
    confounding factors are collected

28
Generalizability
  • Assume that we have found a difference between
    our exposure and control group and we have shown
    that this result is not likely due to chance,
    bias or confounding.
  • What does this mean for the general population?
    Specifically, to which group of people can we
    apply our results?
  • This is often based on how the sample was
    originally collected.
  • Ex. If exposed group were school children living
    near power lines and the control group were
    school children living elsewhere, can we
    generalize the findings to all children? adults?

29
Practice
  • I have given out a couple of papers. We are going
    to go through
  • question of interest
  • study design
  • study population
  • potential biases
  • confounders

30
  • Now, try to write functions to do the following
    things.
  • Take a vector input and find the mean of all of
    the values except the minimum and maximum
  • Take a vector input and output a graph with a
    histogram and boxplot
  • Take a matrix input. Find the mean and median of
    each column. Output the mean, median and column
    number as a list for the column with the highest
    median
Write a Comment
User Comments (0)
About PowerShow.com