Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004 - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004

Description:

Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004 Statistics in Medical Research 1. Design phase ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 136
Provided by: John61
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Basic Statistical Principles for the Clinical Research Scientist Kristin Cobb October 13 and October 20, 2004


1
Basic Statistical Principlesfor the Clinical
Research ScientistKristin CobbOctober 13 and
October 20, 2004
2
Statistics in Medical Research
  • 1. Design phase
  • Statistics starts in the planning stages of a
    clinical trial or laboratory experiment to
  • establish optimal sample size needed
  • ensure sound study design
  • 2. Analysis phase
  • Make inferences about a wider population.

3
Common problems with statistics in medical
research
  • Sample size too small to find an effect (design
    phase problem)
  • Sub-optimal choice of measurement for predictors
    and outcomes (design phase problem)
  • Inadequate control for confounders (design or
    analysis problem)
  • Statistical analyses inadequate (analysis
    problem)
  • Incorrect statistical test used (analysis
    problem)
  • Incorrect interpretation of computer output
    (analysis problem)
  • Therefore, it is essential to collaborate with
    a statistician both during planning and analysis!

4
Additionally, errors arise when
  • The statistical content of the paper is confusing
    or misleading because the authors do not fully
    understand the statistical techniques used by the
    statistician.
  • The statistician performs inadequate or
    inappropriate analyses because she is unclear
    about the questions the research is designed to
    answer.
  • Therefore, clinical research scientists need to
    understand the basic principles of biostatistics

5
Outline (today and next week)
  • 1. Primer on hypothesis testing, p-values,
    confidence intervals, statistical power.
  • 2. Biostatistics in Practice Applying statistics
    to clinical research design

6
Quick review
  • Standard deviation
  • Histograms (frequency distributions)
  • Normal distribution (bell curve)

7
Review Standard deviation
Standard deviation tells you how variable a
characteristic is in a population. For example,
how variable is height in the US? A standard
deviation of height represents the average
distance that a random person is away from the
mean height in the population.
8
Review Histograms
9
Review Histograms
1 inch bins
10
Review Normal Distribution
11
Review Normal Distribution
In fact, here, 101/150 (67) subjects have
heights between 62.7 and 67.7 (1 standard
deviation below and above the mean).
A perfect, theoretical normal distribution
carries 68 of its area within 1 standard
deviation of the mean.
12
Review Normal Distribution
In fact, here, 146/150 (97) subjects have
heights between 60.2 and 70.2 (2 standard
deviations below and above the mean).
A perfect, theoretical normal distribution
carries 95 of its area within 2 standard
deviations of the mean.
13
Review Normal Distribution
In fact, here, 150/150 (100) subjects have
heights between 57.7 and 72.7 (1 standard
deviation below and above the mean).
A perfect, theoretical normal distribution
carries 99.7 of its area within 3 standard
deviations of the mean.
14
Review Applying the normal distribution
  • If womens heights in the US are normally
    distributed with a mean of 65 inches and a
    standard deviation of 2.5 inches, what percentage
    of women do you expect to have heights above 6
    feet (72 inches)?

From standard normal chart or computer ? Z of
2.8 corresponds to a right tail area of .0026
expect 2-3 women per 1000 to have heights of 6
feet or greater.
15
Statistics Primer
  • Statistical Inference
  • Sample statistics
  • Sampling distributions
  • Central limit theorem
  • Hypothesis testing
  • P-values
  • Confidence intervals
  • Statistical power

16
Statistical Inference The process of making
guesses about the truth from a sample.
17
  • EXAMPLE What is the average blood pressure of
    US post-docs?
  • We could go out and measure blood pressure in
    every US post-doc (thousands).
  • Or, we could take a sample and make inferences
    about the truth from our sample.

Using what we observe, 1. We can test an a priori
guess (hypothesis testing). 2. We can estimate
the true value (confidence intervals).
18
Statistical Inference is based on Sampling
Variability
  • Sample Statistic we summarize a sample into one
    number e.g., could be a mean, a difference in
    means or proportions, an odds ratio, or a
    correlation coefficient
  • E.g. average blood pressure of a sample of 50
    American men
  • E.g. the difference in average blood pressure
    between a sample of 50 men and a sample of 50
    women
  • Sampling Variability If we could repeat an
    experiment many, many times on different samples
    with the same number of subjects, the resultant
    sample statistic would not always be the same
    (because of chance!).
  • Standard Error a measure of the sampling
    variability

19
Examples of Sample Statistics
  • Single population mean
  • Difference in means (ttest)
  • Difference in proportions (Z-test)
  • Odds ratio/risk ratio
  • Correlation coefficient
  • Regression coefficient

20
Variability of a sample mean
Random Postdocs
The Truth (not knowable)
21
Variability of a sample mean
Random samples of 5 post-docs
The Truth (not knowable)
22
Variability of a sample mean
Samples of 50 Postdocs
The Truth (not knowable)
129 mmHg
134 mmHg
131 mmHg
130 mmHg
128 mmHg
130 mmHg
23
Variability of a sample mean
Samples of 150 Postdocs
The Truth (not knowable)
131.2 mmHg
130.2 mmHg
129.7 mmHg
130.9 mmHg
130.4 mmHg
129.5 mmHg
24
How sample means vary A computer experiment
  • 1. Pick any probability distribution and specify
    a mean and standard deviation.
  • 2. Tell the computer to randomly generate 1000
    observations from that probability distributions
  • E.g., the computer is more likely to spit out
    values with high probabilities
  • 3. Plot the observed values in a histogram.
  • 4. Next, tell the computer to randomly generate
    1000 averages-of-2 (randomly pick 2 and take
    their average) from that probability
    distribution. Plot observed averages in
    histograms.
  • 5. Repeat for averages-of-5, and averages-of-100.

25
Uniform on 0,1 average of 1(original
distribution)
26
Uniform 1000 averages of 2
27
Uniform 1000 averages of 5
28
Uniform 1000 averages of 100
29
Exp(1) average of 1(original distribution)
30
Exp(1) 1000 averages of 2
31
Exp(1) 1000 averages of 5
32
Exp(1) 1000 averages of 100
33
Bin(40, .05) average of 1(original
distribution)
34
Bin(40, .05) 1000 averages of 2
35
Bin(40, .05) 1000 averages of 5
36
Bin(40, .05) 1000 averages of 100
37
The Central Limit Theorem
  • If all possible random samples, each of size n,
    are taken from any population with a mean ? and a
    standard deviation ?, the sampling distribution
    of the sample means (averages) will

3. be approximately normally distributed
regardless of the shape of the parent population
(normality improves with larger n)
38
Example 1 Weights of doctors
  • Experimental question Are practicing doctors
    setting a good example for their patients in
    their weights?
  • Experiment Take a sample of practicing doctors
    and measure their weights
  • Sample statistic mean weight for the sample
  • ?IF weight is normally distributed in doctors
    with a mean of 150 lbs and standard deviation of
    15, how much would you expect the sample average
    to vary if you could repeat the experiment over
    and over?

39
Relative frequency of 1000 observations of weight
mean 150 lbs standard deviation 15 lbs
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Using Sampling Variability
  • In reality, we only get to take one sample!!
  • But, since we have an idea about how sampling
    variability works, we can make inferences about
    the truth based on one sample.

44
Experimental results
  • Lets say we take one sample of 100 doctors and
    calculate their average weight.

45
Expected Sampling Variability for n100 if the
true weight is 150 (and SD15)
46
Expected Sampling Variability for n100 if the
true weight is 150 (and SD15)
47
P-value associated with this experiment
P-value (the probability of our sample average
being 160 lbs or more IF the true average weight
is 150) lt .0001 Gives us evidence that 150 isnt
a good guess
48
The P-value
  • P-value is the probability that we would have
    seen our data (or something more unexpected) just
    by chance if the null hypothesis (null value) is
    true.
  • Small p-values mean the null value is unlikely
    given our data.

49
The P-value
  • By convention, p-values of lt.05 are often
    accepted as statistically significant in the
    medical literature but this is an arbitrary
    cut-off.
  • A cut-off of plt.05 means that in about 5 of 100
    experiments, a result would appear significant
    just by chance (Type I error).

50
Hypothesis Testing
  • The Steps
  • Define your hypotheses (null, alternative)
  • The null hypothesis is the straw man that we
    are trying to shoot down.
  • Null here mean weight of doctors 150 lbs
  • Alternative here mean weight gt 150 lbs
    (one-sided)
  • Specify your sampling distribution (under the
    null)
  • If we repeated this experiment many, many times,
    the sample average weights would be normally
    distributed around 150 lbs with a standard error
    of 1.5
  • 3. Do a single experiment (observed sample mean
    160 lbs)
  • 4. Calculate the p-value of what you observed
    (plt.0001)
  • 5. Reject or fail to reject the null hypothesis
    (reject)

51
Errors in Hypothesis Testing
52
Errors in Hypothesis Testing
  • Type-I Error (false positive)
  • Concluding that the observed effect is real when
    its just due to chance.
  • Type-II Error (false negative)
  • Missing a real effect.
  • POWER (the complement of type-II error)
  • The probability of seeing a real effect (of
    rejecting the null if the null is false).

53
Beyond Hypothesis TestingEstimation (confidence
intervals)
Wed estimate based on these data that the
average weight is somewhere closer to 160 lbs.
And we could state the precision of this estimate
(a confidence interval)
54
Confidence Intervals
  • (Sample statistic) ? (measure of how confident
    we want to be) ? (standard error)

55
Confidence interval (more information!!)
  • 95 CI for the mean
  • 1601.96(1.5) (157 163)

56
What Confidence Intervals do
  • They indicate the un/certainty about the size
    of a population characteristic or effect. Wider
    CIs indicate less certainty.
  •   Confidence intervals can also answer the
    question of whether or not an association exists
    or a treatment is beneficial or harmful.
    (analogous to p-values)
  • e.g., since the 95 CI of the mean weight does
    not cross 150 lbs (the null value), then we
    reject the null at plt.05.

57
Expected Sampling Variability for n2
58
Expected Sampling Variability for n2
59

Expected Sampling Variability for n10
60
Statistical Power
  • We found the same sample mean (160 lbs) in our
    100-doctor sample, 10-doctor sample, and 2-doctor
    sample.
  • But we only rejected the null based on the
    100-doctor and 10-doctor samples.
  • Larger samples give us more statistical power

61
Can we quantify how much power we have for given
sample sizes?
62
(No Transcript)
63
(No Transcript)
64
Null Distribution mean150 sd4.74
Clinically relevant alternative mean160
sd4.74
65
(No Transcript)
66
Null Distribution mean150 sd1.37
Nearly 100 power!
Clinically relevant alternative mean160
sd1.37
67
Factors Affecting Power
  • 1. Size of the difference (10 pounds higher)
  • 2. Standard deviation of the characteristic
    (sd15)
  • 3. Bigger sample size
  • 4. Significance level desired

68
1. Bigger difference from the null mean
69
2. Bigger standard deviation
70
3. Bigger Sample Size
71
4. Higher significance level
72
Examples of Sample Statistics
  • Single population mean
  • Difference in means (ttest)
  • Difference in proportions (Z-test)
  • Odds ratio/risk ratio
  • Correlation coefficient
  • Regression coefficient

73
Example 2 Difference in means
  • Example Rosental, R. and Jacobson, L. (1966)
    Teachers expectancies Determinates of pupils
    I.Q. gains. Psychological Reports, 19, 115-118.

74
The Experiment (note exact numbers have been
altered)
  • Grade 3 at Oak School were given an IQ test at
    the beginning of the academic year (n90).
  • Classroom teachers were given a list of names of
    students in their classes who had supposedly
    scored in the top 20 percent these students were
    identified as academic bloomers (n18).
  • BUT the children on the teachers lists had
    actually been randomly assigned to the list.
  • At the end of the year, the same I.Q. test was
    re-administered.

75
The results
  • Children who had been randomly assigned to the
    top-20 percent list had mean I.Q. increase of
    12.2 points (sd2.0) vs. children in the control
    group only had an increase of 8.2 points (sd2.0)
  • Is this a statistically significant difference?
    Give a confidence interval for this difference.

76
Difference in means
  • Sample statistic Difference in mean change in IQ
    test score.
  • Null hypothesis no difference between academic
    bloomers and normal students

77
Explore sampling distributionof difference in
means
  • Simulate 1000 differences in mean IQ change under
    the null hypothesis (both academic bloomer and
    controls improve by, lets say, 8 points, with a
    standard deviation of 2.0)

78
academic bloomers
79
normal students
80
Difference academic bloomers-normal students
Notice that most experiments yielded a difference
value between 1.1 and 1.1 (wider than the above
sampling distributions!)
81
Confidence interval (more information!!)
  • 95 CI for the difference 4.01.99(.52) (3.0
    5.0)

Does not cross 0 therefore, significant at .05.
82
95 confidence interval for the observed
difference 4 2.523-5
83
Clearly lots of power to detect a difference of 4!
84
  • How much power to detect a difference of 1.0?

85
Power closer to 50 now.
86
Example 3 Difference in proportions
  • Experimental question Do men tend to prefer Bush
    more than women?
  • Experimental design Poll representative samples
    of men and women in the U.S. and ask them the
    question do you plan to vote for Bush in
    November, yes or no?
  • Sample statistic The difference in the
    proportion of men who are pro-Bush versus women
    who are pro-Bush?
  • Null hypothesis the difference in proportions
    0
  • Observed results women.36 men.46

87
Explore sampling distributionof difference in
proportions
  • Simulate 1000 differences in proportion
    preferring Bush under the null hypothesis (41
    overall prefer Bush, with no difference between
    genders)

88
men
89
women
Under the null hypothesis, most experiments
yielded a mean between .27 and .55
90
Difference men-women
Under the null hypothesis, most experiments
yielded difference values between -.20 (women
preferring Bush more than men) and .20 (men
preferring Bush more)
91
  • What if we had 200 men and 200 women?

92
men
Most of 1000 simulated experiments yielded a mean
between .34 and .48
93
women
Most of 1000 simulated experiments yielded a mean
between .34 and .48
94
Difference men-women
Notice that most experiments will yield a
difference value between -.10 (women preferring
Bush more than men) and .10 (men preferring Bush
more)
95
  • What if we had 800 men and 800 women?

96
men
Most experiments will yield a mean between .38
and.44
97
women
Most experiments will yield a mean between .38
and.44
98
Difference men-women
Notice that most experiments will yield a
difference value between -.05 (women preferring
Bush more than men) and .05 (men preferring Bush
more)
99
If we sampled 1600 per group, a 2.5 difference
would be statistically significant at a
significance level of .05. If we sampled 3200
per group, a 1.25 difference would be
statistically significant at a significance
level of .05. If we sampled 6400 per group, a
.625 difference would be statistically
significant at a significance level of
.05. BUT if we found a significant difference
of 1 between men and women, would we care if we
were Bush or Kerry??
100
Limits of hypothesis testingStatistical vs.
Clinical Significance
Consider a hypothetical trial comparing death
rates in 12,000 patients with multi-organ failure
receiving a new inotrope, with 12,000 patients
receiving usual care. If there was a 1
reduction in mortality in the treatment group
(49 deaths versus 50 in the usual care group)
this would be statistically significant (plt.05),
because of the large sample size. However, such
a small difference in death rates may not be
clinically important.
101
Example 4 The odds ratio
  • Experimental question Does smoking increase
    fracture risk?
  • Experiment Ask 50 patients with fractures and 50
    controls if they ever smoked.
  • Sample statistic Odds Ratio (measure of relative
    risk)
  • Null hypothesis There is no association between
    smoking and fractures (odds ratio1.0).

102
The Odds Ratio (OR)
103
Example 3 Sampling Variability of the null Odds
Ratio (OR) (50 cases/50 controls/20 exposed)
If the Odds Ratio1.0 then with 50 cases and 50
controls, of whom 20 smoke, this is the expected
variability of the sample OR?note the right skew
104
The Sampling Variability of the natural log of
the OR (lnOR) is more Gaussian
105
Statistical Power
  • Statistical power here is the probability of
    concluding that there is an association between
    exposure and disease if an association truly
    exists.
  • The stronger the association, the more likely we
    are to pick it up in our study.
  • The more people we sample, the more likely we are
    to conclude that there is an association if one
    exists (because the sampling variability is
    reduced).

106
Part II Biostatistics in Practice Applying
statistics to clinical research design
107
From concept to protocol
  • Define your primary hypothesis
  • Define your primary predictor and outcome
    variables
  • Decide on study type (cross-sectional,
    case-control, cohort, RCT)
  • Decide how you will measure your predictor and
    outcome variables, balancing statistical power,
    ease of measurement, and potential biases
  • Decide on the main statistical tests that will be
    used in analysis
  • Calculate sample size needs for your chosen
    statistical test/s
  • Describe your sample size needs in your written
    protocol, disclosing your assumptions
  • Write a statistical analysis plan
  • Briefly, describe descriptive statistics that you
    plan to present
  • Describe which statistical tests you will use to
    test your primary hypotheses
  • Describe which statistical tests you will use to
    test your secondary hypotheses
  • Describe how you will account for confounders and
    test for interactions
  • Describe any exploratory analyses that you might
    perform

108
Powering a studyWhat is the primary hypothesis?
  • Before you can calculate sample size, you need to
    know the primary statistical analysis that you
    will use in the end.
  • What is your main outcome of interest?
  • What is your main predictor of interest?
  • Which statistical test will you use to test for
    associations between your outcome and your
    predictor?
  • Do you need to adjust sample size needs upwards
    to account for loss to follow-up, switching arms
    of a randomized trial, accounting for
    confounders?
  • Seek guidance from a statistician

109
Overview of statistical tests
  • The following table gives the appropriate choice
    of a statistical test or measure of association
    for various types of data (outcome variables and
    predictor variables) by study design.

e.g., blood pressure pounds age treatment
(1/0)
110
(No Transcript)
111
Comparing Groups
  • T-test compares two means
  • (null hypothesis difference in means 0)
  • ANOVA compares means between gt2 groups
  • (null hypothesis difference in means 0)
  • Non-parametric tests are used when normality
    assumptions are not met
  • (null hypothesis difference in medians 0)
  • Chi-square test compares proportions between
    groups
  • (null hypothesis categorical variables are
    independent)

112
Simple sample size formulas/calculators available
  • Sample size for a difference in means
  • Sample size for a difference in proportions
  • Can roughly be used if you plan to calculate risk
    ratios, odds ratios, or to run logistic
    regression or chi-square tests
  • Sample size for a hazard ratio/log-rank test
  • If you plan to do survival analysis Kaplan-Meier
    methods (log-rank test), Cox regression

113
(No Transcript)
114
The pay-off for sitting through the theoretical
part of these lectures!
  • Heres where it pays to understand whats behind
    sample size/power calculations!
  • Youll have a much easier time using sample size
    calculators if you arent just putting numbers
    into a black box!

115
(No Transcript)
116
(No Transcript)
117
(No Transcript)
118
(No Transcript)
119
If this look complicated, dont panic!
  • In reality, youre unlikely to have to derive
    sample size formulas yourself
  • ?but its critical to understand where they come
    from if youre going to apply them yourself.

120
Formula for difference in means
121
Formula for difference in proportions
122
Formula for hazard ratio/log-rank test
123
Recommended sample size calculators!
  • http//hedwig.mgh.harvard.edu/sample_size/size.htm
    l
  • http//vancouver.stanford.edu8080/clio/index.html
  • ?Traverse protocol wizard

124
These sample size calculations are idealized
  • We have not accounted for losses-to-follow up
  • We have not accounted for non-compliance (for
    intervention trial or RCT)
  • We have assumed that individuals are independent
    observations (not true in clustered designs)
  • Consult a statistician for these considerations!

125
Applying statistics to clinical research design
Example
  • You want to study the relationship between
    smoking and fractures.

126
Steps
  • ?Define your primary hypothesis
  • ?Define your primary predictor and outcome
    variables
  • ?Decide on study type

127
Applying statistics to clinical research design
Example
  • ? predictor smoking (yes/no or continuous)
  • ?outcome osteoporotic fracture (time-to-event)
  • ?Study design cohort

128
From concept to protocol
  • ?Decide how you will measure your predictor and
    outcome variables
  • ?Decide on the main statistical tests that will
    be used in analysis
  • ?Calculate sample size needs for your chosen
    statistical test/s

129
(No Transcript)
130
Formula for hazard ratio/log-rank test
131
Example sample size calculation
  • Ratio of exposed to unexposed in your sample?
  • 11
  • Proportion of non-smokers who will fracture in
    your defined population over your defined study
    period?
  • 10
  • What is a clinically meaningful hazard ratio?
  • 2.0
  • Based on hazard ratio, how many smokers will
    fracture?
  • 1-902 19
  • What power are you targeting?
  • 80
  • What significance level?
  • .05

132
Formula for hazard ratio/log-rank test
You may want to adjust upwards for loss to
follow-up. E.g., if you expect to lose 10,
divide the above estimate by 90.
133
From concept to protocol
  • Describe your sample size needs in your written
    protocol, disclosing your assumptions
  • Write a statistical analysis plan

134
(No Transcript)
135
Statistical analysis plan
  • Descriptive statistics
  • E.g., of study population by smoking status
  • Kaplan-Meier Curves (univariate)
  • Describe exploratory analyses that may be used to
    identify confounders and other predictors of
    fracture
  • Cox regression (multivariate)
  • What confounders have you measured, and how will
    you incorporate them into multivariate analysis?
  • How will you explore for possible interactions?
  • Describe potential exploratory analysis for other
    predictors of fracture
Write a Comment
User Comments (0)
About PowerShow.com