Biostatistics in Practice - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Biostatistics in Practice

Description:

... is measured in the 60 aliquots and compared between A and B. ... One of the 30 A aliquots is further divided into 25 parts and ... for a single B aliquot. ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 49

Provided by: bios62

Category:

more less

Transcript and Presenter's Notes

Title: Biostatistics in Practice

1
Biostatistics in Practice
Session 2 Summarization of Quantitative
Information
Peter D. Christenson Biostatistician http//gcrc
.LABioMed.org/Biostat
2
Topics for this Session
Experimental Units Independence of Measurements
Graphs Summarizing Results Graphs Aids for
Analysis Summary Measures Confidence
Intervals Prediction Intervals
3
Most Practical from this Session
Geometric Means Confidence Intervals Reference
Ranges Justify Methods from Graphs
4
Experimental Units_____Independence of
Measurements
5
Statistical Independence
Experimental units are the smallest independent
entities for addressing a scientific question in
an analysis of an experiment. Independent
refers to the measurement that is made and the
question, not the units. Definition If
knowledge of the value for a unit does not
provide information about another units value,
given other factors (and the overall mean) in the
analysis of the experiment, then the units are
independent for this measurement. There may be a
hierarchy of units.
6
Importance of Independence
Many basic statistical methods require that
measurements are independent for the analysis to
be valid. Other methods can incorporate the lack
of independence. There can be some subjectivity
regarding independence. Statistical methods use
models. Models can be wrong.
7
Example Units and Independence
Ten mice receive treatment A, each is bled, and
blood samples are each divided into 3 aliquots.
The same is done for 10 mice on treatment B.

A serum hormone is measured in the 60 aliquots
and compared between A and B.
The aliquots for a mouse are not independent.
The unit is a mouse.
A summary statistic from a mouses 3 aliquots
(e.g., maximum or mean) are independent.
N10 and 10, not 30 and 30.

8
Example, Continued

One of the 30 A aliquots is further divided into
25 parts and 5 different in vitro challenges are
each made to a random set of 5 of the parts. The
same is done for a single B aliquot.
For this challenge experiment, each part is a
unit, the values of challenge response are
independent, and N2525.
For comparing A and B, there are only N11
experimental units, the two mice.

9
Experimental Units in Case Study
10
Experimental Units in Case Study
There is a nested hierarchy of several "levels"
of data Schools, children within the schools,
and diets received by every child. What would you
use for the "N" for this study? Which outcomes
do you intuitively think are correlated (in
common language)? Results from one child's three
diets? Results from children in the same school?
Schools?
11
Experimental Units in Case Study
N Number of children Results from one child's
three diets cannot be modeled as
independent. Results from children in the same
school also could be correlated (dependent).
They can be modeled as independent, if the effect
of school is included in the analysis. Knowing
one childs score and the school mean gives no
info on another childs score.
12
Units and Analysis in the Case Study
N Number of children Analysis
This method is a complex generalization of
methods we discuss in Session 3. For any method,
though, you need to inform the software of the
correct experimental units. For some experiments,
it is obvious and implicit.
13
GraphsSummarizing Results
14
Common Graphical Summaries
Graph Name Y-axis X-axis
Histogram Count or Category Scatterplot
Continuous Continuous Dot Plot Continuous
Category Box Plot Percentiles
Category Line Plot Mean or value
Category Kaplan-Meier Probability Time
Many of the examples are from StatisticalPractice.
com
15
Data Graphical Displays
Histogram
Scatter plot
Raw Data
Summarized
Raw data version is a stem-leaf plot. We will
see one later.
16
Data Graphical Displays
Dot Plot
Box Plot
Raw Data
Summarized
17
Data Graphical Displays
Line or Profile Plot
Summarized - bars can represent various types of
ranges
18
Data Graphical Displays
Kaplan-Meier Plot
Probability of Surviving 5 years is 0.35
This is not necessarily 35 of subjects
19
GraphsAids for Analysis
20
Graphical Aids for Analysis
Most statistical analyses involve
modeling. Parametric methods (t-test, ANOVA, ?2)
have stronger requirements than non-parametric
methods (rank -based). Every method is based on
data satisfying certain requirements. Many of
these requirements can be assessed with some
useful common graphics.
21
Look at the Data for Analysis Requirements

What do we look for?
In Histograms (one variable)
Ideal Symmetric, bell-shaped.
Potential Problems
Skewness.
Multiple peaks.
Many values at, say, 0, and bell-shaped
otherwise.
Outliers.

22
Example Histogram OK for Typical Analyses

Symmetric.
One peak.
Roughly bell-shaped.
No outliers.

Typical mean, SD, confidence intervals, to be
discussed in later slides.
23
Histograms Not OK for Typical Analyses
Skewed
Multi-Peak
Need to transform intensity to another scale,
e.g. Log(intensity)
Need to summarize with percentiles, not mean.
24
Histograms Not OK for Typical Analyses
Truncated Values
Outliers
Undetectable in 28 samples (ltLLOQ)
LLOQ
Need to use percentiles for most analyses.
Need to use median, not mean, and percentiles.
25
Look at the Data for Analysis Requirements

What do we look for?
In Scatter Plots (two variables)
Ideal Football-shaped ellipse.
Potential Problems
Outliers.
Funnel-shaped.
Gap with no values for one or both variables.

26
Example Scatter Plot OK for Typical Analyses
27
Scatter Plot Not OK for Typical Analyses
Gap and Outlier
Funnel-Shaped
Ott, Amer J Obstet Gyn 20051921803-9.
Ferber et al, Amer J Obstet Gyn 20041901473-5.
Should transform y-value to another scale, e.g.
logarithm.
Consider analyzing subgroups.
28
Summary Measures
29
Common Summary Measures
Mean and SD or SEMGeometric MeanZ-ScoresCorr
elationSurvival ProbabilityRisks, Odds, and
Hazards
30
Summary Statistics One Variable

Data Reduction to a few summary measures.
Basic Need Typical Value and Variability of
Values
Typical Values (Location)
Mean for symmetric data.
Median for skewed data.
Geometric mean for some skewed data - details
in later slides.

31
Summary StatisticsVariation in Values

Standard Deviation, SD 1.25 (Average
deviation of values from their mean).
Standard, convention, non-intuitive values.
SD of what? E.g., SD of individuals, or of
group means.
Fundamental, critical measure for most
statistical methods.

32
Examples Mean and SD
A
B
Mean 60.6 min.
SD 9.6 min.
Mean 15.1
SD 2.8
Note that the entire range of data in A is about
6SDs wide, and is the source of the Six Sigma
process used in quality control and business.
33
Examples Mean and SD
Skewed
Multi-Peak
SD 1.1 min.
Mean 70.3
Mean 1.0 min.
SD 22.3
34
Summary StatisticsRule of Thumb

For bell-shaped distributions of data
(normally distributed)
68 of values are within mean 1 SD
95 of values are within mean 2 SD
(Normal) Reference Range
99.7 of values are within mean 3 SD

35
Summary Statistics Geometric means

Commonly used for skewed data.
Take logs of individual values.
Find, say, mean 2 SD ? mean and (low, up) of the
logged values.
Find antilogs of mean, low, up. Call them GM,
low2, up2 (back on original scale).
GM is the geometric mean. The interval
(low2,up2) is skewed about GM (corresponds to
graph).
See next slide

36
Geometric Means
These are flipped histograms rotated 90º, with
box plots. Any log base can be used.
GM exp(4.633) 102.8 low2
exp(4.633-21.09) 11.6 upp2
exp(4.63321.09) 909.6
909.6
102.8
11.6
37
Confidence Intervals
Reference ranges - or Prediction Intervals -are
for individuals. Contains values for 95 of
individuals. ____________________________________
_ Confidence intervals (CI) are for a summary
measure (parameter) for an entire
population. Contains the (still unknown) summary
measure for everyone with 95 certainty.
38
Z- Score (Measure - Mean)/SD
Standardizes a measure to have mean0 and SD1.
Z-scores make different measures comparable.
Mean 60.6 min.SD 9.6 min.
41 61 79
Mean 0SD 1
Mean 60.6 min.
SD 9.6 min.
-2 0 2
Z-Score (Time-60.6)/9.6
39
Outcome Measure in Case Study
GHA Global Hyperactivity Aggregate For each
child at each time Z1 Z-Score for ADHD from
Teachers Z2 Z-Score for WWP from Parents Z3
Z-Score for ADHD in Classroom Z4 Z-Score for
Conner on Computer All have higher values ? more
hyperactive. Zs make each measure scaled
similarly. GHA Mean of Z1, Z2, Z3, Z4
40
Confidence Interval for Population Mean
95 Reference range - or Prediction Interval - or
Normal Range, if subjects normal, is sample
mean 2(SD) ____________________________________
_ 95 Confidence interval (CI) for the (true,
but unknown) mean for the entire population
is sample mean 2(SD/vN) SD/vN is called Std
Error of the Mean (SEM)
41
Confidence Interval More Details
Confidence interval (CI) for the (true, but
unknown) mean for the entire population is 95,
N100 sample mean 1.98(SD/vN) 95, N
30 sample mean 2.05(SD/vN) 90, N100 sample
mean 1.66(SD/vN) 99, N100 sample mean
2.63(SD/vN) If N is small (Nlt30?), need
normally, bell-shaped, data distribution.
Otherwise, skewness is OK. This is not true for
the PI, where percentiles are needed.
42
Confidence Interval Case Study
Table 2
Adjusted CI
0.13 -0.12 -0.37
Confidence Interval -0.14 1.99(1.04/v73)
-0.14 0.24 ? -0.38 to 0.10
close to
Prediction Interval -0.14 1.99(1.04) -0.14
2.07 ? -2.21 to 1.93
43
CI for the Antibody Example
So, there is 95 assurance that an individual is
between 11.6 and 909.6, the PI.
GM exp(4.633) 102.8 low2
exp(4.633-21.09) 11.6 upp2
exp(4.63321.09) 909.6
So, there is 95 certainty that the population
mean is between 92.1 and 114.8, the CI.
GM exp(4.633) 102.8 low2
exp(4.633-21.09 /v394) 92.1 upp2
exp(4.63321.09 /v394) 114.8
44
Summary StatisticsTwo Variables (Correlation)

Always look at scatterplot.
Correlation, r, ranges from -1 (perfect inverse
relation) to 1 (perfect direct). Zerono
relation.
Specific to the ranges of the two variables.
Typically, cannot extrapolate to populations with
other ranges.
Measures association, not causation.
We will examine details in Session 5.

45
Correlation Depends on Range of Data
B
A
Graph B contains only the points from graph A
that are in the ellipse. Correlation is reduced
in graph B. Thus correlation between two
quantities may be quite different in different
study populations.
46
Correlation and Measurement Precision
A
B
overall
12 10
r0 for s
5 6
B
A lack of correlation for the subpopulation with
5ltxlt6 may be due to inability to measure x and y
well. Lack of evidence of association is not
evidence of lack of association.
47
Summary Statistics Survival Probability
Example 100 subjects start a study. Nine
subjects drop out at 2 years and 7 drop out at 4
yrs and 20, 20, and 17 died in the intervals 0-2,
2-4, 4-5 yrs.
Then, the 0-2 yr interval has 80/100
surviving. The 2-4 interval has 51/71 surviving
4-5 has 27/44 surviving. So, 5-yr survival prob
is (80/100)(51/71)(27/44) 0.35.
Actually uses finer subdivisions than 0-2, 2-4,
4-5 years, with exact death times.
Dont know vital status of 16 subjects at 5 years.
48
Summary StatisticsRelative Likelihood of an
Event
Compare groups A and B on mortality. Relative
Risk ProbADeath / ProbBDeath where
ProbDeath Deaths per 100 Persons Odds Ratio
OddsADeath / OddsBDeath where Odds
ProbDeath / ProbSurvival Hazard Ratio
IADeath / IBDeath where I Incidence
Deaths per 100 PersonDays

Write a Comment

User Comments (0)