Biostatistics in Practice - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Biostatistics in Practice

Description:

... is measured in the 60 aliquots and compared between A and B. ... One of the 30 A aliquots is further divided into 25 parts ... for a single B aliquot. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 27
Provided by: bios62
Category:

less

Transcript and Presenter's Notes

Title: Biostatistics in Practice


1
Biostatistics in Practice
Session 2 Summarization of Quantitative
Information
Peter D. Christenson Biostatistician http//gcrc
.LABioMed.org/Biostat
2
Readings for Session 2from StatisticalPractice.co
m
  • Units of Analysis
  • Experimental units
  • Look at the data
  • Summary statistics
  • Typical values and their variability
  • Correlation
  • Normal distribution
  • Confidence intervals

Units E.g., Mouse or litter, Not e.g.,
mg/ml. RAW data, preferably. Summary Particular
method depends on structure in the raw
data. Bell curve often natural. Want ranges
(for what?).
3
Units of Analysis
Go over this entire reading at StatisticalPractice
.com. The author states that some students are
more similar to each other than are other
students, or some students are independent.
What does this mean? Independent really refers
to the measurement that is made, not the units
such as students or classes or schools. If
knowledge of the value for a student does not
change the likelihood of another students value,
given class means, then the students are
independent for this measurement. Would students
from the same class likely be independent on
height? How about on knowing some academic fact,
such as what a case-control study is?
4
Example Units and Independence
Ten mice receive treatment A, each is bled, and
blood samples are each divided into 3 aliquots.
The same is done for 10 mice on treatment B.
  • A serum hormone is measured in the 60 aliquots
    and compared between A and B. The unit is a
    mouse, their means from 3 aliquots each are
    independent, N1010, and aliquots for a mouse
    are not independent.
  • One of the 30 A aliquots is further divided into
    25 parts and 5 different in vitro challenges each
    made to 5 of the parts. The same is done for a
    single B aliquot. For the challenge experiment,
    each part is a unit, their values are
    independent, and N2525. For comparing A and B,
    there are only N11 units, the two mice.

5
Look at the Data
Statistical methods depend on the form of a set
of data, which can be assessed with some common
useful graphics Graph Name Y-axis X-axis
Histogram Count or
Category Scatterplot Continuous
Continuous Dot Plot Continuous
Category Box Plot Percentiles
Category Line Plot Mean or value Category
Examples on following slides are from
StatisticalPractice.com
6
Data Graphical Displays
Histogram
Scatter plot
Raw Data
Summarized
Raw data version is a stem-leaf plot. We will
see one later.
7
Data Graphical Displays
Dot Plot
Box Plot
Raw Data
Summarized
8
Data Graphical Displays
Line or Profile Plot
Week
Summarized - antennae can represent various
ranges
9
Look at the Data, Continued
  • What do we look for?
  • Histograms Ideal Symmetric, bell-shaped.
  • Potential Problems
  • Skewness.
  • Multiple peaks.
  • Many values at, say, 0, and bell-shaped
    otherwise
  • Outliers.
  • Scatter plots Ideal Football-shaped ellipse.
  • Potential Problems
  • Outliers.
  • Funnel-shaped.
  • Gap with no values for one or both variables.

10
Example Histogram OK for Default Analyses
  • Symmetric.
  • One peak.
  • Roughly bell-shaped.
  • No outliers.

software default, typical mean, SD, confidence
intervals.
11
Histograms Not OK for Default Analyses
Skewed
Multi-Peak
Need to transform intensity to another scale,
e.g. Log(intensity)
Need to summarize with percentiles, not mean.
12
Histograms Not OK for Default Analyses
Truncated Values
Outliers
Undetectable in 28 samples (ltLLOQ)
LLOQ
Need to use percentiles for most analyses.
Need to use median, not mean, and percentiles.
13
Example Scatter Plot OK for Typical Analyses
14
Scatter Plot Not OK for Typical Analyses
Gap and Outlier
Funnel-Shaped
Ott, Amer J Obstet Gyn 20051921803-9.
Ferber et al, Amer J Obstet Gyn 20041901473-5.
Consider analyzing subgroups.
Could transform y-value to another scale, e.g.
logarithm.
15
Summary Statistics I
  • Typical Values (Location)
  • Mean for symmetric data.
  • Median for skewed data.
  • Geometric mean for some skewed data (see later
    slide).
  • Variation in Values (Spread. Standard
    deviationSD)
  • Standard, convention, non-intuitive values.
  • SD Avg. deviation of values from their mean.
  • SD of what? E.g., SD of individuals, or of group
    means.
  • Fundamental, critical measure for most
    statistical methods.
  • See graphs in reading for how mean and SD change
    if units of measurement change, e.g., nmoles to
    mg
  • Mean (a bX) a bMean(X)
  • SD (a bX) bSD(X)

16
Examples Mean and SD
A
B
Mean 60.6 min.
SD 9.6 min.
Mean 15.1 min.
SD 2.8 min.
Note that the entire range of data in A is about
6SDs wide, and is the source of the Six Sigma
process used in business quality control.
17
Examples Mean and SD
Skewed
Multi-Peak
SD 1.1 min.
Mean 70.3 min.
Mean 1.0 min.
SD 22.3 min.
18
Summary Statistics II
  • Rule of Thumb
  • For bell-shaped distributions of data
    (normally distributed)
  • 68 of values are within mean 1 SD
  • 95 of values are within mean 2 SD
  • 99.7 of values are within mean 3 SD
  • Geometric means (see next slide)
  • Used for some skewed data.
  • Take logs of individual values.
  • Find, say, mean 2 SD ? mean (low, up) of the
    logged values.
  • Find antilogs of mean, low, up. Call them GM,
    low2, up2 (back on original scale).
  • GM is the geometric mean. The interval
    (low2,up2) is skewed about GM (corresponds to
    graph).

19
Geometric Means
These are flipped histograms rotated 90º, and box
plots. Any base for the log transformation gives
a symmetric distribution. Ln used here log10
gives same GM and bounds.
909.6
102.8
11.6
GM exp(4.633) 102.8 low2 exp(4.633-21.09)
11.6 upp2 exp(4.63321.09) 909.6
20
Summary Statistics III (Correlation)
  • We will examine calculation details later.
  • With 2 continuous measures, always look at
    scatterplot.
  • See graphs in readings for values ranging from -1
    (perfectly inverse relation) to 1 (perfectly
    direct). Zerono relation.
  • Measures linear association.
  • Very sensitive to outliers.
  • Specific to the ranges of the two variables.
  • Typically, cannot extrapolate to populations with
    other ranges.
  • Subgroups may not have the same correlation in
    fact, they could have the opposite association
    (ecological fallacy).
  • Special correlations are used for non-symmetric
    data.
  • Measures association, not causation.

21
Correlation Depends on Ranges of X and Y
B
A
Graph B contains only the graph A points in the
ellipse. Correlation is reduced in graph B. Thus
correlations for the same quantities X and Y may
be quite different in different study
populations.
22
Correlation and Measurement Precision
overall
12 10
r0 for s
5 6
B
A
B
A lack of correlation for the subpopulation with
5ltxlt6 may be due to inability to measure x and y
well. Again, lack of evidence is not evidence of
lack (of association in this setting).
23
Confidence Intervals I
  • See beginning of reading for the goal of
    confidence intervals.
  • CIs are not about individuals, but rather about
    populations, i.e., groups of individuals.
  • A mean from a sample estimates the mean of the
    entire population.
  • 95 CI for the mean is a range of values we're
    95 sure contains the unknown mean.
  • Reading example N40 non-smokers. Vitamin C
    mean2SD is 90235 20 to 160 normal range.
    Our estimate of the unknown mean for all
    non-smokers is 90, but how confident are we about
    that estimate? Need a range for it that we are
    95 confident contains the unknown mean.

24
Confidence Intervals II
  • Can calculate a CI for any unknown parameter.
  • Typical 95 CI for a mean is roughly mean
    2SD/vN.
  • Larger SD ? wider CI.
  • Larger N ? narrower CI.
  • More confidence ? wider CI.
  • For reading example, CI 90 235/v40 78 to
    102.
  • I am being sloppy with terminology. The
    underlined mean above is the always-to-be-unknown
    mean for the population (everyone). The other
    mean, before , is the mean that is calculated
    from the sample of N, denoted X-bar, and it
    estimates the unknown mean, denoted µ.
  • Note explicit use of N correct unit of analysis
    is critical. What if we measured vitamin C on 10
    days for each subject?

25
Confidence vs. Prediction Intervals
  • Typical 95 CI for a mean is roughly mean
    2SD/vN.
  • Recall that this CI is the range of values we're
    95 sure contains the unknown mean for
    everyone.
  • What about (normal) ranges for individuals?
  • This is often called a prediction interval (PI)
    normal range reference range.
  • 95 of individuals fall in a 95 PI.
  • 95 chance that an individual falls in a 95 PI.
  • Typical 95 PI for an individual is roughly mean
    2SD.
  • With large N (how large? often Ngt30 is used), do
    not need bell-shaped data distribution for the
    CI, but that shape IS needed for the PI,
    regardless of N. Otherwise, we use percentiles
    for normal ranges.

26
CI and PI for the Antibody Example
GM exp(4.633) 102.8 low2 exp(4.633-21.09)
11.6 upp2 exp(4.63321.09) 909.6
So, there is 95 assurance that an individual is
between 11.6 and 909.6, the PI.
GM exp(4.633) 102.8 lower
exp(4.633-21.09/v394) 92.1 upper
exp(4.63321.09/v394) 114.8
909.6
102.8
11.6
So, there is 95 assurance that the pop- ulation
mean is between 92.1 and 114.8, the CI.
Write a Comment
User Comments (0)
About PowerShow.com