Title: Community Health Assessment in Small Populations: Tools for Working With Small Numbers
1Community Health Assessment in Small Populations
Tools for Working With Small Numbers
- Region 2 Quarterly Meeting
- January 26, 2009
2Outline
- Description of the Problem
- Random variation
- Survey samples versus complete count datasets
- Observed events versus underlying risk
- Statistical Tools
- Confidence intervals
- Combining data
- SMR
3Small Numbers The Problem
4Random Variation
- Exercise
- Select a sample
- Calculate the median age
- State of New Mexico Median age1
- 36.0
- Why are they different?
1. 2007 American Community Survey, U.S. Census
Bureau. Downloaded on 1/21/09 from
http//factfinder.census.gov
5Random Variation and Sample Size
- What if we had a sample of New Mexico residents
that was - Randomly selected
- n5,000
- Would it better match the state Census Bureau
estimate?
6Size Matters
- The larger sample helps to cancel out the
effects of random variation. - Some sample subjects are older than the median.
- Some sample subjects are younger than the median.
- As you increase the number of sample subjects,
the differences cancel out, and you get closer to
the median.
7Reliability and Validity
- The term "accuracy" is often used in relation to
validity, while the term, "precision" is used to
describe reliability.
8Numerator vs. Denominator
- A large sample size means we have a large
denominator, but the numerator also matters. - Some methods use a Poisson distribution, which
considers ONLY the numerator size when assessing
precision. - If we have only 1 event in one year, and 2 the
next year, the addition of a single event doubles
the rate of occurrence.
9Random Variation and Complete Count Datasets
- What are some complete count datasets?
- How do we use them for community health
assessment?
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Summary of the Problem
- Measurements are subject to sampling variability,
also known as random error. - Even complete count datasets are subject to
random error because we use them as a reflection
of the underlying disease risk or rate.
18Summary of the Problem
- A larger sample (denominator, population size)
helps to cancel out the effects of random
variation. - Size matters, in both the numerator and the
denominator. - A measure that is relatively free from the
effects of random variation is called precise,
reliable, and stable. Those terms are
synonymous.
19Small Numbers Statistical Tools
20Tool 1. Confidence Intervals
- Use confidence intervals to help you decide
whether the rate is stable. - Wont solve the problem, but will provide
information to help you interpret the rates. - The stability of an observed rate is important
when comparing areas or assessing whether disease
risk has increased or decreased.
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25Calculation of 95 C.I.
- The 95 confidence interval is calculated as 1.96
x Standard Error of the estimate (s.e.). - s.e. is calculated as
- So the 95 C.I. is 1.96
26The Normal Distribution
27Poisson Distribution
28Calculation of 95 C.I.
- p stands for probability. It is the rate
without the multiplier (e.g., 100,000 for
deaths). q is the complement of the probability
(1 minus P). - In Union County, there were 2 diabetes deaths
among the 4,470 population, for a probability of
0.00045 (45 in 100,000)
29Calculation of 95 C.I.
- Formula 1.96
- p0.00045, q0.99955, n4,470
- (pq)/n .000447 / 4470 0.000000100051
- v(pq)/n 0.000316
- 1.96v(pq)/n 1.96 x 0.000316 0.00062
- Then we need to add the multiplier back in, so
the confidence interval is - 100,0000.00062 62
30Calculation of 95 C.I.
- The diabetes death rate was 44.7 per 100,000.
- The confidence interval statistic is applied both
above and below the rate. - C.I. LL (lower limit) is 44.7- 62 -17.3, and
since we cannot have a negative rate, well call
it 0 - C.I. UL (upper limit) is 44.7 62 106.7
- The diabetes death rate for Union County in 2006
was 44.7 per 100,000 (95 C.I., 0 to 106.7)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Confidence Interval Factoids
- The confidence interval may be thought of as the
range of probable true values for a statistic. - The confidence interval is an indication of the
precision (stability, reliability) of the
estimate. - A confidence interval is typically expressed as a
symmetric value (e.g., "plus or minus 5"). But
for percentages, when the point estimate is close
to 0 or 100, a confidence interval with an
asymmetric shape can be used.
35More Confidence Interval Factoids
- The 95 confidence interval (calculated as 1.96
times the standard error of a statistic)
indicates the range of values within which the
statistic would fall 95 of the time if the
researcher were to calculate the statistic from
an infinite number of samples of the same size
drawn from the same base population. Unless
otherwise stated, a confidence interval will be
the "95 confidence interval."
36More Confidence Interval Factoids
- The 90 confidence interval, also commonly used,
is calculated as 1.65 times the standard error of
the estimate. - To calculate a confidence interval when the
number of health events 0, you may use 0 as the
lower confidence limit, and for the upper
confidence limit, assume a count of 3 health
events in the same population.
37Tool 2. Combine Data
- Combine years
- Combine geographic areas (e.g., use the regional
estimate rather than the county estimate) - Use a broader age group
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Interpretation of Diabetes Deaths in Union County
- Union Countys diabetes death rate (1999-2006)
was higher than the state, overall rate, but was
not statistically significantly higher. - In other words, the Union County rate was
marginally higher than the New Mexico state
rate. - Was it higher than Santa Fe County?
42Differences Between Two Rates
- Statistical significance of a change in a rate
from time1 to time2 - Statistical significance of the difference
between two rates in one time period (e.g., Union
County versus Santa Fe County). - Test of Proportions
43Test of Proportions
- Proportion1 1999-2006 Union County diabetes
death rate 41.3/100,000 .000413 - Proportion 2 1999-2006 Santa Fe County diabetes
death rate 20.4/100,000 .000204 - Difference between the two proportions
- .000413 - .000204 .000209
44Test of Proportions (contd)
- The difference between the two rates (0.00026)
must be considered in the context of the standard
error of the difference between two rates (pooled
standard error), computed as - If the difference between the two rates,
0.000209, is greater than 1.96 x s.e.diff, then
the difference is considered statistically
significant.
Bruning, J.L., and Kintz, B.L. (1977)
Computational Handbook of Statistics. Scott,
Foresman and Company London.
45Calculation of s.e.diff
- pproportion, q(1-p), n is the person-years at
risk, or the sum of the population counts across
all eight years.
- Union County
- p10.000413
- q10.999587
- n133,929
- Santa Fe County
- p20.000204
- q20.999796
- n21,092,565
46Calculation of s.e.diff
47Evaluation of the Difference
- Union County 41.3/100,000 .000413
- Santa Fe County 20.4/100,000 .000204
- Difference .000413 - .000204 .000209
- s.e.diff .0001111
- 1.96 s.e.diff .000218
- Is .000209 greater than .000218?
- No. Union Countys rate is greater than Santa Fe
Countys rate, but the difference is NOT
statistically significant.
48Tool 3. SMR and ISR
- Standardized Mortality (or Morbidity) Ratio (SMR)
- Estimates the number of deaths (or health events)
one would EXPECT, based on - The age- and sex-specific rates in a standard
population (e.g., New Mexico rate) - The age and sex distribution of the index area.
- Indirectly Standardized Rates
- Use SMR to perform age adjustment when the number
of cases is less than 20.
49Standardized Mortality Ratio
- The all-cause death rate in New Mexico in 2006
was 757.5 deaths per 100,000 population. - All other things being equal, we should expect
the same death rate in Union County.
50Standardized Mortality Ratio
- BUT all other things are NOT equal.
- 2006, of population over age 65 was
- 18.9 in Union County, compared with
- 12.3 statewide.
- In an older population, we would expect a higher
death rate.
51Standardized Mortality Ratio
- And Union Countys death rate is higher 1364.6
deaths per 100,000. - IF we adjust the New Mexico death rate to account
for Union Countys older population, THAT is how
many deaths we should EXPECT.
52Standardized Mortality Ratio for 2006 Union
County, All-cause Mortality
(Rate x Pop) / 100,000
SMR (Observed/Expected)
53SMR, Union County
- An SMR lt1.0 indicates less-than-expected
mortality. - An SMR gt1.0 indicates greater-than-expected
mortality (also known as excess mortality). - Union Countys SMR was 1.28, so the county had
excess mortality in 2006. - Was it significantly more than expected?
54Indirect Age-Standardization
- You should not use direct age adjustment when
there are fewer than 20 (some say 25) health
events. If you multiply the New Mexico crude rate
by the Union County SMR, you get the indirectly
age-adjusted rate for Union County. - Union Co. crude all-cause death rate 1364.6
- NM crude all-cause death rate 757.5
- Union County SMR 1.28
- Union County indirectly age-standardized rate
969.6 (still higher than the state rate, but the
effects of Union Countys age distribution have
been removed).
55Confidence Interval for SMR
- Observed deaths 61 ( deaths from Vital Records
data) - Expected deaths 47.7 ( expected from SMR
calculation) - SMR 1.28 (observed / expected)
- StdErr for SMR 0.16 (SQRT(observed)) / expected
- 95 Confidence Interval 0.32 (1.96 x StdErr)
- Significance Test Does the 95 confidence
interval include 1.0? - If "yes" -gt not significant
- If "no" -gt statistically significant
56Summary Statistical Tools
- Use confidence intervals assess the stability of
a rate. - Use C.I. to see if your local rate is
significantly different from the state rate. - A statistic called a Test of Proportions uses
the pooled standard error to test whether two
local rates are significantly different.
57Summary Statistical Tools
- Combine data to improve the stability of your
rate. - Combine persons (e.g., broader age group)
- Combine place (larger area)
- Combine time (more years)
58Summary Statistical Tools
- Use the Standardized Mortality (Morbidity) Ratio
(SMR) to compare a local rate to a standard
population (e.g., state overall). - The SMR expected can be used for indirect
age-adjustment when the number of health events
is fewer than 20, or if the age-specific death
rates are not known.
59Thanks!
- Lois M. Haggard, PhD
- Community Health Assessment Program, NMDOH
- lois.haggard_at_state.nm.us
- http//ibis.health.state.nm.us