Title: Study designs: Cross-sectional studies, ecologic studies (and confidence intervals)
1Study designs Cross-sectional studies, ecologic
studies (and confidence intervals)
Principles of Epidemiology for Public Health
(EPID600)
Victor J. Schoenbach, PhD home page Department of
EpidemiologyGillings School of Global Public
HealthUniversity of North Carolina at Chapel
Hill www.unc.edu/epid600/
2Signs from around the world
- In a Copenhagen airline ticket office
- We take your bags and send them in all
directions.
3Signs from around the world
- In a Norwegian cocktail lounge
- Ladies are requested not to have children in the
bar.
4Signs from around the world
- Rome laundry Ladies, leave your clothes here
and spend the afternoon having a good time.
5Faster keyboarding - 1
- I cdnuolt blveiee taht I cluod aulaclty
uesdnatnrd waht I was rdanieg. The phaonmneal
pweor of the hmuan mnid, aoccdrnig to a
rscheearch at Cmabrigde Uinervtisy. It dn'seot
mttaer in waht oredr the ltteers in a wrod are,
the olny iprmoatnt tihng is taht the frist and
lsat ltteer be in the rghit pclae. The rset can
be a taotl mses and you can sitll raed it wouthit
a porbelm. - Gary C. Ramseyer's First Internet Gallery of
Statistics Jokes http//davidmlane.com/hyperstat/h
umorf.html (162)
6Faster keyboarding - 2
- Most of my friends could read this with
understanding and rather quickly I might add.
Then I had them read a statistical bit of
literature - Miittluvraae asilyans sattes an idtenossiy
ctuoonr epilsle is the itternoiecsno of a panle
pleralal to the xl-yapne and the sruacfe of a
btiiarave nmarol dbttiisruein. - Gary C. Ramseyer's First Internet Gallery of
Statistics Jokes http//davidmlane.com/hyperstat/h
umorf.html (162)
7Study designs Cross-sectional studies, ecologic
studies (and confidence intervals)
Principles of Epidemiology for Public Health
(EPID600)
Victor J. Schoenbach, PhD home page Department of
EpidemiologyGillings School of Global Public
HealthUniversity of North Carolina at Chapel
Hill www.unc.edu/epid600/
8Today outline
- Cross-sectional studies (and sampling)
- Ecologic studies
- Confidence intervals
9Cross-sectional studies
- Cross-sectional studies include surveys
- People are studied at a point in time, without
follow-up. - Can combine a cross-sectional study with
follow-up to create a cohort study. - Can conduct repeated cross-sectional studies to
measure change in a population.
10Cross-sectional studies
- Number of uninsured Americans rises to 50.7
million. (USA Today, 9/17/2010 data from Census
Bureau) - In 2007-2008, almost one in five children older
than 5 years was obese. (Health, United States,
2010 data from the National Health and Nutrition
Examination Survey) - 35 (7.4 million) of births to U.S. women during
the preceding 5 years were mistimed or unwanted
(2002 National Survey of Family Growth, Series
23, No. 25, Table 21) - Source www.cdc.gov/nchs/
11Cross-sectional studies
- Incidence information is not available from a
typical cross-sectional study - Sometimes can reconstruct incidence from
historical information - Example the incidence proportion of quitting
smoking, called the quit ratio
ex-smokers / ever-smokersis calculated from
survey data.
12Measure prevalence at point in time
- Snapshot of a population, a still life
- Can measure attitudes, beliefs, behaviors,
personal or family history, genetic factors,
existing or past health conditions, or anything
else that does not require follow-up to assess. - The source of most of what we know about the
population
13Population census
- A cross-sectional study of an entire population
- Provides the denominator data for many purposes
(e.g., estimation of rates, assessing
generalizability, projecting from smaller
studies) - A huge effort people can be difficult to find
and to count may not want to provide data - Some countries maintain accurate and current
registries of the entire country
14National surveys conducted by NCHS
- National Health Interview Survey (NHIS)
household interviews - National Health and Nutrition Examination Survey
(NHANES) interviews and physical examinations - National Survey of Family Growth (NSFG)
household interviews - National Health Care Survey (NHCS) medical
records
15National surveys
- Designed to be representative of the entire
country - Modes household interview, telephone, mail
- Employ complex sampling designs to optimize
efficiency (tradeoff between information and
cost) - Logistically challenging (answering machines,
cellphones, . . .) - See presentation by Dr. Anjani Chandra at
www.minority.unc.edu/institute/2003/materials/slid
es/Chandra-20030522.ppt
16Example National Health Interview Survey
- Conducted every year in U.S. by National Center
for Health Statistics (CDC) - Stratified, multistaged, household survey that
covers the civilian noninstitutionalized
population of the United States - Redesigned every decade to use new census
17multistaged
- Improves logistical feasibility and reduces costs
(though reduces precision) - 1. Divide population into primary sampling units
(PSUs)PSU primary sampling unit
metropolitan statistical area, county, group of
adjacent counties
18multistaged
- 2. Select sample of census block groups (SSUs)
within each selected PSU - 3. Map each selected census block group or
examine building permits - 4. Select one cluster of 4-8 housing units
dispersed evenly throughout the block - NCHS draws a new representative sample for each
weeks interviews
19stratified
- US divided into 1,900 PSUs
- Largest 52 PSUs are self-representing
- Rest of PSUs divided into 73 categories
(strata), based on socioeconomic and
demographic variables - Sampling takes place separately within each
category (stratum)
20Sample size and Precision
21Weighted sampling
22stratified
- Also place census blocks into categories and
sample within each - Oversample some strata
23Defined population
- Studies, especially cross-sectional studies, are
easiest to interpret when they are based in a
population that has some existence apart from the
study itself (defined population) - 1. Political subdivision (city, county, state)
- 2. Institutional (HMO, employer, profession)
- Probability sampling enables statistical
generalizability to the defined population
24Surveys of sentinel populations
- HIV seroprevalence survey in three county STD
clinics in central NC in 1988 - 3,000 anonymous, unlinked, leftover sera
- Anonymous questionnaire for demographics and risk
factors - Schoenbach VJ, Landis SE, Weber DJ, Mittal
M, Koch GG, Levine PH. HIV seroprevalence in
sexually transmitted disease clients in a
low-prevalence southern state. Ann Epidemiol
19933281-288
25HIV seroprevalence
- Schoenbach VJ, Landis SE, Weber DJ, Mittal
M, Koch GG, Levine PH. HIV seroprevalence in
sexually transmitted disease clients in a
low-prevalence southern state. Ann Epidemiol
19933281-288
26Seroprevalence ( HIV) by risk factors
- Schoenbach VJ, Landis SE, Weber DJ, Mittal M,
Koch GG, Levine PH. HIV seroprevalence in
sexually transmitted disease clients in a
low-prevalence southern state. Ann Epidemiol
19933281-288
27Interpretation
- Measures prevalence if incidence is our real
interest, prevalence is often not a good
surrogate measure - Studies only survivors and stayers
- May be difficult to determine whether a cause
came before an effect (exception genetic
factors)
28Other points
- Can choose by exposure or overall
- Can choose by disease may not be
distinguishable from a case-control study with
prevalent cases
29Outline
- Cross-sectional studies (and sampling)
- Ecologic studies
- Confidence intervals
30Ecologic studies
- Most study designs cross-sectional,
case-control, cohort, intervention trials can
be carried out with individuals or with groups - Group-level studies which use routinely collected
data are easier and less costly - Group-level studies that involve interventions
may not be easier or less costly
31Types of group-level variables
- Summary of individual-level variable (e.g.,
median household income, with high school
diploma) - Property of the aggregate (e.g., neighborhood
grocery stores, seat belt legislation, community
competence)
32Interpretation
- Link between summary exposure variable and
individual-level outcome must be inferred - Inference from group to individual is not always
sound
33Example Male Circumcision and HIV
(Slope indicates strength of relationship r
indicates linearity)
- Source Bongaarts J, et al. The relationship
between male circumcision and HIV infection in
African populations. AIDS 1989 3(6) 373-7.
34Outline
- Cross-sectional studies (and sampling)
- Ecologic studies
- Confidence intervals
35Confidence intervals
- Provide a plausible range for the quantity being
estimated - Width indicates the precision of an estimate for
a given level of confidence - Confidence intervals quantify only random error
from sampling variation, not systematic error
from nonresponse, study design, etc.
36Confidence level vs. precision
- The more vague my estimate, the more confident I
can be that it includes the population parameter
I am 100 confident that the prevalence of HIV
is between 0 and 100. - The more specific my estimate, the lower my
confidence I am 0 confident that the
prevalence of HIV is 5.23
37Confidence intervals interpretation
- Simple interpretations are typically not precise
- Precise interpretations are typically not simple
38Simple but imprecise
- There is 95 confidence that the interval
contains the true value True, but begs the
question how to define confidence
39Simple but imprecise
- There is a 95 probability that the interval
contains the true value Not quite correct
probability (as conventionally defined) applies
to a process, not to a single instance
40Probability applies to a process example
- A 95 confidence interval can be viewed as a
measurement or estimation process that will be
correct (the interval includes the true value of
the parameter) 95 of the time and incorrect 5
of the time. - Let us make up another estimation process that
will be correct (about) 95 of the time.
41Why probability applies to a process
- Estimate your gender by flipping a coin 5 times -
if the result is 5 heads estimate
your gender to be its opposite otherwise
estimate your gender to be what you think it is
now. - Probability that estimate will be correct is(1
Probability of 5 heads) 0.97 97 - Probability that estimate will be incorrect is 3
42Why probability applies to a process
- So we now have a measurement process that will be
correct 97 of the time. We will use it to
measure your gender. - Flip the coin 5 times, and suppose you get 5
heads - Is there a 97 probability that you are of the
opposite sex?
43Precise but not simple
- A 95 confidence interval is
- 1. obtained by using a procedure that will
include the population parameter being estimated
95 of the time - 2. the set of all population values which are
likely to yield a sample like the one we
obtained
44Suppose that this line represents the value of
the parameter we are trying to estimate
True value
45Possible estimates of that parameter in N
identical studies (shows sampling variation)
Study estimates
True value
46One possible true value and how it would
manifest, on average, in N identical studies
True value
95 of the distribution
47Estimate from one study of a given size
?
Estimate
48A possible true value with lt 2.5 chance of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
49A possible true value with gt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
50A possible true value with gt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
51A possible true value with lt 2.5 probability of
being observed at or beyond the estimate
?
Estimate
95 of the distribution
52What the confidence interval represents
?
95 confidence interval
53What the confidence interval represents
95 confidence interval
54One possible true value and how it would
manifest, on average, in N identical studies
True value
1.96 x s.e. 1.96 x s.e.
55Confidence intervals another take
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
56One possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
57Another possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
58A 3rd possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
59A 4th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
60A 5th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
61A 6th possible population
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
62etc.
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
63There are 1.6 x 1060 possible populations (no
cases all cases)
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
64Suppose this is the population
(prevalence 15)
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
65Take a sample (n10)
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
66The sample
?
?
?
? ?
?
? ?
?
?
O
O
67Make point estimate of prevalence
?
?
?
? ?
?
? ?
?
?
O
O
68Interval estimate
- What are all the possible populations that would
be expected to yield this prevalence in a sample
of size 10?
69This one is not possible
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
70Possible, but VERY UNLIKELY
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
71Not quite 2.5 probability (2.1, in fact)
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
72Yields just about 2.5 (3, actually) probability
of selecting 2 (or more) cases in 10
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
73One possible true value and how it would
manifest, on average, in N identical studies
True value
95 of the distribution
74Just above 2.5 (actually 2.6) probability of
selecting 2 (or fewer) cases in 10
O
O
O
O
O
O
O
O
O
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
75Just below 2.5 (actually 2.4) probability of
selecting 2 (or fewer) cases in 10
O
O
O
O
O
O
O
O
O
O
O
O
O
????????????????????????? ????????????????????????
? ????????????????????????? ??????????????????????
??? ????????????????????????? ????????????????????
????? ????????????????????????? ??????????????????
???????
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
76Interval estimate for 2/10
- Lower bound 2.5 (5 cases)
- Upper bound 55 (110 cases)
- Meaning Our sample of 10 with 2 cases provides
evidence to exclude, at conventional error
tolerance, populations with fewer than 5 cases or
more than 110 cases. Populations with 5-110
cannot be excluded as likely sources for this
sample.
77Interval estimate for 2/10
- Actual population prevalence was 15, which in
fact is between 2.5 and 55. - 2.5 to 55 is a very wide interval, i.e., a very
imprecise estimate - To make it more precise, we need a larger sample
78Signs from around the world Germany
- A sign posted in Germany's Black Forest It is
strictly forbidden on our black forest camping
site that people of different sex, for instance,
men and women, live together in one tent unless
they are married with each other for that
purpose.
79Signs from around the world Finland
- On the faucet in a Finnish washroom
- To stop the drip, turn cock to right.