Title: Canadian Community Health Survey
1 - Canadian Community Health Survey
- Cycle 1.1
- Overview of methodological issues and more...
2Presentation Outline
- Sample Design
- Target population, sample allocation and frames
- Sampling strategies, oversampling of
sub-populations - Data collection, response rates
- Imputation
- Weighting
- Sampling error
- Sampling variability guidelines
- Variance estimation Bootstrap re-sampling
technique - CV look-up tables
- Analysis
- Examples
- How to use the Bootvar programs
3CCHS - Cycle 1.1 Health Region-level survey
- Main objective
- Produce timely cross-sectional estimates for 136
health regions - Target population
- individuals living in private occupied dwellings
aged 12 years old or over - Exclusions those living on Indian Reserves and
Crown Lands, residents of institutions, full-time
members of the Canadian Armed Forces and
residents of some remote areas - CCHS 1.1 covers 98 of the Canadian population
4CCHS - Sample Allocation to Provinces
- Prov Pop of 1st Step 2nd Step Total
- Size HRs 500/HR X-prop Sample
- NFLD 551K 6 2,780 1,230 4,010
- PEI 135K 2 1,000 1,000 2,000
- NS 909K 6 3,000 2,040 5,040
- NB 738K 7 3,500 1,650 5,150
- QUE 7,139K 16 8,000 16,280 24,280
- ONT 10,714K 37 18,500 23,760 42,260
- MAN 1,114K 11 5,500 2,500 8,000
- SASK 990K 11 5,400 2,320 7,720
- ALB 2,697K 17 8,150 6,050 14,200
- BC 3,725K 20 10,000 8,090 18,090
- CAN 29,000K 133 65,830 64,920 130,750
- The sampling fraction in some small HRs was
capped at 1 in 20 households
5CCHS - Sample Allocation to Health Regions
- Pop. Size of Mean
- Range HRs Sample Size
- Small less than 75,000 41 525
- Medium 75,000 - 240,000 60 900
- Large 240,000 - 640,000 25 1,500
- X-Large 640,000 and more 7 2,500
6CCHS - Sample Allocation to Territories
- Population Sample
-
- Yukon 25,000 850
- NWT 36,000 900
- Nunavut 22,000 800
7CCHS - Sample Frame
- CCHS sample selected from three frames
- Area frame (Labour Force Survey structure)
- RDD frame of telephone numbers (Random Digit
Dialling) - List frame of telephone numbers
- Three frames are needed for CCHS for the
following reasons - 1. To yield the desired sample sizes in all
health regions - 2. Have a telephone data collection structure in
place to quickly address provincial/regional
requests for buy-in sample and/or content at any
point in time - 3. Optimize collection costs
8Area frame - Sampling of households
- 83 of CCHS sampled households
- Multistage stratified cluster sample design
1 Each health region is divided into
strata
Stratum 1
?
?
2 Clusters selected within strata (PPS
sampling) (1st stage)
?
?
?
?
?
3 Dwellings selected within clusters
(2nd stage)
Stratum 2
?
?
?
?
?
?
?
9RDD frame of telephone numbers Sampling of
households
- Elimination of non-working banks method
- 7 of CCHS sampled households
- Telephone bank area code first 5 digits of a
7-digit phone - 1- Keep the banks with at least one valid phone
- 2- Group the banks to encompass as closely as
possible the health region areas - RDD strata - 3- Within each RDD stratum, first select one bank
at random and then generate at random one number
between 00 and 99 - 4- Repeat the process until the required number
of telephone numbers within the RDD stratum is
reached
10List frame of telephone numbers Sampling of
households
- Simple random sample of telephone numbers
- 10 of CCHS sampled households
- Telephone companies billing address files and
Telephone Infobase (repository of phone
directories) - 1- Create a list of phone numbers
- 2- Stratify the phone numbers by health region
using the residential postal codes - 3- Select phone numbers at random within a health
region - 4- Repeat the process until the required number
of telephone numbers is reached
11CCHS - Sampling of persons
- Area frame
- Simple random sample (SRS) of one person aged 12
years of age or older (82 of households) - SRS sample of two persons aged 12 years of age
or older (18) - RDD / List frames
- SRS sample of one person aged 12 years of age or
older
12CCHS - Sampling of persons
- Age 1996 LFS CCHS
- group Census sample simulated (all
persons) sample - ( only 1 person)
- 12-19 13.2 13.7 8.5
- 20-29 16.4 14.4 14.3
- 30-44 30.8 28.7 29.1
- 45-64 25.8 28.0 27.9
- 65 13.8 15.2 20.2
- averaged distribution over 100 repetitions
using the May 99 LFS sample
13CCHS - Representativity of sub-populations
- To address users needs, two sub-population
groups needed larger effective sample sizes - Youths (12-19 years old)
- Decision gt Oversample youths by selecting a
second person (12-19) in some households based on
their composition - Elderlies (65 years old and )
- Decision gt Do not oversample - let the general
sample selection process address the issue by
itself
14Sampling strategy based on household composition
- Number of persons aged 20 or over
- Number 0 1 2 3 4 5 of 12-19
-
- 0 - A A A A B
- 1 A A C C C B
- 2 A C C C C C
- 3 A C C C C C
- A Simple random sample (SRS) of one person aged
12 - B SRS of two persons aged 12
- C SRS of one person in the age group 12-19 and
SRS of one person 20
15CCHS - Sample Distribution after Oversampling
- Age 1996 CCHS CCHS
- group Census simulated simulated sample sam
ple - ( only 1 person) ( some 2 persons)
- 12-19 13.2 8.5 14.9
- 20-29 16.4 14.3 13.1
- 30-44 30.8 29.1 28.1
- 45-64 25.8 27.9 26.3
- 65 13.8 20.2 17.6
- averaged distribution over 100 repetitions
using the May 99 LFS sample
16CCHS - Initial data collection plan
- 12 monthly samples
- 12 collection months 1
- Area frame
- CAPI
- STC field interviewers
- targeted response rate 90
- anticipated vacancy rate 13
- (09 / 2000 - 08 / 2001) 09 / 2001
- RDD / List frames
- CATI
- STC call centres
- targeted response rate 85
- telephone hit rate 15-60
17CCHS data collection - Observed situation
- Field interviewers
- workload exceeded field staff capacity
- Call centres
- new collection infrastructure
- unequal allocation of work among call centres
- Descriptive paper
- Â Preventing nonresponse in the Canadian
Community Health Survey , Y. Béland, J. Dufour,
and M. Hamel. 2001, Hull, Statistics Canada
XVIIIth International Symposium.
18CCHS - Final response rates
- Field Call centres
Total - NFLD 86.6 89.3 86.8
- PEI 87.7 82.6 84.7
- NS 88.8 89.3 88.8
- NB 88.4 92.4 88.5
- QUE 85.7 84.8 85.6
- ONT 82.8 79.5 82.0
- MAN 90.0 85.0 89.5
- SASK 87.0 85.4 86.8
- ALB 85.2 84.9 85.1
- BC 83.9 86.7 84.7
- YUK 79.3 95.6 82.7
- NWT 89.6 85.4 89.2
- NUN 66.3 34.6 62.5
- CAN 85.1 83.1 84.7
19CCHS - Proxy interviews
- Definition When another person in the household
responds - to the survey on behalf of the selected
person in the sample - Acceptable in the following cases
- Out of the country for a long period of time
- Mental or physical state of health
- Language barrier
- Usually, 2 to 3 of proxy respondents
- Because of the field problems 6,3
- Higher rate in some health regions, for men and
younger respondents -
- Major consequence one third of the
questionnaire is missing - Personal or sensitive type of questions not
asked - Solution imputation
20Modules for proxy and non-proxy
- Alcohol
- Chronic condition
- Exposure to second hand smoke
- Food insecurity
- General health (Q1, Q2 and Q7)
- Health care utilization
- Health Utility Index (HUI)
- Height / Weight (Q2 and Q3)
- Injuries
- Restriction of activities
- Smoking
- Tobacco alternatives
- Two-week disability
- Household composition housing
- Income
- Labour force
- Socio-demographic characteristics
- Administration
- Drug use (optional)
- Home care (optional)
21Modules for non-proxy only
- Alcohol dependence / abuse
- Blood pressure check
- Breastfeeding
- Contacts with mental health professionals
- Mammography
- Fruit vegetable consumption
- General health (Q3-Q6, Q8-Q10)
- Height / Weight (Q4 only)
- PAP smear test
- PSA test
- Physical activities
- Patient Satisfaction
- Breast examinations
- Breast self examinations
- Changes made to improve health
- Depression
- Dental visits
- Distress
- Driving under influence
- Eye examinations
- Flu shots
- Mastery
- Mood
- Physical check-up
- Sedentary activities
- Self-esteem
- Sexual behaviours
- Smoking cessation aids
- Social support
- Spirituality
- Suicidal thoughts and attempts
- Use of protective equipment
- Work stress
22Imputation Strategy - 5 passes
- 1st pass Health prevention modules
- 3 modules imputed completely
- 6 modules imputed partially (some questions
only) - 2 modules not imputed
- 2nd pass Mental health modules
- 6 modules imputed completely
- 7 modules not imputed
- 3rd pass Sexual behaviours
- 4th pass Fruit and vegetable consumption
- 5th pass one question in the module height and
weight - Note that some modules and/or questions are not
imputed (such as physical activity, distress,
work stress, time since last flu shot, etc.)
23Imputation Strategy
- Strategy applied at each imputation pass
- 1. Create imputation classes
- Usually Province X Sex X Age groups X
Filters - The donor has to be in the same imputation class
as the recipient - Minimum donor rule donors / (donors
recipients) gt 60 - 2. Identify a list of matching variables
- 3. Assign a weight to each matching variable
- Default weight 1, sometimes weight 2, 3 or
more. - 4. Find the nearest donor
- - Highest Total Weighted Match
- - If more than one possible donor, select one
randomly from them - - No imputation if no donor over a minimum number
of matches
24CCHS - Weighting and Estimation
- Estimation relates sample back to population
- MUST use weights in calculation of estimates to
correctly draw conclusions about population of
interest - Sampling weight is related to the probability of
selecting a person in the sample - Persons are selected with unequal probabilities
therefore have varying weights
25CCHS - Weighting and Estimation
- Three separate weighting systems
- Area frame design
- RDD frame design
- List frame design
- Several adjustments
- non-response (household and person)
- seasonal factor
- etc...
- Integration of the two weighting systems based on
design effects and sample sizes ( n / deff ) - Calibration using a one-dimensional
poststratification adjustment of ten age/sex
poststrata within each health region - Variance estimation bootstrap re-sampling
approach - set of 500 bootstrap weights for each individual
26Weighting Estimation
27Weighting Estimation
- Initial weight Inverse of the probability of
being selected
28Weighting Estimation
- Household nonresponse Distribute weight of
nonresponding households to responding ones - Using nonresponse classes such as HR, collection
period and urban, rural/urban)
29Weighting Estimation
- No phone lines No coverage of hhlds without a
phone line. Weights are boosted by a certain
rate (specific to each HR) - Rates of no phone lines calculated using area
frame data
30Weighting Estimation
- of people in hhld Convert the hhld-level
weight into a person-level weight (multiply by
the number of people) - Depends on the of people selected (1 or 2), and
their age
31Weighting Estimation
- Person level nonresponse Redistribute the weight
of selected person who did not respond to the
ones who responded - Using classes (age, sex, person selected,
collection period, etc)
32Weighting Estimation
- Multiple phone lines More phone lines higher
probability of being selected - weight divided by the number of residential phone
lines
33Weighting Estimation
- Final weight Each frames final weight is each
representative of the total population. To
create a single set of weights, they are combined
through Integration
34Weighting Estimation
- Integration Combine the 2 sets of weights into
one single set of weights - Based on sample size and design effect of each
frame
35Weighting Estimation
- Seasonal effect Adjust weights so that each
season contains 25 of the total population - Based on the collection period(sept-nov /
dec-feb / mar-may / june - aug)
36Weighting Estimation
- Post-stratification Ensure the sum of weights
matches the estimated population projections in
each HR, for 10 age-sex groups - 12-19, 20-29, 30-44, 45-64 and 65 crossed with
two sexes
37Weighting Estimation
- Final CCHS weight Final weight present on the
CCHS master file
38CCHS - Special Weights
- For various reasons, many other weights are
produced - Quarter 4 special weight
- PEI special weight
- Share weights (master, Q4 and PEI special)
- Link weights (master, Q4 and PEI special)
39Sampling Error
- Difference in estimates obtained from a sample as
compared to a census - The extent of this error depends on four factors
- sample size
- variability of the characteristic of interest
- sample design
- estimation method
- Generally, the sampling error decreases as the
size of the sample increases
40Sampling Error
- Measures of precision associated to an estimate
- Variance
- Standard deviation (square root of the variance)
- 95 confidence interval (estimate 1.96 x
standard deviation) - Coefficient of variation
- Standard deviation of estimate x 100 / estimate
itself - CV allows comparison of precision of estimates
with different scales - Examples
- 24 of population are daily smokers, std dev.
0.003 - gt CV0.003/0.24 x 1001.25
- gt 95 CI 0.240 1.96 x 0.003 0.234
0.246
41Sampling Variability Guidelines
- Type of estimate CV Guidelines
- Acceptable 0.0-16.5 General
unrestricted release - Marginal 16.6-33.3 General
unrestricted release but with
warning cautioning users of the high
sampling variablitity.
Should be identified by letter E. - Unacceptable gt 33.3 No release.
- Should be flagged with letter F.
42Sampling Error
- Measuring sampling error for complex sample
designs - Simple formulas not available
- Most software packages do not incorporate design
effect (and weights adjustments) appropriately
for calculations - Solution for CCHS the Bootstrap re-sampling
method
43Bootstrap method
- Principle
- You want to estimate how precise is your
estimation of the number of smokers in Canada - You could draw 500 totally new CCHS samples, and
compare the 500 estimations you would get from
these samples. The variance of these 500
estimations would indicate the precision. - Problem drawing 500 new CCHS samples is
- Solution Assuming your sample is representative
of the population, sample 500 new subsamples and
compute new sampling weights for each subsample.
44Bootstrap method
- How CCHS Bootstrap weights are created(the
secret is now revealed!!!)
T 40Var ? (Bi - B)2 / 499
45Bootstrap Method
- How Bootstrap replicates are built?
- The real recipe
- 1- Subsample clusters (SRS) within a design
stratum - 2- Apply (initial design) weight
- 3- Adjust (boost) weight for selection of n-1
among n - 4- Apply all standard weight adjustments
(nonresponse, integration, share, etc.) - 5- Post-stratification to population counts
- The bootstrap method intends to mimic the same
approach used for the sampling and weighting
processes
46Bootstrap Method
- Sampling weight versus Bootstrap weights
- Sampling weight used to compute the estimation of
a parameter (e.g. number of smokers) - Bootstrap weights used to compute the precision
of the estimation (e.g. the CV of the number of
smokers estimation)
47Bootstrap Method
- The process of variance estimation is divided
into two phases - Calculation of bootstrap weights
- Need to be produced only once
- Done by Statistics Canada methodologists
48Bootstrap Method
- Variance estimation using bootstrap weights
- Done by anyone - internally or externally
- Bootstrap weights files distributed with all CCHS
files,except Public-Use Microdata File (PUMF) - Bootstrap weights are in a separate file (match
using IDs) - Not for PUMF because bootstrap weights reveal
confidential info - PUMF users must proceed through remote access to
get  exact variances or use the CV look-up
tables
49Bootstrap Method
- Variance estimation using bootstrap weights
- SAS and SPSS (beta) macro programs provided to
users (BOOTVAR) - Allow users to perform a few statistical analysis
(totals, proportions, differences of proportions
and regression analysis) - Fully documented with examples
- Bootstrap hands-on workshop
50How to use the Bootvar program
- STEP 1
- Create your analytical file
STEP 2 Compute your variances using Bootvar
- Read CCHS data file
- Prepare the necessary dummy variables
- Keep only the necessary variables
- Perform the analysis to obtain the point
estimates - (not essential but recommended)
- Specify the location of the files
- Your analytical file
- Bootstrap weights file
- Specify the level of geography
- Specify the analysis to perform
- Total, proportion, diff. of prop.
- Regression (linear logistic)
- Generalized linear model
51How to use the Bootvar program
- Statistical analysis
- Using the NPHS cycle 3 (1998) cross-sectional
dummy data, estimate the number of ontarians aged
12, by gender, who perceive themselves as being - - in poor or fair health,
- - in good health,
- - in very good health,
- - in excellent health.
- - Compute 95 confidence interval for each point
estimate using the Bootvar program.
52Necessary variables for the analysis
- Self-perceived health (GHC8DHDI)
- 0 poor, 1 fair, 2 good, 3 very good, 4
excellent, 9 not stated - Age (DHC8_AGE) Sex (DHC8_SEX)
- gt 12 1 male, 2 female
- Province (PRC8_CUR) Sampling weight (WT68)
- 35 Ontario
- Record identifier for the household (REALUKEY)
- Number identifying the person in the household
(PERSONID)
53Basic theoritical notions for estimating a
proportion
- Example of a data file
- ID Weight Sex Asthma Asthma_id
- A 50 M YES 1
- B 60 M NO 0
- C 50 M NO 0
- D 70 M YES 1
- E 50 M NO
0 - (WeightA WeightD)
- (WeightAWeightBWeightCW
eightDWeightE) -
- (50 70) / (50 60 50
70 50) 100 120 / 280 100 43
(asthma_men) ________________________________
__________ 100
54Little trick for the statistical analysis
- Create your univariate dummy variable
- Men 1,0 (men)
- Good health 1,0 (good)
- Men in good health mgood men good
- men good mgood
- 1 0 0
- 1 1 1
- 0 0 0
- 0 1 0
55Results of the statistical analysis
- Self-perceived health of ontarians aged 12 or
older by gender - in 1998
- (000) 95 CI 95 CI
- Men
- - Poor / fair 391 (330 452) 8.4 (7.1 9 .8)
- - Good 1,106 (1,007 1,204) 23.9 (21.7
26.0) - - Very good 1,764 (1,648 1,880) 38.1 (35.6
40.6) - - Excellent 1,373 (1,268 1,479) 29.6 (27.4
31.9) - Women
- - Poor / fair 480 (409 551) 9.9 (8.5 11.4)
- - Good 1,258 (1,151 1,364) 26.1 (23.9
28.3) - - Very good 1,846 (1,726 1,965) 38.2 (35.8
40.7) - - Excellent 1,243 (1,138 1,348) 25.8 (23.6
27.9)
56Why use the Bootstrap method?
- Other techniques
- Taylor
- Need to define a linear equation for each
statistic examined - Jacknife
- Number of replicates depends on the number of
strata (large number of strata makes it
impossible to disseminate)
57Why use the Bootstrap method?
- BOOTSTRAP
- more user-friendly when there is a large number
of strata - sets of 500 bootstrap weights can be distributed
to data users - Recommended (over the jackknife) for estimating
the variance of nonsmooth functions like
quantiles, LICO - Official reference
- Bootstrap Variance Estimation for the National
Population Health Survey, D. Yeo, H. Mantel, and
T.-P. Liu. 1999, Baltimore, ASA Conference.
58CV Look-up Tables
- Alternative to bootstrap
- Approximate
- Can only be used for categorical variables, and
for estimations of totals and proportions - Available for every health region, province and
Canada - Provided with PUMF and Share file for some
subpopulations
59CV Look-up TablesExample
- National Population Health Survey - 1996/1997
- Approximate Sampling Variability
Tables for Ontario Health AreaOTTAWA CARLETON -
Selected members - NUMERATOR OF
ESTIMATED PERCENTAGE - PERCENTAGE
- ('000) 0.1 1.0 2.0 5.0
10.0 15.0 20.0 25.0 30.0 35.0
40.0 50.0 70.0 90.0 - 1 48.6 48.4 47.6
46.4 45.0 43.7 42.3 40.9 39.4
37.8 34.5 26.8 15.5 - 2 34.4 34.2 33.7
32.8 31.9 30.9 29.9 28.9 27.9
26.8 24.4 18.9 10.9 - 3 28.1 27.9 27.5
26.8 26.0 25.2 24.4 23.6 22.7
21.9 19.9 15.5 8.9 - 4 24.3 24.2 23.8
23.2 22.5 21.9 21.2 20.4 19.7
18.9 17.3 13.4 7.7 - 5 21.7 21.6 21.3
20.7 20.1 19.5 18.9 18.3 17.6
16.9 15.5 12.0 6.9 - 6 19.8 19.7 19.4
18.9 18.4 17.8 17.3 16.7 16.1
15.5 14.1 10.9 6.3 - 7 18.4 18.3 18.0
17.5 17.0 16.5 16.0 15.5 14.9
14.3 13.1 10.1 5.8 - 8 17.1 16.8
16.4 15.9 15.5 15.0 14.5 13.9
13.4 12.2 9.5 5.5 - 9 16.1 15.9
15.5 15.0 14.6 14.1 13.6 13.1
12.6 11.5 8.9 5.2 - 10 15.3 15.1
14.7 14.2 13.8 13.4 12.9 12.5
12.0 10.9 8.5 4.9 - ...
60Another example using the Bootvar program
- Statistical analysis
- Using the NPHS cycle 3 (1998) cross-sectional
dummy data, determine whether or not the number
of men aged 12 or older who perceive themselves
as being in excellent health in Ontario is
statistically different (at level ?5) than the
number of women. -
61Basic theoritical notions for performing a Z-test
- M_excel estimated proportion of men in
excellent health - F_excel estimated proportion of women in
excellent health - Hypothesis test H0 M_excel F_excel
- H1 M_excel ? F_excel
-
- At level ? 0,05, we conclude H0 if z
lt 1.96 - We conclude H1 otherwise.
- Z ( M_excel - F_excel )
- sd (M_excel-F_excel)
- We use the section difference of proportions of
the BOOTVAR program to estimate the standard
deviation of the difference between the two
estimates.
__________________
62Results
- M_excel 29.64 F_excel 25.75
sd(M_excel-F_excel) 1.62 - Z ( M_excel - F_excel ) (29.64 -
25.75) 3.89 2.40 - sd (M_excel-F_excel) 1.62
1.62 - At ? 0,05 level , we conclude H1 because z
2.40 gt 1.96 . - We can then conclude that among the ontarians
aged 12 or older there is a statistical
difference between men and women with regard to
the caracteristic self-perceived health
excellent. -
________________
_________
____
63CCHS - Data Dissemination Strategy
- Wide range of users and capacity
- 136 health regions
- 13 provincial/territorial Ministries of Health
- Health Canada and CIHI
- Internal STC analysts
- Academics
- Others
- Data products
- Microdata
- Analytical products (Health Reports, How Healthy
are Canadians, etc) - Tabular statistics (ePubs, Cansim II, community
profiles, etc) - Client support (head and regional offices, CCHS
website, workshops, etc)
64CCHS - Access to microdata
- Master file
- all records, all variables
- Statistics Canada
- university research data centres
- remote access
- Share / Link files
- respondents who agreed to share / link
- provincial/territorial Ministries of Health
- health regions (through the STC third-party share
agreement) - Public Use Microdata File (PUMF)
- all records, subset of variables with collapsed
response categories - free for 136 health regions
- cost recovery for others
65CCHS - Overview of Cycle 1.2
- Produce provincial cross-sectional estimates from
a sample of 30,000 respondents - Area frame sample only / one person per household
- CAPI only
- 90 minute in-depth interviews on mental health
and well-being based on WMH2000 questionnaire - Scheduled to begin collection in May 2002
66CCHS - Future Plans
- Same two-year cycle approach
- health region level survey starting in January
2003 - provincial level survey starting in January 2004
- New consultation process with provincial and
regional authorities - Flexible sample designs (adaptable to regional
needs) - Development of an in-depth nutrition focus
content (Cycle 2.2)
67CCHS Web site
- www.statcan.ca/health_surveys
- www.statcan.ca/enquetes_santé
-
68Contacts in Methodology
- Yves Bélandyves.beland_at_statcan.ca
- François Brisebois francois.brisebois_at_statcan.c
a