Title: After the First Steps: The Evolution of a Longitudinal Survey
1After the First Steps The Evolution of a
Longitudinal Survey
- National Population Health Survey (NPHS)
- Douglas Yeo
- Workshop on Longitudinal Research in Social
ScienceA Canadian Focus - Population Studies Centre, University of Western
Ontario - London, Oct. 2527, 1999
2 NPHS Program
3 Objectives
- To aid in the development of public policy
- To understand the determinants of health
- Economic, social, demographic, occupational and
environmental correlates of health - To explore relationship between health status and
health care utilisation - To follow a panel of people to reflect the
dynamic process of health - To provide means to supplement content or sample
- To allow linkage with administrative data
4 Sample
- Sample allocation at the national, provincial and
territorial levels - Minimum requirement of 1,200 households for each
province and territory - Household component 20,000 households
- Use of the LFS sampling design
- Health care institutions 2,500 residents
- The North 2,400 persons
5 Description
- Longitudinal and cross-sectional
- First cycle in 1994, repeated every 2 years
- Personal and telephone interviews
- Basic information collected from all household
members - One household member selected as the health
respondent (longitudinal respondent)
6 Description
- General Questionnaire
- All household members
- Proxy reporting permitted (55 of cases)
- Health Questionnaire
- One randomly selected respondent in each
household - Proxy reporting rarely permitted (4 of cases)
7 ContentCore (General)
- Two-week Disability
- Health Care Utilization
- Restriction of Activities
- Chronic Conditions
- Sociodemographic Characteristics
- Country of birth, immigration, language
- Labour force
- Income
- Education
8 ContentCore (Health)
- Self-Perceived Health
- Blood Pressure
- Womens Health
- Height and Weight
- Health Status
- Physical Activity
- Repetitive Strain (1996 and 1998)
- Injuries
9 ContentCore (Health)
- Use of Medications
- Smoking
- Alcohol
- Mental Health
- Social Support
- Sense of Coherence (1994 and 1998)
- Alcohol Dependence (1996)
10 ContentFocus 1994
- Stress
- Ongoing problems
- Recent Life Events
- Childhood and Adult Stressors (traumas)
- Work Stress
- Self-esteem
- Mastery
11 ContentFocus 1996
- Access to Services
- Blood pressure
- Pap smear test
- Mammography
- Breast examinations
- Breastfeeding
- Physical check-up
- Flu shots
- Dental visits
- Eye examination
- Emergency services
- Insurance coverage
12 HPS1996
- Height and Weight
- Breast Self-Examination
- Breastfeeding
- Pregnancy
- HIV
- Smoking
- Alcohol
- Sexual Health
- Road Safety
- Food Insecurity
- Separate Realise
13ContentFocus 1998
- Focus
- Self Care
- Family Medical History
- Diet/Nutrition
- Tobacco Alternatives
- Food Insecurity supplement (HRDC)
14ContentFocus 2000
- Additional chronic conditions
- In-depth diabetes questions
- Fibromyalgia
- Tanning and UV exposure
- Stress questions are back
- Ongoing Problems
- Recent Life Events
- Childhood and Adult Stressors (traumas)
- Work stress
- Self-esteem
- Mastery
- Illicit drug use
15 File Creation1994
- Core sample (20,000)
- Buy-in sample
- N.B., Ont., Man., B.C.
- Files produced Cross-sectional
- 1994 General File (all household members)
- 1994 Health File (one randomly selected
respondent)
16 File Creation1996
- 1994 responding panel members
- Cross-sectional Files
- 1996 General File (all household members)
- 1996 Health File (one randomly selected
respondent) - Includes buy-in sample
- Ontario, Manitoba, Alberta
- Longitudinal File (199496)
17 ProductsFiles
- Master Files 199495 199697 (released)
- Share Files 199495 199697 (released)
- (Health Canada Provinces)
- Public Use Microdata Files
- 199495 Household, General Health (rel.)
- 199495 Institutions, Health (rel.)
- 199697 Household, General Health (rel.)
- 199697 Institutions, Health Longitudinal (late
1999) - 199697 Household, Longitudinal (doubtful)
18 ProductsAccess
- Master Files
- Selected Regional Offices
- Deemed employee of Statistics Canada
- Remote Access
- Internet job submission
- Using test master files
- Free to clients
- DLI, SSHRC
19 ProductsPublications
- NPHS Overview Report
- 199495self-rated health and income, chronic
conditions and pain, depression, use of health
care services and alternative medicine - 199697chronic disease incidence, changes in
activity limitation status, depression,
repetitive strain injuries, smoking, use of
health care services - 199899March 2000 issue of Health Reports
20 ProductsPublications
- Health Reportsdetailed articles
- Depression, chronic pain, immigrants health,
sense of coherence, smoking, hormone replacement
therapy, bicycle helmet use, sample design
21 ProductsNHRDP
- National Health Research and Development Program
- Jointly funded by Health Canada and Statistics
Canada - Up to 300,000 annually for NPHS research
- Cycle 1 8 grants, papers available
- Cycle 2 7 grants, papers available
- Cycle 3 7 grants, research starting
- Cycle 4 Health Canada preparing RFP
221994 Sample Design
- Household target population
- Based upon Labour Force Survey (LFS) and Enquête
sociale et de santé (in Quebec only) - Household residents in all provinces
- Exclusions Indian reserves, Canadian forces
bases, remote areas in Ontario and Quebec - Stratified multistage design
231994 Sample Design
- 1st stage
- Strata formed
- Major urban centres, urban towns, rural areas
- Further stratified by geography and/or
socioeconomic characteristics - Clusters (heterogeneous) formed independently
within strata - Clusters selected based upon PPS sampling
- 2nd stage
- Dwelling lists prepared for each selected cluster
- Subsample of households selected within each
cluster
24Cluster Sampling
- Highly cost-effective in terms of listing and
data collection - Only selected clusters are listed
- Less efficient than SRS
- Neighbouring units similar (intracluster
correlation) - PPS sampling
- Vary the probability with which a unit is
selected according to its size - Units do not have same probability of selection
(unequal weights)
251994 Sample DesignRejective Method
- One member/hhld selected at random to be
longitudinal respondent - Panel would underrepresent persons in large hhlds
(parents and children) and overrepresent persons
in smaller hhlds (singles and elderly) - Portion of sample pre-identified for screening
- If no member lt 25 years old then screened out
- Increased hhlds visited by anticipated
screened out
261994 Sample DesignIntegration With NLSC
- NLSC follows 25,000 children
- NPHS longitudinal respondents lt 12 years of age
collected by NLSC - NPHS childrens sample used in NLSC estimates and
for NPHS - Due to scheduling constraints NPHS kids sample
not selected before Q3 and Q4
27Sample Design Subsequent Cycles
- Longitudinal respondents recontacted, using
contact information from previous cycles - Moved into an institution
- Moved to territories
- Moved to an Indian reserve gt tried to get data
- Moved temporarily away
- Identified deaths
- Hhlds in sample include hhlds where the
longitudinal respondent currently lives - Hhld composition may have changed
28Sample Design Subsequent Cycles
- Longitudinal respondents data used for panel and
cross-sectional purposes - Hhld members data used for cross-sectional
estimates only (General file) - NPHS kids sample now collected by NPHS, not NLSCY
- Cross-sectional supplementary samples from
previous cycles not followed up
29Sample Design Subsequent Cycles
- Top-up of sample every second cycle
- First time in 1998
- For cross-sectional purposes only
- Account for changing population, panel attrition
- To cover population not present in 1994 new
births, immigrants
30Data Collection
- Statistics Canada LFS interviewers
- Computer-Assisted Personal or Telephone
Interviews (CAPI/CATI) - Built-in edits, mins, maxes
- Direct skip patterns
- On-screen prompts
- Pre-filling of text or data
- Average interview time 1 hour
31Data Collection
- Data collected at 4 points in time
- For operational, seasonality reasons
- June, August, November, February
- Nonresponse no contact, refusal
- Letter sent, second call, senior interviewer
follows up - Never replace sample dwellings with others
- Resends follow up nonresponse in subsequent
quarters, and in special resend period the
following June
32Data Collection
- Tracing to find longitudinal respondents
- Panel member only
- Feed back information from previous cycles
- Data quality check
- Probes for reasons for change
- Restriction of activities, chronic conditions,
smoking - Some sociodemographic information not re-asked if
no change
33Processing
- Editing
- On-line edits in CAPI
- Some head office consistency edits
- Invalid, inconsistent data set to "not stated"
- Coding of write-in information (e.g., drugs)
- Creation of derived variables
34 Response Rates
- 1994 Household 88.7
- Selected respondent 96.1
- 1996 Longitudinal
- General 93.6
- Health 92.8
- Only 1.7 not traced
- 1996 Cross-sectional Household 82.5
- Selected Respondent 95.0
35 Analysing Complex Data
- Point estimation
- Survey weights must be used in calculation of
estimates to correctly draw conclusions about
popn of interest - Weights take stratification, unequal sampling
probabilites into account - Variance estimation
- Using survey weight only not sufficient
- Complex design (and design effect) must be
accounted for to avoid serious underestimation of
standard errors
36Effect of Weighting
- Comparison of males and females who reported
being in excellent or very good health - Weighted difference 65.3 - 61.6 3.7
- Unweighted difference 62.6 - 60.8 1.8
371998 Weighting Methodology
- All panel respondents have a longitudinal weight
- Includes moved to institution, dead, etc.
- Start with basic weights from 1994
- Derived from LFS or Lenquête sociale et de santé
weights - Probability of selecting a dwelling in a selected
cluster
381998 Weighting Methodology
- Nonresponse adjustmentby weighting classes
- To account for potential nonresponse bias.
- Study if nonrespondents are different,
- Create special weighting classes based on
response propensity using CHAID to account for
these differences properly - Calibrate to 1994 population totals (by
province/age/sex)
391998 Weighting Methodology
- Three longitudinal weights
- WT68LF Full for fully completed for all
components/all occasions - WT68LP Partial for fully completed for 1994
and 1998 - WT64LS Squareentire panel of 17,276,
including nonrespondents
40Design Effects
- Measure of complexity of sample design
- Calculate design variance using bootstrap weights
- Calculate SRS variance
- Deff design variance / SRS variance
- Generally, deffs gt 1 for clustered designs,
deffs lt 1 for stratified designs - Varies (sometimes greatly) by characteristic
41Variance Estimation
- Measuring sampling error for complex sample
designs - Simple formulas not available
- Most software packages do not incorporate design
effect appropriately for variance calculations - Need to provide some measures of data quality to
users
42NPHS Variance Estimation
- Bootstrap resampling method (similar to
jackknife) used for all variance estimation - Aggregates, proportions, differences,
coefficients from linear and logistic regressions - Variance estimation program written in SAS/SPSS
macros - Approximate coefficient of variation (CV) look-up
tables also provided with PUMF - For categorical variables, totals, proportions
43Bootstrap Weight Method
- Variance estimation divided into two phases
- Calculation of bootstrap weights
- Calculated only once, by Statistics Canada
- Variance estimation using bootstrap weights
- Internally and externally
- Bootstrap weights available for regional office
master files, for share files, in remote access
program (dummy files) - No need for design information
- Bootstrap weights incorporate design effect
implicitly
44Bootstrap WeightsCalculation
- Resampling method, which divides records into
subgroups (replicates) and determines the
variation in the estimates from replicate to
replicate - Within each stratum, resample within original
sample by taking a SRSWR of n-1 of the n clusters
in that stratum
45Bootstrap WeightsCalculation
- Recalculate the weight for each record in that
stratumthis is the bootstrap weight - We now have a new bootstrap weight for every
record on the file. This set of weights is the
first bootstrap replicate. A new point estimate (
) can be calculated using the weights of
this replicate - Repeat B (e.g., B500) times
46Bootstrap WeightsVariance Estimation
- To estimate the variance of any estimate (?),
first calculate the estimate B times, using the
weights from the B bootstrap replicates - Then calculate the variance among these B
estimates
47Bootstrap Weight MethodAdvantages
- Sets of 500 bootstrap weights can be distributed
to analysts - Handles large datasets
- Interprovincial migration accounted for
corrrectly in variance estimates - Recommended (over the jackknife) for estimating
the variance of nonsmooth functions like
quantiles, LICO, Gini index
48Variance Estimation Example
- Comparison of of males vs. females who are in
excellent or very good health - Weighted difference 65.3 - 61.6 3.7
- SASscaled weights
- Standard error 0.36
- 95 confidence interval (3.0, 4.4)
- Bootstrap
- Standard error 0.70
- 95 confidence interval (2.3, 5.1)
49 Limitations and Feedback
- Some topics could be explored more thoroughly
- Data raises more questions than it answers
- Sample sizes can become small in a hurry
- Often useful to combine with other survey data to
explain phenomenon - Nice to be able to calculate bootstrap variance
which takes design into account
50 Analytical Findings
- Proxy / nonproxy reporting
- Handling item nonresponse
- Handling data inconsistencies
- Study gross flows / changes
51 Self-Rated Health
52 Self-Rated Health Change199496
53 NPHS Future Directions
- New Household Cross-sectional Survey
- Provide health-region estimates
- Sample size of 130,000 / 30,000
- Core, regional and rotating focus content
- 45-minute interview
- Every two years starting in 200001
54 NPHS Future Directions
- Expanded Health Care Institutions Survey
- Provide national and provincial estimates
- To start in 2001?
- Expanded Northern Survey
- Total sample of 3,000 (1,000 per territory)
- National Person-oriented Registries
- NPHS data linked
55 NPHS Future Directions
- Current Household Survey
- Strictly longitudinal focus
- New cohort to start in 2004?
- Physical measures content to (sample of)
longitudinal cohort twice in 20 year life - Continue every two years
56NPHS Sample Design
57NPHS Sample Design
58 NPHS Future Directions
- Focus entirely longitudinal
- Content will now specialise
- How big should the new household cohort be?
- Institutions, North panels
- How long should they be kept?
- Integration with other surveys
- When should new cohort be started?
59National Population Health Survey
Contacts www.statcan.ca
- Manager Lorna Bailie lorna.bailie_at_statcan
.ca - Output Manager Bryan Lafrance, 613-951-3285
- bryan.lafrance_at_statcan.ca
- Senior Methodologists Harold Mantel,
613-951-4150 - harold.mantel_at_statcan.ca
- Douglas Yeo, 613-951-8614
- douglas.yeo_at_statcan.ca