Title: The obsession with weight in the modelling world
1The obsession with weight in the modelling world
And its ancillary affects on Analysis
2The basic
- The basic idea of sampling
- The reason behind complicating a good idea
- The implication when modelling data
3How Sampling Works.
Now lets assume that we had some idea about the
picture we wanted to see. And we decide to
stratify the sample. In this case we decide to
sample different areas of the picture at
different rates, the backgroud, the dress, the
face, the hands, etc...
Imagine a well known picture Since a picture is
made up of points of colour (pixels), we will
sample the points of colour at different rates.
1 Random (systematic)
10 Random
3 Random
5 Random
2.5 Stratified
4How Sampling Works.
1
3
5
10
2.5 Stratified
5How does this affect modeling or analysis
- The sample is no longer simply random
- We purposefully biaised the sample to gain
efficiencies to meet other goals - This bias is corrected when we apply the design
weights.
6Framework
If you were to analyse each stratum separately
Still there would be some difficulty associated
with the correction for non-response and final
callibration (post)
Each part can actually be treated as surveys each
with a simpler design
The sampling frame or design allows you to keep
all these part together in a cohesive way for
analysis.
7How to interpret sampling
The way we sample is reflected and corrected by
how we weight the data in the end.
- If you looked only at the parts we sampled
- You wouldnt get an accurate picture.
- All the parts would be there but not in the right
proportions.
- The design weights compensate for the known
distortions. The final weights include estimated
distortions.
8What would you use to base the fundamental
multivariate relationships in your model or
analysis ?
9Steps to calculate the weights Basic overview
- At the survey design stage, some factors are used
to determine the sample size required - Probability of selection calculated
- First series of adjustments for non-response
- Post-stratification
10Factors to determine the sample size
- Characteristics to be estimated (small
proportions) - Required precision of the estimates (targetted
CV) - Variability of the data
- Expected non-response rate
- Size of the population
11Original design weight
- Once the sample is selected in each stratum,
calculate the original weight - Nh/nh, where h is the stratum
- Since the sample is selected from LFS, get
original weight from LFS. - Adjustments for the number of available children.
12Non-response adjustment
- Adjustments must be made to take into account the
total non-response - Characteristics of respondents vs non-respondents
are analyzed - Province, income, level of education of parents,
depression scale of PMK, urban/rural, etc.
13Post-stratification
- Adjustment factor calculated in order to
post-stratify the sample to known population
counts, by - Province, age, gender
14Final weight
- Wf Wi X Adj1 X Adj2
- Where
- Wf Final weight
- Wi initial weight
- Adj1 Non-response adjustment
- Adj2 Post stratification
15Link between analysis and the sample design
(weight)
Intelligence
Grade level
Childs Ability
Social environment
Teachers
School
Materials
Subject
Curriculum
Province
The proportion of kids in the sample being taught
the PEI curriculum is much larger than whats
found in the population
Province is a stratum
16Link between analysis and the sample design
- There are very few things in a childs life that
is not related to where they live. - In the city versus in a small village
- In a small province versus a large one
- what social/educational programs are offered
- what social support and services are offered
- regional cultural differences
- to name a few
17Weights for cycle 4
- Cross-sectional weights
- Longitudinal weights, including the converted
respondents. - Longitudinal weights, children introduced in C1
and respondent to all cycles. NEW - Not to mention the bootstrap weights, which are
used for an entirely different purpose.
18Cross-sectional Weights
- Available for all cycles, up to Cycle 4.
- When are they used?
- Cycle 4 cross-sectional weights
- to represent the population aged 0-17 in 2000-01.
-
- Cycle 1 weights
- to represent the population aged 0-11 in 1994-95.
19Cross-sectional Weights - Cycle 4 - Warning
- In Cycle 4, children with a cross-sectional
weight come from 4 different cohorts (introduced
in 1994, 1996, 1998 and 2000). - By 2000, the 1994 cohort has been around for 6
years - cross-sectional representativity decreases over
time because of sample erosion and population
change (immigration).
20Cross-sectional Weights - Cycle 5
- For Cycle 5 (2002-2003), no children aged 6 and
7. - In addition, the 1994 cohorts cross-sectional
representativity has declined even further
(erosion and immigration). - As a result, cross-sectional weights will be
calculated only for children aged 0-5.
21Cross-sectional weights in a nutshell
- Cross-sectional weights must be used when the
analysis concerns a specific year, when you want
a snapshot of the situation at a specific point
in time.
22Longitudinal Weights
- Longitudinal weights represent the population of
children at the time they were brought in to the
survey. - Children introduced in Cycle 1 longitudinal
weights represent the population of children aged
0-11 in 1994-95.
23Longitudinal Weights (continued)
- Children introduced in Cycle 2 longitudinal
weights represent the population of children aged
0-1 in 1996-97. - Children introduced in Cycle 3 longitudinal
weights represent the population of children aged
0-1 in 1998-99. - Children introduced in Cycle 4 longitudinal
weights represent the population of children aged
0-1 in 2000-01.
24When are longitudinal weights used?
- When you want to track a cohort of children
introduced in a particular cycle and see how
theyve developed over time.
25Longitudinal Weights - Cycle 4
- Something new in Cycle 4
- 2 sets of longitudinal weights
- Set 1 Weights for children who responded in
their first cycle and in Cycle 4 (possible
non-response in Cycle 2 or 3) - Set 2 Weights for those introduced in cycle 1
who responded in every cycle. NEW.
26Longitudinal Weights - Cycle 4
- Difference between the 2 sets of longitudinal
weights - To avoid total non-response in Cycle 2 or 3, the
set of weights for those who responded throughout
can be used. - If youre only interested in the changes between
Cycle 1 and Cycle 4 directly, the longitudinal
weights including converted respondents can be
used.
27Examples
- Following are real examples taken from the NLSCY
data
28Weighting - Examples
Average weights in Cycle 4.
Prince Edward Island
7 1-year-olds
5-year-old
29Weighting - Examples
Average weights in Cycle 4 (continued)
Ontario
712 15-year-olds
15-year-old
30Example Proportion of children aged 0-17, by
province, Cycle 4, UNWEIGHTED
- 24 of Canadas children live in the Maritime
provinces whereas in reality...
31Example Proportion of children aged 0-17, by
province, Cycle 4, WEIGHTED
- Whereas in reality7.3 of children live in the
Maritime provinces.
32Number of children aged 0-15 by year of age,
Quebec, Cycle 3, unweighted
- The conclusion is obvious
- Huge increase in births in 1993 and 1997!!!!!
33Number of children aged 0-15 by year of age,
Quebec, Cycle 3, WEIGHTED
- So much for the pseudo baby boom...
34Conclusion
- To be obsessed with weights is a good thingwhere
statistical analysis is concerned