Working with the ECLSK Datasets Weights and other issues' - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Working with the ECLSK Datasets Weights and other issues'

Description:

How are Weights Used? Dataset with 5 cases. Value 4 2 1 5 2. Weight 1 2 4 1 2 ... Base Year Characteristic. Examples of Weighted vs. Unweighted Data ... – PowerPoint PPT presentation

Number of Views:469

Avg rating:3.0/5.0

Slides: 59

Provided by: hpcus1154

Category:

more less

Transcript and Presenter's Notes

Title: Working with the ECLSK Datasets Weights and other issues'

1
Working with the ECLS-K Datasets Weights and
other issues.
Information is courtesy of the Institute of
Educational Sciences, National Center for
Education Statistics and is used in their
training seminars.
2
Sampling Weights

What are sampling weights and why are they
important?
How are weights used?
What weights are on the ECLS-K data files and
when should they be used?

3
What is a Weight ?

A weight is used to indicate the relative
strength of an observation.
In the simplest case, each observation is counted
equally.
For example, if we have five observations, and
wish to calculate the mean, we just add up the
values and divide by 5.

4
How are Weights Used?

Dataset with 5 cases.
Value 4 2 1 5 2
Weight 1 2 4 1 2
Sample mean (42152) 2.8
Weighted mean (41) (22) (14) (51)
(22)/sum of weights (4 4 4 5 4)/10
2.1

5
What is the Difference Between Weighted and
Unweighted Data?

With unweighted data, each case is counted
equally.
Unweighted data represent only those in the
sample who provide data.
With weighted data, each case is counted relative
to its representation in the population.
Weights allow analyses that represent the target
population.

6
ECLS-K and Weights

The ECLS-K is a sample, i.e. the entire
population was not surveyed.
The ECLS-K is not a simple random sample (SRS).
That is, not all schools, teachers, and children
had an equal probability of selection.
Not all schools, teachers, and children
participated.

7
Why Use Weights in the ECLS-K?

The ECLS-K weights allow you to make statements
about the population of U.S. children that were
in kindergarten in 1998-99 or in first grade in
1999-2000. Without using weights, estimated are
not nationally representative.
Weights adjust for differential selection
probabilities and reduce bias associated with
non-response by adjusting for differential
nonresponse.

8
Examples of Weighted vs. Unweighted Data
9
Examples of Weighted vs. Unweighted Data
10
Types of Weights on the ECLS-K

Weights vary according to
Level of analysis child, teacher, or school
(only child-level after base year).
Round(s) of data cross-sectional or
longitudinal.
Source(s) of data child assessment, parent
interview, and/or teacher questionnaires.

11
Level of Analysis Base Year
The first element in a weight variable name
indicates the level of analysis

Weights for School-level analyses begin with S.
Weights for Teacher-level analyses begin with
B.
Weights for Child-level analyses begin with C
(cross-sectional).
Weights for Child-level analyses begin with BY
(longitudinal).

12
Level of Analysis 1st, 3rd and 5th Grades

Weights for Child-level analyses (cross sectional
and longitudinal) begin with C.
One exception weight Y2COMW0 is for child-level
analyses of assessment data from rounds 1, 2 and
4 and parent and/or teacher data from spring of
first grade, and one or more base year rounds of
parent and/or teacher data.

13
Data Round(s)
The second element in a weight variable name
indicates the round(s) of data.

Weights for cross-sectional analyses have a
single round number 1,2,3,4,5 or 6.
Weights for longitudinal analyses have 2 or more
numbers, for example
45 for rounds 4 and 5.
124 for rounds 1,2 and 4 (exception in
Y2COMW0).
1_4 for rounds 1,2,3 and 4.
1_6F for rounds 1,2,4,5,6 (Ffull sample).
1_5S for rounds 1,2,3,4,5 (Ssubsample).

14
Source of the Data
The third element in a weight variable name
indicates the source(s) of data.Weights for
analyses using data from

Child assessments (alone or in conjunction with
any combination of a limited set of child
characteristic, e.g. age, sex, race/ethnicity)
have a C.
Parent interview (with or without child data)
have a P.
Child AND parent AND teacher have a CPT.
In 5th grade, the CPT is followed by either
R, M or S for reading, math or science
teacher.

15
Sources of the DataTwo exceptions

BYCOMW0 Child assessment data from fall AND
spring kindergarten in conjunction with one or
more rounds of parent and/or teacher base year
data.
Y2COMW0 Child direct assessment data from fall
AND spring kindergarten AND spring first grade,
in conjunction with parent and/or teacher data
from spring first grade, AND one or more base
year rounds of parent and/or teacher data.

16
Source of the Data
Sources that do not affect choice of weight

School administrator questionnaire
Facilities checklist
Teacher questionnaire C
Special education questionnaires
Student record abstract data
Head Start data
Salary and benefits data

17
ExampleC23PW0

C for child level analysis.
23 for analysis of data from rounds 2 and 3.
P for analysis of parent interview data.

18
ExampleC6CPTM0

C for child level analysis.
6 for analysis of data from round 6.
CPTM for analysis of child, parent, and math
teacher.

19
Cross-sectional Examples

C1PW0 -- Child-level analyses from round 1,
parent interview data (with or without child
assessment data).
B1TW0 -- Teacher level analyses (teacher data)
from round 1.
S2SAQW0 -- School-level analysis (SAQ data) from
round 2.
C6CW0 -- Child assessment data from round 6.
C5CPTW0 -- Child-level analyses from round 5 with
child, parent AND teacher data.

20
Longitudinal Examples
All longitudinal weights are for child-level
analyses.

BYPW0 Round 1 and 2 parent interview data.
BYCOMW0 Round 1 and 2 assessment data and some
other parent and teacher data.
C24PW0 Round 2 and 4 parent interview data.
C245CW0 Round 2, 4 and 5 assessment data.
C1_6FCO Round 1,2,4,5 and 6 assessment data.

21
Third and Fifth-Grade Weights

Unlike the first grade sample, the ECLS-K sample
was not freshened in third and fifth grade.
The ECLS-K sample does not represent all third
graders in 2001-02 or fifth graders in 2003-04.
These samples represent all children who began
kindergarten in 1998 or began first grade in 1999.

22
How to Use Weights

In SAS, use the WEIGHT statement.
In SPSS, use the WEIGHT BY statement.
Key Fact All ECLS-K weights sum to population
totals.

23
Weights in SAS

SAS uses the WEIGHT statement in various
PROCedures.
PROC FREQ data test
Tables Age Gender Score
Weight weightvar
Run

24
Weights in SPSS

LIST VARIABLES age to weightvar.
Frequencies variables age, score /stadefault.
weight by weightvar.
frequencies variables age, score /stadefault.

25
Weights in STATA

clear
use c\temp\test1.dta"
tabulate score age gender pweightweightvar

26
Weights for HLM Users

ECLS-K weights are adjusted for nonresponse.
ECLS-K weights are not normalized (they sum to
the population N rather than the sample n).
A within-school child-level weight can be
approximated by dividing a regular child-level
weight by the school-level weight.
If the analysis includes children that stayed in
the same school at each round of the analysis,
the school weight (S2SAQW0) can be used as a
school-level weight.

27
Other Frequently Asked Questions

When selecting a weight, do I have to subset my
dataset?
What happens to cases where there is no positive
weight?
What weights do I use if analyzing a subsample of
cases?
What if Im running a regression what weights
do I use?

28
Summary about Weights

Weights should be used when analyzing data from
the ECLS-K.
The appropriate weight should be selected based
on Level of analysis, Round(s) of data, and
Source(s) of data.
There may not be a perfect weight for some
analyses. The best weight can be determined with
some descriptive analysis.

29
Variance, Calculating Standard Errors

Why are standard errors important?
Why not use standard errors that assume a simple
random sample (SRS)?
How to use exact methods for estimating
standard errors.
How to use approximation methods for estimating
standard errors.

30
Why are Standard Errors Important?

Standard errors are produced for estimates from
sample surveys. They are a measure of the
variance in the estimates associated with the
selected sample being one of many possible
samples.
Standard errors are used to test hypotheses and
to study group differences when making inferences
to a population.
Using inaccurate standard errors can lead to
identification of statistically significant
results where none are present and vice versa.

31
Important Considerations

All weights on the ECLS-K data files sum to
population totals and not sample totals.
The ECLS-K has a complex sample design and is not
a simple random sample.

32
The ECLS-K Sample DesignOversampling

The ECLS-K includes oversamples of private
schools, and private school children.
The ECLS-K also oversamples Asian and Pacific
Islander children.

33
The ECLS-K Sample DesignClustering

Sample children were clustered within primary
sampling units (PSUs) to reduce field costs.
Children were in closer geographical proximity
than would occur in a simple random sample.
Children in a clustered sample tend to be more
alike than those in a simple random sample.

34
Complex Samples and Standard Errors

The usual standard error formula assumes a simple
random sample.
Standard errors for estimates from a complex
sample must account for the within cluster/across
cluster variation.
Special software can make the adjustment, or this
adjustment can be approximated using the design
effect.

35
Options

Exact Methods such as the TAYLOR series and
REPLICATION techniques.
Approximation Method

36
Exact Methods

Taylor series
Extract PSU and strata Ids from data file.
Software available SUDAAN, STATA (using SVY
commands), and SAS (using PROC SURVEY commands).

37
Exact MethodsReplication Techniques

Extract replication weights (90 of them).
ECLS-K replication weights use jackknife 2 (JK2)
methods.
Software WESVAR replication series (JK2), AM
(JK2), and SAS callable SUDAAN.

38
Approximation Method

Two stages
First, normalize weights so standard error is
based on actual sample size rather than
population size.
Then, use design effect (DEFF) to account for
complex sampling design.

39
1) Normalizing Weights

Weights on the ECLS-K sum to the population
totals.
Calculate a new weight that sums to the sample
size.
Normalized weights (ECLS-K weight) (sample
n/population N).
SAS users do not need this step since estimates
are produced based on the actual sample size.

40
Example Normalizing Weights

Weight to be normalized C2PW0
Sum of weights 3,865,946
Total number of cases with a positive weight
18,950
Normalized weight C2PW0 (18,950 / 3,865,946)

41
2) Adjusting for Complex Design

The ECLS-K has a complex sample design it is not
a simple random sample.
Software packages designed for simple random
samples tend to underestimate the standard errors
for complex sample designs.
Special methods are required for complex designs.

42
Using Design Effects (DEFF)

What is a design effect (DEFF)?
Its the ratio of the variance found in actual
(complex) sample design to the variance expected
in a simple random sample of the same sample size.

43
Using Design Effects (DEFF)

DEFT the square root of DEFF (Design standard
error/ simple random sample error).
Example for fall-kindergarten reading scores
SE (SRS) 0.063
SE (Design) 0.156
DEFF 0.1562/0.0632 6.15
DEFT 0.156/0.063 square root of 6.15 2.48

44
3 Ways of Using the DEFF

Multiply the SRS (simple random sample) standard
error produced by statistical software (when
using normalized weights) by the square root of
the DEFF (DEFT).
Or
Adjust the t-statistic by dividing it by the
square root of the design effect (DEFT) or adjust
the F-statistic by dividing it by the DEFF.
Or
Adjust the weight such that an adjusted standard
error is produced.

45
Using a DEFF- Adjusted Weight

First step, create a weight that sums to the
sample size (normalized weight.
Second, divide this normalized weight by the
DEFF.
Third, use this weight for analyses. The
standard errors produced will approximate the
standard errors obtained using exact methods.

46
Where to find ECLS-K DEFFs

Training material ECLS-K Specifications for
Computing Standard Errors
ECLS-K users manuals
Base Year (Kindergarten) Table 4.12
First Grade Tables 4.13 and 9.4
Third Grade Tables 4.14 and 9.2
Fifth Grade Tables 4.19 and 9.2

47
For SAS Users

SAS base procedures such as PROC REG, PROC FREQ,
and PROC MEANS do account for the actual sample
size but not for complex sampling.
SAS procedures such as PROC SURVEYMEAN and PROC
SURVEYREG (and other procedures that begin with
Survey) use the Taylor series method to account
for complex sampling and provide exact estimates
of the standard errors.

48
PROC SURVEYREG Example

Example using ECLS-K data, spring kindergarten
and spring first grade variables.
proc surveyreg data fscores
model c4r3mscl c2r3mscl lowkread t4learn
cluster c24cstr
strata c24cpsu
weight c24cw0
where lowkmath 0
run

49
PROC SURVEYLOGISTIC Example

Example using ECLS-K data, spring kindergarten
and spring first grade variables.
proc surveylogistic data fscores
model lowkread (desc) c2r3mscl t4learn
cluster c24cstr
strata c24cpsu
weight c24cw0
where lowkmath 0
run

50
PROC SURVEYFREQ Example

Example using ECLS-K data, spring kindergarten
and spring first grade variables.
proc surveyfreq data fscores
tables lowkread c2r3mscl t4learn
cluster c24cstr
strata c24cpsu
weight c24cw0
run

51
STATA Code for Complex Design

Logistic Regression Example, 3rd Grade Data
Svyset pweightC5CW0, strata (C5TCWSTR) psu
(C5CWPSU)
Svy, subpop (male) logit highbmi white

52
STATA Code for Complex Design

Regression Example, 3rd Grade Data
Svyset pweightC5CW0, strata (C5TCWSTR) psu
(C5CWPSU)
Svy, subpop (male) reg highbmi white

53
STATA Code for Complex Design

Means Example, 3rd Grade Data
Svyset pweightC5CW0, strata (C5TCWSTR) psu
(C5CWPSU)
Svy, subpop (male) mean highbmi female

54
SPSS for Complex Sample Design

Use add-on to SPSS called, SPSS Complex Samples
Complex Samples Logistic Regression
(CSLOGISTIC)Performs binary logistic regression
analysis, as well as multiple logistic regression
(MLR) analysis, for samples drawn by complex
sampling methods. The procedure estimates
variances by taking into account the sample
design used to select the sample, including equal
probability and PPS methods, and WR and WOR
sampling procedures. Optionally, CSLOGISTIC
performs analyses for subpopulations.
Courtesy of SPSS

55
Regression Analysis

Use appropriate software such as AM, WESVAR,
SUDAAN or SAS (SURVEYREG procedure).
For SAS (PROC REG procedure), use DEFF-adjusted
weights.
For SPSS, use normalized, DEFF-adjusted weights.

56
Summary
All statistical tests should be based on standard
errors that are calculated to account for the
complex sample design of the ECLS-K.

Preferred Use software that incorporates JK2
replication methods, or
Use software that incorporates Taylor series
method, or
Last resort Make approximate adjustments based
on design effects.

57
ECLS-K Data Availability

Base Year (Kindergarten) through 5th Grade
restricted use and Public Use datasets have been
released.
8th Grade restricted use dataset should be
released in the winter of 2008 and the public
datasets should be released in March 2009.

58
Differences in Restricted Use and Public Use
ECLS-K Datasets.

Heres a short explanation from the NCES
http//nces.ed.gov/ecls/kinderfaq.asp?faq1
Chapter 7 in the ECLS-K, 5th Grade Users Guide
has Tables 7-15 and 7-16 that describe the
differences in the public and restricted
datasets. The Users Guide can be found online
at http//sodapop.pop.psu.edu/codebooks/ecls/k5u
serpart2.pdf

Write a Comment

User Comments (0)