Working with the ECLSK Datasets Weights and other issues' - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Working with the ECLSK Datasets Weights and other issues'

Description:

How are Weights Used? Dataset with 5 cases. Value 4 2 1 5 2. Weight 1 2 4 1 2 ... Base Year Characteristic. Examples of Weighted vs. Unweighted Data ... – PowerPoint PPT presentation

Number of Views:469
Avg rating:3.0/5.0
Slides: 59
Provided by: hpcus1154
Category:

less

Transcript and Presenter's Notes

Title: Working with the ECLSK Datasets Weights and other issues'


1
Working with the ECLS-K Datasets Weights and
other issues.
Information is courtesy of the Institute of
Educational Sciences, National Center for
Education Statistics and is used in their
training seminars.
2
Sampling Weights
  • What are sampling weights and why are they
    important?
  • How are weights used?
  • What weights are on the ECLS-K data files and
    when should they be used?

3
What is a Weight ?
  • A weight is used to indicate the relative
    strength of an observation.
  • In the simplest case, each observation is counted
    equally.
  • For example, if we have five observations, and
    wish to calculate the mean, we just add up the
    values and divide by 5.

4
How are Weights Used?
  • Dataset with 5 cases.
  • Value 4 2 1 5 2
  • Weight 1 2 4 1 2
  • Sample mean (42152) 2.8
  • Weighted mean (41) (22) (14) (51)
    (22)/sum of weights (4 4 4 5 4)/10
    2.1

5
What is the Difference Between Weighted and
Unweighted Data?
  • With unweighted data, each case is counted
    equally.
  • Unweighted data represent only those in the
    sample who provide data.
  • With weighted data, each case is counted relative
    to its representation in the population.
  • Weights allow analyses that represent the target
    population.

6
ECLS-K and Weights
  • The ECLS-K is a sample, i.e. the entire
    population was not surveyed.
  • The ECLS-K is not a simple random sample (SRS).
    That is, not all schools, teachers, and children
    had an equal probability of selection.
  • Not all schools, teachers, and children
    participated.

7
Why Use Weights in the ECLS-K?
  • The ECLS-K weights allow you to make statements
    about the population of U.S. children that were
    in kindergarten in 1998-99 or in first grade in
    1999-2000. Without using weights, estimated are
    not nationally representative.
  • Weights adjust for differential selection
    probabilities and reduce bias associated with
    non-response by adjusting for differential
    nonresponse.

8
Examples of Weighted vs. Unweighted Data
9
Examples of Weighted vs. Unweighted Data
10
Types of Weights on the ECLS-K
  • Weights vary according to
  • Level of analysis child, teacher, or school
    (only child-level after base year).
  • Round(s) of data cross-sectional or
    longitudinal.
  • Source(s) of data child assessment, parent
    interview, and/or teacher questionnaires.

11
Level of Analysis Base Year
The first element in a weight variable name
indicates the level of analysis
  • Weights for School-level analyses begin with S.
  • Weights for Teacher-level analyses begin with
    B.
  • Weights for Child-level analyses begin with C
    (cross-sectional).
  • Weights for Child-level analyses begin with BY
    (longitudinal).

12
Level of Analysis 1st, 3rd and 5th Grades
  • Weights for Child-level analyses (cross sectional
    and longitudinal) begin with C.
  • One exception weight Y2COMW0 is for child-level
    analyses of assessment data from rounds 1, 2 and
    4 and parent and/or teacher data from spring of
    first grade, and one or more base year rounds of
    parent and/or teacher data.

13
Data Round(s)
The second element in a weight variable name
indicates the round(s) of data.
  • Weights for cross-sectional analyses have a
    single round number 1,2,3,4,5 or 6.
  • Weights for longitudinal analyses have 2 or more
    numbers, for example
  • 45 for rounds 4 and 5.
  • 124 for rounds 1,2 and 4 (exception in
    Y2COMW0).
  • 1_4 for rounds 1,2,3 and 4.
  • 1_6F for rounds 1,2,4,5,6 (Ffull sample).
  • 1_5S for rounds 1,2,3,4,5 (Ssubsample).

14
Source of the Data
The third element in a weight variable name
indicates the source(s) of data.Weights for
analyses using data from
  • Child assessments (alone or in conjunction with
    any combination of a limited set of child
    characteristic, e.g. age, sex, race/ethnicity)
    have a C.
  • Parent interview (with or without child data)
    have a P.
  • Child AND parent AND teacher have a CPT.
  • In 5th grade, the CPT is followed by either
    R, M or S for reading, math or science
    teacher.

15
Sources of the DataTwo exceptions
  • BYCOMW0 Child assessment data from fall AND
    spring kindergarten in conjunction with one or
    more rounds of parent and/or teacher base year
    data.
  • Y2COMW0 Child direct assessment data from fall
    AND spring kindergarten AND spring first grade,
    in conjunction with parent and/or teacher data
    from spring first grade, AND one or more base
    year rounds of parent and/or teacher data.

16
Source of the Data
Sources that do not affect choice of weight
  • School administrator questionnaire
  • Facilities checklist
  • Teacher questionnaire C
  • Special education questionnaires
  • Student record abstract data
  • Head Start data
  • Salary and benefits data

17
ExampleC23PW0
  • C for child level analysis.
  • 23 for analysis of data from rounds 2 and 3.
  • P for analysis of parent interview data.

18
ExampleC6CPTM0
  • C for child level analysis.
  • 6 for analysis of data from round 6.
  • CPTM for analysis of child, parent, and math
    teacher.

19
Cross-sectional Examples
  • C1PW0 -- Child-level analyses from round 1,
    parent interview data (with or without child
    assessment data).
  • B1TW0 -- Teacher level analyses (teacher data)
    from round 1.
  • S2SAQW0 -- School-level analysis (SAQ data) from
    round 2.
  • C6CW0 -- Child assessment data from round 6.
  • C5CPTW0 -- Child-level analyses from round 5 with
    child, parent AND teacher data.

20
Longitudinal Examples
All longitudinal weights are for child-level
analyses.
  • BYPW0 Round 1 and 2 parent interview data.
  • BYCOMW0 Round 1 and 2 assessment data and some
    other parent and teacher data.
  • C24PW0 Round 2 and 4 parent interview data.
  • C245CW0 Round 2, 4 and 5 assessment data.
  • C1_6FCO Round 1,2,4,5 and 6 assessment data.

21
Third and Fifth-Grade Weights
  • Unlike the first grade sample, the ECLS-K sample
    was not freshened in third and fifth grade.
  • The ECLS-K sample does not represent all third
    graders in 2001-02 or fifth graders in 2003-04.
    These samples represent all children who began
    kindergarten in 1998 or began first grade in 1999.

22
How to Use Weights
  • In SAS, use the WEIGHT statement.
  • In SPSS, use the WEIGHT BY statement.
  • Key Fact All ECLS-K weights sum to population
    totals.

23
Weights in SAS
  • SAS uses the WEIGHT statement in various
    PROCedures.
  • PROC FREQ data test
  • Tables Age Gender Score
  • Weight weightvar
  • Run

24
Weights in SPSS
  • LIST VARIABLES age to weightvar.
  • Frequencies variables age, score /stadefault.
  • weight by weightvar.
  • frequencies variables age, score /stadefault.

25
Weights in STATA
  • clear
  • use c\temp\test1.dta"
  • tabulate score age gender pweightweightvar

26
Weights for HLM Users
  • ECLS-K weights are adjusted for nonresponse.
  • ECLS-K weights are not normalized (they sum to
    the population N rather than the sample n).
  • A within-school child-level weight can be
    approximated by dividing a regular child-level
    weight by the school-level weight.
  • If the analysis includes children that stayed in
    the same school at each round of the analysis,
    the school weight (S2SAQW0) can be used as a
    school-level weight.

27
Other Frequently Asked Questions
  • When selecting a weight, do I have to subset my
    dataset?
  • What happens to cases where there is no positive
    weight?
  • What weights do I use if analyzing a subsample of
    cases?
  • What if Im running a regression what weights
    do I use?

28
Summary about Weights
  • Weights should be used when analyzing data from
    the ECLS-K.
  • The appropriate weight should be selected based
    on Level of analysis, Round(s) of data, and
    Source(s) of data.
  • There may not be a perfect weight for some
    analyses. The best weight can be determined with
    some descriptive analysis.

29
Variance, Calculating Standard Errors
  • Why are standard errors important?
  • Why not use standard errors that assume a simple
    random sample (SRS)?
  • How to use exact methods for estimating
    standard errors.
  • How to use approximation methods for estimating
    standard errors.

30
Why are Standard Errors Important?
  • Standard errors are produced for estimates from
    sample surveys. They are a measure of the
    variance in the estimates associated with the
    selected sample being one of many possible
    samples.
  • Standard errors are used to test hypotheses and
    to study group differences when making inferences
    to a population.
  • Using inaccurate standard errors can lead to
    identification of statistically significant
    results where none are present and vice versa.

31
Important Considerations
  • All weights on the ECLS-K data files sum to
    population totals and not sample totals.
  • The ECLS-K has a complex sample design and is not
    a simple random sample.

32
The ECLS-K Sample DesignOversampling
  • The ECLS-K includes oversamples of private
    schools, and private school children.
  • The ECLS-K also oversamples Asian and Pacific
    Islander children.

33
The ECLS-K Sample DesignClustering
  • Sample children were clustered within primary
    sampling units (PSUs) to reduce field costs.
  • Children were in closer geographical proximity
    than would occur in a simple random sample.
  • Children in a clustered sample tend to be more
    alike than those in a simple random sample.

34
Complex Samples and Standard Errors
  • The usual standard error formula assumes a simple
    random sample.
  • Standard errors for estimates from a complex
    sample must account for the within cluster/across
    cluster variation.
  • Special software can make the adjustment, or this
    adjustment can be approximated using the design
    effect.

35
Options
  • Exact Methods such as the TAYLOR series and
    REPLICATION techniques.
  • Approximation Method

36
Exact Methods
  • Taylor series
  • Extract PSU and strata Ids from data file.
  • Software available SUDAAN, STATA (using SVY
    commands), and SAS (using PROC SURVEY commands).

37
Exact MethodsReplication Techniques
  • Extract replication weights (90 of them).
  • ECLS-K replication weights use jackknife 2 (JK2)
    methods.
  • Software WESVAR replication series (JK2), AM
    (JK2), and SAS callable SUDAAN.

38
Approximation Method
  • Two stages
  • First, normalize weights so standard error is
    based on actual sample size rather than
    population size.
  • Then, use design effect (DEFF) to account for
    complex sampling design.

39
1) Normalizing Weights
  • Weights on the ECLS-K sum to the population
    totals.
  • Calculate a new weight that sums to the sample
    size.
  • Normalized weights (ECLS-K weight) (sample
    n/population N).
  • SAS users do not need this step since estimates
    are produced based on the actual sample size.

40
Example Normalizing Weights
  • Weight to be normalized C2PW0
  • Sum of weights 3,865,946
  • Total number of cases with a positive weight
    18,950
  • Normalized weight C2PW0 (18,950 / 3,865,946)

41
2) Adjusting for Complex Design
  • The ECLS-K has a complex sample design it is not
    a simple random sample.
  • Software packages designed for simple random
    samples tend to underestimate the standard errors
    for complex sample designs.
  • Special methods are required for complex designs.

42
Using Design Effects (DEFF)
  • What is a design effect (DEFF)?
  • Its the ratio of the variance found in actual
    (complex) sample design to the variance expected
    in a simple random sample of the same sample size.

43
Using Design Effects (DEFF)
  • DEFT the square root of DEFF (Design standard
    error/ simple random sample error).
  • Example for fall-kindergarten reading scores
  • SE (SRS) 0.063
  • SE (Design) 0.156
  • DEFF 0.1562/0.0632 6.15
  • DEFT 0.156/0.063 square root of 6.15 2.48

44
3 Ways of Using the DEFF
  • Multiply the SRS (simple random sample) standard
    error produced by statistical software (when
    using normalized weights) by the square root of
    the DEFF (DEFT).
  • Or
  • Adjust the t-statistic by dividing it by the
    square root of the design effect (DEFT) or adjust
    the F-statistic by dividing it by the DEFF.
  • Or
  • Adjust the weight such that an adjusted standard
    error is produced.

45
Using a DEFF- Adjusted Weight
  • First step, create a weight that sums to the
    sample size (normalized weight.
  • Second, divide this normalized weight by the
    DEFF.
  • Third, use this weight for analyses. The
    standard errors produced will approximate the
    standard errors obtained using exact methods.

46
Where to find ECLS-K DEFFs
  • Training material ECLS-K Specifications for
    Computing Standard Errors
  • ECLS-K users manuals
  • Base Year (Kindergarten) Table 4.12
  • First Grade Tables 4.13 and 9.4
  • Third Grade Tables 4.14 and 9.2
  • Fifth Grade Tables 4.19 and 9.2

47
For SAS Users
  • SAS base procedures such as PROC REG, PROC FREQ,
    and PROC MEANS do account for the actual sample
    size but not for complex sampling.
  • SAS procedures such as PROC SURVEYMEAN and PROC
    SURVEYREG (and other procedures that begin with
    Survey) use the Taylor series method to account
    for complex sampling and provide exact estimates
    of the standard errors.

48
PROC SURVEYREG Example
  • Example using ECLS-K data, spring kindergarten
    and spring first grade variables.
  • proc surveyreg data fscores
  • model c4r3mscl c2r3mscl lowkread t4learn
  • cluster c24cstr
  • strata c24cpsu
  • weight c24cw0
  • where lowkmath 0
  • run

49
PROC SURVEYLOGISTIC Example
  • Example using ECLS-K data, spring kindergarten
    and spring first grade variables.
  • proc surveylogistic data fscores
  • model lowkread (desc) c2r3mscl t4learn
  • cluster c24cstr
  • strata c24cpsu
  • weight c24cw0
  • where lowkmath 0
  • run

50
PROC SURVEYFREQ Example
  • Example using ECLS-K data, spring kindergarten
    and spring first grade variables.
  • proc surveyfreq data fscores
  • tables lowkread c2r3mscl t4learn
  • cluster c24cstr
  • strata c24cpsu
  • weight c24cw0
  • run

51
STATA Code for Complex Design
  • Logistic Regression Example, 3rd Grade Data
  • Svyset pweightC5CW0, strata (C5TCWSTR) psu
    (C5CWPSU)
  • Svy, subpop (male) logit highbmi white

52
STATA Code for Complex Design
  • Regression Example, 3rd Grade Data
  • Svyset pweightC5CW0, strata (C5TCWSTR) psu
    (C5CWPSU)
  • Svy, subpop (male) reg highbmi white

53
STATA Code for Complex Design
  • Means Example, 3rd Grade Data
  • Svyset pweightC5CW0, strata (C5TCWSTR) psu
    (C5CWPSU)
  • Svy, subpop (male) mean highbmi female

54
SPSS for Complex Sample Design
  • Use add-on to SPSS called, SPSS Complex Samples
  • Complex Samples Logistic Regression
    (CSLOGISTIC)Performs binary logistic regression
    analysis, as well as multiple logistic regression
    (MLR) analysis, for samples drawn by complex
    sampling methods. The procedure estimates
    variances by taking into account the sample
    design used to select the sample, including equal
    probability and PPS methods, and WR and WOR
    sampling procedures. Optionally, CSLOGISTIC
    performs analyses for subpopulations.
  • Courtesy of SPSS

55
Regression Analysis
  • Use appropriate software such as AM, WESVAR,
    SUDAAN or SAS (SURVEYREG procedure).
  • For SAS (PROC REG procedure), use DEFF-adjusted
    weights.
  • For SPSS, use normalized, DEFF-adjusted weights.

56
Summary
All statistical tests should be based on standard
errors that are calculated to account for the
complex sample design of the ECLS-K.
  • Preferred Use software that incorporates JK2
    replication methods, or
  • Use software that incorporates Taylor series
    method, or
  • Last resort Make approximate adjustments based
    on design effects.

57
ECLS-K Data Availability
  • Base Year (Kindergarten) through 5th Grade
    restricted use and Public Use datasets have been
    released.
  • 8th Grade restricted use dataset should be
    released in the winter of 2008 and the public
    datasets should be released in March 2009.

58
Differences in Restricted Use and Public Use
ECLS-K Datasets.
  • Heres a short explanation from the NCES
    http//nces.ed.gov/ecls/kinderfaq.asp?faq1
  • Chapter 7 in the ECLS-K, 5th Grade Users Guide
    has Tables 7-15 and 7-16 that describe the
    differences in the public and restricted
    datasets. The Users Guide can be found online
    at http//sodapop.pop.psu.edu/codebooks/ecls/k5u
    serpart2.pdf
Write a Comment
User Comments (0)
About PowerShow.com