Using Weighted Data - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Using Weighted Data

Description:

A population weight (pweight) is a variable which ... the ACS uses PUMA codes (basically county-level data). Select only the PUMA codes of the area ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 13
Provided by: davidjo6
Category:
Tags: data | ouma | using | weighted

less

Transcript and Presenter's Notes

Title: Using Weighted Data


1
Using Weighted Data
  • Donald Miller
  • Population Research Institute
  • 812 Oswald Tower, 3-3155
  • miller_at_pop.psu.edu
  • December 2008

2
Review of David Johnsons Presentation
  • A population weight (pweight) is a variable which
    indicates how many people (in the population of
    interest) an observation will count in a
    statistical procedure. This is different from a
    frequency weight (fweight), which indicates a row
    of a dataset actually represents more than one
    observation.
  • Weights can be used to correct for design (over-
    and under- sampling), and for non-response bias.
  • Most software packages treat pweights properly (a
    notable exception is SPSS outside of complex
    survey package).
  • To create a pweight, use either a raking-type
    algorithm, or a logistic regression.

3
How to use Population Weights
  • SAS
  • Use the weight statement in procedures this
    is a population weight
  • proc logistic datamydata descending
  • model finishedage cs_educ sex
    race_white a1b a2b a3b a4b a5b a6b
  • weight pwgt_variable
  • run
  • Stata
  • Use the pweight option (you can use pw)
  • regress y x1 x2 x3 pweightpwgt_variable

4
Raking 1 Select Census Data
  • Choose a census dataset (CPS, ACS, etc.), and
    which variables you will use in your raking
    model. These are usually demographics variables
    (age, race, education, gender).
  • You will need to recode your survey variables
    and/or the census variables so the response
    categories match. This might require grouping
    some values together.
  • Match the year of the survey with the census
    data. If you have 2006 survey data, use the 2006
    census data.
  • Match the physical area as closely as possible.
    For example, the ACS uses PUMA codes (basically
    county-level data). Select only the PUMA codes
    of the area of interest.
  • You should probably do some simple descriptives /
    frequencies to compare survey to census.
    Remember the ACS already has a weight (PWGTP).

5
Raking 2 Frequencies (Census data)
  • Construct 1-way frequency counts for every
    variable in the raking model. You need a dataset
    for each variable, with mrgtotal being the
    counts. SAS code example (do this for
    gender, race, etc.)
  • proc freq dataacs.acs_myarea_recoded
  • table cs_educ /list missing outcs_educ
  • weight PWGTP
  • run
  • data cs_educ
  • set cs_educ
  • rename COUNTmrgtotal
  • run

6
Raking 3 Raking Macro (SAS)
  • Izrael, etc. has provided a SAS Macro (RAKINGE)
    to do the main raking procudure. This is
    introduced in Paper 258-25, from SAS SUGI 25.
    This is available online from SAS at
  • http//www2.sas.com/proceedings/sugi25/25/st/2
    5p258.pdf
  • Various improvements were made to macro and
    introduced in Paper 207-29, from SAS SUGI 29.
    This is available online from SAS at
  • http//www2.sas.com/proceedings/sugi29/207-29.
    pdf
  • I uploaded the (corrected version of the) RAKINGE
    macro here
  • http//help.pop.psu.edu/help-by-statistical-m
    ethod/weighting

7
Raking 4 Raking Macro (SAS)
  • You will need to save this macro, edit it
    slightly, and run it. The vast majority of the
    code you will never touch. Towards the top of
    the program you will need to change these lines
  • macro rakinge (indsINPUTDATASETNAME,
  • outdsOUTPUTDATASETNAME,
  • ...
  • outwtNEW_PWEIGHT_VARIABLE_NAME,
  • ...
  • varlistLIST OF VARIABLES IN RAKING MODEL,
  • numvar4,

8
Normalized Weight
  • If the raking macro does not converge, look at
    the frequencies (for census and survey) again.
    You may need to collapse some categories, or
    change the convergence criterion in the raking
    macro (you can control this with the TRMPCT and
    NUMITER options in the macro).
  • You may wish to normalize the weight, so the
    sum of the weights for the dataset equal to a
    predetermined number N (either sample size or the
    areas total population). To do this, calculate
  • SW the sum of the weights
  • then multiply each weight value by N/SW.

9
Non-Response Bias 1
  • The probability to survey completion may differ
    with people of different characteristics
    (demographics, chronic conditions, etc.). To
    address this non-response bias, estimate a
    logistic regression model such as the following
  • FINISH ß0 ß1AGE ß2EDUCATION
  • ß3FEMALE ß4WHITE ß5OTHER e
  • Where FINISH is 1 if they finished the survey (0
    otherwise). The next four values are from the
    raking model. The next value (OTHER, there can
    be more than one of these) are other variables
    which might explain non-response bias.

10
Non-Response Bias 2
  • For each respondent, the non-response bias weight
    is the reciprocal of the predicted probability of
    survey completion. It is treated as a weight,
    and should be multiplied with the raking weight
    to create a total weight.
  • Sample SAS code (continued next slide)
  • proc logistic dataarea_weighted descending
  • class finish
  • model finishage cs_educ sex race_white a1b
  • output outlogitresults pp
  • weight pwgt
  • run / check output to see if significant /

11
Non-Response Bias 3
  • (SAS code continued)
  • proc sort datalogitresults
  • by ID
  • run
  • data area_weighted2 / merge in pred. prob.
    /
  • merge area_weighted logitresults
  • by ID
  • run
  • data area_weighted2 / calculate non-resp
    wgt /
  • set area_weighted2
  • nonresp_wgt1/p
  • total_wgtpwgtnonresp_wgt
  • run

12
Stata / R
  • I personally havent tried either of these yet,
    but raking packages exist for Stata and R
  • Stata
  • survwgt - you can get and install this using
    findit
  • survwgt rake pw , by(varlist_raking_model)
  • totvars(varlist_totals) generate(pwgt)
    replace
  • R
  • Rake package
  • sraked_data lt- simpleRake(unraked_data,
    pop_totals, rake_var1, rakevar2, ...,
  • TRUE)
Write a Comment
User Comments (0)
About PowerShow.com