NHANES 1999-2004 - PowerPoint PPT Presentation

About This Presentation
Title:

NHANES 1999-2004

Description:

... especially those with large weights can really influence your ... Weights were not designed for combining subsamples and may not produce good estimates. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 51
Provided by: aphaC
Category:
Tags: nhanes | weights

less

Transcript and Presenter's Notes

Title: NHANES 1999-2004


1
NHANES 1999-2004 Analytic Strategies
Deanna Kruszon-Moran, MS
2
Analyzing Data NHANES 1999-2004Preparing your
data files
  • Downloading demographic, questionnaire, exam and
    lab files.
  • Files are no longer available as self-extracting
    zip files.
  • Documentation and procedure files are now in
    Adobe PDF format and can be viewed or accessed
    directly via the web link
  • Clicking on the data link will allow you to store
    the data file or open it directly with SAS.
  • Data files are in SAS transport (.xpt) format.

3
Know your data
  • Read the documentation !!
  • Read the documentation !!
  • Read the documentation !!
  • Read the documentation!!

4
Preparing your data files
  • Merging
  • Merge all files by sequence number to the
    demographic file.
  • Verify the numbers of records merged and the
    final sample number against the published
    frequencies on the web.
  • Be sure they are what you expected and all merges
    worked correctly.

5
Know your data
  • Run basic frequencies and cross tabulations.
  • Know your target population.
  • Understand how item was measured
  • (how is the item defined, topcoded, recoded)
  • Recode variables as necessary
  • (example age groups, positive/negative lab
    tests, high/low BP, high/low cholesterol etc.).
  • Recode unknown/refusals as missing data
  • (77, 99 recode to missing).
  • Check your coding run frequencies in SAS.

6
Know your data
  • Continuous Outcome Data
  • Look for outliers in your measure.
  • Run Proc Univariate.
  • Look for outliers among the weights.
  • Use Proc Univariate on the weight variable.
  • Outlying variables especially those with large
    weights can really influence your estimates.
  • Look at normality.
  • Consider transformations.
  • Log, square root, power.

7
NHANES Sample Design
  • NHANES is a complex, multistage,
  • probability cluster design of the civilian,
  • noninstitutionalized US population.

8
Sample Weights
  • To analyze NHANES data you must use the sample
    weights to account for

9
1. The base probability of selection
10
2. Over sampling
  • NHANE 1999-2004 - Oversampled
  • African Americans
  • Mexican Americans
  • Persons with low income
  • Adolescents aged 12-19
  • Persons aged 60

11
Non-response to the interview exam Sample
persons age 20
12
Non-response issues for NHANES
  • Non-response
  • Most components have some level of individual
    item or component non-response.
  • ONLY non-response to the interview and exam has
    already been accounted for in the weights.
  • All additional non-response to the outcome
    measure of interest should be examined against
    all possible predictors.
  • Potential biases should be discussed.
  • If non-response is high, re-weighting should be
    considered.

13
Why weight?
Sample Subdomain US Population sample unweighted sample weighted
Non-Hispanic Blacks 13 25 12
Mexican Americans 9 28 9
12-19 year olds 12 24 12
14
Sample weights Which weights?
Weight Variables to Use Household Interview Data ONLY ANY Data from Exam/Lab/MEC Interview
Any 2 yrs of data (1999-2000 or 2001-2002 or 2003-2004) WTINT2YR WTMEC2YR
4 yrs of data (1999-2002) WTINT4YR WTMEC4YR
4 or 6 yrs of data (1999-2004) or (2001-2004) Combine appropriate 2 or 4 year weights as follows Combine appropriate 2 or 4 year weights as follows
15
Two, Four, Six, Eight - How can we estimate?
  • For 4 years of data from 2001-2004 -
  • MEC4YR 1/2 WTMEC2YR
  • For 6 years of data from 1999-2004
  • if sddsrvyr1 or sddsrvyr2 then
  • MEC6YR 2/3 WTMEC4YR / for 1999-2002 /
  • If sddsrvyr3 then
  • MEC6YR 1/3 WTMEC2YR / for 2003-2004 /
  • Only when analyzing years 1999-2002, you should
    not combined 2 year weights but use the 4 year
    weights provided.

16
Two, Four, Six, Eight - How can we estimate?
  • Future years of data will be combined similarly
  • For 6 years of data from 2001-2006 -
  • if sddsrvyr in (2,3,4) then
  • MEC6YR 1/3 WTMEC2YR
  • For 8 years of data from 1999-2006
  • if sddsrvyr1 or sddsrvyr2 then
  • MEC8YR 1/2 WTMEC4YR / for 1999-2002 /
  • if sddsrvyr3 or sddsrvyr4 then
  • MEC8YR 1/4 WTMEC2YR etc / for 2003-2006 /

17
Sample Weights - Subsamples
  • Subsamples and appropriate weights
  • Look at your primary variable of interest and the
    corresponding weight.
  • Look at all other variables you want to combine
    with it.
  • Are all from the interview? Exam? Subsample (i.e.
    fasting, audiometry, dioxin, VOCs ) ?
  • Use the weight from the smallest subsample for
    your analysis.
  • Be consistent!

18
Sample Weights - Subsamples
  • Subsamples and appropriate weights
  • Be careful about combining subsamples beyond MEC
    VOCs, Interview Dioxin etc.
  • Combining subsamples such as Environmental AM
    fasting could be problematic.
  • Some subsamples are mutually exclusive.
  • Weights were not designed for combining
    subsamples and may not produce good estimates.

19
Preparing for Analyses
  • Subsetting the data for SUDAAN
  • If using MEC exam weights - SUBSET the data on
    those MEC EXAMINED in SAS before using SUDAAN.
  • If using other subsample weights subset the
    data on those in the subsample corresponding to
    the weights you are using.
  • Then use the SUBPOPN statement in the SUDAAN
    procedure to further subset your data by age,
    gender etc. to reflect the target population you
    are interested in analyzing.

20
Sample Weights
  • Example
  • You are interested in examining the association
    of high triglycerides, blood pressure, and body
    mass index (BMI) controlling for race/ethnicity
    on females age 20-59 from the 6 years of data
    from 1999-2004.

21
Sample Weights
  • Step 1 Determine the smallest sample
    population for the analysis to determine the
    correct weight to use.
  • Race/ethnicity, gender and age are in the
    interview.
  • Blood pressure and weight come from the MEC exam
    a subset of those interviewed.
  • Triglycerides were measured on a subsample of
    those MEC examined who fasted for 8 hours and
    came to the AM MEC exam.
  • Therefore, the fasting subsample is the smallest
    subsample in the analysis and you would use the
    AM fasting weights (WTSAF2YR and WTSAF4YR).

22
Sample Weights
  • Step 2 Combine weights in SAS prior to the
    SUDAAN procedure for the 6 years from 1999-2004
  • If sddsrvyr in (1,2) then
  • WEIGHT6 2/3WTSAF4YR / 1999-2002 /
  • If sddsrvyr3 then
  • WEIGHT6 1/3WTSAF2YR / 2003-2004/

23
Sample Weights
  • Step 3 Subset your data set in SAS to reflect
    the weight being used (AM fasting weights
    WTSAF2YR or WTSAF4YR)
  • SAS Code
  • IF WTSAF2YR ne . or WTSAF4YR ne .

24
Sample Weights
  • Step4 Last specify the correct weight to use
    using the weight statement in SUDAAN
  • and subset your data to obtain the subpopulation
    of interest using the SUBPOPN statement in SUDAAN
    (females age 20-59)
  • WEIGHT WEIGHT6
  • SUBPOPN riagendr2 and ridageyr gt 19 and
    ridageyr lt 60

25
NHANES 1999-2000Variance Estimation
  • Why must you use the sample design to estimate
    the variance?
  • NHANES is a cluster design
  • Individual within a cluster are more similar than
    those in other clusters.
  • This homogeneity or clustering results in a
    reduction of our effective sample size because we
    choose individuals within cluster vs randomly
    throughout the population.

26
NHANES 1999-2004Variance Estimation
  • Why must you use the sample design to estimate
    the variance?
  • Variance estimates that do not account for this
    intra cluster correlation are too low and biased.
  • Survey software such as SUDAAN or SAS Survey
    procedures must be used to account for the
    complex design and produce unbiased variance
    estimates
  • These procedures require information on the
    sample design (i.e. identification of the PSU and
    strata) for each sample person.

27
NHANES 1999-2000Variance Estimation
  • For the initial 1999-2000 data release we
    recommended
  • Using JK-1/Jackknife/leave-one-out procedure.
  • Required 52 replicate weights for each of 52
    groups created. Only provided for 1999-2000.
  • Can still be used if you have software that can
    produce the replicate weights.
  • Replicate weights for this procedure will no
    longer be created on the data set.
  • Too cumbersome

28
NHANES 1999-2004Variance Estimation
  • We now recommend
  • Using the Taylor series (linearization) method
  • Same as that used in NHANES III.
  • We now provide Masked Variance Units (MVUs) in
    place of primary sampling units (PSUs) to
    maintain confidentiality.
  • Design variables are called - SDMVSTRA and
    SDMVPSU.

29
Design Variables
  • SDMVSTRA and SDMVPSU
  • Found in the demographic file.
  • Found in all two year data sets and can be
    combined for 4 or 6 or year data sets.
  • Can be used the same as the actual stratum and
    PSU variables.
  • Produce variance estimates close to those using
    the true design.
  • Data MUST be sorted by SDMVSTRA and SDMVPSU
    first, before using SUDAAN.

30
Sample SUDAAN Code
31
Preparing for AnalysisSetting up the procedure
in SAS Surveymeans
32
Other data analysis issues from NHANES
  • Calculating Population Totals
  • Estimates of the number of persons in the U.S.
    population with a particular condition must be
    done carefully.
  • Recommended procedure is to
  • First, estimate the proportion with the condition
    for each subdomain of interest.
  • Mutliply that by the population control totals
    for that subdomain.
  • Tables are available on the NCHS web site with
    the current March 2001 CPS control totals as part
    of the analytic guidelines.

33
Other data analysis issues from NHANES
  • Calculating Population Totals
  • Estimates of number of persons with a condition
    can be obtained by summing the weights of those
    positive.
  • These estimates will be less reliable due to
  • item non response
  • and sampling error
  • Not the recommended method.

34
Analyzing within NHANES 1999-2004
  • Things to consider
  • Data released in two year cycles.
  • We STRONGLY RECOMMEND using two or more cycles (4
    or more years )to produce reliable estimates.
  • Verify data items collected were comparable in
    wording and methods.
  • When combining years remember to use correct
    combined weights.

35
Analyzing trends with NHANES NHANES III to
NHANES 1999-2004
  • Things to consider
  • What is your sample from each surveyage?
  • How different was the question worded or the
    interview methods ?
  • How different were the lab or exam methodologies
    ? Cutoffs used? Definitions?
  • For current NHANES 1999-2004 sample sizes may be
    smaller depending on number of years measured -
    especially in sub domains
  • Larger sampling variation.
  • May need to limit comparisons.

36
Race/Ethnicity NHANES 1999-2004
  • Two variables available
  • RIDRETH1
  • RIDRETH2

37
Race/Ethnicity NHANES 1999-2004
  • Ridreth1- Use for analyses of 1999-2004 data
    alone.
  • 1Mexican American
  • 2other Hispanic
  • 3non-Hispanic white
  • 4non-Hispanic black
  • 5other races including multiracial.
  • For 2 and 4 years of data we know there is
    insufficient sample size to analyze other
    Hispanics (group 2) alone or to analyze all
    Hispanics.
  • Analyses to evaluate whether 6 years of data
    (1999-2004) are sufficient to analyze these
    Hispanic groups are ongoing.
  • Groups 2 and 5 can AND should continue to be
    combined to represent all other races.

38
Race/Ethnicity NHANES 1999-2004
  • Ridreth2
  • Use for analyzing trends from NHANES III to
    NHANES 1999-2004.
  • Most comparable to race/ethnicity variable
    collected in NHANES III.
  • Coded as
  • 1non-Hispanic white
  • 2non-Hispanic black
  • 3Mexican American
  • 4other including Multi-Racial
  • 5other Hispanic

39
Analyzing data from NHANES 1999-2004
  • Crude versus Age Standardized Estimates
  • Age distributions within survey samples vary by
    racial/ethnic group.
  • Age distributions also vary by survey NHANES
    III vs. NHANES 1999-2004.
  • When comparing estimates across racial/ethnic
    groups or between surveys you may need to age
    standardize.
  • Also present all age specific estimates!

40
Analyzing data from NHANES 1999-2004
  • When Age Standardizing
  • Use the 2000 U.S. Census Population for
    consistency for both NHANES III and all NHANES
    1999-2000 or above.
  • For guidelines and population proportions see the
    website below for the Klein and Schoenborn HP2010
    Statistical Notes on Age Adjustment using the
    2000 Projected U.S. Population.
  • http//www.cdc.gov/nchs/data/statnt/statnt20.pdf

41
Analyzing data from NHANES 1999-2004
  • When Age Standardizing
  • In SUDAAN, use the STDVAR and STDWGT statements.
  • STDVAR variable name for the age groups.
  • STDWGT corresponding proportion of the 2000
    U.S. Census population for that age subgroup.

42
Age standardization for NHANES
  • Crude vs. Age Standardized Estimates Example

Hepatitis B NHANES III Non-Hispanic White Non-Hispanic Black Mexican American
Crude Prevalence 3.1 (2.6-3.6) 11.9 (10.6-13.2) 3.6 (2.8-4.6)
Age Standardized 2.6 (2.2-3.1) 11.9 (10.7-13.3) 4.4 (3.4-5.6)
43
Other data analysis issues from NHANES
  • Design Effect
  • Sample design effect - the ratio of the variance
    estimated under the complex sample design to the
    variance under simple random sampling
  • Var (CSD) / Var (SRS)
  • SUDAAN - DEFT2 option in Proc Descript
  • Design effect can be averaged

44
Other data analysis issues from NHANES
  • Effective Sample sizes
  • Sample sizes should be adjusted by the sample
    design effect (DEFF)
  • Effective N N/DEFF
  • Minimum sample size for reporting each individual
    estimate depends on the statistic being
    calculated, its relative size, stability of the
    SE estimate, degrees of freedom and other special
    circumstances.
  • Please refer to the Analytic Guidelines on our
    web site for more details.

45
Other data analysis issues from NHANESEstimate
Stability
  • Relative Standard Errors
  • For estimates such as means/prevalences
    calculate the relative standard error (RSE) as
    follows (SE mean / mean) X 100
  • For prevalence estimates near 100 (i.e. gt 90),
    look at the RSE for the percent negative not just
    percent positive.
  • i.e. calculate RSE for minimum p or 1-p

46
Other data analysis issues from NHANES
  • Relative Standard Errors and Rare Events
  • RSEs lt20, estimates are most likely reportable.
  • RSEs gt30, consider whether the estimate
    provides useful information.
  • Estimates of 50 with SE of 15 and RSE of 30
    give a 95 CIs approximately 20-80. Is this
    really useful information?
  • Estimates of low prevalence (i.e. 5) with SE of
    1.5 also give RSE of 30 but the 95 CI is
    approximately 2-8. This may be very useful
    information.

47
Other data analysis Issues from NHANES
  • Confidence Limits for rare (gt90 or lt10)
    events
  • Standard normal approaches for calculating 95
    CIs may give lower bounds lt 0 or upper bounds gt
    100.
  • Statistical literature describes alternative
    methods under these situations.
  • Evaluation of these various methods - see
    analytic guidelines on NCHS web site.

48
Other data analysis Issues from NHANES
  • Degrees of freedom (DF) for t-statistics
  • Must calculate the DF to obtain a correct
    t-statistic for calculating confidence limits.
  • DF are number of clusters in the 2nd level of
    sampling ( PSUs) number of clusters in the
    1st level of sampling (strata) in your subgroup
    of interest.
  • Same for both SAS and SUDAAN when all strata and
    PSUs are represented in your subgroup.

49
Other data analysis Issues from NHANES
  • Degrees of freedom (DF) for t-statistics
  • SAS and SUDAAN do not calculate DF the same way
    when your subgroup is NOT represented in all
    PSUs and strata.
  • SAS is currently working on correcting this.
  • In SUDAAN, to calculate DF you must output the
    strata and the PSUs using the ATLEVL11 and
    ATLEV22 options in your PROC Descript or PROC
    Crosstab

50
Analyzing Data from NHANES 1999-2004
  • Analytic Guidelines
  • Detailed guidelines for working with NHANES data
    can be found at
  • http//www.cdc.gov/nchs/nhanes.htm
  • This document contains everything discussed today
    and will continue to grow to include guidelines
    for statistical tests, multivariate analyses,
    modeling and more!
  • Web based tutorial also currently available and
    continuously being developed.
Write a Comment
User Comments (0)
About PowerShow.com