Power analysis and hypothesis testing with multiple samples - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Power analysis and hypothesis testing with multiple samples

Description:

The planning team included the property owners, a certified risk assessor (to ... Specifying the spatial and temporal boundaries for collecting data. ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 29
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: Power analysis and hypothesis testing with multiple samples


1
Power analysis and hypothesis testing with
multiple samples
  • ESM 206
  • 6 April 2006

2
Types of error
  • Type II fail to reject null hypothesis when its
    really false
  • Desired level b
  • Is associated with a given effect size
  • E.g., want a probability 0.1 of failing to reject
    when true difference between means is 0.35.
  • Type I reject null hypothesis when its really
    true
  • Desired level a

3
Setting error levels
  • a is controlled by setting critical P-value for
    rejecting null hypothesis
  • b decreased by
  • increasing a
  • Increasing sample size (n)
  • Decreasing sample variance, var(x)
  • increasing effect size, D
  • Tradeoff between a and b
  • Need to balance costs associated with type I and
    type II errors
  • Power is 1-b
  • POWER ANALYSIS
  • Take a few samples to get an estimate of var(x)
  • Assume that population has mean m0 D and
    variance var(x)
  • If I take n samples, what is the probability of
    failing to reject the null hypothesis (getting P
    gt a)?
  • Either through simulation or theory
  • Adjust n to get the desired error level

4
Power and the water temperature test
  • 3
  • Testing
  • H0 m 56

5
Effect of sample size on power
6
Statistics Decision Making EPAs Data Quality
Objectives (DQOs)
  • What are DQOs? DQOs are qualitative and
    quantitative statements, developed using the DQO
    Process, that clarify study objectives, define
    the appropriate type of data, and specify
    tolerable levels of potential decision errors
    that will be used as the basis for establishing
    the quality and quantity of data needed to
    support decisions. DQOs define the performance
    criteria that limit the probabilities of making
    decision errors by considering the purpose of
    collecting the data defining the appropriate
    type of data needed and specifying tolerable
    probabilities of making decision errors.
  • See link on class website

7
The DQO process
  • State the Problem
  • Define the problem identify the planning team
    examine budget, schedule.
  • Identify the Decision
  • State decision identify study question define
    alternative actions.
  • Identify the Inputs to the Decision
  • Identify information needed for the decision
    (information sources, basis for Action Level,
    sampling/analysis method).
  • Define the Boundaries of the Study
  • Specify sample characteristics define
    spatial/temporal limits, units of decision making.
  • Develop a Decision Rule
  • Define statistical parameter (mean, median)
    specify Action Level develop logic for action.
  • Specify Tolerable Limits on Decision Errors
  • Set acceptable limits for decision errors
    relative to consequences (health effects, costs).
  • Optimize the Design for Obtaining Data
  • Select resource-effective sampling and analysis
    plan that meets the performance criteria.

8
Preliminary assessment of household lead dust
  • State the Problem
  • Describing the problem. The owners wish to
    evaluate the potential hazards associated with
    lead in dust in a single-family residence because
    other residences in the Athington Park House
    neighborhood had shown levels of lead in dust
    that might pose potential hazards.
  • Establishing the planning team. The planning team
    included the property owners, a certified risk
    assessor (to collect and handle dust samples and
    serve as a liaison with the laboratory), and a
    quality assurance specialist. The decision makers
    were the property owners.
  • Describing the conceptual model of the potential
    hazard. The conceptual model described a
    single-family residence in a neighborhood where
    hazardous levels of lead had been detected in
    other residences. Interior sources of lead in
    dust were identified as lead-based paint on
    doors, walls, and trim, which deteriorated to
    form, or attach to, dust particles. Exterior
    sources included lead in exterior painted
    surfaces that had deteriorated and leached into
    the dripline soil, or lead deposited from
    gasoline combustion fumes that accumulated in
    soil. In these cases, soil could be tracked into
    the house, and collected as dust on floors,
    window sills, toys, etc. As this dust could be
    easily ingested through hand-to-mouth activities,
    dust was considered to be a significant exposure
    route. Levels of lead in floor dust were to be
    used as an indicator of the potential hazard.
  • Identifying the general intended use of collected
    data. The data collected in this study will be
    used to determine if a heath hazard is present at
    Athington Park House using the criteria
    established under 40 CFR 745. This is a decision
    making (test of hypothesis) DQO Process.
  • Identifying available resources, constraints, and
    deadlines. The property owners were willing to
    commit up to 1,000 for the study. To minimize
    inconvenience to the family, all sampling would
    be conducted during one calendar day.
  • Identify the Decision
  • Specifying the primary study question. The
    primary question to be addressed is to determine
    if there were significant levels of lead in floor
    dust at the House.
  • Determining the range of possible outcomes from
    this study. If there were significant levels of
    lead in floor dust at the residence, the team
    planned follow-up testing to determine whether
    immediately dangerous contamination exists and
    the location of the contamination in the
    property. If not, then there was no potential
    lead hazard, and testing would be discontinued.

9
Preliminary assessment of household lead dust
  • Identify the Inputs to the Decision
  • Identifying the types of information that is
    needed to resolve the decision statement. The
    assessment of a dust lead hazard would be
    evaluated by measuring dust lead loadings by
    individual dust wipe sampling according to
    established protocol.
  • Identifying the source of information. The EPA
    proposed standard stated that if dust lead levels
    were above 50 µg /ft2 on bare floors, a lead
    health hazard was possible and follow-up testing
    and/or intervention should be undertaken (40 CFR
    745).
  • Identifying how the Action Level will be
    determined. The Action Level is the EPA standard
    specified in 40 CFR 745.
  • Identifying appropriate sampling and analysis
    methods. Wipe samples were collected according to
    ASTM standard practice E1728. These samples were
    digested in accordance with ASTM standard
    practice E1644 and the sample extracts were
    chemically analyzed by ASTM standard test method
    E1613. The results of these analyses provided
    information on lead loading (i.e., µgof lead per
    square foot of wipe area) for each dust sample.
    The detection limit was well below the Action
    Level.
  • Define the Boundaries of the Study
  • Specifying the spatial and temporal boundaries
    for collecting data. The spatial boundaries of
    the study area were defined as all floor areas
    within the dwelling that were reasonably
    accessible to young children who lived at, or
    visited, the property. Dust contained in each one
    ft.2 area of each floor of the residence was
    sampled and sent to a laboratory for analysis.
  • Specifying other practical constraints for
    collecting data. Permission from the residents of
    Athington Park House was required before risk
    assessors could enter the residence to collect
    dust wipe samples. Sampling was completed within
    1 calendar day to minimize the inconvenience to
    the residents.
  • Specifying the scale of estimates to be made. The
    test results were considered to appropriately
    characterize the current and future hazards. It
    was possible that lead contained in soil could be
    tracked into the residence and collect on
    surfaces, but no significant airborne sources of
    lead deposition were known in the region. The
    dust was not expected to be transported away from
    the property therefore, provided the exterior
    paint was maintained in intact condition, lead
    concentrations measured in the dust were not
    expected to change significantly over time.
  • Specifying the scale of inference for decision
    making. The decision unit was the interior floor
    surface (approximately 1,700 ft2) of the
    residence at the time of sampling and in the near
    future.

10
Preliminary assessment of household lead dust
  • Develop a Decision Rule
  • Specifying the Action Level. This was given in 40
    CFR 745 which specified 50 µg/ft2.
  • Developing the population of interest and the
    theoretical decision rule. From 40 CFR 745, the
    median was selected as the appropriate parameter
    to characterize the population under study. The
    median dust lead loading was defined to be that
    level, measured in µg/ft2, above and below which
    50 of all possible dust lead loadings at the
    property were expected to fall. If the true
    median dust loading in the residence was greater
    than 50 µg/ft2, then the planning team required
    followup testing. Otherwise, they decided that a
    dust lead hazard was not present and discontinued
    testing.

11
Preliminary assessment of household lead dust
  • Determining the impact of decision errors and
    setting tolerable decision error limits. The edge
    of the gray region was designated by considering
    that a false acceptance decision error would
    result in the unnecessary expenditure of scarce
    resources for follow-up testing and/or
    intervention associated with a presumed hazard
    that did not exist. The planning team decided
    that this decision error should be adequately
    controlled for true dust lead loadings of 40
    µg/ft2 and below. Since human exposure to lead
    dust hazards causes serious health effects, the
    planning team decided to limit the false
    rejection error rate to 5. This meant that if
    this dwellings true median dust lead loading was
    greater than 50 µg/ft2, the baseline condition
    would be correctly rejected 19 out of 20 times.
    The false acceptance decision, which would result
    in unnecessary use of testing and intervention
    resources, was allowed to occur more frequently
    (i.e., 20 of the time when the true dust-lead
    loading is 40 µg/ft2 or less).
  • Specify Tolerable Limits on Decision Errors
  • Setting the baseline condition. The baseline
    condition adopted by the property owners was that
    the true median dust lead loading was above the
    EPA hazard level of 50 µg/ft2, due to the
    seriousness of the potential hazard. The planning
    team decided that the most serious decision error
    would be to decide that the true median dust lead
    loading was below the EPA hazard level of 50
    µg/ft2, when in truth the median dust lead
    loading was above the hazard level. This
    incorrect decision would result in significant
    exposure to dust lead and adverse health effects.

12
(No Transcript)
13
EXAMPLE LEAD DUST
  • Preliminary sampling suggests that the standard
    deviation of lead dust observations is 30 mg/ft2
  • We want to know how many observations we need to
    take so that b is 0.2 if the true mean dust
    concentration is 10 mg/ft2 below the
    contamination threshold and we are using an a of
    0.05

For t-test, a for 1-sided test equivalent to 2a
for 2-sided test
1 - b
14
Interlude the lognormal distribution
15
LEAD DUST DONE RIGHT
Effect size is log(50) log(40) 0.22 Std.
dev. of log(lead) est. as 1.5
16
Cleanup of a contaminated site
  • THE PROBLEM
  • A site has suffered the release of a toxic
    chemical (TcCB) into the soil, and the company
    responsible has undertaken cleanup activities.
  • How should we decide whether the cleanup has been
    adequate?
  • THE DATA
  • We have samples of TcCB concentration (measured
    in ppb) in the soils at the cleanup site, as well
    as samples of concentrations at an uncontaminated
    reference site with similar soil
    characteristics.
  • The concentrations of TcCB at the reference site
    are not zero, and we need to determine what the
    normal levels of this chemical are.

17
EPA standards for assessing site contamination
  • If a site has not been declared to be
    contaminated, then the null hypothesis should be
    that it is clean, i.e., there is no difference
    from the control site. The alternative
    hypothesis is that the site is contaminated. A
    non-significant test results leads to the
    conclusion that there is no real evidence that
    the site is contaminated.
  • If a site has been declared to be contaminated,
    then the null hypothesis should be that this is
    true, i.e., there is a difference (in an
    unacceptable direction) from the control site.
    The alternative hypothesis is that the site is
    clean. A non-significant test results leads to
    the conclusion that there is no real evidence
    that the site has been cleaned up.

USEPA (1989) Methods for Evaluating the
Attainment of Cleanup Standards. Vol. 1 Soils
and Solid Media. EPA Report 230/02-89-042,
Office of Policy, Planning and Evaluation,
Washington, DC.
18
COMPARING TWO GROUPS
  • Two-sample t-test
  • Tests for differences between means of two groups
  • Null hypotheses
  • Under null hypothesis, difference in means,
    standardized by standard deviations of both
    groups, should follow a t distribution

19
TcCB cleanup conclusion
  • Using the null hypothesis that the cleanup site
    is contaminated with respect to the control site,
    we fail to reject the hypothesis that the cleanup
    site is still contaminated (one-sided two-sample
    t-test with unequal variances, t 1.45, df
    76.05, P 0.925).

20
Comparing fuel efficiency of two gasoline blends
  • THE PROBLEM
  • The owner of a taxi company is evaluating two
    gasoline blends, and wants to use the one that
    produces greater fuel efficiency
  • How should she decide which (if either) produces
    greater efficiency?
  • THE DATA
  • On one day, all the taxis in the fleet were
    fueled with gas A, and at the end of the day the
    efficiency of each car (in mpg) was calculated.
  • On the next day, all the taxis in the fleet were
    fueled with gas B, and at the end of the day the
    efficiency of each car was calculated.

21
Two-sample t-test of gas data
Want to control for variability among drivers
22
COMPARING TWO GROUPS
  • Paired t-test
  • Each observation is a pair of measurements
  • Water quality upstream and downstream of a road
    crossing
  • Fuel mileage by a taxi driver using two brands of
    gasoline
  • Natural variability between sampling units might
    swamp differences between the means
  • Streams have different background water quality
  • Drivers have different driving styles
  • Instead, test for mean of differences

23
  • CONCLUSION
  • We find strong evidence that mileage differs
    between gas A and gas B (paired t-test, t 3.12,
    df 9, P 0.012). On average, the fuel
    efficiency with gas B is 0.6 mpg greater than
    with gas A.

24
COMPARING MEANS OF 3 OR MORE GROUPS
  • ANOVA (ANalysis Of VAriance)
  • Like 2-sample t-test, but with multiple groups
  • H0 All groups have the same mean
  • HA Not all groups have the same mean
  • Rejecting H0 doesnt tell you which groups differ
  • Can do a bunch of t-tests for this

25
CONCLUSION Very strong evidence that highway
mileage differs among car types (one-way ANOVA, F
23.67, df 5,86, P lt 0.0001)
26
HYPOTHESIS TESTING OVERVIEW
27
ASSUMPTIONS OF T-TEST AND ANOVA
  • T-test
  • Distribution within each group is normal
  • ANOVA
  • Distribution within each group is normal
  • Variances of all groups are the same
  • Both tests are robust to moderate violations of
    these assumptions
  • Regard P value as an approximate value
  • TcCB data
  • Assumption of normality is badly violated
  • Solution do tests on transformed data
  • Car mileage data
  • Assumption of equal variances is badly violated
  • Solution perform Welch ANOVA

28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com