The Importance of Data Validation - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

The Importance of Data Validation

Description:

Ensuring High Quality Data The Importance of Data Validation Data Validation Levels Level I: Field and Laboratory Checks Level II: Internal Consistency Checks and ... – PowerPoint PPT presentation

Number of Views:184
Avg rating:3.0/5.0
Slides: 35
Provided by: wus88
Category:

less

Transcript and Presenter's Notes

Title: The Importance of Data Validation


1
Ensuring High Quality Data
  • The Importance of Data Validation
  • Data Validation Levels
  • Level I Field and Laboratory Checks
  • Level II Internal Consistency Checks and
    Examples
  • Level III/IV Unusual Value Identification and
    Examples
  • Validation of PM2.5 Mass
  • Information to be Provided with PM Sampler Data
  • Are Measurements Comparable?
  • National Contract Lab Responsibilities
  • Data Access
  • Sample Size Issues
  • References
  • Appendix Criteria Tables for PM2.5 Mass
    Validation
  • Critical Criteria Table
  • Operational Evaluations Table
  • Systematic Issues

The purpose of data validation is to detect and
then verify any data values that may not
represent actual air quality conditions at the
sampling station. (U.S. EPA, 1984)
2
The Importance of Data Validation
  • Data validation is critical because serious
    errors in data analysis and modeling results can
    be caused by erroneous individual data values.
  • Data validation consists of procedures developed
    to identify deviations from measurement
    assumptions and procedures.
  • Timely data validation is required to minimize
    the generation of additional data that may be
    invalid or suspect and to maximize the
    recoverable data.

Main et al., 1998
3
The Importance of Data Validation
  • The quality and applicability of data analysis
    results are directly dependent upon the inherent
    quality of the data. In other words, data
    validation is critical because serious errors in
    data analysis and modeling results can be caused
    by erroneous individual data values. The EPA's
    PM2.5 speciation guidance document provides
    quality requirements for sampling and analysis.
    The guidance document also discusses data
    validation including the suggested four-level
    data validation system. It is the monitoring
    agencys responsibility to prevent, identify,
    correct, and define the consequences of
    difficulties that might affect the precision and
    accuracy, and/or the validity, of the
    measurements.
  • Once the quality assured data are provided to
    data analysts, additional data validation steps
    need to be taken. Given the newness and
    complexity of the PM2.5 speciation monitoring and
    sample analysis methods, errors are likely to
    pass through the system despite rigorous
    application of quality assurance and validation
    measures by the monitoring agencies. Therefore,
    data analysts should also check the validity of
    the data before conducting their analyses.
  • While some quality assurance and data validation
    can be performed without a broad understanding of
    the physical and chemical processes of PM (such
    as ascertaining that the field or laboratory
    instruments are operating properly), some degree
    of understanding of these processes is required.
    Key issues to understand include PM physical,
    chemical, and optical properties PM formation
    and removal processes and sampling artifacts,
    interferences, and limitations. These topics
    were discussed in the introduction and references
    therein. The analyst should also understand the
    measurement uncertainty and laboratory analysis
    uncertainty. These uncertainties may differ
    significantly among samplers and analysis methods
    which, in turn, have an affect on the
    interpretation and uses of the data (e.g., in
    source apportionment).

4
Data Validation Procedures and Tools
  • Data validation tools for PM are in development

5
Data Validation Levels
  • Level I. Routine checks during the initial data
    processing and generation of data (e.g., check
    file identification review unusual events, field
    data sheets, and result reports do instrument
    performance checks).
  • Level II. Internal consistency tests to identify
    values in the data that appear atypical when
    compared to values of the entire data set.
  • Level III. Current data comparisons with
    historical data to verify consistency over time.
  • Level IV. Parallel consistency tests with data
    sets from the same population (e.g., region,
    period of time, air mass) to identify systematic
    bias.

U.S. EPA, 1999a
6
Level I Field and Laboratory Checks
  • Verify computer file entries against data sheets.
  • Flag samples when significant deviations from
    measurement assumptions have occurred.
  • Eliminate values for measurements that are known
    to be invalid because of instrument malfunctions.
  • Replace data from a backup data acquisition
    system in the event of failure of the primary
    system.
  • Adjust measurement values of quantifiable
    calibration or interference bias.

Chow et al., 1996
7
Level II Internal Consistency Checks
  • Compare collocated samplers (scatter plots,
    linear regression).
  • Check sum of chemical species vs. PM2.5 mass
    (multielements Al to U sulfate nitrate
    ammonium ions OC EC - Sulfur).
  • Check physical and chemical consistency (sulfate
    vs. total sulfur, soluble potassium vs. total
    potassium, soluble chloride vs. chlorine, babs
    vs. elemental carbon).
  • Balance cations and anions.
  • Balance ammonium.
  • Investigate nitrate volatilization and adsorption
    of gaseous organic carbon.
  • Prepare material balances and crude mass balances.

Chow, 1998
8
Level II Consistency Check Guidelines
Chow, 1998
IC ion chromatography XRF energy dispersive
X-ray fluorescence AAS atomic absorption
spectrophotometry
9
Example Compare Collocated Samplers
  • Data from collocated samplers should be compared
    - between the same sampler type and different
    sampler types.
  • During the 1995 Integrated Monitoring Study
    (IMS95) in California, the collocated PM2.5
    samplers (same type) at Bakersfield showed
    excellent agreement.
  • SSI 1 and TEOM measurements did not correlate
    very well during the winter/fall season. The two
    samplers showed much better agreement during
    March-September (not shown).

11
Reg.
Reg. linear regression fit
Chow, 1998
10
Example Check Sum of Chemical Species vs. PM2.5
Mass
11
Reg.
Chow, 1998
  • Compare the sum of species to the PM2.5 mass
    measurements.
  • The comparison shown here indicates an excellent
    correlation (r0.98).
  • The sum of species concentrations is lower than
    the reported mass because the sum of species does
    not include oxygen.

11
Example Check Chemical and Physical Consistency
(1 of 2)
11
31
Reg.
Reg.
Chow, 1998
  • Chemical and physical consistency checks include
    comparing sulfate with total sulfur (sulfate
    should be about three times the sulfur
    concentrations) and comparing soluble potassium
    with total potassium.
  • In the examples shown, the sulfur data compare
    well while the potassium data comparison shows a
    considerable amount of scatter.

12
Example Check Chemical and Physical Consistency
(2 of 2)
  • Another consistency check that can be performed
    (if data are available) is to compare the
    elemental carbon concentrations with particle
    absorption (babs) measurements.
  • In the example shown, the two measurements agree
    well.

babs vs. Elemental Carbon
Reg.
Chow, 1998
13
Example Anion and Cation Balance
  • Equations to calculate anion and cation balance
    (?moles/m3)
  • Anion equivalence
  • e Cl- NO3- SO4
  • 35.453 62.005 48.03
  • Cation equivalence
  • e Na K NH4
  • 23.0 39.098 18.04
  • Plot cation equivalents vs. anion equivalents

Reg.
Chow 1998
14
Example Ammonia Balance
  • Equations to calculate ammonia balance (?g/m3)
  • Calculated ammonium based on NH4NO3 and NH4HSO4
    0.29 (NO3-) 0.192 (SO4)
  • Calculated ammonium based on NH4NO3 and (NH4)2SO4
    0.29 (NO3-) 0.38 (SO4)
  • Plot calculated ammonium vs. measured ammonium
    for both forms of sulfate

Chow 1998
15
Example Nitrate Volatilization Check
San Joaquin Valley, CA
  • Particularly for the western U.S., the analyst
    should understand the extent of possible nitrate
    volatilization in the data set.
  • This example shows that nitrate volatilization
    was significant during the summer.

Chow 1998
16
Example Adsorption of Gaseous OC Check
  • Some VOCs evaporate from a filter (negative
    artifact) during sampling while others are
    adsorbed (positive artifact).
  • The top figure shows the organic carbon (OC)
    concentrations on the backup filters were
    frequently 50 or more of the front filter
    concentrations. The error bars reflect
    measurement standard deviation.
  • The bottom figure shows the ratio of the backup
    OC to the front filter OC as a function of PM2.5
    mass. Relatively larger organic vapor artifacts
    at lower PM2.5 concentrations suggests that
    particles provide additional adsorption sites on
    the front filters (Chow et al., 1996).

Chow 1998
17
Example Material Balance
Denver, CO Core Sites
  • Geological ( 1.89 ? Al 2.14 ? Si
    1.4 ? Ca 1.43 ? Fe )
  • Organic carbon ( 1.4 ? OC )
  • Elemental carbon
  • Ammonium nitrate ( 1.29 ? NO3 )
  • Ammonium sulfate ( 1.38 ? SO4 )
  • Remaining trace elements (excluding Al, Si,
    Ca, Fe, and S)
  • Unidentified

Chow 1998
18
Example Crude Mass Balance
  • Crude mass balances can be constructed to
    investigate estimated source contributions.
  • Do the crude estimates make sense spatially and
    temporally?

Las Vegas, NV
Site types
Sites
Chow 1998
19
Level III/IV Unusual Value Identification
  • Extreme values
  • Values that normally track the values of other
    variables in a time series
  • Values that normally follow a qualitatively
    predictable spatial or temporal pattern

The first assumption upon finding a measurement
that is inconsistent with physical expectations
is that the unusual value is due to a measurement
error. If, upon tracing the path of the
measurement, nothing unusual is found, the value
can be assumed to be a valid result of an
environmental cause.
Chow et al., 1996
20
Example Unusual Value Identification
  • Potassium nitrate (KNO3) is a major component of
    all fireworks.
  • This figure shows all available PM2.5 K data
    from all North American sites, averaged to
    produce a continental average for each day during
    1988-1997.
  • Fourth of July celebration fireworks are clearly
    observed in the potassium time series.
  • Fireworks displays on local holidays/events could
    have a similar affect on data.

Poirot (1998)
Regional averaging and count of sample numbers
were conducted in Voyager, using variations of
the Voyager script on p. 6 of the Voyager
Workbook Kvoy.wkb. Additional averaging and
plotting was conducted in Microsoft Excel.
21
Data Validation Continues During Data Analysis
  • Two source apportionment models were applied to
    PM2.5 data collected in Vermont, and the results
    of the models were compared.
  • Excellent agreement for the selenium source was
    observed for part of the data while the rest of
    the results did not agree well.
  • Further investigation showed that the period of
    good agreement coincided with a change in
    laboratory analysis (with an accompanying change
    in detection limit and measurement uncertainty -
    the two models treat these quantities
    differently.)

Poirot, 1999
22
Validation of PM2.5 Mass
  • Consistent validation of PM2.5 mass
    concentrations across the U.S. is needed. To aid
    in this, three tables of criteria were developed
    and are provided in the appendix to this section
    of the workbook.
  • Observations that do not meet each and every
    criterion on the Critical Criteria Table should
    be invalidated unless there are compelling
    reasons and justification not to do so.
  • Criteria that are important for maintaining and
    evaluating the quality of the data collection
    system are included in the Operational
    Evaluations Table. Violation of a criterion or a
    number of criteria may be cause for invalidation.
  • Criteria important for the correct interpretation
    of the data but that do not usually impact the
    validity of a sample or group of samples are
    included on the Systematic Issues Table.

U.S. EPA, 1999c
23
Information to be Provided with PM Sampler Data
These supplemental measurements will be useful to
help explain or caveat unusual data
40 CFR 50 Appendix L, Table L-1
24
Are Measurements Comparable?
  • Example comparison of 24-hr average TEOM (from
    hourly measurements), IMPROVE (gravimetric mass
    from the A filter), and FRM PM2.5 mass
    measurements made in New Haven, CT during the
    third and fourth quarters of 1998.
  • During the colder months at this site, the TEOM
    seems to report a lower concentration than the
    FRM.

PM2.5 average values (mg/m3) New Haven, CT 1998
(No. of samples in the calculated average). For
example, in the third quarter, TEOM and IMPROVE
samples ran concurrently on 24 days. The ten
values where all three samplers ran are a subset
of the 24.
Graham, 1999
25
National Contract Lab Responsibilities
  • Discussion of national contract laboratory
    responsibilities to be added.

26
Data Access (1 of 2)
  • Official data sources
  • AIRS Data via public web at http//www.epa.gov/air
    sdata
  • AIRS Air Quality System (AQS) via registered
    users
  • register with EPA/NCC (703-487-4630)
  • PM2.5 websites via public web
  • PM2.5 Data Analysis Workbook at
    http//capita.wustl.edu/databases/userdomain/pmfin
    e/
  • EPA PM2.5 Data Analysis clearinghouse at
    http//www.epa.gov/oar/oaqps/pm25/
  • Northern Front Range Air Quality Study at
    http//nfraqs.cira.colostate.edu/index2.html
  • NEARDAT at http//capita.wustl.edu/NEARDAT

27
Data Access (2 of 2)
  • Secondary data sources
  • Meteorological parameters from National Weather
    Service (NWS) http//www.nws.noaa.gov
  • Meteorological parameters from PAMS/AIRS AQS
    register with EPA/NCC (703-487-4630)
  • Collocated or nearby SO2, nitrogen oxides, CO,
    VOC from AIRS AQS
  • Private meteorological agencies (e.g., forestry
    service, agricultural monitoring, industrial
    facilities)

28
Sample Size Issues
  • How complete must data be to show that an area
    meets the NAAQS for PM?

U.S. EPA, 1999b
Sample size requirements for data analyses will
vary depending upon the analysis type, the
analysis goals, the variability in the data, and
other factors.
29
Summary
  • Data validation is vital because serious errors
    in data analysis and modeling results can be
    caused by erroneous individual data values.
  • This workbook section provides a discussion of
    data validation levels, example validation
    checks, and other information important to the
    data validation process.

30
References
  • Ayers G.P., Keywood M.D., Gras J.L. (1999) TEOM
    vs. manual gravimetric methods for determination
    of PM2.5 aerosol mass concentrations. Atmos.
    Environ., 33, pp. 3717-3721.
  • Chow J.C. and J.G. Watson (1998) Guideline on
    speciated particulate monitoring. Draft report 3
    prepared by Desert Research Institute for the
    U.S. EPA Office of Air Quality Planning and
    Standards. August.
  • Chow J.C. (1998) Descriptive data analysis
    methods. Presentation prepared by Desert
    Research Institute for the U.S. EPA in Research
    Triangle Park, November.
  • Chow J.C., J.G. Watson, Z. Lu, D.H. Lowenthal,
    C.A. Frazier, P.A. Solomon, R.H. Thuillier, K.
    Magliano (1996) Descriptive analysis of PM2.5 and
    PM10 at regionally representative locations
    during SJVAQS/AUSPEX. Atmos. Environ., Vol. 30,
    No. 12, 2079-2112.
  • Chow J.C. (1995) Measurement methods to determine
    compliance with ambient air quality standards for
    suspended particles. J. Air Waste Manage.
    Assoc., 45, pp.320-382.
  • Graham, J. (1999) personal communication.
  • Homolya J.B., Rice J., Scheffe R.D. (1998) PM2.5
    speciation - objectives, requirements, and
    approach. Presentation. September.
  • Main H.H., Chinkin L.R., and Roberts P.T. (1998)
    PAMS data analysis workshops illustrating the
    use of PAMS data to support ozone control
    programs. Web page prepared for the U.S.
    Environmental Protection Agency, Research
    Triangle Park, NC by Sonoma Technology, Inc.,
    Petaluma, CA, lthttp//www.epa.gov/oar/oaqps/pams/a
    nalysisgt STI-997280-1824, June.
  • Poirot R. (1999) personal communication
  • Poirot R. (1998) Tracers of opportunity
    Potassium. Paper available at http//capita.wustl
    .edu/PMFine/Workgroup/SourceAttribution/Reports/In
    -progress/Potass/ktext.html
  • U.S. Environmental Protection Agency (1984)
    Quality assurance handbook for air pollution
    measurement systems, volume ii ambient air
    specific methods (interim edition),
    EPA/600/R-94/0386, April.
  • U.S. Environmental Protection Agency(1999a)
    Particulate matter (PM2.5) speciation guidance
    document. Available at http//www.epa.gov/ttn/amt
    ic/files/ambient/pm25/spec/specpln3.pdf
  • U.S. Environmental Protection Agency(1999b)
    Guideline on data handling conventions for the PM
    NAAQS. EPA-454/R-99-008, April.
  • U.S. Environmental Protection Agency(1999c) PM2.5
    mass validation criteria. Available at
    http//www.epa.gov/ttn/amtic/pmqa.html

31
Critical Criteria Table
U.S. EPA, 1999c
32
Operational Evaluations Table (1 of 2)
U.S. EPA, 1999c
33
Operational Evaluations Table (2 of 2)
U.S. EPA, 1999c
34
Systematic Issues
U.S. EPA, 1999c
Write a Comment
User Comments (0)
About PowerShow.com