Title: The Importance of Data Validation
1Ensuring High Quality Data
- The Importance of Data Validation
- Data Validation Levels
- Level I Field and Laboratory Checks
- Level II Internal Consistency Checks and
Examples - Level III/IV Unusual Value Identification and
Examples - Validation of PM2.5 Mass
- Information to be Provided with PM Sampler Data
- Are Measurements Comparable?
- National Contract Lab Responsibilities
- Data Access
- Sample Size Issues
- References
- Appendix Criteria Tables for PM2.5 Mass
Validation - Critical Criteria Table
- Operational Evaluations Table
- Systematic Issues
The purpose of data validation is to detect and
then verify any data values that may not
represent actual air quality conditions at the
sampling station. (U.S. EPA, 1984)
2The Importance of Data Validation
- Data validation is critical because serious
errors in data analysis and modeling results can
be caused by erroneous individual data values. - Data validation consists of procedures developed
to identify deviations from measurement
assumptions and procedures. - Timely data validation is required to minimize
the generation of additional data that may be
invalid or suspect and to maximize the
recoverable data.
Main et al., 1998
3The Importance of Data Validation
- The quality and applicability of data analysis
results are directly dependent upon the inherent
quality of the data. In other words, data
validation is critical because serious errors in
data analysis and modeling results can be caused
by erroneous individual data values. The EPA's
PM2.5 speciation guidance document provides
quality requirements for sampling and analysis.
The guidance document also discusses data
validation including the suggested four-level
data validation system. It is the monitoring
agencys responsibility to prevent, identify,
correct, and define the consequences of
difficulties that might affect the precision and
accuracy, and/or the validity, of the
measurements. - Once the quality assured data are provided to
data analysts, additional data validation steps
need to be taken. Given the newness and
complexity of the PM2.5 speciation monitoring and
sample analysis methods, errors are likely to
pass through the system despite rigorous
application of quality assurance and validation
measures by the monitoring agencies. Therefore,
data analysts should also check the validity of
the data before conducting their analyses. - While some quality assurance and data validation
can be performed without a broad understanding of
the physical and chemical processes of PM (such
as ascertaining that the field or laboratory
instruments are operating properly), some degree
of understanding of these processes is required.
Key issues to understand include PM physical,
chemical, and optical properties PM formation
and removal processes and sampling artifacts,
interferences, and limitations. These topics
were discussed in the introduction and references
therein. The analyst should also understand the
measurement uncertainty and laboratory analysis
uncertainty. These uncertainties may differ
significantly among samplers and analysis methods
which, in turn, have an affect on the
interpretation and uses of the data (e.g., in
source apportionment).
4Data Validation Procedures and Tools
- Data validation tools for PM are in development
5Data Validation Levels
- Level I. Routine checks during the initial data
processing and generation of data (e.g., check
file identification review unusual events, field
data sheets, and result reports do instrument
performance checks). - Level II. Internal consistency tests to identify
values in the data that appear atypical when
compared to values of the entire data set. - Level III. Current data comparisons with
historical data to verify consistency over time. - Level IV. Parallel consistency tests with data
sets from the same population (e.g., region,
period of time, air mass) to identify systematic
bias.
U.S. EPA, 1999a
6Level I Field and Laboratory Checks
- Verify computer file entries against data sheets.
- Flag samples when significant deviations from
measurement assumptions have occurred. - Eliminate values for measurements that are known
to be invalid because of instrument malfunctions. - Replace data from a backup data acquisition
system in the event of failure of the primary
system. - Adjust measurement values of quantifiable
calibration or interference bias.
Chow et al., 1996
7Level II Internal Consistency Checks
- Compare collocated samplers (scatter plots,
linear regression). - Check sum of chemical species vs. PM2.5 mass
(multielements Al to U sulfate nitrate
ammonium ions OC EC - Sulfur). - Check physical and chemical consistency (sulfate
vs. total sulfur, soluble potassium vs. total
potassium, soluble chloride vs. chlorine, babs
vs. elemental carbon). - Balance cations and anions.
- Balance ammonium.
- Investigate nitrate volatilization and adsorption
of gaseous organic carbon. - Prepare material balances and crude mass balances.
Chow, 1998
8Level II Consistency Check Guidelines
Chow, 1998
IC ion chromatography XRF energy dispersive
X-ray fluorescence AAS atomic absorption
spectrophotometry
9Example Compare Collocated Samplers
- Data from collocated samplers should be compared
- between the same sampler type and different
sampler types. - During the 1995 Integrated Monitoring Study
(IMS95) in California, the collocated PM2.5
samplers (same type) at Bakersfield showed
excellent agreement. - SSI 1 and TEOM measurements did not correlate
very well during the winter/fall season. The two
samplers showed much better agreement during
March-September (not shown).
11
Reg.
Reg. linear regression fit
Chow, 1998
10Example Check Sum of Chemical Species vs. PM2.5
Mass
11
Reg.
Chow, 1998
- Compare the sum of species to the PM2.5 mass
measurements. - The comparison shown here indicates an excellent
correlation (r0.98). - The sum of species concentrations is lower than
the reported mass because the sum of species does
not include oxygen.
11Example Check Chemical and Physical Consistency
(1 of 2)
11
31
Reg.
Reg.
Chow, 1998
- Chemical and physical consistency checks include
comparing sulfate with total sulfur (sulfate
should be about three times the sulfur
concentrations) and comparing soluble potassium
with total potassium. - In the examples shown, the sulfur data compare
well while the potassium data comparison shows a
considerable amount of scatter.
12Example Check Chemical and Physical Consistency
(2 of 2)
- Another consistency check that can be performed
(if data are available) is to compare the
elemental carbon concentrations with particle
absorption (babs) measurements. - In the example shown, the two measurements agree
well.
babs vs. Elemental Carbon
Reg.
Chow, 1998
13Example Anion and Cation Balance
- Equations to calculate anion and cation balance
(?moles/m3) - Anion equivalence
- e Cl- NO3- SO4
- 35.453 62.005 48.03
- Cation equivalence
- e Na K NH4
- 23.0 39.098 18.04
- Plot cation equivalents vs. anion equivalents
Reg.
Chow 1998
14Example Ammonia Balance
- Equations to calculate ammonia balance (?g/m3)
- Calculated ammonium based on NH4NO3 and NH4HSO4
0.29 (NO3-) 0.192 (SO4) - Calculated ammonium based on NH4NO3 and (NH4)2SO4
0.29 (NO3-) 0.38 (SO4) - Plot calculated ammonium vs. measured ammonium
for both forms of sulfate
Chow 1998
15Example Nitrate Volatilization Check
San Joaquin Valley, CA
- Particularly for the western U.S., the analyst
should understand the extent of possible nitrate
volatilization in the data set. - This example shows that nitrate volatilization
was significant during the summer.
Chow 1998
16Example Adsorption of Gaseous OC Check
- Some VOCs evaporate from a filter (negative
artifact) during sampling while others are
adsorbed (positive artifact). - The top figure shows the organic carbon (OC)
concentrations on the backup filters were
frequently 50 or more of the front filter
concentrations. The error bars reflect
measurement standard deviation. - The bottom figure shows the ratio of the backup
OC to the front filter OC as a function of PM2.5
mass. Relatively larger organic vapor artifacts
at lower PM2.5 concentrations suggests that
particles provide additional adsorption sites on
the front filters (Chow et al., 1996).
Chow 1998
17Example Material Balance
Denver, CO Core Sites
- Geological ( 1.89 ? Al 2.14 ? Si
1.4 ? Ca 1.43 ? Fe ) - Organic carbon ( 1.4 ? OC )
- Elemental carbon
- Ammonium nitrate ( 1.29 ? NO3 )
- Ammonium sulfate ( 1.38 ? SO4 )
- Remaining trace elements (excluding Al, Si,
Ca, Fe, and S) - Unidentified
Chow 1998
18Example Crude Mass Balance
- Crude mass balances can be constructed to
investigate estimated source contributions. - Do the crude estimates make sense spatially and
temporally?
Las Vegas, NV
Site types
Sites
Chow 1998
19Level III/IV Unusual Value Identification
- Extreme values
- Values that normally track the values of other
variables in a time series - Values that normally follow a qualitatively
predictable spatial or temporal pattern
The first assumption upon finding a measurement
that is inconsistent with physical expectations
is that the unusual value is due to a measurement
error. If, upon tracing the path of the
measurement, nothing unusual is found, the value
can be assumed to be a valid result of an
environmental cause.
Chow et al., 1996
20Example Unusual Value Identification
- Potassium nitrate (KNO3) is a major component of
all fireworks. - This figure shows all available PM2.5 K data
from all North American sites, averaged to
produce a continental average for each day during
1988-1997. - Fourth of July celebration fireworks are clearly
observed in the potassium time series. - Fireworks displays on local holidays/events could
have a similar affect on data.
Poirot (1998)
Regional averaging and count of sample numbers
were conducted in Voyager, using variations of
the Voyager script on p. 6 of the Voyager
Workbook Kvoy.wkb. Additional averaging and
plotting was conducted in Microsoft Excel.
21Data Validation Continues During Data Analysis
- Two source apportionment models were applied to
PM2.5 data collected in Vermont, and the results
of the models were compared. - Excellent agreement for the selenium source was
observed for part of the data while the rest of
the results did not agree well. - Further investigation showed that the period of
good agreement coincided with a change in
laboratory analysis (with an accompanying change
in detection limit and measurement uncertainty -
the two models treat these quantities
differently.)
Poirot, 1999
22Validation of PM2.5 Mass
- Consistent validation of PM2.5 mass
concentrations across the U.S. is needed. To aid
in this, three tables of criteria were developed
and are provided in the appendix to this section
of the workbook. - Observations that do not meet each and every
criterion on the Critical Criteria Table should
be invalidated unless there are compelling
reasons and justification not to do so. - Criteria that are important for maintaining and
evaluating the quality of the data collection
system are included in the Operational
Evaluations Table. Violation of a criterion or a
number of criteria may be cause for invalidation. - Criteria important for the correct interpretation
of the data but that do not usually impact the
validity of a sample or group of samples are
included on the Systematic Issues Table.
U.S. EPA, 1999c
23Information to be Provided with PM Sampler Data
These supplemental measurements will be useful to
help explain or caveat unusual data
40 CFR 50 Appendix L, Table L-1
24Are Measurements Comparable?
- Example comparison of 24-hr average TEOM (from
hourly measurements), IMPROVE (gravimetric mass
from the A filter), and FRM PM2.5 mass
measurements made in New Haven, CT during the
third and fourth quarters of 1998. - During the colder months at this site, the TEOM
seems to report a lower concentration than the
FRM.
PM2.5 average values (mg/m3) New Haven, CT 1998
(No. of samples in the calculated average). For
example, in the third quarter, TEOM and IMPROVE
samples ran concurrently on 24 days. The ten
values where all three samplers ran are a subset
of the 24.
Graham, 1999
25National Contract Lab Responsibilities
- Discussion of national contract laboratory
responsibilities to be added.
26Data Access (1 of 2)
- Official data sources
- AIRS Data via public web at http//www.epa.gov/air
sdata - AIRS Air Quality System (AQS) via registered
users - register with EPA/NCC (703-487-4630)
- PM2.5 websites via public web
- PM2.5 Data Analysis Workbook at
http//capita.wustl.edu/databases/userdomain/pmfin
e/ - EPA PM2.5 Data Analysis clearinghouse at
http//www.epa.gov/oar/oaqps/pm25/ - Northern Front Range Air Quality Study at
http//nfraqs.cira.colostate.edu/index2.html - NEARDAT at http//capita.wustl.edu/NEARDAT
27Data Access (2 of 2)
- Secondary data sources
- Meteorological parameters from National Weather
Service (NWS) http//www.nws.noaa.gov - Meteorological parameters from PAMS/AIRS AQS
register with EPA/NCC (703-487-4630) - Collocated or nearby SO2, nitrogen oxides, CO,
VOC from AIRS AQS - Private meteorological agencies (e.g., forestry
service, agricultural monitoring, industrial
facilities)
28Sample Size Issues
- How complete must data be to show that an area
meets the NAAQS for PM?
U.S. EPA, 1999b
Sample size requirements for data analyses will
vary depending upon the analysis type, the
analysis goals, the variability in the data, and
other factors.
29Summary
- Data validation is vital because serious errors
in data analysis and modeling results can be
caused by erroneous individual data values. - This workbook section provides a discussion of
data validation levels, example validation
checks, and other information important to the
data validation process.
30References
- Ayers G.P., Keywood M.D., Gras J.L. (1999) TEOM
vs. manual gravimetric methods for determination
of PM2.5 aerosol mass concentrations. Atmos.
Environ., 33, pp. 3717-3721. - Chow J.C. and J.G. Watson (1998) Guideline on
speciated particulate monitoring. Draft report 3
prepared by Desert Research Institute for the
U.S. EPA Office of Air Quality Planning and
Standards. August. - Chow J.C. (1998) Descriptive data analysis
methods. Presentation prepared by Desert
Research Institute for the U.S. EPA in Research
Triangle Park, November. - Chow J.C., J.G. Watson, Z. Lu, D.H. Lowenthal,
C.A. Frazier, P.A. Solomon, R.H. Thuillier, K.
Magliano (1996) Descriptive analysis of PM2.5 and
PM10 at regionally representative locations
during SJVAQS/AUSPEX. Atmos. Environ., Vol. 30,
No. 12, 2079-2112. - Chow J.C. (1995) Measurement methods to determine
compliance with ambient air quality standards for
suspended particles. J. Air Waste Manage.
Assoc., 45, pp.320-382. - Graham, J. (1999) personal communication.
- Homolya J.B., Rice J., Scheffe R.D. (1998) PM2.5
speciation - objectives, requirements, and
approach. Presentation. September. - Main H.H., Chinkin L.R., and Roberts P.T. (1998)
PAMS data analysis workshops illustrating the
use of PAMS data to support ozone control
programs. Web page prepared for the U.S.
Environmental Protection Agency, Research
Triangle Park, NC by Sonoma Technology, Inc.,
Petaluma, CA, lthttp//www.epa.gov/oar/oaqps/pams/a
nalysisgt STI-997280-1824, June. - Poirot R. (1999) personal communication
- Poirot R. (1998) Tracers of opportunity
Potassium. Paper available at http//capita.wustl
.edu/PMFine/Workgroup/SourceAttribution/Reports/In
-progress/Potass/ktext.html - U.S. Environmental Protection Agency (1984)
Quality assurance handbook for air pollution
measurement systems, volume ii ambient air
specific methods (interim edition),
EPA/600/R-94/0386, April. - U.S. Environmental Protection Agency(1999a)
Particulate matter (PM2.5) speciation guidance
document. Available at http//www.epa.gov/ttn/amt
ic/files/ambient/pm25/spec/specpln3.pdf - U.S. Environmental Protection Agency(1999b)
Guideline on data handling conventions for the PM
NAAQS. EPA-454/R-99-008, April. - U.S. Environmental Protection Agency(1999c) PM2.5
mass validation criteria. Available at
http//www.epa.gov/ttn/amtic/pmqa.html
31Critical Criteria Table
U.S. EPA, 1999c
32Operational Evaluations Table (1 of 2)
U.S. EPA, 1999c
33Operational Evaluations Table (2 of 2)
U.S. EPA, 1999c
34Systematic Issues
U.S. EPA, 1999c