Title: FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES
1FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES
- Yan Liu and Amy Nail
- Department of Statistics
- North Carolina State University
- EPA
- Office of Air Quality, Planning, and Standards
- Emissions Monitoring, and Analysis Division
2Project Objectives
- Estimation of annual average of PM2.5
concentration - Estimation of standard errors associated with
annual average estimates - Estimation of the probability that a sites
annual average exceeds 15 mg/m3 - At 2400 lattice points for 2000, 2001
- Comparisons of 4 different methodologies
- 1. Quarter-based analysis (Yan)
- 2. Annual-based analysis (Yan)
-
- Daily-based analyses
- 3. Doug Nychkas method (Bill)
- 4. Generalized least squares in
- SAS Proc Mixed (Amy)
3Why are Standard Errors Important?
- We may estimate that the annual average for
lattice point 329 is 16 mg/m3, which exceeds the
standard of 15. But since our estimate has some
uncertainty or standard error, wed like to take
this uncertainty into account in order to
determine the probability that lattice point 329
exceeds 15.
4In addition to maps like this ...
5we also want maps like this.
Note This Map is WRONG--so dont show it to
anyone! We havent figured out the correct way
to determine errors, so we cannot correctly draw
a probability map yet.
6Data Description
- Concentrations of PM2.5 measured during 2000,
2001 - The domain analyzed the portion of the U.S. east
of 100o longitude - Concentrations measured every third day
7Map of 2400 Lattice Points
8 Method 1 Quarterly Analysis
- 3 months in each quarter
- Q1(Jan. - Mar.) Q2(Apr. - Jun.)
- Q3(Jul. - Sep.) Q4(Oct. - Dec.)
- Within quarters, 75 completeness
- Found quarter mean conc. at each site
- For each quarter, kriged mean conc. over lattice
- Averaged the quarter predictions to get annual
average estimate
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Annual Average Predictions
14Method 2 Annual Analysis
- Used sites common to all 4 quarters in quarterly
analysis - Found annual mean conc. at each site
- Kriged annual mean conc. over lattice
15(No Transcript)
16The Number of Sites
17Model for Quarterly and Annual Analyses
- Predicted value
-
- quadratic surface prediction (SP)
-
- error prediction (KP)
18Estimating Quadratic Surface
- Model
- Conc ?0 ?1lat ?2lon ?3lat2 ?4lon2
?5lat lon ? - Assume 1) E(?) 0, Var(?) ?2 I
- 2) The betas are
estimated by SAS - assuming errors iid
-
- Fit parameters using ordinary least squares in
SAS proc reg - Obtained surface predictions (SP) and their
standard errors (SEsp) and the ?s
19Kriging the Error Surface
- Model
- ?(s) s ? R2
E(?(s) ) 0 Var(?(s) - ?(s) )
0 if ss
?2n ?2(1- e-dist/?) if s?s - Estimated variogram parameters using nonlinear
least squares in Splus - Obtained kriging predictions (KP) and their
standard errors (SEkp) -
20Variogram Models
- 3 commonly used variogram models
- Exponential
- ?(h)1 exp (-3h/a)
- Spherical
- ?(h)1.5 (h/a) - 0.5 (h/a)3 if h ?a
- ?(h)1 otherwise
- Gaussian
- ?(h)1 - exp (-3h2 /a2)
- a range
- h distance
21Cross Validation to Select Variogram Model
- Idea temporarily remove the sample value at a
particular location one at a time, estimate this
value from remaining data using the different
variogram models. - Prediction error observed - predicted
- MSE 1/(n-1) ?(prediction error)2
22Cross Validation MSE for Three Variogram Models
Exponential model has the least MSE.
Conclusion use Exponential model
23Calculating Predicted Annual Averages
- Quarter averages
- PQi SPQi KPQi
- Annual average from quarterly analysis
- Pannual ( ? PQi) / 4
- Annual average from annual analysis
- Pannual SPannual KPannual
4
i1
24(No Transcript)
25Calculation of Standard Error for Annual Averages
- Standard errors of quarterly averages
- SEQi ?(SEspi)2 (SEkpi)2
- Standard errors of annual averages from quarterly
analysis - SEannual ?1/16 ?(SEQi)2
- Standard errors of annual averages from annual
analysis - SEannual ?(SEsp)2 (SEkp)2
26Sources of Error
Less than 5 of total errors is coming from
fitting a quadratic surface. Kriging
prediction error dominates.
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Problems With Quarterly Annual Analysis
- The surface prediction and kriging prediction are
not independent. - Var (SP KP) ? Var (SP) Var (KP)
31(No Transcript)
32(No Transcript)
33More Problems With Quarterly and Annual Analysis
- Not using all available data
- When kriging residuals, estimated variogram is
biased low (Kim and Boos 2002) (This problem
could be solved by using generalized least
squares.) - Ignored standard deviation of annual and/or
quarterly averages in calculation of kriging
prediction error - Quarterly averages may not be independent
34Methods 3 4 - Daily-Based
- Used every third day data (122 days per year)
- Kriged each day to obtain predictions at 2400
lattice points - At each lattice point fit a timeseries to the 122
days estimates to estimate annual average - Calculated timeseries error for annual average
using proc arima
35Method 3 - Dougs Method
- Fit a quadratic surface using the Krig function
in Splus - Used an algorithm that minimizes generalized
cross validation error in order to estimate all
parameters--including both quadratic surface
parameters and covariance parameters - Did not assume errors iid when fitting quad surf,
so coefficients in quad surf estimated based on
cov structure - Specified an exponential covariance structure
with a nugget - Provided the fixed value of 200 km for range
parameter for all 122 days
36Method 4 - Amys Method
- Fit a quadratic surface using Generalized Least
Squares in SAS Proc Mixed - Restricted (or residual) Maximum Likelihood used
to estimate all parameters - Did not assume errors iid when fitting quad surf,
so coefficients in quad surf estimated based on
cov structure - Specified an exponential covariance structure
with a nugget - Estimated each parameter each day
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Problems with Dougs Method
- Using the same value for range parameter every
day requires assumption that the range parameter
is constant over time. Not a valid assumption.
Amys method does not make this assumption. - Ignored kriging prediction error in calculation
of timeseries error for annual average.
43Problems with Amys Method
- REML assumes data for each day is normally
distributed. It isnt. Can fix by using a
transformation, but must be careful not to
introduce bias in back-transform. There is an
unbiased back-transform predictor and an
associated estimate of error in Cressie section
3.2.2. Also must decide whether to transform
each day using the same function. Dougs method
does not require normality assumption. - Ignored kriging prediction error in calculation
of timeseries error for annual average.
44What if we propagate errors?
- At a given lattice point we have 122 days worth
of predictions, each with a kriging prediction
error. What if we treat the 122 days as
independent observations (they arent, they are
AR1) and combine the errors accordingly? And we
do this for each of our 2400 lattice points.
45(No Transcript)
46(No Transcript)
47The Big Problem
- None of our standard error estimates are correct!
- They are all underestimates!
- We need to learn how to put spatial error
components together with temporal error
components.
48Model for one day
- Yij ?o ?1i ?2i2 ?3j ?4j2 ?5ij ?ij
- Where i lattitude j longitude
- E(?ij) 0
- Cov(?ij, ?Ij) ?2n ?2e-dist/? iiand jj
?2e-dist/? i?i or j?j
49Model for one site
- Yk ? ?(Yk-1- ?) ek k 1,,122
- Where E(ek) 0
- Var (ek) ?2
- Note this is an AR1 model. The errors are iid
(0, ?2) because the temporal correlation is
accounted for using the ?(Yk-1- ?) term.
50Model for all sites and days?
- Yijk ?o,k ?1,ki ?2,ki2 ?3,kj ?4,kj2
?5,kij ?ijk eijk - Where E(?ijk ) 0, E(eijk) 0
- Weve assumed isotropy and stationarity for
simplicity. - But how do we model Cov(?ijk, ?Ijk), Cov(eijk,
eijk), and Cov (?ijk, eijk)?
51Separability
- Weve been treating the covariance structure as
separable--meaning that the 1-D temporal and 2-D
spatial covariance structures can be estimated
separately and then can be mathematically
combined to obtain a 3-D space-time covariance
structure. We need to test for separability, and
if the covariance components are separable, we
need to appropriately combine them. We are just
now learning how to do this.
52Next Steps.
- Re-do Quarterly and Annual analyses using
generalized least squares - Perform Amys analysis using transformations,
making sure to use an unbiased estimator in the
back-transform and the appropriate error
estimator. How much does the lack of normality
in the original analysis affect results?
53More next steps.
- Investigate the separability of the covariance
structure and the correct method for combining
space and time covariance components. - Attempt a 3-dimensional kriging. No assumption
of separability is required to do this. We must,
however, write our own code for this project
because there is no software package (to our
knowledge) that performs such an analysis. This
method would allow us to use even more data than
we are using now, as we would not be restricted
to every third day.
54Thats all, folks!
55(No Transcript)
56(No Transcript)