FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES

Description:

Averaged the quarter predictions to get annual average estimate. Annual Average Predictions ... Also must decide whether to transform each day using the same function. ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 57
Provided by: amyn7
Category:

less

Transcript and Presenter's Notes

Title: FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES


1
FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES
  • Yan Liu and Amy Nail
  • Department of Statistics
  • North Carolina State University
  • EPA
  • Office of Air Quality, Planning, and Standards
  • Emissions Monitoring, and Analysis Division

2
Project Objectives
  • Estimation of annual average of PM2.5
    concentration
  • Estimation of standard errors associated with
    annual average estimates
  • Estimation of the probability that a sites
    annual average exceeds 15 mg/m3
  • At 2400 lattice points for 2000, 2001
  • Comparisons of 4 different methodologies
  • 1. Quarter-based analysis (Yan)
  • 2. Annual-based analysis (Yan)
  • Daily-based analyses
  • 3. Doug Nychkas method (Bill)
  • 4. Generalized least squares in
  • SAS Proc Mixed (Amy)

3
Why are Standard Errors Important?
  • We may estimate that the annual average for
    lattice point 329 is 16 mg/m3, which exceeds the
    standard of 15. But since our estimate has some
    uncertainty or standard error, wed like to take
    this uncertainty into account in order to
    determine the probability that lattice point 329
    exceeds 15.

4
In addition to maps like this ...
5
we also want maps like this.
Note This Map is WRONG--so dont show it to
anyone! We havent figured out the correct way
to determine errors, so we cannot correctly draw
a probability map yet.
6
Data Description
  • Concentrations of PM2.5 measured during 2000,
    2001
  • The domain analyzed the portion of the U.S. east
    of 100o longitude
  • Concentrations measured every third day

7
Map of 2400 Lattice Points
8
Method 1 Quarterly Analysis
  • 3 months in each quarter
  • Q1(Jan. - Mar.) Q2(Apr. - Jun.)
  • Q3(Jul. - Sep.) Q4(Oct. - Dec.)
  • Within quarters, 75 completeness
  • Found quarter mean conc. at each site
  • For each quarter, kriged mean conc. over lattice
  • Averaged the quarter predictions to get annual
    average estimate

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Annual Average Predictions
14
Method 2 Annual Analysis
  • Used sites common to all 4 quarters in quarterly
    analysis
  • Found annual mean conc. at each site
  • Kriged annual mean conc. over lattice

15
(No Transcript)
16
The Number of Sites
17
Model for Quarterly and Annual Analyses
  • Predicted value
  • quadratic surface prediction (SP)
  • error prediction (KP)

18
Estimating Quadratic Surface
  • Model
  • Conc ?0 ?1lat ?2lon ?3lat2 ?4lon2
    ?5lat lon ?
  • Assume 1) E(?) 0, Var(?) ?2 I
  • 2) The betas are
    estimated by SAS
  • assuming errors iid
  • Fit parameters using ordinary least squares in
    SAS proc reg
  • Obtained surface predictions (SP) and their
    standard errors (SEsp) and the ?s

19
Kriging the Error Surface
  • Model
  • ?(s) s ? R2
    E(?(s) ) 0 Var(?(s) - ?(s) )
    0 if ss
    ?2n ?2(1- e-dist/?) if s?s
  • Estimated variogram parameters using nonlinear
    least squares in Splus
  • Obtained kriging predictions (KP) and their
    standard errors (SEkp)

20
Variogram Models
  • 3 commonly used variogram models
  • Exponential
  • ?(h)1 exp (-3h/a)
  • Spherical
  • ?(h)1.5 (h/a) - 0.5 (h/a)3 if h ?a
  • ?(h)1 otherwise
  • Gaussian
  • ?(h)1 - exp (-3h2 /a2)
  • a range
  • h distance

21
Cross Validation to Select Variogram Model
  • Idea temporarily remove the sample value at a
    particular location one at a time, estimate this
    value from remaining data using the different
    variogram models.
  • Prediction error observed - predicted
  • MSE 1/(n-1) ?(prediction error)2

22
Cross Validation MSE for Three Variogram Models
Exponential model has the least MSE.
Conclusion use Exponential model
23
Calculating Predicted Annual Averages
  • Quarter averages
  • PQi SPQi KPQi
  • Annual average from quarterly analysis
  • Pannual ( ? PQi) / 4
  • Annual average from annual analysis
  • Pannual SPannual KPannual

4
i1
24
(No Transcript)
25
Calculation of Standard Error for Annual Averages
  • Standard errors of quarterly averages
  • SEQi ?(SEspi)2 (SEkpi)2
  • Standard errors of annual averages from quarterly
    analysis
  • SEannual ?1/16 ?(SEQi)2
  • Standard errors of annual averages from annual
    analysis
  • SEannual ?(SEsp)2 (SEkp)2

26
Sources of Error
Less than 5 of total errors is coming from
fitting a quadratic surface. Kriging
prediction error dominates.
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Problems With Quarterly Annual Analysis
  • The surface prediction and kriging prediction are
    not independent.
  • Var (SP KP) ? Var (SP) Var (KP)

31
(No Transcript)
32
(No Transcript)
33
More Problems With Quarterly and Annual Analysis
  • Not using all available data
  • When kriging residuals, estimated variogram is
    biased low (Kim and Boos 2002) (This problem
    could be solved by using generalized least
    squares.)
  • Ignored standard deviation of annual and/or
    quarterly averages in calculation of kriging
    prediction error
  • Quarterly averages may not be independent

34
Methods 3 4 - Daily-Based
  • Used every third day data (122 days per year)
  • Kriged each day to obtain predictions at 2400
    lattice points
  • At each lattice point fit a timeseries to the 122
    days estimates to estimate annual average
  • Calculated timeseries error for annual average
    using proc arima

35
Method 3 - Dougs Method
  • Fit a quadratic surface using the Krig function
    in Splus
  • Used an algorithm that minimizes generalized
    cross validation error in order to estimate all
    parameters--including both quadratic surface
    parameters and covariance parameters
  • Did not assume errors iid when fitting quad surf,
    so coefficients in quad surf estimated based on
    cov structure
  • Specified an exponential covariance structure
    with a nugget
  • Provided the fixed value of 200 km for range
    parameter for all 122 days

36
Method 4 - Amys Method
  • Fit a quadratic surface using Generalized Least
    Squares in SAS Proc Mixed
  • Restricted (or residual) Maximum Likelihood used
    to estimate all parameters
  • Did not assume errors iid when fitting quad surf,
    so coefficients in quad surf estimated based on
    cov structure
  • Specified an exponential covariance structure
    with a nugget
  • Estimated each parameter each day

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Problems with Dougs Method
  • Using the same value for range parameter every
    day requires assumption that the range parameter
    is constant over time. Not a valid assumption.
    Amys method does not make this assumption.
  • Ignored kriging prediction error in calculation
    of timeseries error for annual average.

43
Problems with Amys Method
  • REML assumes data for each day is normally
    distributed. It isnt. Can fix by using a
    transformation, but must be careful not to
    introduce bias in back-transform. There is an
    unbiased back-transform predictor and an
    associated estimate of error in Cressie section
    3.2.2. Also must decide whether to transform
    each day using the same function. Dougs method
    does not require normality assumption.
  • Ignored kriging prediction error in calculation
    of timeseries error for annual average.

44
What if we propagate errors?
  • At a given lattice point we have 122 days worth
    of predictions, each with a kriging prediction
    error. What if we treat the 122 days as
    independent observations (they arent, they are
    AR1) and combine the errors accordingly? And we
    do this for each of our 2400 lattice points.

45
(No Transcript)
46
(No Transcript)
47
The Big Problem
  • None of our standard error estimates are correct!
  • They are all underestimates!
  • We need to learn how to put spatial error
    components together with temporal error
    components.

48
Model for one day
  • Yij ?o ?1i ?2i2 ?3j ?4j2 ?5ij ?ij
  • Where i lattitude j longitude
  • E(?ij) 0
  • Cov(?ij, ?Ij) ?2n ?2e-dist/? iiand jj
    ?2e-dist/? i?i or j?j

49
Model for one site
  • Yk ? ?(Yk-1- ?) ek k 1,,122
  • Where E(ek) 0
  • Var (ek) ?2
  • Note this is an AR1 model. The errors are iid
    (0, ?2) because the temporal correlation is
    accounted for using the ?(Yk-1- ?) term.

50
Model for all sites and days?
  • Yijk ?o,k ?1,ki ?2,ki2 ?3,kj ?4,kj2
    ?5,kij ?ijk eijk
  • Where E(?ijk ) 0, E(eijk) 0
  • Weve assumed isotropy and stationarity for
    simplicity.
  • But how do we model Cov(?ijk, ?Ijk), Cov(eijk,
    eijk), and Cov (?ijk, eijk)?

51
Separability
  • Weve been treating the covariance structure as
    separable--meaning that the 1-D temporal and 2-D
    spatial covariance structures can be estimated
    separately and then can be mathematically
    combined to obtain a 3-D space-time covariance
    structure. We need to test for separability, and
    if the covariance components are separable, we
    need to appropriately combine them. We are just
    now learning how to do this.

52
Next Steps.
  • Re-do Quarterly and Annual analyses using
    generalized least squares
  • Perform Amys analysis using transformations,
    making sure to use an unbiased estimator in the
    back-transform and the appropriate error
    estimator. How much does the lack of normality
    in the original analysis affect results?

53
More next steps.
  • Investigate the separability of the covariance
    structure and the correct method for combining
    space and time covariance components.
  • Attempt a 3-dimensional kriging. No assumption
    of separability is required to do this. We must,
    however, write our own code for this project
    because there is no software package (to our
    knowledge) that performs such an analysis. This
    method would allow us to use even more data than
    we are using now, as we would not be restricted
    to every third day.

54
Thats all, folks!
55
(No Transcript)
56
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com