Title: Statistics in WR: Session 20
1Statistics in WR Session 20
- Introduction to Spatial Statistics
- Ernest To
2Outline
- Basics of spatial statistics
- Kriging
- Application of spatial-temporal statistics
(Gravity currents in CCBay)
3Basics
4Consider the following scenario
- Two river stations, A and B, measure dissolved
oxygen (DO). - At station A
- mean DO µA 5 mg/L
- std dev at Station A sA 2 mg/L
- At station B
- mean DO µB 5 mg/L
- std dev at Station A sB 2 mg/L
- Correlation between measurements at stations A
and B ?AB 0.5.
A
B
5New data!
- We collected a DO measurement of 2 mg/L at
Station A. - What is the updated mean (µBXA ) and standard
deviation (sBXA) at Station B? - (assume that the DO distributions are normal)
µA 5 mg/L sA 2 mg/L New sample X A 2 mg/L
A
µB 5 mg/L sB 2 mg/L µBXA ? sBXA ?
B
6Lets sketch out the distributions
- Distributions at A and B (assume normal)
- Joint distribution at A and B
f(xA)
f(xB)
XA
XB
µA 5 mg/L, sA 2 mg/L
µB 5 mg/L, sB 2 mg/L
f(xA,xB)
XA
XB
7Marginal and joint distributions
8How does ?AB affect the shape of the joint
distribution?
Scatter plots of XA vs XB
?AB 0.5
?AB 0.99
?AB 0
?AB -0.99
XA
XB
f(xA,xB)
XA
XB
Joint distribution of XB and XA
9Bayesian conditioning
Prior pdf (joint distribution)
XA
PRIOR STAGE
XB
CONDITIONALIZATION STAGE Observed data is used to
update the distribution.
xA 2 mg/L
XA
XB
POSTERIOR STAGE A conditional pdf for XB is
generated.
Prior pdf
xA 2 mg/L
XA
Conditional pdf
XB
10Conditional pdf
If the prior pdf is binormal, the conditional pdf
is also normal with Mean Variance
Conditional pdf
XBXA
(The variance is independent of XA or XB
Homoscedasticity)
Expected value of conditional pdf is a linear
function of the conditioning data
11Back to the problem
- Updated mean and std. dev at Station B
- Mean
- Std. dev
µA 5 mg/L sA 2 mg/L New sample X A 2 mg/L
A
µB 5 mg/L sB 2 mg/L µBXA 3.5 mg/L sBXA
1.7 mg/L
B
12Can we do the same for any two points on the
river?
- Yes we can.
- But under following conditions
- Normality
- 2nd order stationarity
- Mean does not change with location
- Variance does not change with location
- Know the mean and variance.
- Have a function that determines the correlation
between two locations
A
µ 5 mg/L s 2 mg/L
B
13Modeling correlation
- In spatial statistics, correlation is modeled as
a function of the separation distance between two
points - Where h separation distance (aka lag).
Most of the time, correlation decreases with
distance. (Things that are closer together tend
to be more correlated with each other).
14Estimating correlation model from data
- Imagine the case where we have a smattering of
data along an axis. - Any given pair of data points, i and j, will have
two properties - The semivariance ? 0.5(Zi-Zj )2
- 2. The separation distance hij
15Estimating correlation model from data
- We can plot the semivariance, ? , of all possible
pairs against the lag, h. This gives us a
variogram.
16Estimating correlation model from data
- We can fit a curve through the semivariogram to
model the semivariance as a function of the lag.
This is the variogram model.
17Estimating correlation model from data
- We can fit a curve through the semivariogram to
model the semivariance as a function of the lag.
This is the variogram model.
sill
range
18Estimating correlation model from data
- Assuming that mean and variance do not change
with location (assumption of stationarity), the
variogram model is related to the - covariance model by the equation
C(h)
Where s2 is the variance
19Estimating correlation model from data
- Assuming that variance does not change with
location (assumption of stationarity), the
correlation model is related to the - covariance model model by the equation
?(h)
1
.8
.6
.4
.2
20How does the correlation model affect the
estimation
?AB 0
?AB 0.5
?AB 0.99
Scatter plots of XA vs XB
XA
XB
f(xA,xB)
Joint distribution of XA and XB
XA
XB
Conditional distribution of XBXA
XBXA
Increasing h
21Kriging
22Multivariable case
- What if we have more than one location that
provide conditioning data? - (Assume distributions are STILL normal at all
locations). - At station A1, A2, A3, A4
- µA1 µA2 µA3 µA4 5 mg/L
- sA1 sA2 sA3 sA4 2 mg/L
- At station B
- mean DO µB 5 mg/L
- std dev at Station A sB 2 mg/L
- ? f(h) 0.0125h2 - 0.225h 1
A1
A2
A3
A4
B
23Modeling correlation
? f(h) 0.0125h2 - 0.225h 1
Distance along river (in hundred meters)
2
2
2
2
From correlation model ?A1B 0.0, ?A2B 0.1,
?A3B 0.3, ?A4B 0.6 ?A1A2 0.6, ?A1A3 0.3,
?A1A4 0.1, ?A2A3 0.6, ?A2A4 0.3 , ?A3A4 0.6
24Dealing with multiple variables
- Divide locations into two groups
- The vector, , representing the set of random
variables at the locations contributing the
conditioning data. - The variable, ,representing the random
variable at the point of estimation.
A1
A2
A3
A4
B
25Concept
1. If individual distributions are normal, joint
pdf is multi-normal.
2. Group variables into two one for points
with data, one for the point of estimation.
Prior pdf
3. Intersect pdf with conditioning data to get
conditional pdf.
Conditional pdf
26Dealing with multiple variables
- The updated mean and variance of the distribution
at Station B are given by - Mean
- Variance
- Where
A1
A2
A3
A4
B
27Equations in multivariable case are more
generalized
Recall two variable case
- Multivariable case takes into account
- Correlation between data locations and estimated
location ( ). - Correlation among data locations ( ).
- This is the most fundamental form of kriging,
i.e. Simple Kriging.
Multivariable case
Conditional pdf
28Plug and Chug
- Recall that Cov(A,B) ?AB sA s B
- Compute data to data correlation
29Plug and Chug
- Compute data to estimation point correlation
30Plug and Chug
weights
Note The weights attributed to each station are
determined by the prior (joint distribution)
among them.
31Plug and Chug
Weights ?1, ?2, ?3, ?n
weights
Note The weights attributed to each station are
determined by the prior (joint distribution)
among them.
32Plug and Chug
33Plug and Chug
34Results from Simple Kriging
- The updated mean and standard deviation of the
distribution at Station B are - Mean
- Standard deviation
A1
A2
A3
A4
B
35Other forms of kriging
- Ordinary kriging (OK)
- Does not require mean to be known
- Assumes that mean is constant and is somewhere in
the range of the conditioning data - Universal kriging (UK)
- Does not require mean to be known nor require it
to be constant - User specifies a model for the trend in mean. UK
will then fit the model to the data. - Indicator kriging (IK)
- handles binary variables (0 or 1)
- has ability to take care of non-normality in data
through iterative application. - Co-kriging (CK)
- takes into account a related secondary variable
to help estimate the primary variable.
36Extension to 2D, 3D
- The lag can be represented by the euclidean
distance between 2 points - So the covariance model of the form, C f(h),
can still be used - Variables may be more correlated in one direction
than the other (anisotropy) - linear transformation can be performed to
transform the distances so the correlation
distance is the same in all directions (isotropy)
37Extension to space-time
- For space and time, there is no standard
space-time metric. - The form
- is not always correct because the temporal and
spatial axes are not always orthogonal to each
other. - Processes that happen in time usually have some
dependency on processes that happen in space. - (They are not independent).
- A separate temporal lag term is usually used
- The covariance function takes on the form
38Application(Gravity currents in Corpus Christi
Bay)
39Sensors in Corpus Christi Bay
TCOON stations
TCEQ stations
Corpus Christi Bay
Oso Bay
Gulf of Mexico
Laguna Madre
Aerial photo from Google Earth
HRI stations
USGS gages
SERF stations
40(No Transcript)
41(No Transcript)
42Selecting a study area
depressions
ridges
?
?
?
- 5.0 m above Mean High Water Level
- 4.5 m above Mean High Water Level
Oso Bay
- 4.0 m above Mean High Water Level
- 3.5 m above Mean High Water Level
West Laguna Madre
- 2.5 m above MeanHigh Water Level
East Laguna Madre
- 2.0 m above Mean High Water Level
- 1.5 m above Mean High Water Level
- 1.0 m above Mean High Water Level
channel
43Downstream of East Laguna Madre
Water quality data July 12 and 18, 2006. (At
birth and demise of gravity current) Paul
Montagna Texas AM University, Corpus Christi
- Plume tracking survey
- July 14 to 17, 2006.
- (While gravity current was on the move)
- Ben Hodges
- University of Texas at Austin
44Synthesis of data
45Data Preparation
1. Salinity data from HRI are acquired using
HydroGet (a GIS web service client) and combined
with plume tracking data.
2. Data locations are projected onto a reference
line following the general direction of flow.
- Space-time kriging is performed in 3 dimensions
- X Longitudinal measure
- (meters from origin point)
- Y Time
- (days since 7/12/2006)
- Z Elevation
- (meters from water surface)
-
Reference line
Origin x 0 m
46Variogram along direction of flow
where h lag distance along direction of
flow C0 nugget 2 psu2 C1 sill 3.6 psu2 a
range 6000 m (Gaussian variogram model)
47Variogram along direction of flow
where h lag distance along direction of
flow C0 nugget 2 psu2 C1 sill 3.6 psu2 a
range 6000 m (Gaussian variogram model)
sill
nugget
range
48Variogram along depth
where h lag distance along direction of
flow C0 nugget 0 psu2 C1 sill 3.6 psu2 a
range 1.7 m (Gaussian variogram model)
49Variogram along time axis
where h lag distance along direction of
flow C0 nugget 0 psu2 C1 sill 3 psu2 a
range 1 day (Spherical variogram model)
50Interpolation results
N
LEGEND
37 40 psu 40 42 psu 42 43 psu 42 44
psu 44 46 psu
Longitudinal profile on 7/13/2006 1800
z
Longitudinal profile on 7/12/2006 1800
N
y
x
51Longitudinal Profiles
52Bottom salinities
53Cross validation
- a common method to evaluate variogram models.
- aka fictitious point method (Delhomme, 1978),
- remove one data point at a time from data set and
then using the remaining n-1 points the estimate
the removed point. - estimated and actual values were then compared
with each other.
54Conclusions
- Weve covered
- Basics of spatial statistics
- Kriging
- Application of spatial-temporal statistics
(Gravity currents in CCBay) - Spatial statistics is fun!
55Geostatistical tools
- ArcGIS Geostatistical Analyst
- Easiest to use
- GSLIB
- Library of fortran programs
- DeCesares version of GSLIB
- Modification of GSLIB to do space-time kriging
- BMELIB
- Library of MATLAB programs