Model Based Geostatistics - PowerPoint PPT Presentation

About This Presentation
Title:

Model Based Geostatistics

Description:

Model Based Geostatistics Archie Clements University of Queensland School of Population Health Variable Coefficient Odds Ratio Intercept 1.9 (-2.3 - 10.3) LST 35 ... – PowerPoint PPT presentation

Number of Views:299
Avg rating:3.0/5.0
Slides: 46
Provided by: ArchieC2
Category:

less

Transcript and Presenter's Notes

Title: Model Based Geostatistics


1
Model Based Geostatistics
  • Archie Clements
  • University of Queensland
  • School of Population Health

2
Overview
  • Introduction to geostatistics
  • Assumptions
  • Variogram components
  • Variogram models
  • Kriging
  • Assumptions
  • Model-based geostatistics
  • Principles
  • Building the model
  • Prediction
  • Validation
  • Applications parasitic disease control in Africa

3
Spatial variation
Z
Y
X
4
First and second order variation
  • First-order variation
  • Trend
  • Large-scale variation
  • Can be due to large-scale environmental drivers
    (e.g. temperature for vector-borne diseases)
  • Second-order variation
  • Localised variation clustering
  • Modelled using geostatistics

5
Spatial dependence
  • Observations close in space are more similar than
    observations far apart
  • The variance of pairs of observations that are
    close together (small h) tends to be smaller than
    the variance of pairs far apart (large h)
  • Basis of the semivariogram
  • Spatial decomposition of the sample variance

6
Semivariance statistical notation
Semivariance is half the average squared
difference of values observed at locations
separated by a given distance (and direction)
Function of distance (and direction) distance in
bins, direction in sectors of compass azimuth
7
Modelling spatial correlation semivariogram
Partial Sill
Semivariance
Sill
Nugget
Lag (h)
8
Nugget
  • Random variation (white noise) non-spatial
    measurement error
  • Microvariation (spatial variation at a scale
    smaller than the smallest bin)
  • If no spatial correlation
  • Nugget sill (flat semivariogram)

9
Semivariogram decisions to be made
  • How many/what sized bins?
  • Depends on density of data points
  • For regular-spaced (grid-sampled) data bin size
    size of cells in the grid
  • For irregular sampling modify according to
    range of spatial correlation (big range, big
    bins small range, small bins)
  • What maximum lag(h) to use?
  • Should be estimated up to half the length of the
    shortest side of study area
  • Which parametric model to use?
  • Visual fit
  • Statistical fit

10
Variogram models
11
Schistosoma mansoni, Uganda
Omnidirectional semivariograms
12
Anisotropy
  • Spatial dependence is different in different
    directions
  • Semivariogram calculated in one direction is
    different from semivariogram calculated in
    another direction
  • Should check for anisotropy and, if present,
    accommodate it in interpolation
  • Range or sill (or both) can differ

13
Schistosoma mansoni, Uganda directional
semivariograms
Direction Range (km) Sill Nugget
Omni- directional 43.4 7E-2 4E-2
0 39.4 1E-1 -3E-3
45 43.6 7E-2 2E-2
90 35.8 8E-2 3E-2
135 39.5 1E-1 2E-2
14
Schistosoma haematobium, Northwestern Tanzania
Direction Range (km) Sill Nugget
Omni- directional 36.0 5E-2 0
0 260.1 2E-2 3E-2
45 163.9 6E-3 3E-2
90 56.2 5E-2 0
135 97.7 3E-2 7E-3
15
Schistosoma haematobium, Northwestern Tanzania
16
Trended and skewed data
  • Data should be de-trended
  • Polynomials (regression on XY coordinates)
  • Generalised linear models (regression on
    covariates)
  • Generalised additive models (can over-fit)
  • If directional variograms are calculated range
    in one direction is gt3 X range in perpendicular,
    sign of trend
  • If skewed, consider transformation (e.g. log
    transformation, normal score transformation)
  • Otherwise, extreme values overly influence
    interpolated map
  • Have to back-transform interpolated values
  • Called disjunctive Kriging

17
Non-stationarity
  • Spatial correlation structure cannot be
    generalised to the whole study area
  • Why does it occur?
  • Different factors may operate in different parts
    of the study area
  • Different ecological zones with different disease
    epidemiology
  • Need to estimate the spatial correlation
    structure separately in each homogeneous zone

18
Kriging
  • Z(si) is the measured value at the ith location
  • ?i is the weight attributed to the measured value
    at the ith location (calculated using
    semivariogram)
  • So is the prediction location

For formulae on how the weights are estimated
using the variogram http//en.wikipedia.org/wiki/
Kriging
Prediction standard error/variance gives an
indication of precision of the prediction
19
Geostatistics summary
  • Geostatistics involves 3 steps
  • Exploratory data analysis
  • Definition of a variogram
  • Using the variogram for interpolation (Kriging)
  • Technique applicable for
  • Point-referenced data
  • Spatially continuous processes
  • Disease risk
  • Rainfall, elevation, temperature, other climate
    variables
  • Wildlife, vegetation, geology (mineral deposits)

20
Bayesian model-based geostatistics
  • Seminal paper
  • Diggle, Tawn and Moyeed (1998). Model-based
    geostatistics. Appl. Stat. 473299-350
  • Observed a need for addressing non-Gaussian
    observational error
  • Idea is to embed linear Kriging methodology
    within a more general distributional framework
  • Generalised linear models with an unobserved
    Gaussian process in the linear predictor
  • Implemented in a Bayesian framework

21
Advantages of the Bayesian approach
  • Natural framework for incorporation of parameter
    uncertainty into spatial prediction
  • Can build uncertainty into parameters using
    priors
  • Non-informative
  • Informative (based on exploratory analysis,
    additional sources of information)
  • Convenient for modelling hierarchical data
    structures

22
Bayesian model-based geostatistics
23
Predictions
  • Can predict at specified validation locations
    (with observed outcomes for comparison)
  • Can predict at non-sampled locations, e.g. a
    prediction grid
  • Might be interested in
  • outcome
  • spatial random effect
  • Standard error of predicted outcome

24
Validation
  • Jack-knifing sampling with replacement
  • Remove one observation, do prediction at that
    location and store predicted value
  • Repeat for all observations
  • Compare predicted to observed using statistical
    measures of fit (RMSE) and discriminatory
    performance (AUC)
  • Not feasible with MBG other than with v. small
    datasets
  • Cross-validation sampling without replacement
  • Set aside a subset for validation (ideally 50)
  • Use remaining data to train model
  • Compare predicted and observed for the validation
    subset using statistical measures
  • Can then recombine the validation and training
    subsets for final model build
  • External validation using other prospective or
    retrospective dataset

25
Model-based geostatistics summary
  • Model-based geostatistics involves
  • Visual and exploratory data analysis
  • Variography (to determine if there is
    second-order spatial variation)
  • Variable selection (for deterministic component)
  • Building model (e.g. in WinBUGS)
  • Model selection (e.g. using DIC)
  • Prediction and validation

26
Application Schistosomiasis in Sub-Saharan
Africa
27
  • Schistosomiasis
  • 779 million people at risk
  • 207 million infected
  • Most in Africa
  • Significant illness and mortality
  • Two main forms in Africa
  • Urinary schistosomiasis caused by Schistosoma
    haematobium
  • Intestinal schistosomiasis caused by S. mansoni

28
Life cycle of Schistosoma haematobium

Cercariae released
Adult worm in human bladder wall
Sporocysts in snail
Eggs in urine
Miracidia
29
Diagnosis of infection
  • S. haematobium
  • Microscopic examination of urine slides Presence
    of eggs and egg counts
  • Macrohaematuria (visible blood)
  • Microhaematuria (invisible blood) tested using
    chemical reagent strips
  • Blood in urine questionnaire
  • S. mansoni and soil-transmitted helminths
  • microscopic examination of stool samples

30
School-based control programmes
  • School-aged children have highest prevalence
    (proportion infected) and intensity (severity) of
    infection
  • Education system is convenient for control
    central location to access target population

31
How do we determine which schools should be
targeted?
  • World Health Organisation guidelines treat
    communities biannually where prevalence in
    school-age children is gt10 and annually where
    prevalence gt50
  • No surveillance
  • Need to do surveys

32
Field survey northwest Tanzania
Lake Victoria
  • 153 schools surveyed
  • 60 children per school
  • What about non-sampled locations? Need to predict
    (interpolate) values

33
MBG model for S. haematobium prevalence
34
S. haematobium model results
Variable Coefficient Odds Ratio
Intercept 1.9 (-2.3 - 10.3)
LST gt35-39C 0.4 (-0.3 - 1.1) 1.5 (0.8 - 2.9)
LST gt39C 0.3 (-1.5 - 2.2) 1.4 (0.2 - 8.6)
Rainfall gt1050mm -1.1 (-3.4 - 1.1) 0.3 (3.3 x 10-2 - 3.1)
? 0.9 (0.6 - 1.3)
f 0.2 (0.1 - 1.0)
35
Clements et al. TMIH 2006
36
Uncertainty
Lower bound 95 PI
Upper bound 95 PI
37
  • Co-ordinated surveys in 3 contiguous countries
  • 418 schools
  • gt26,000 children

Variable
Variable Mean (95 CI) SD
Sex Female 0.70 (0.65, 0.76) 0.03
Age 910 years 1.16 (1.00, 1.33) 0.08
Age 1112 years 1.51 (1.31, 1.73) 0.10
Age 1316 years 1.79 (1.53, 2.06) 0.14
Distance to perennial water body 0.34 (0.21, 0.54) 0.08
Land surface temperature 0.80 (0.51, 1.21) 0.18
Land surface temperature2 1.10 (0.85, 1.40) 0.14
Rate of decay of spatial correlation 2.03 (1.48, 2.74) 0.32
Variance of the spatial random effect (sill) 7.03 (5.36, 9.31) 1.01
Probability that prevalence is gt50 Clements et
al. EID 2008
38
Other outcomes co-infection
East Africa Brooker and Clements, Int. J.
Parasitol., in press S. mansoni mono-infection
7.9 Hookworm mono-infection 40.5 Co-infection
8.1
39
Model for co-infection
,
YijkMultinomial(pijk,nijk),
40
Variable S. mansoni mono-infection posterior mean (95 posterior CI) Hookworm mono-infection posterior mean (95 posterior CI) S. mansoni/hookworm co-infection posterior mean (95 posterior CI)
Intercept -3.8 (-4.7 - -2.9) -0.6 (-1.1 - -0.3) -4.4 (-5.0 - -3.7)
OR Elevation 0.35 (0.22 - 0.58) 0.77 (0.65 - 0.89) 0.30 (0.20 - 0.47)
OR DPWB 0.23 (0.10 - 0.45) 0.94 (0.76 - 1.15) 0.30 (0.18 - 0.58)
OR Rural vs urban 0.43 (0.21 - 0.79) 0.98 (0.68 - 1.37) 0.61 (0.36 - 1.02)
OR Ext. rural vs urban 0.62 (0.23 - 1.44) 1.16 (0.82 - 1.81) 0.75 (0.31 - 1.62)
OR LST 0.88 (0.62 - 1.25) 0.60 (0.50 - 0.72) 0.57 (0.31 - 0.87)
OR Female 0.86 (0.76 - 0.96) 0.91 (0.86 - 0.97) 0.70 (0.63 - 0.77)
OR Age (9-10 years) 1.67 (1.37 - 2.06) 1.17 (1.04 - 1.30) 1.82 (1.52 - 2.21)
OR Age (11-13 years) 2.44 (2.06 - 2.89) 1.55 (1.39 - 1.71) 2.99 (2.55 - 3.52)
OR Age (14 years) 2.87 (2.19 - 3.71) 1.88 (1.63 - 2.14) 3.83 (3.01 - 4.86)
Phi (rate of decay) 3.52 (1.73 - 7.21) 4.98 (3.38 - 7.33) 3.76 (2.10 - 7.36)
Sill 6.39 (3.52 - 11.78) 1.31 (0.98 - 1.76) 6.34 (3.98 - 9.95)
41
Co-infection
Hookworm monoinfection
S. mansoni - Hookworm coinfection
S. mansoni monoinfection
42
Other outcomes Intensity of infection
  • Prevalence is used (currently) for disease
    control planning
  • Intensity of infection (eggs/ml urine or /g
    faeces) is more indicative of
  • Morbidity (anaemia, urine tract, hepatic
    pathology)
  • Transmission

43
Model for intensity of infection
44
Intensity of S. mansoni infection, East Africa
Clements et al. Parasitol 2006
Variable Posterior Mean (95 CI)
Intercept 10.06 (5.77 - 13.22)
Female -0.41 (-0.72 - -0.11)
Elevation (m) -0.007 (-0.01 - -0.004)
DPWB (dec deg) -5.36 (-7.51 - -3.30)
Sill 23.96 (19.06 - 32.07)
Range 0.134 (0.09 - 0.20)
Overdispersion 0.06 (0.058 - 0.062)
45
Conclusions
  • In disease control we need evidence-based
    framework for deciding on where to allocate
    limited control resources
  • Maps are useful tools for highlighting
    sub-national variation targeting interventions
    advocacy (national and local) integrated control
    programmes estimating heterogeneities in disease
    burden
  • Model-based geostatistics enables rich inference
    from spatial data uncertainty
Write a Comment
User Comments (0)
About PowerShow.com