Global Clustering Tests - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Global Clustering Tests

Description:

More than 100 different tests for global clustering proposed by different ... i ci j cj I(dij dik(i)) where. ci = number of deaths in county i ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 32
Provided by: MartinKu
Learn more at: https://www.satscan.org
Category:

less

Transcript and Presenter's Notes

Title: Global Clustering Tests


1
Global Clustering Tests
2
Tests for Spatial Randomness
H0 The risk of disease is the same everywhere
after adjustment for age, gender and/or other
covariates.
3
Tests for Global Clustering
Evaluates whether clustering exist as a global
phenomena throughout the map, without
pinpointing the location of specific clusters.
4
Tests for Global Clustering
  • More than 100 different tests for global
    clustering proposed by different scientists in
    different fields. For example
  • Whittemores Test, Biometrika 1987
  • Cuzick-Edwards k-NN, JRSS 1990
  • Besag-Newells R, JRSS 1991
  • Tangos Excess Events Test, StatMed 1995
  • Swartz Entropy Test, Health and Place 1998
  • Tangos Max Excess Events Test, StatMed 2000

5
Cuzick-Edwards k-NN Test
åi ci åj cj I(dijltdik(i)) where ci number of
deaths in county i dij distance from county i
to county j k(i) the county with the k-nearest
neighbor to an individual in county i, defined
in terms of expected cases rather than
individuals.
6
Cuzick-Edwards k-NN Test
Special case of the Weighted Morans I
Test, proposed by Cliff and Ord, 1981
7
Tangos Excess Events Test
åi åj cj-E(cj) cj-E(cj) e-4d2ij/l2
where ci number of deaths in county i E(cj)
expected cases in county i H0 dij distance
from county i to county j l clustering scale
parameter
8
Whittemore's Test
  • Whittemore et al. proposed the statistic

9
Besag- Newells R
  • For each case, find the collection of nearest
    counties so that there are a total of at least k
    cases in the area of the original and neighboring
    counties.
  • Using the Poisson distribution, check if this
    area is statistically significant (not adjusting
    for multiple testing)
  • R is the the number of cases for which this
    procedure creates a significant area

10
Besag-Newell's R
  • Let um(i)minj(Dj(i)1) k. Under null
    hypothesis, the case number s will have Poisson
    distribution with probability
  • where pC/N. For each county
  • R is defined as

11
Swartzs Entropy Test
  • The test statistic is defined as

where ni is the population in county I, and N is
the total population
12
Global Clustering TestsPower Evaluation
Joint work with Toshiro Tango, Peter Park and
Changhong Song
13
Power Evaluation, Setup
  • 245 counties and county equivalents in
    Northeastern United States
  • Female population
  • 600 randomly distributed cases, according to
    different probability models

14
Note
Besag-Newells R and Cuzick-Edwards k-NN tests
depend on a clustering scale parameter. For each
test we evaluate three different parameters.
15
Global Chain Clustering
  • Each county has the same expected number of cases
    under the null and alternative hypotheses
  • 300 cases are distributed according to complete
    spatial randomness
  • Each of these have a twin case, located at the
    same or a nearby location.

16
PowerZero Distance
  • Besag-Newell 0.48 0.49 0.42
  • Cuzick-Edwards 1.00 0.92 0.73
  • Tangos MEET 0.99
  • Swartz Entropy 1.00
  • Whittemores Test 0.13
  • Spatial Scan 0.79

17
PowerFixed Distance, 1
  • Besag-Newell 0.06 0.08 0.23
  • Cuzick-Edwards 0.16 0.32 0.38
  • Tangos MEET 0.41
  • Swartz Entropy 0.14
  • Whittemores Test 0.12
  • Spatial Scan 0.28

18
PowerFixed Distance, 4
  • Besag-Newell 0.06 0.06 0.12
  • Cuzick-Edwards 0.06 0.06 0.07
  • Tangos MEET 0.17
  • Swartz Entropy 0.06
  • Whittemores Test 0.10
  • Spatial Scan 0.12

19
PowerRandom Distance, 1
  • Besag-Newell 0.14 0.21 0.27
  • Cuzick-Edwards 0.53 0.52 0.47
  • Tangos MEET 0.56
  • Swartz Entropy 0.39
  • Whittemores Test 0.12
  • Spatial Scan 0.35

20
PowerRandom Distance, 4
  • Besag-Newell 0.08 0.10 0.12
  • Cuzick-Edwards 0.14 0.17 0.18
  • Tangos MEET 0.25
  • Swartz Entropy 0.13
  • Whittemores Test 0.10
  • Spatial Scan 0.18

21
Hot Spot Clusters
  • One or more neighboring counties have higher risk
    that outside.
  • Constant risks among counties in the cluster, as
    well as among those outside the cluster

22
PowerGrand Isle, Vermont (RR193)
  • Besag-Newell 0.71 0.39 0.09
  • Cuzick-Edwards 0.75 0.17 0.04
  • Tangos MEET 0.20
  • Swartz Entropy 0.94
  • Whittemores Test 0.02
  • Spatial Scan 1.00

23
PowerGrand Isle 15 neigbors (RR3.9)
  • Besag-Newell 0.82 0.88 0.50
  • Cuzick-Edwards 0.76 0.62 0.25
  • Tangos MEET 0.23
  • Swartz Entropy 0.71
  • Whittemores Test 0.01
  • Spatial Scan 0.97

24
PowerPittsburgh, PA (RR2.85)
  • Besag-Newell 0.04 0.02 0.98
  • Cuzick-Edwards 0.65 0.92 0.90
  • Tangos MEET 0.92
  • Swartz Entropy 0.27
  • Whittemores Test 0.00
  • Spatial Scan 0.94

25
PowerPittsburgh 15 neighbors (RR2.1)
  • Besag-Newell 0.29 0.28 0.91
  • Cuzick-Edwards 0.60 0.72 0.84
  • Tangos MEET 0.83
  • Swartz Entropy 0.35
  • Whittemores Test 0.00
  • Spatial Scan 0.95

26
PowerManhattan (RR2.73)
  • Besag-Newell 0.04 0.03 0.95
  • Cuzick-Edwards 0.63 0.86 0.89
  • Tangos MEET 0.94
  • Swartz Entropy 0.26
  • Whittemores Test 0.27
  • Spatial Scan 0.92

27
PowerManhattan 15 neighbors (RR1.53)
  • Besag-Newell 0.01 0.06 0.37
  • Cuzick-Edwards 0.26 0.65 0.80
  • Tangos MEET 0.99
  • Swartz Entropy 0.05
  • Whittemores Test 0.87
  • Spatial Scan 0.93

28
Power, Three ClustersGrand Isle (RR193),
Pittsburgh (RR2.85), Manhattan (RR2.73
  • Besag-Newell 0.54 0.18 1.00
  • Cuzick-Edwards 0.99 1.00 0.99
  • Tangos MEET 1.00
  • Swartz Entropy 0.99
  • Whittemores Test 0.01
  • Spatial Scan 1.00

29
Power, Three ClustersGrand Isle 15, Pittsburgh
15, Manhattan 15
  • Besag-Newell 0.64 0.77 0.84
  • Cuzick-Edwards 0.91 0.96 0.96
  • Tangos MEET 0.98
  • Swartz Entropy 0.74
  • Whittemores Test 0.12
  • Spatial Scan 0.98

30
Conclusions
  • Besag-Newells R and Cuzick-Edwards k-NN often
    perform very well, but are highly dependent on
    the chosen parameter
  • Morans I and Whittemores Test have problems
    with many types of clustering
  • Tangos MEET perform well for global clustering
  • The spatial scan statistic perform well for
    hot-spot clusters

31
Limitations
  • Only a few alternative models evaluated, on one
    particular geographical data set.
  • Results may be different for other types of
    alternative models and data sets.
Write a Comment
User Comments (0)
About PowerShow.com