Title: Global Clustering Tests
1Global Clustering Tests
2Tests for Spatial Randomness
H0 The risk of disease is the same everywhere
after adjustment for age, gender and/or other
covariates.
3Tests for Global Clustering
Evaluates whether clustering exist as a global
phenomena throughout the map, without
pinpointing the location of specific clusters.
4Tests for Global Clustering
- More than 100 different tests for global
clustering proposed by different scientists in
different fields. For example - Whittemores Test, Biometrika 1987
- Cuzick-Edwards k-NN, JRSS 1990
- Besag-Newells R, JRSS 1991
- Tangos Excess Events Test, StatMed 1995
- Swartz Entropy Test, Health and Place 1998
- Tangos Max Excess Events Test, StatMed 2000
5Cuzick-Edwards k-NN Test
åi ci åj cj I(dijltdik(i)) where ci number of
deaths in county i dij distance from county i
to county j k(i) the county with the k-nearest
neighbor to an individual in county i, defined
in terms of expected cases rather than
individuals.
6Cuzick-Edwards k-NN Test
Special case of the Weighted Morans I
Test, proposed by Cliff and Ord, 1981
7Tangos Excess Events Test
åi åj cj-E(cj) cj-E(cj) e-4d2ij/l2
where ci number of deaths in county i E(cj)
expected cases in county i H0 dij distance
from county i to county j l clustering scale
parameter
8Whittemore's Test
- Whittemore et al. proposed the statistic
9Besag- Newells R
- For each case, find the collection of nearest
counties so that there are a total of at least k
cases in the area of the original and neighboring
counties. - Using the Poisson distribution, check if this
area is statistically significant (not adjusting
for multiple testing) - R is the the number of cases for which this
procedure creates a significant area
10Besag-Newell's R
- Let um(i)minj(Dj(i)1) k. Under null
hypothesis, the case number s will have Poisson
distribution with probability - where pC/N. For each county
- R is defined as
11Swartzs Entropy Test
- The test statistic is defined as
where ni is the population in county I, and N is
the total population
12Global Clustering TestsPower Evaluation
Joint work with Toshiro Tango, Peter Park and
Changhong Song
13Power Evaluation, Setup
- 245 counties and county equivalents in
Northeastern United States - Female population
- 600 randomly distributed cases, according to
different probability models
14Note
Besag-Newells R and Cuzick-Edwards k-NN tests
depend on a clustering scale parameter. For each
test we evaluate three different parameters.
15Global Chain Clustering
- Each county has the same expected number of cases
under the null and alternative hypotheses - 300 cases are distributed according to complete
spatial randomness - Each of these have a twin case, located at the
same or a nearby location.
16PowerZero Distance
- Besag-Newell 0.48 0.49 0.42
- Cuzick-Edwards 1.00 0.92 0.73
- Tangos MEET 0.99
- Swartz Entropy 1.00
- Whittemores Test 0.13
- Spatial Scan 0.79
17PowerFixed Distance, 1
- Besag-Newell 0.06 0.08 0.23
- Cuzick-Edwards 0.16 0.32 0.38
- Tangos MEET 0.41
- Swartz Entropy 0.14
- Whittemores Test 0.12
- Spatial Scan 0.28
18PowerFixed Distance, 4
- Besag-Newell 0.06 0.06 0.12
- Cuzick-Edwards 0.06 0.06 0.07
- Tangos MEET 0.17
- Swartz Entropy 0.06
- Whittemores Test 0.10
- Spatial Scan 0.12
19PowerRandom Distance, 1
- Besag-Newell 0.14 0.21 0.27
- Cuzick-Edwards 0.53 0.52 0.47
- Tangos MEET 0.56
- Swartz Entropy 0.39
- Whittemores Test 0.12
- Spatial Scan 0.35
20PowerRandom Distance, 4
- Besag-Newell 0.08 0.10 0.12
- Cuzick-Edwards 0.14 0.17 0.18
- Tangos MEET 0.25
- Swartz Entropy 0.13
- Whittemores Test 0.10
- Spatial Scan 0.18
21Hot Spot Clusters
- One or more neighboring counties have higher risk
that outside. - Constant risks among counties in the cluster, as
well as among those outside the cluster
22PowerGrand Isle, Vermont (RR193)
- Besag-Newell 0.71 0.39 0.09
- Cuzick-Edwards 0.75 0.17 0.04
- Tangos MEET 0.20
- Swartz Entropy 0.94
- Whittemores Test 0.02
- Spatial Scan 1.00
23PowerGrand Isle 15 neigbors (RR3.9)
- Besag-Newell 0.82 0.88 0.50
- Cuzick-Edwards 0.76 0.62 0.25
- Tangos MEET 0.23
- Swartz Entropy 0.71
- Whittemores Test 0.01
- Spatial Scan 0.97
24PowerPittsburgh, PA (RR2.85)
- Besag-Newell 0.04 0.02 0.98
- Cuzick-Edwards 0.65 0.92 0.90
- Tangos MEET 0.92
- Swartz Entropy 0.27
- Whittemores Test 0.00
- Spatial Scan 0.94
25PowerPittsburgh 15 neighbors (RR2.1)
- Besag-Newell 0.29 0.28 0.91
- Cuzick-Edwards 0.60 0.72 0.84
- Tangos MEET 0.83
- Swartz Entropy 0.35
- Whittemores Test 0.00
- Spatial Scan 0.95
26PowerManhattan (RR2.73)
- Besag-Newell 0.04 0.03 0.95
- Cuzick-Edwards 0.63 0.86 0.89
- Tangos MEET 0.94
- Swartz Entropy 0.26
- Whittemores Test 0.27
- Spatial Scan 0.92
27PowerManhattan 15 neighbors (RR1.53)
- Besag-Newell 0.01 0.06 0.37
- Cuzick-Edwards 0.26 0.65 0.80
- Tangos MEET 0.99
- Swartz Entropy 0.05
- Whittemores Test 0.87
- Spatial Scan 0.93
28Power, Three ClustersGrand Isle (RR193),
Pittsburgh (RR2.85), Manhattan (RR2.73
- Besag-Newell 0.54 0.18 1.00
- Cuzick-Edwards 0.99 1.00 0.99
- Tangos MEET 1.00
- Swartz Entropy 0.99
- Whittemores Test 0.01
- Spatial Scan 1.00
29Power, Three ClustersGrand Isle 15, Pittsburgh
15, Manhattan 15
- Besag-Newell 0.64 0.77 0.84
- Cuzick-Edwards 0.91 0.96 0.96
- Tangos MEET 0.98
- Swartz Entropy 0.74
- Whittemores Test 0.12
- Spatial Scan 0.98
30Conclusions
- Besag-Newells R and Cuzick-Edwards k-NN often
perform very well, but are highly dependent on
the chosen parameter - Morans I and Whittemores Test have problems
with many types of clustering - Tangos MEET perform well for global clustering
- The spatial scan statistic perform well for
hot-spot clusters
31Limitations
- Only a few alternative models evaluated, on one
particular geographical data set. - Results may be different for other types of
alternative models and data sets.