Summary of - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Summary of

Description:

Spatial Scan Statistic (SSS) Kulldorff (1997) used SSS to detect clusters ... An important result on most likely cluster based on these models is given in the ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 40

Provided by: profi181

Category:

more less

Transcript and Presenter's Notes

Title: Summary of

1
Summary of A Spatial Scan Statistic by M.
Kulldorff

Presented by Gauri S. Datta
gauri_at_stat.uga.edu
Mid-Year Meeting
February 3, 2006

2
Background

Scan Statistic
A tool to detect cluster in a Point Process
Naus (1965 JASA) studied in one dimension
tests if a 1-dim point process is purely random
Point Process
Consider a time interval a,b and a window
At,tw of fixed width w
?(A) of e-mails arrived in the time window A
n(A) nA of junk e-mails number of
points
Arrival times of junk e-mails define a Point
Process

3
Main Idea in Scan Statistic

Move a window t,tw of size w lt b-a over a time
interval a,b
Over all possible values of t, record the maximum
number of points in the window
Compare this number with cut off points under the
the hypothesis of a purely Poisson Process

4
(No Transcript)
5
p
p
q
6
Building block of Scan Test

Repeated use of tests for equality of two
Binomial or Poisson populations
Two populations are defined by the scanning
window A and its complement Ac
As in multiple comparison, these tests are
dependent as one moves the scanning window

7
Spatial Scan Statistic (SSS)

Kulldorff (1997) used SSS to detect clusters in
spatial process
SSS can be used
In multi-dim point process
With variable window size
With baseline process an inhomogeneous Poisson
process or Bernoulli Process

8
SSS (continued)

Scanning window can be any predefined shape
SSS is on a geographical space G with a measure ?
In traditional point process, G is a line, ? is a
uniform measure
In 2-dim, G is a plane, ? a Lebesgue measure

9
p
p
q
10
Examples

Forestry
Spatial clustering of trees.
Want to see for clusters of a specific kind of
trees after adjusting for uneven spatial
distribution of all trees
?(A)Total of trees in region A
nA of trees in A of specific kind

11
Examples (continued)

Epidemiology
Interest in detecting geographical clusters of
disease
Need to adjust for uneven population density
Rural vs. urban population
For data aggregated into census districts,
measure is concentrated at the central
coordinates of districts

12
Examples (continued)

If interest is in space-time clusters of a
disease, the measure will still be concentrated
in the geographical region as in the prior
example
Adjusting for uneven population distribution is
not always enough. Should take confounding
factors into account. E.g., in epidemiology
measure can reflect standardized expected
incidence rate

13
SS LR statistic

For a fixed size window, scan statistic is the
maximum of points in the window at any given
time/geographical region
Test Stat is equivalent to LR test statistic for
testing H0?1?2 vs. Ha?1gt?2
Generalization to LR test is important for
variable window

14
Generalized SS Notation/Models

G Geographical area / study space
A Window ½ G
N(A) Random of points in A
A spatial point process
Goal to find the prominent cluster
Two useful models for point process
(a) Bernoulli model
(b) Poisson model

15
Standard Models for SS

For Bernoulli model, measure ? is such that ?(A)
is an integer for all subsets A of G
Two states (disease point or no disease) for
each unit
Location of the points define a point process

16
(No Transcript)
17
LR Test Bernoulli Model
18
LR Test Bernoulli Model
19
Poisson Model

Under Poisson model, points generated by inhom.
Poiss. Proc. There is exactly one zone Z ? G s.t.
N(A) ? Po(pµ(A??Z) qµ(A?Zc)) for all A.
Null hypothesis H0pq
Alternative hypo H1 pgtq, Z ??.
Under H0, N(A) ? Po(pµ(A)) for all A.
- the parameter Z disappears under H0

20
Poisson Model (continued)
21
Poisson Model (continued)
22
Poisson Model (continued)
23
Choice of Zones

How is ? selected? Possibilities
All circular subsets
All circles centered at any of several foci on a
fixed grid, with a possible upper limit on size
Same as (2) but with a fixed size
All rectangles of fixed size and shape
If looking for space-time clusters, use
cylinders scanning circular geographical areas
over variable time intervals

24
Bernoulli vs. Posson Model

Choice between a Bernoulli or Poisson model does
not matter much if
n(G) ltlt ?(G)
In other cases, use the model most appropriate
for application

25
A Useful Result

An important result on most likely cluster
based on these models is given in the paper. It
states that as long as the points within the zone
constituting the most likely cluster are located
where they are, H_0 will be rejected irrespective
of the other points in G. If a cluster is located
in Seattle, locations of the points in the east
coast of U.S. do not matter (Theorem 1)

26
Computations and MC

To find the value of ?, we need to calculate LR
maximized over collection of zones in H1. Seems
like a daunting task since of zones could be
infinite.
of observed points finite
For a fixed of points, likelihood decreases as
µ(Z) increases

27
Computations (contd)

If the circle size increases for a fixed foci,
need to recalculate likelihood whenever a new
point enters the circle. For a finite points,
of recalcing likelihood for each foci is finite.
Distribution of ? is difficult. MC simulation
used to generate histogram of ? . Under H0,
replicate the data sets conditional on nG .

28
Application of SSS to SIDS

Bernoulli and Poisson models are illustrated
using the SIDS data from NC
For 100 counties in NC, total of live births
and of SIDS cases for 1974-84.
Live births range from 567 to 52345
Location of county seats are the coordinates.
Measure is the of live births in a county

29
Application to SIDS (continued)

Zones for scanning window are circles centered at
a county coordinate point including at most half
of the total population
Zones are circular only wrt the aggregated data.
As circles around a county seat are drawn, other
counties will either be completely part of a zone
or else not at all, depending on whether its
county seat is within the circle or not

30
Bernoulli model for SIDS

Bernoulli model is very natural. Each birth can
correspond to at most one SID. Table 1 summarizes
the results of the analysis.
From Figure 1, the most likely cluster A,
consists of Bladen, Columbus, Hoke, Robeson, and
Scotland.
Using a conservative test, a secondary cluster is
B, consists of Halifax, Hartford and Northampton
counties.

31
Poisson model for SIDS

For a rare disease SIDS, Poisson model gives a
close approximation to Bernoulli. Results are
reported in Table 1
Both models detect the same cluster
P-values for the primary cluster are same for
both the models p-values for the secondary
cluster are very close

32
Application to SIDS (continued)
33
Two significant clusters based on SSS
34
SSS adjusted for Race

For SIDS one useful covariate is race
Race is related to SIDS through unobserved
covariates such as quality of housing, access to
health care
Overall incidence of SIDS for white children is
1.512 per 1000 and for black children is 2.970
per 1000.

35
SSS race-adjusted (continued)

Racial distribution differs widely among the
counties in NC
This analysis leads to the same primary cluster
(see Figure 2)
Previous secondary cluster disappeared but a
third secondary cluster C emerges. Cluster C
consists of a bunch of counties in the western
part of the state

36
Application to SIDS (continued)
37
SSS to SIDS adjusted for race
38
A Bayesian alternative to SSS