Apresenta - PowerPoint PPT Presentation

1 / 76
About This Presentation
Title:

Apresenta

Description:

Title: Apresenta o do PowerPoint Author: Usuario Last modified by: Luiz Duczmal Created Date: 9/10/2005 1:32:12 PM Document presentation format – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 77
Provided by: Usuar1807
Category:

less

Transcript and Presenter's Notes

Title: Apresenta


1
Geoinfo 2006
What is the true shape of a disease cluster? The
multi-objective genetic scan
Luiz Duczmal
André L.F. Cançado
Ricardo C.H. Takahashi
Univ. Federal Minas Gerais, Brazil, Statistics
Dept., Electrical Engineering Dept.,
Mathematics Dept.
2
Irregularly shaped spatial disease clusters occur
commonly in epidemiological studies, but their
geographic delineation is poorly defined. Most
current spatial scan software usually displays
only one of the many possible cluster solutions
with different shapes, from the most compact
round cluster to the most irregularly shaped one,
corresponding to varying degrees of penalization
parameters imposed to the freedom of shape.
Even when a fairly complete set of solutions
is available, the choice of the most appropriate
parameter setting is left to the practitioner,
whose decision is often subjective.
3
We propose quantitative criteria for choosing the
best cluster solution, through multi-objective
optimization, by finding the Pareto-set in the
solution space. Two competing objectives are
involved in the search regularity of shape, and
scan statistic value. Instead of running
sequentially a cluster finding algorithm with
varying degrees of penalization, the complete set
of solutions is found in parallel, employing a
genetic algorithm.
4
  • The cluster significance concept is extended
    for this set in a natural and unbiased way, being
    employed as a decision criterion for choosing the
    optimal solution. The Gumbel distribution is
    used to approximate the empiric scan statistic
    distribution, speeding up the significance
    estimation. The method is fast, with good power
    of detection. An application to breast cancer
    clusters is discussed.Keywords spatial scan
    statistic, disease clusters, geometric
    compactness penalty correction, Pareto-sets,
    multi-objective optimization, vector
    optimization, Gumbel distribution, genetic
    algorithm.

5
Spatial Scan Statistics Kulldorff (1997) Map
with m regions Total population N C cases
Under the null hypothesis there is no cluster
in the map, and the number of cases in each
region is Poisson distributed.
6
For each circle centered in each centroids
region, let z be the collection of regions that
lie inside it. Let number of cases
inside z expected cases inside z
The scan statistic is defined as

z
if
and one otherwise.
7
The collection (or zone) z with the highest L(z)
is the most likely cluster.
2
We sweep through all the m
possible circular
zones, looking for the highest L(z) value.
We need to compare this value against the max
L(z) for maps with cases distributed
randomly under the null hypothesis.
The whole procedure is repeated for thousands of
times, for each set of randomly distributed
cases. (Monte Carlo, Dwass(1957)).
8
Extreme example of an irregularly shaped cluster
Penalty function to control the freedom of shape
(joint work with Kulldorff and Huang)
9
A(z)area of the zone z H(z)perimeter of the
convex hull of z
Intuitively, the convex hull of a planar object
is the cell inside a rubber band stretched
around it.
Compactness
K(z) the area of z divided by the area of the
circle with perimeter H(z).
10
Compactness for some common shapes
Circle K(z) 1 Square K(z) p/4
11
Penalty function for the log of the likelihood
ratio (LLR(z))
K(z).LLR(z)
Generalized compactness correction
.LLR(z)
a 1 full compactness correction a 0.5
medium compactness correction a 0.0 no
compactness correction
12
The Elliptic Scan Statistic (joint work with
Kulldorff, Huang and Pickle)
The scanning window has variable location, size,
shape and angle. A penalty function may be used.
13
Breast Cancer Mortality Rates
14
(No Transcript)
15
penalty correction
1
0
circular
16
penalty correction
1
0
elliptical
17
penalty correction
1
0
irregular
18
no penalty correction
disaster !
1
0
irregular
19
Extreme example of an irregularly shaped cluster
(joint work with Martin Kulldorff and Lan Huang)
20
Homicide average 1998-2002 Minas Gerais State,
Brazil Hom./100,000 inhab./year 853
municipalities Source DATASUS Map by Ricardo
Tavares
100 km
21
Genetic Algorithms (joint work with Cançado,
Takahashi and Bessegato)
  • OBJECTIVE
  • Find a quasi-optimal solution for a maximization
    problem.
  • Initial population.
  • Random crossing-over of parents and offspring
    generation.
  • Selection of children and parents for the next
    generation.
  • Random mutation.
  • Repeat the previous steps for a predefined number
    of
  • generations or until there is no improvement in
    the functional.

22
We minimize the graph-related operations by
means of a fast offspring generation and
evaluation of the Kulldorffs scan likelihood
ratio statistic. This algorithm is more than
ten times faster and exhibits less variance
compared to a similar approach using simulated
annealing, and thus gives better confidence
intervals for the Monte Carlo inference process
of significance evaluation for the most likely
cluster found.
23
(No Transcript)
24
(No Transcript)
25
Incidence of Malaria Deaths in the Brazilian
Amazon (1998-2002)
26
(No Transcript)
27
Initial population construction Start at a region
of the map.
28
Initial population construction Add the neighbor
which forms the highest LLR 2-cell zone.
29
Initial population construction Add the neighbor
which forms the highest LLR 3-cell zone.
30
Initial population construction Add the neighbor
which forms the highest LLR 4-cell zone.
31
Initial population construction Stop. (It is
impossible to form a higher LLR 5-cell zone)
32
Initial population construction Start at another
region of the map.
33
Initial population construction Add the neighbor
which forms the highest LLR 2-cell zone.
34
Initial population construction etc. Repeat the
previous steps for all the regions of the map.
35
THE OFFSPRING GENERATION (a simple example)
36
THE OFFSPRING GENERATION (a simple example)
37
THE OFFSPRING GENERATION (a simple example)
38
THE OFFSPRING GENERATION (a simple example)
Another possible numbering
39
THE OFFSPRING GENERATION (a more sofisticated
example)
40
One instance of two parent trees
41
  • Advantages
  • The offspring generation is very inexpensive
  • All the children zones are automatically
    connected
  • Random mutations are easy to implement
  • The selection for the next generation is
    straightforward
  • Fast evolution convergence
  • The variance between different test runs is
    small.

42
Population Evolution Performance
43
  • Irregularly shaped clusters
  • benchmark, Northeast US
  • counties map.
  • Duczmal L, Kulldorff M, Huang L.
  • (2006)
  • Evaluation of spatial scan statistics
  • for irregularly shaped clusters.
  • J. Comput. Graph. Stat.

44
 
  Power evaluation of the genetic algorithm,
compared to the simulated annealing algorithm.
45
Cluster of high incidence of breast cancer. São
Paulo State, Brazil, 2002. Population adjusted
for age and under-reporting.
46
Cluster of high incidence of breast cancer. São
Paulo State, Brazil, 2002. Population adjusted
for age and under-reporting.
Data source DATASUS, G.L.Souza
Compactness correction 1.0 Cluster cases
2,924 Cluster population 346,024 Incidence
0.00845 LLR 298.9 p-value0.001
0 100 km
47
Cluster of high incidence of breast cancer. São
Paulo State, Brazil, 2002. Population adjusted
for age and under-reporting.
Data source DATASUS, G.L.Souza
Compactness correction 0.5 Cluster cases
3,078 Cluster population 361,373 Incidence
0.00852 LLR 343.8 p-value0.001
0 100 km
48
Cluster of high incidence of breast cancer. São
Paulo State, Brazil, 2002. Population adjusted
for age and under-reporting.
Data source DATASUS, G.L.Souza
Compactness correction 0.0 Cluster cases
3,324 Cluster population 394,294 Incidence
0.00843 LLR 449.6 p-value0.001
0 100 km
49
  • The genetic algorithm for disease cluster
    detection is fast and exhibits less variance
    compared to similar approaches
  • The potential use for epidemiological studies
    and syndromic surveillance is encouraged
  • The need of penalty functions for the
    irregularity of clusters shape is clearly
    demonstrated by the power evaluation tests
  • The power of detection of clusters is similar to
    the simulated annealing algorithm
  • The flexibility of shape control gives to the
    practitioner more insight of the geographic
    cluster delineation.

50
Northeast US counties map with observed cases
Age adjusted female breast cancer, 1995.
Kulldorff M., Feuer E.J., Miller B.A., Freedman
L.S. (1997) Breast cancer clusters in the
Northeast United States a geographic analysis.
American Journal of Epidemiology, 146161-170.
Percent below/above expected gt 20
12 to 20 4 to 12 -4 to 4
-12 to -4 -20 to -12 lt -20
51
The Gumbel parametric approximation to the log
likelihhod ratio scan. Joint work with Cançado
and Takahashi. Based on the results of Abrams,
Kulldorff and Kleinmann.
LLR
52
Pareto Sets
The detection of irregularly shaped disease
clusters through multi-objective optimization.
53
The genetic algorithm is used to maximize two
objectives -the scan statistic. -the
regularity of shape (compactness).
54
Elite (red dots) Each red dot is not surpassed
by any other point on all variables
simultaneously.
compactness
log likelihood ratio
55
Elite (red dots) Each red dot is not surpassed
by any other point on all variables
simultaneously.
compactness
log likelihood ratio
56
Elite (red dots) Each red dot is not surpassed
by any other point on all variables
simultaneously.
compactness
log likelihood ratio
57
Elite (red dots) Each red dot is not surpassed
by any other point on all variables
simultaneously.
compactness
log likelihood ratio
58
The Pareto Surface is formed joining the elite
points.
compactness
log likelihood ratio
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
compactness
Null Hypothesis Critical Value Pareto Surface, 95
percentile (circles). 100 elites (from 100
simulations under the null hypothesis).
log likelihood ratio
67
compactness
Power Test Pareto Surface, 95 percentile under
null hypothesis (red circles). 100 elites (from
100 simulations under the alternative hypothesis).
log likelihood ratio
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Northeast US counties map with observed cases
Age adjusted female breast cancer, 1995.
Kulldorff M., Feuer E.J., Miller B.A., Freedman
L.S. (1997) Breast cancer clusters in the
Northeast United States a geographic analysis.
American Journal of Epidemiology, 146161-170.
Percent below/above expected gt 20
12 to 20 4 to 12 -4 to 4
-12 to -4 -20 to -12 lt -20
74
(No Transcript)
75
(No Transcript)
76
References
  • Duczmal L, Kulldorff M, Huang L. (2006)
    Evaluation of spatial scan statistics for
    irregularly shaped clusters. J. Comput. Graph.
    Stat. 152,1-15.
  • Duczmal L, Cançado ALF, Takahashi RHC, Bessegato
    LF, 2006. A genetic algorithm for irregularly
    shaped spatial scan statistics (submitted).
  • Duczmal L, Cançado ALF, Takahashi RHC, 2006.
    Delineation of Irregularly Shaped Disease
    Clusters through Multi-Objective Optimization
    (submitted).
  • Duczmal L, Assunção R. (2004), A simulated
    annealing strategy for the detection of
    arbitrarily shaped spatial clusters, Comp. Stat.
    Data Anal., 45, 269-286.
  • Kulldorff M, Huang L, Pickle L, Duczmal L.
    (2005) An Elliptic Spatial Scan Statistic.
    Statistics in Medicine (to appear).
  • Patil GP, Taillie C. (2004) Upper level set scan
    statistic for detecting arbitrarily shaped
    hotspots. Envir. Ecol. Stat., 11, 183-197.
  • Kulldorff M. (1997), A Spatial Scan Statistic,
    Comm. Statist. Theory Meth., 26(6), 1481-1496.
  • Kulldorff M, Tango T, Park PJ. (2003) Power
    comparisons for disease clustering sets, Comp.
    Stat. Data Anal., 42, 665-684.
  • Kulldorff M, Feuer EJ, Miller BA, Freedman LS.
    (1997) Breast cancer clusters in the Northeast
    United States a geographic analysis. Amer. J.
    Epidem., 146161-170.
  • de Souza Jr. GL (2005) The Detection of Clusters
    of Breast Cancer in São Paulo State, Brazil.
    M.Sc. Dissertation, Univ. Fed. Minas Gerais.

 
Write a Comment
User Comments (0)
About PowerShow.com