National Institute of Statistical Sciences Workshop on Statistics and Counterterrorism G. P. Patil November 20, 2004 New York University - PowerPoint PPT Presentation

About This Presentation
Title:

National Institute of Statistical Sciences Workshop on Statistics and Counterterrorism G. P. Patil November 20, 2004 New York University

Description:

Title: NIST Presentation Author: gp patil Last modified by: G. P. Patil Created Date: 8/9/2001 5:15:35 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 83
Provided by: gppa4
Learn more at: https://www.niss.org
Category:

less

Transcript and Presenter's Notes

Title: National Institute of Statistical Sciences Workshop on Statistics and Counterterrorism G. P. Patil November 20, 2004 New York University


1
National Institute of Statistical
SciencesWorkshop on Statistics and
CounterterrorismG. P. PatilNovember 20,
2004New York University
2

3

4
(No Transcript)
5
The Spatial Scan Statistic
  • Move a circular window across the map.
  • Use a variable circle radius, from zero up
  • to a maximum where 50 percent of the population
    is included.

6
A small sample of the circles used
7
Detecting Emerging Clusters
  • Instead of a circular window in two dimensions,
    we use a cylindrical window in three dimensions.
  • The base of the cylinder represents space, while
    the height represents time.
  • The cylinder is flexible in its circular base and
    starting date, but we only consider those
    cylinders that reach all the way to the end of
    the study period. Hence, we are only considering
    alive clusters.

8
West Nile Virus Surveillance in New York City
  • 2000 Data Simulation/Testing of Prospective
    Surveillance System
  • 2001 Data Real Time Implementation of Daily
    Prospective Surveillance

9
Major epicenter on Staten Island
West Nile Virus Surveillance in New York City
  • Dead bird surveillance system June 14
  • Positive bird report July 16 (coll. July 5)
  • Positive mosquito trap July 24 (coll. July 7)
  • Human case report July 28 (onset July 20)

10
(No Transcript)
11
Hospital Emergency Admissions in New York City
  • Hospital emergency admissions data from a
    majority of New York City hospitals.
  • At midnight, hospitals report last 24 hour of
  • data to New York City Department of Health
  • A spatial scan statistic analysis is performed
    every morning
  • If an alarm, a local investigation is conducted

12
Issues
13
Geospatial Surveillance
14
Spatial Temporal Surveillance
15
Syndromic Crisis-Index Surveillance
16
Hotspot Prioritization
17
(No Transcript)
18
National Applications
  • Biosurveillance
  • Carbon Management
  • Coastal Management
  • Community Infrastructure
  • Crop Surveillance
  • Disaster Management
  • Disease Surveillance
  • Ecosystem Health
  • Environmental Justice
  • Sensor Networks
  • Robotic Networks
  • Environmental Management
  • Environmental Policy
  • Homeland Security
  • Invasive Species
  • Poverty Policy
  • Public Health
  • Public Health and Environment
  • Syndromic Surveillance
  • Social Networks
  • Stream Networks

19
Geographic Surveillance and Hotspot Detection for
Homeland Security Cyber Security and Computer
Network Diagnostics Securing the nation's
computer networks from cyber attack is an
important aspect of Homeland Security. Project
develops diagnostic tools for detecting security
attacks, infrastructure failures, and other
operational aberrations of computer networks.
Geographic Surveillance and Hotspot Detection
for Homeland Security Tasking of Self-Organizing
Surveillance Mobile Sensor Networks Many
critical applications of surveillance sensor
networks involve finding hotspots. The upper
level set scan statistic is used to guide the
search by estimating the location of hotspots
based on the data previously taken by the
surveillance network. Geographic Surveillance and
Hotspot Detection for Homeland Security Drinking
Water Quality and Water Utility Vulnerability
New York City has installed 892 drinking water
sampling stations. Currently, about 47,000 water
samples are analyzed annually. The ULS scan
statistic will provide a real-time surveillance
system for evaluating water quality across the
distribution system. Geographic Surveillance and
Hotspot Detection for Homeland Security
Surveillance Network and Early Warning Emerging
hotspots for disease or biological agents are
identified by modeling events at local hospitals.
A time-dependent crisis index is determined for
each hospital in a network. The crisis index is
used for hotspot detection by scan statistic
methods Geographic Surveillance and Hotspot
Detection for Homeland Security West Nile Virus
An Illustration of the Early Warning Capability
of the Scan Statistic West Nile virus is a
serious mosquito-borne disease. The mosquito
vector bites both humans and birds. Scan
statistical detection of dead bird clusters
provides an early crisis warning and allows
targeted public education and increased mosquito
control. Geographic Surveillance and Hotspot
Detection for Homeland Security Crop Pathogens
and Bioterrorism Disruption of American
agriculture and our food system could be
catastrophic to the nation's stability. This
project has the specific aim of developing novel
remote sensing methods and statistical tools for
the early detection of crop bioterrorism. Geograph
ic Surveillance and Hotspot Detection for
Homeland Security Disaster Management Oil Spill
Detection, Monitoring, and Prioritization The
scan statistic hotspot delineation and poset
prioritization tools will be used in combination
with our oil spill detection algorithm to provide
for early warning and spatial-temporal monitoring
of marine oil spills and their consequences. Geogr
aphic Surveillance and Hotspot Detection for
Homeland Security Network Analysis of Biological
Integrity in Freshwater Streams This study
employs the network version of the upper level
set scan statistic to characterize biological
impairment along the rivers and streams of
Pennsylvania and to identify subnetworks that are
badly impaired.

Center for Statistical Ecology and Environmental Statistics
G. P. Patil, Director
20
Hotspot Detection Innovation Upper Level Set Scan
Statistic
  • Attractive Features
  • Identifies arbitrarily shaped clusters
  • Data-adaptive zonation of candidate hotspots
  • Applicable to data on a network
  • Provides both a point estimate as well as a
    confidence set for the hotspot
  • Uses hotspot-membership rating to map hotspot
    boundary uncertainty
  • Computationally efficient
  • Applicable to both discrete and continuous
    syndromic responses
  • Identifies arbitrarily shaped clusters in the
    spatial-temporal domain
  • Provides a typology of space-time hotspots with
    discriminatory surveillance potential

21
Candidate Zones for Hotspots
  • Goal Identify geographic zone(s) in which a
    response is significantly elevated relative to
    the rest of a region
  • A list of candidate zones Z is specified a priori
  • This list becomes part of the parameter space and
    the zone must be estimated from within this list
  • Each candidate zone should generally be spatially
    connected, e.g., a union of contiguous spatial
    units or cells
  • Longer lists of candidate zones are usually
    preferable
  • Expanding circles or ellipses about specified
    centers are a common method of generating the
    list

22
Scan Statistic Zonation for Circles and
Space-Time Cylinders
23
ULS Candidate Zones
  • Question Are there data-driven (rather than a
    priori) ways of selecting the list of candidate
    zones?
  • Motivation for the question A human being can
    look at a map and quickly determine a reasonable
    set of candidate zones and eliminate many other
    zones as obviously uninteresting. Can the
    computer do the same thing?
  • A data-driven proposal Candidate zones are the
    connected
  • components of the upper level sets of the
    response surface. The candidate zones have a tree
    structure (echelon tree is a subtree),
  • which may assist in automated detection of
    multiple, but
  • geographically separate, elevated zones.
  • Null distribution If the list is data-driven
    (i.e., random), its variability must be accounted
    for in the null distribution. A new list must be
    developed for each simulated data set.

24
ULS Scan Statistic
  • Data-adaptive approach to reduced parameter space
    ?0
  • Zones in ?0 are connected components of upper
    level sets of the empirical intensity function Ga
    Ya / Aa
  • Upper level set (ULS) at level g consists of all
    cells a where Ga ? g
  • Upper level sets may be disconnected. Connected
    components are
  • the candidate zones in ?0
  • These connected components form a rooted tree
    under set inclusion.
  • Root node entire region R
  • Leaf nodes local maxima of empirical intensity
    surface
  • Junction nodes occur when connectivity of ULS
    changes with
  • falling intensity level

25
Upper Level Set (ULS) of Intensity Surface
26
Changing Connectivity of ULS as Level Drops
27
ULS Connectivity Tree
28
A confidence set of hotspots on the ULS tree.
The different connected components correspond to
different hotspot loci while the nodes within a
connected component correspond to different
delineations of that hotspot
29
Network Analysis of Biological Integrity in
Freshwater Streams
30
New York CityWater Distribution Network
31
NYC Drinking Water Quality Within-City Sampling
Stations
  • 892 sampling stations
  • Each station about 4.5 feet high and draws water
    from a nearby water main
  • Sampling frequency increased after 9-11
    Currently, about 47,000 water samples analyzed
    annually
  • Parameters analyzed
  • Bacteria
  • Chlorine levels
  • pH
  • Inorganic and organic pollutants
  • Color, turbidity, odor
  • Many others

32
Network-Based Surveillance
  • Subway system surveillance
  • Drinking water distribution system surveillance
  • Stream and river system surveillance
  • Postal System Surveillance
  • Road transport surveillance
  • Syndromic Surveillance

33
Syndromic Surveillance
  • Symptoms of disease such as diarrhea, respiratory
    problems, headache, etc
  • Earlier reporting than diagnosed disease
  • Less specific, more noise

34
Syndromic Surveillance
(left) The overall procedure, leading from
admissions records to the crisis index for a
hospital. The hotspot detection algorithm is
then applied to the crisis index values defined
over the hospital network. (right) The
-machine procedure for converting an event stream
into a parse tree and finally into a
probabilistic finite state automaton (PFSA).
35
Experimental Validation
Formal Language Events a green to red or red
to green b green to tan or tan to green c
green to blue or blue to green d red to tan or
tan to red e blue to red or red to blue f
blue to tan or tan to blue
Pressure sensitive floor
Wall following
Random walk
Analyze String Rejections
Target Behavior
36
Emergent Surveillance Plexus (ESP)Surveillance
Sensor Network Testbed Autonomous Ocean Sampling
NetworkTypes of Hotspots
  • Hotspots due to multiple, localized, stationary
    sources
  • Hotspots corresponding to areas of interest in a
    stationary mapped field
  • Time-dependent, localized hotspots
  • Hotspots due to moving point sources

37
Ocean SAmpling MObile Network OSAMON
38
Ocean SAmpling MObile Network OSAMON Feedback Loop
  • Network sensors gather preliminary data
  • ULS scan statistic uses available data to
    estimate hotspot
  • Network controller directs sensor vehicles to new
    locations
  • Updated data is fed into ULS scan statistic system

39
SAmpling MObile Networks (SAMON) Additional
Application Contexts
  • Hotspots for radioactivity and chemical or
    biological agents to prevent or mitigate the
    effects of terrorist attacks or to detect nuclear
    testing
  • Mapping elevation, wind, bathymetry, or ocean
    currents to better understand and protect the
    environment
  • Detecting emerging failures in a complex
    networked system like the electric grid,
    internet, cell phone systems
  • Mapping the gravitational field to find
    underground chambers or tunnels for rescue or
    combat missions

40
Sensor Devices
Miniaturized Spec Node Prototype
41
Scalable Wireless Geo-Telemetrywith Miniature
Smart Sensors
Geo-telemetry enabled sensor nodes deployed by a
UAV into a wireless ad hoc mesh network
Transmitting data and coordinates to TASS and GIS
support systems
42
Architectural Block Diagram of Geo-Telemetry
Enabled Sensor Node with Mesh Network Capability
43
Standards Based Geo-Processing Model
44
UAV Capable of Aerial Survey
45
Data Fusion Hierarchy for Smart Sensor Network
with Scalable Wireless Geo-Telemetry Capability
46
Wireless Sensor Networks for Habitat Monitoring
47
Target Tracking in Distributed Sensor Networks
48
Video Surveillance and Data Streams
49
Video Surveillance and Data StreamsTurning Video
into InformationMeasuring Behavior by Segments
  • Customer Intelligence
  • Enterprise Intelligence
  • Entrance Intelligence
  • Media Intelligence
  • Video Mining Service

50
Deterministic Finite Automata (DFA)
  • Directed Graph (loops multiple edges permitted)
    such that
  • Nodes are called States
  • Edges are called Transitions
  • Distinguished initial (or starting) state
  • Transitions are labeled by symbols from a given
    finite alphabet, ? a, b, c, . . .
  • The same symbol can label several transitions
  • A given symbol can label at most one transition
    from a given state (deterministic)

51
Deterministic Finite Automata (DFA)Formal
Definition
  • Quadruple (Q, q0 , ?, ? ) such that
  • Q is a finite set of states
  • ? is a finite set of symbols, called the
    alphabet
  • q0?Q is the initial state
  • ? Q ? ? ? Q ? Blocked is the transition
    function
  • ? (q, a) Blocked if there is no transition
    from q labeled by a
  • ? (q, a) q' if a is a
    transition from q to q'

52
DFA and Strings
Any path through the graph starting from the
initial state determines a string from the
alphabet. Example The blue dashed path
determines the string a b c a
Conversely, any string from the alphabet is
either blocked or determines a path through the
graph. Example The following strings are
blocked c,
aa, ac, abb, etc. Example The
following strings are not blocked
a, b, ab, bb, etc. The
collection of all unblocked strings is called the
language accepted or determined by the DFA (all
states are final in our approach)
53
Strings and Languages
? (finite) alphabet ? set of all (finite)
strings from ? A language is any subset of ?.
Not all languages can be determined by a
DFA. Different DFAs can accept the same
language
54
Probabilistic Finite Automata (PFA)
  • A PFA is a DFA (Q, q0 , ?, ? ) with a probability
    attached to each transition such that the sum of
    the probabilities across all transitions from a
    given node is unity.
  • Formally, p Q ? ? ? 0, 1 such that
  • p(q, a) 0 if and only if ? (q, a)
    Blocked

Multiplying branch probabilities lets us assign a
probability value ?(q0, s) to each string s in
?. E.G., ?(q0, abca)(.8)1(.6)(.4).192
55
Properties of ?(q0, s)
  • For fixed q0, ?(q0, s) is a measure on ?
  • Support of ? is the language accepted by the DFA
  • For fixed q0, ?(q0, s) is a probability measure
    on ?i
    ( ?i strings of length i ) This
    probability measure is written as ?(i).
  • Given a probability distribution w(i) across
    string lengths i, defines a probability
    measure across ?, called the w-weighted
    probability measure of the PFA. If all w(i)
    are positive, then the support of ? is also the
    language accepted by the underlying DFA.

56
Distance Between Two PFA
Let A and B be two PFAs on the same alphabet
? Let w(i) be a probability distribution across
string lengths i Let ?A and ?B be the w-weighted
probability measures of A and B Define the
distance between A and B as the variational
distance between the probability measures ?A and
?B d(A, B)
?A ? ?B
57
(No Transcript)
58
Crop Biosurveillance/Biosecurity
59
Crop Biosurveillance/BiosecurityData Processing
Module
60
We also present a prioritization innovation. It
lies in the ability for prioritization and
ranking of hotspots based on multiple indicator
and stakeholder criteria without having to
integrate indicators into an index, using Hasse
diagrams and partial order sets. This leads us to
early warning systems, and also to the selection
of investigational areas.
Prioritization Innovation Partial Order Set
Ranking
61
HUMAN ENVIRONMENT INTERFACELAND, AIR, WATER
INDICATORS
for land - of undomesticated land, i.e., total
land area-domesticated (permanent crops and
pastures, built up areas, roads, etc.)for air -
of renewable energy resources, i.e., hydro,
solar, wind, geothermalfor water - of
population with access to safe drinking water
RANK COUNTRY LAND AIR WATER
Sweden Finland Norway 5 Iceland 13 Austria 22 Switzerland 39 Spain 45 France 47 Germany 51 Portugal 52 Italy 59 Greece 61 Belgium 64 Netherlands 77 Denmark 78 United Kingdom 81 Ireland 69.01 76.46 27.38 1.79 40.57 30.17 32.63 28.34 32.56 34.62 23.35 21.59 21.84 19.43 9.83 12.64 9.25 35.24 19.05 63.98 80.25 29.85 28.10 7.74 6.50 2.10 14.29 6.89 3.20 0.00 1.07 5.04 1.13 1.99 100 98 100 100 100 100 100 100 100 82 100 98 100 100 100 100 100
62
Hasse Diagram (all countries)
63
Hasse Diagram(Western Europe)
64
Ranking Partially Ordered Sets 5
Linear extension decision tree
Poset(Hasse Diagram)
b
a
b
a
d
c
c
b
d
a
e
b
d
d
a
c
c
e
f
e
d
b
e
d
c
c
e
d
c
e
d
e
e
e
d
e
d
e
d
f
f
f
f
f
f
f
e
f
f
f
e
f
e
f
e
f
f
e
f
e
f
Jump Size 1 3 3 2 3 5 4
3 3 2 4 3 4 4 2 2
65
Cumulative Rank Frequency Operator 5An Example
of the Procedure
In the example from the preceding slide, there
are a total of 16 linear extensions, giving the
following cumulative frequency table.
Rank Rank Rank Rank Rank Rank
Element 1 2 3 4 5 6
a 9 14 16 16 16 16
b 7 12 15 16 16 16
c 0 4 10 16 16 16
d 0 2 6 12 16 16
e 0 0 1 4 10 16
f 0 0 0 0 6 16
Each entry gives the number of linear extensions
in which the element (row label) receives a rank
equal to or better that the column heading
66
Cumulative Rank Frequency Operator 6An Example
of the Procedure
16
The curves are stacked one above the other and
the result is a linear ordering of the elements
a gt b gt c gt d gt e gt f
67
Cumulative Rank Frequency Operator 7An example
where F must be iterated
F 2
68
Incorporating Judgment Poset Cumulative Rank
Frequency Approach
  • Certain of the indicators may be deemed more
    important than the others
  • Such differential importance can be accommodated
    by the poset cumulative rank frequency approach
  • Instead of the uniform distribution on the set of
    linear extensions, we may use an appropriately
    weighted probability distribution ? , e.g.,

69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Space-Time Poverty Hotspot Typology
  • Federal Anti-Poverty Programs have had little
    success in eradicating pockets of persistent
    poverty
  • Can spatial-temporal patterns of poverty hotspots
    provide clues to the causes of poverty and lead
    to improved location-specific anti-poverty policy
    ?

74
Covariate Adjustment
  • Known Covariate Effects (age, population size,
    etc.)

75
Covariate Adjustment
  • Given Covariates, Unknown Effects

76
Incorporating Spatial Autocorrelation
  • Ignoring autocorrelation typically results in
  • under-assessment of variability
  • over-assessment of significance (H0 rejected too
    frequently)

How can we account for possible
autocorrelation? GLMM (SAR) Model Ya
count in cell a Ya distributed as
Poisson ?a log(EYa) The Ya are
conditionally independent given the ?a The ?a are
jointly Gaussian with a Simultaneous
AutoRegressive (SAR) specification

77
Incorporating Spatial Autocorrelation

78
Incorporating Spatial Autocorrelation

79
Spatial Autocorrelation Plus Covariates

80
CAR Model
The entire formulation is similar for Conditional
AutoRegressive (CAR) specs except that the form
of the variance-covariance matrix of ? is changes.

81

82
Write a Comment
User Comments (0)
About PowerShow.com