Homeland Security What Can Mathematics Do? - PowerPoint PPT Presentation

About This Presentation
Title:

Homeland Security What Can Mathematics Do?

Description:

Variants of Air Pollution Monitoring Models ... You look at the bag of words vector associated with the incoming message and see ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 70
Provided by: dimacsR
Category:

less

Transcript and Presenter's Notes

Title: Homeland Security What Can Mathematics Do?


1
Homeland Security What Can Mathematics Do?
Fred Roberts Professor of Mathematics, Rutgers
University Chair, RU Homeland Security Research
Initiative Director, DIMACS Center
2
  • Mathematical methods have become important tools
    in preparing plans for defense against terrorist
    attacks, especially when combined with powerful,
    modern computer methods for analysis and
    simulation.

3
Are you Serious?? What Can Mathematics Do For Us?
4
(No Transcript)
5
  • .
  • After Pearl Harbor Mathematics and
    mathematicians played a vitally important role in
    the US World War II effort.

6
  • Critical War-Effort Contributions Included
  • Code breaking.
  • Creation of the mathematics-based field of
    Operations Research
  • logistics
  • optimal scheduling
  • inventory
  • strategic planning

Enigma machine
7
But Terrorism is Different.Can Mathematics
Really Help?
5 2 ?
1, 2, 3,
8
Ill Illustrate with Mathematics Projects Im
Involved in.There are Many Others
  • Bioterrorism Sensor Location
  • Monitoring Message Streams
  • Identification of Authors
  • Detecting a Bioterrorist Attack through
    Syndromic Surveillance

9
OUTLINE
  • Bioterrorism Sensor Location
  • Monitoring Message Streams
  • Identification of Authors
  • Detecting a Bioterrorist Attack through
    Syndromic Surveillance

10
The Bioterrorism Sensor Location Problem
11
  • Early warning is critical in defense against
    terrorism
  • This is a crucial factor underlying the
    governments plans to place networks of
    sensors/detectors to warn of a bioterrorist
    attack

The BASIS System Salt Lake City
12
Locating Sensors is not Easy
  • Sensors are expensive
  • How do we select them and where do we place them
    to maximize coverage, expedite an alarm, and
    keep the cost down?
  • Approaches that improve upon existing, ad hoc
    location methods could save countless lives in
    the case of an attack and also money in capital
    and operational costs.

13
Two Fundamental Problems
  • Sensor Location Problem
  • Choose an appropriate mix of sensors
  • decide where to locate them for best protection
    and early warning

14
Two Fundamental Problems
  • Pattern Interpretation Problem When sensors set
    off an alarm, help public health decision makers
    decide
  • Has an attack taken place?
  • What additional monitoring is needed?
  • What was its extent and location?
  • What is an appropriate response?

15
The Sensor Location Problem
  • Approach is to develop new algorithmic methods.
  • Developing new algorithms involves fundamental
    mathematical analysis.
  • Analyzing how efficient algorithms are involves
    fundamental mathematical methods.
  • Implementing the algorithms on a computer is
    often a separate problem which needs to go hand
    in hand with the basic mathematics of algorithm
    development.

16
Algorithmic Approaches I Greedy Algorithms
17
Greedy Algorithms
  • Find the most important location first and locate
    a sensor there.
  • Find second-most important location.
  • Etc.
  • Builds on earlier mathematical work at Institute
    for Defense Analyses (Grotte, Platt)
  • Steepest ascent approach.
  • No guarantee of optimal or best solution.
  • In practice, gets pretty close to optimal
    solution.

18
Algorithmic Approaches II Variants of Classic
Facility Location Theory Methods
19
Location Theory
  • Old problem in Operations research Where to
    locate facilities (fire houses, garbage dumps,
    etc.) to best serve users
  • Often deal with a network with nodes, edges, and
    distances along edges
  • Users u1, u2, , un are located at nodes
  • One approach locate the facility at node x
    chosen so that sum of distances to users is
    minimized.
  • Minimize

20
Location Theory A Network
1s represent distances along edges
Nodes are places for users or facilities
21
u1
u2
u3
xa ?d(x,ui)1124 xb ?d(x,ui)2013 xc ?
d(x,ui)3104 xd ?d(x,ui)2215 xe ?d(x,ui
)1326 xf ?d(x,ui)0235 xb is optimal
22
Variants of Classic Facility Location Theory
Methods Complications
  • We dont have a network with nodes and edges we
    have points in a city
  • Sensors can only be at certain locations (size,
    weight, power source, hiding place)
  • We need to place more than one sensor
  • Instead of users, we have places where
    potential attacks take place.
  • Potential attacks take place with certain
    probabilities.
  • Wind, buildings, mountains, etc. add
    complications.

23
Variants of Classic Facility Location Theory
Methods Complications
  • These more complex problems are hard!
  • The best-known algorithms for solving these
    higher-dimensional variants of the classic
    location problem are due to Rafail Ostrovsky -- a
    partner on our project.
  • The mathematics-based approximation methods due
    to Ostrovsky and his colleagues are promising.

24
Algorithmic Approaches IIII Variants of Air
Pollution Monitoring Models
25
Variants of Air Pollution Monitoring Models
  • Long history of using mathematical models to
    locate air pollution monitors.
  • Use fluid dynamics
  • Use plume models.
  • Large computer simulations needed.
  • Long used in nuclear weapons defense.

26
Variants of Air Pollution Monitoring Models
  • Mathematical challenge Modify air pollution
    monitor placement modeling tools for complex
    biological agents.
  • E.g. Complications arise when applying the
    models to cities Buildings make it hard!

27
The Pattern Interpretation Problem
28
The Pattern Interpretation Problem (PIP)
  • It will be up to the Decision Maker to decide how
    to respond to an alarm from the sensor network.

29
Approaching the PIP Minimizing False Alarms
30
Approaching the PIP Minimizing False Alarms
  • One approach Redundancy.
  • Could require two or more sensors to make a
    detection before an alarm is considered confirmed
  • Could require same sensor to register two alarms
    Portal Shield requires two positives for the same
    agent during a specific time period.

31
Approaching the PIP Minimizing False Alarms
  • Could place two or more sensors at or near the
    same location. Require two proximate sensors to
    give off an alarm before we consider it
    confirmed.
  • Redundancy has drawbacks cost, delay in
    confirming an alarm.
  • We need mathematical methods to analyze the
    tradeoff between lowered false alarm rate and
    extra cost/delay

32
Approaching the PIP Using Decision Rules
  • Existing sensors come with a sensitivity level
    specified and sound an alarm when the number of
    particles collected is sufficiently high above
    threshold.

33
Approaching the PIP Using Decision Rules
  • Let f(x) number of particles collected at
    sensor x in the past 24 hours. Sound an alarm if
    f(x) gt T.
  • Alternative decision rule alarm if two sensors
    reach 90 of threshold, three reach 75 of
    threshold, etc.
  • Alarm if
  • f(x) gt T for some x,
  • or if f(x1) gt .9T and f(x2) gt .9T for some
    x1,x2,
  • or if f(x1) gt .75T and f(x2) gt .75T and
    f(x3) gt .75T for some x1,x2,x3.

34
Approaching the PIP Using Decision Rules
  • Prior work along these lines in missile detection
    (Cherikh and Kantor)

35
Bioterrorism Sensor Location Partner
Agencies/Institutions
  • Defense Threat Reduction Agency
  • MITRE Corporation
  • Los Alamos National Laboratory
  • Institute for Defense Analysis
  • New York City Dept. of Health

36
OUTLINE
  • Bioterrorism Sensor Location
  • Monitoring Message Streams
  • Identification of Authors
  • Detecting a Bioterrorist Attack through
    Syndromic Surveillance

37
Monitoring Message Streams Algorithmic Methods
for Automatic Processing of Messages
38
Objective
Monitor huge communication streams, in
particular, streams of textualized communication,
to automatically detect pattern changes and
"significant" events
Motivation monitoring email traffic, news,
communiques, faxes
39
Technical Approaches
  • Given stream of text in any language.
  • Decide whether "events" are present in the flow
    of messages.
  • Event new topic or topic with unusual level of
    activity.
  • Suppose events have been classified into classes
    or groups group 1, group 2,
  • A new message comes in. Does it fit into group 1?
    Into group 2? Or does it (and related messages)
    define a new group of interest?

40
One Approach Bag of Words
  • List all the words of interest that may arise in
    the messages being studied w1, w2,,wn
  • Bag of words vector b has k as the ith entry if
    word wi appears k times in the message.
  • Sometimes, use bag of bits Vector of 0s and
    1s count 1 if word wi appears in the message, 0
    otherwise.

41
Bag of Words Example
  • Words
  • w1 bomb, w2 attack, w3 strike
  • w4 train, w5 plane, w6 subway
  • w7 New York, w8 Los Angeles, w9 Madrid, w10
    Tokyo, w11 London
  • w12 January, w13 March

42
Bag of Words
  • Message 1
  • Strike Madrid trains on March 1.
  • Strike Tokyo subway on March 2.
  • Strike New York trains on March 11.
  • Bag of words b1 (0,0,3,2,0,1,1,0,1,1,0,0,3)
  • w1 bomb, w2 attack, w3 strike
  • w4 train, w5 plane, w6 subway
  • w7 New York, w8 Los Angeles, w9 Madrid, w10
    Tokyo, w11 London
  • w12 January, w13 March

43
The Approach Bag of Words
  • Key idea how close are two such vectors?
  • Suppose known messages have been classified into
    different groups group 1, group 2,
  • A message comes in. Which group should we put it
    in? Or is it new?
  • You look at the bag of words vector associated
    with the incoming message and see if fits
    closely to typical vectors associated with a
    given group.

44
The Approach Bag of Words
  • Your performance can improve over time.
  • You learn how to classify better.
  • Typically you do this automatically and try to
    develop mathematical methods that will allow a
    machine to learn from past data.

45
Bag of Words
  • Message 2
  • Bomb Madrid trains on March 1.
  • Attack Tokyo subway on March 2.
  • Strike New York trains on March 11.
  • Bag of words b2 (1,1,1,2,0,1,1,0,1,1,0,0,3)
  • w1 bomb, w2 attack, w3 strike
  • w4 train, w5 plane, w6 subway
  • w7 New York, w8 Los Angeles, w9 Madrid, w10
    Tokyo, w11 London
  • w12 January, w13 March

46
Bag of Words
  • Note that b1 and b2 are close
  • b1 (0,0,3,2,0,1,1,0,1,1,0,0,3)
  • b2 (1,1,1,2,0,1,1,0,1,1,0,0,3)
  • Close could be measured using distance d(b1,b2)
    number of places where b1,b2 differ (Hamming
    distance between vectors).
  • Here d(b1,b2) 3
  • The messages are similar could belong to the
    same group or class of messages.

47
Bag of Words
  • Message 3
  • Go on strike against Madrid trains on March 1.
  • Go on strike against Tokyo subway on March 2.
  • Go on strike against New York trains on March 11.
  • Bag of words b3 same as b1.
  • BUT message 3 is quite different from message
    1.
  • Shows complexity of problem. Maybe missing some
    key words like go or maybe we should use pairs
    of words like on strike (bigrams)

48
One Approach k-Nearest Neighbor (kNN)
Classifiers
  • How kNN Classifiers Work
  • Find k most similar training messages
    (neighbors)
  • Assign a message to those groups that are most
    common among neighbors (using weighting by
    distance)
  • kNN classifiers had been considered inefficient
    since finding neighbors is slow

49
Speeding up kNN
  • Can finding neighbors be made fast enough to make
    kNN practical?
  • Mathematics can help.
  • Store text and classes sparsely
  • Use inverted file heuristics that group input
    by word, not by document and compute
    similarities using only the few words occurring
    in the document
  • Result New methods are 10 to 100 times faster
    with only a 2-10 loss in effectiveness
    (according to some standard measures)
  • Software delivered to sponsors.

50
Streaming Data
  • We often have just one shot at the data as it
    comes streaming by because there is so much of
    it. This calls for powerful new algorithms.

51
Research Challenge Historic Data Analysis
  • The accumulation of text messages is massive over
    time
  • We can only save summaries of the data.
  • It is a great challenge to use only summarized
    historic data and see if a currently emerging
    phenomenon had precursors occurring in the past
    since you dont have the original data.
  • We have had some success with a novel
    architecture for historic and posterior analyses
    via small summaries - sketches

52
OUTLINE
  • Bioterrorism Sensor Location
  • Monitoring Message Streams
  • Identification of Authors
  • Detecting a Bioterrorist Attack through
    Syndromic Surveillance

53
Related Project Author Identification
Develop and evaluate techniques for identifying
authors in large collections of textual artifacts
(e-mails, communiques, transcribed speech, etc.).
Questions Addressed Which of a set of authors
wrote a particular document/message? Were two
documents written by the same author?
54
Author Identification
  • We are using methods developed in the Monitoring
    Message Streams Project
  • Building on classical work in Statistics Who
    wrote the Federalist papers, Hamilton or Madison?
  • More complicated than conventional text
    classification
  • Large number of possible authors
  • Not much training data
  • Authors write on multiple topics
  • Authors write in different styles in different
    genres

55
One Approach In Bag of Words Use Function
Words
  • a
  • about
  • above
  • according
  • accordingly
  • actual
  • actually
  • after
  • afterward
  • afterwards
  • again
  • against
  • another
  • any
  • anybody
  • anyone
  • anything
  • anywhere
  • are
  • aren't
  • around
  • art
  • as
  • aside
  • at
  • away
  • ago
  • ah
  • ain't
  • all
  • almost
  • along
  • already
  • also
  • although
  • always
  • am
  • among
  • an
  • and

56
Partner Agencies Monitoring Message Streams and
Author Identification Projects
  • Research sponsored by ITIC Intelligence
    Technology Innovation Center
  • Administratively under the CIA
  • Through interagency Knowledge, Discovery, and
    Dissemination (KDD) program.

57
OUTLINE
  • Bioterrorism Sensor Location
  • Monitoring Message Streams
  • Identification of Authors
  • Detecting a Bioterrorist Attack through
    Syndromic Surveillance

58
Bioterrorist Event Detection
  • Great concern about the deliberate introduction
    of diseases such as smallpox by bioterrorists has
    led to new challenges for mathematical
    scientists.

  • smallpox

smallpox
59
Bioterrorist Event Detection
  • Mathematical models of infectious diseases go
    back to Daniel Bernoullis mathematical analysis
    of smallpox in 1760.
  • However, modern data-gathering methods bring with
    them new challenges for mathematicians.
  • Methods used in Monitoring Message Streams and
    Author ID projects enter into using large data
    sets to detect bioterrorist events or emerging
    diseases (SARS) through syndromic surveillance

60
New Data Types for Public Health Surveillance
  • Managed care patient encounter data
  • Pre-diagnostic/chief complaint (ED data)
  • Over-the-counter sales transactions
  • Drug store
  • Grocery store
  • 911-emergency calls
  • Ambulance dispatch data
  • Absenteeism data
  • ED discharge summaries
  • Prescription/pharmaceuticals
  • Adverse event reports

61
Syndromic Surveillance NYC Dept. of Health Data
62
Approach
  • As with Monitoring Message Streams and Author
    Identification, represent data by using a vector.
  • For example, use bag of bits (0 or 1 only in
    each entry).
  • If use symptoms, then 1 or 0 represents presence
    or absence of symptoms such as coughing, fever
    over 102 degrees, achy legs, disoriented, etc.

63
Many New Mathematical Methods and Approaches
under Development
  • Spatial-temporal scan statistics
  • Statistical process control (SPC)
  • Bayesian applications
  • Market-basket association analysis
  • Text mining
  • Rule-based surveillance
  • Change-point techniques

64
Project a Collaboration between a Math/CS
Research Center and a Government Agency
DIMACS Center for Discrete Mathematics and
Theoretical Computer Science
CDC Centers for Disease Control and Prevention
65
  • Would Mathematics help Protect our Bridges and
    Tunnels?

George Washington Bridge
Lincoln Tunnel
66
  • Would Mathematics Help Protect our Borders?

67
Would it help with a Deliberate Outbreak of
Anthrax?
68
  • Similar approaches, using mathematical models,
    have proven useful in many other fields, to
  • make policy
  • plan operations
  • analyze risk
  • compare interventions
  • identify the cause of observed events

69
  • Why not in homeland security?
Write a Comment
User Comments (0)
About PowerShow.com