Title: RAPID: Representation and Analysis of Probabilistic Intelligence Data
1RAPIDRepresentation and Analysis
ofProbabilistic Intelligence Data
PAINT
- Carnegie Mellon University
- PI Prof. Jaime G. Carbonell / jgc_at_cs.cmu.edu /
(412) 268-7279 - Dr. Eugene Fink / e.fink_at_cs.cmu.edu / (412)
268-6593 - Dr. Anatole Gershman / anatoleg_at_cs.cmu.edu /
(412) 268-8259 - DYNAMiX Technologies
- POC Dr. Ganesh Mani / gmani_at_dynamixtechnologies.c
om / (412) 401-0121 - Mr. Dwight Dietrich / ddietrich_at_dynamixtechnologie
s.com / (724) 940-4304
2People
- Carnegie Mellon
- FacultyJaime G. CarbonellEugene FinkAnatole
Gershman - StudentsBin FuDiwakar PunjaniAndrew Yeager
DYNAMiX PrincipalsDwight DietrichGanesh
Mani EngineersAtul BhandariJeremy
HermannVeera Manda
3Outline of the presentation
- RAPID functionality
- Preliminary demo
- Architecture and main components
- Integration with REALISM
- Current results and work plan
4Analysis of uncertain intelligence
- RAPID is a probabilistic reasoning engine for the
analysis of dynamically evolving intelligence
data.
- RAPID will help
- Identify important holes
- Locate most crucialmissing pieces
- Insert these pieces
- Knowledge sources
- Public domain
- Intelligence
- Inferences
5Analysis of uncertain intelligence
- RAPID will help intelligence analysts to
accomplish the following tasks.
- Draw probabilistic conclusions from available
intelligence, including uncertain and missing
data - Identify potentially surprising developments
- Formulate and assess hypotheses
- Identify critical uncertainties
- Develop strategies for proactive collection of
additional intelligence to resolve uncertainties,
based on the analysis of cost / benefit trade-offs
6Underlying functionality
- Representation of uncertaintyNovel
representation of massive uncertain data,which
supports fast matching and inferences - Inferences from uncertain dataScalable
inference mechanism for reasoningabout uncertain
intelligence - Analysis of critical uncertaintiesAssessment of
uncertain situations, evaluation of datautility,
and identification of important missing data - Proactive intelligence planningEvaluation of
available probes and constructionof optimized
intelligence-collection plans
7Outline of the presentation
- RAPID functionality
- Preliminary demo
- Architecture and main components
- Integration with REALISM
- Current results and work plan
8Preliminary demo
Uncertainty analysisand probe evaluation,integra
ted into Excel.
9Outline of the presentation
- RAPID functionality
- Preliminary demo
- Architecture and main components
- Integration with REALISM
- Current results and work plan
10Architecture
Advanced analysis of incomplete
data,identification of critical
uncertainties,evaluation and selection of
probes,what-if analysis, and visualization.
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty,
probes, and proactive data collection
11Architecture
Analystinterface
Uncertain situation assessmentand
data-collection planning
Fast database operations on astream of newly
incoming data,and integration of this
streamwith the static database.
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty,
probes, and proactive data collection
Processing ofdata streams
Real-time matching of queriesand inference rules
against amassive stream of new data
Scalable assessment ofuncertain intelligence
Relational database of uncertaindata and
inference rules
External API
OTHER PAINT SYSTEMS
12Architecture
Analystinterface
Approved plans forproactive data collection
Uncertain situation assessmentand
data-collection planning
Uncertainty calculus andproactive probe planning
Proactiveintelligencecollection
Excel extension for the analysis of uncertainty,
probes, and proactive data collection
Processing ofdata streams
Real-time matching of queriesand inference rules
against amassive stream of new data
Massive newintelligence
Massive newintelligence
Scalable assessment ofuncertain intelligence
Generalintelligencecollection
Relational database of uncertaindata and
inference rules
External API
Value-addedreasoning tools
OTHER PAINT SYSTEMS
13Uncertainty calculus andproactive probe planning
Microsoft Excel
14Scalable assessmentof uncertain intelligence
Uncertainfacts
Uncertaininferencerules
Semanticnetwork
15Value-added reasoning tools
The available intelligence data and inference
rules are in Excel tables, and in the uncertainty
database integrated with Excel.
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty,
probes, and proactive data collection
16Analyst interface
- Optional extension of the Excel interface
- Visualization and explanation of intelligence
data, inferences, and data-collection plans
17Outline of the presentation
- RAPID functionality
- Preliminary demo
- Architecture and main components
- Integration with REALISM
- Current results and work plan
18Integration goals
We will integrate the text-extraction system
developed by HNC / Fair Isaac with the
uncertainty-analysis system developed by CMU /
DYNAMiX. The integrated system will support the
following capabilities.
- Extraction of facts, relations, and causal links
from natural-language documents - Evaluation of given hypotheses
- Proactive information gathering
- Application to the analysis of Iranian
nano-technology plans and capabilities
19Inputs and outputs
REALISM
RAPID
- Input
- Requirements and filters for the information
extraction - Natural-language documents
- World-wide web
- Input
- Tables of uncertain facts
- Uncertain inference rules
- Queries for specific data
- Analyst hypotheses
- Output
- Large structured tables of relevant facts and
entities, which include uncertainty - Inference-rule representation of relations and
causal links, also including uncertainty
- Output
- Inferences from uncertain data
- Exact and approximatematches for given queries
- Hypothesis assessment
- Proactive plans for collectingadditional data
20Architecture
Analystinterface
Uncertain situation assessmentand
data-collection planning
Informationrequests
Topicfilters
Uncertainty calculus andproactive probe planning
Structuredfacts andentities
TEXT DOCUMENTS
REALISM
External API
Structured relations andcausal links
Scalable assessment ofuncertain intelligence
WEB
RAPID CMU / DYNAMiX
REALISM HNC / Fair Isaac
21Outline of the presentation
- RAPID functionality
- Preliminary demo
- Architecture and main components
- Integration with REALISM
- Current results and work plan
22Initial results
- Detailed technical plan of uncertain situation
assessment and proactive probe planningarchitect
ure, functionality, and algorithms - Uncertain intelligence scenario based onpublic
data about Iranian nano-technology
CONECPTUAL
23Current work
- Uncertainty calculus,integrated with Excel
- Proactive probe planning
- Scalable uncertainty assessment,integrated with
a relational database - Integration with REALISM
- Initial analyst interface
24Short-term plan
Prototype of uncertainty calculus March
Prototype of probe-planning tools March
Initial RAPID / REALISM integration May
Initial analyst interface (extended Excel) June
Prototype of uncertainty database July
25Long-term plan
All versions of RAPID will demonstrate all main
capabilities, with increasing functionality over
time.
Uncertain situation assessmentand proactive probe planning July 2008
Discrimination among competing hypothesesand identification of critical uncertainties July 2009
Fully integrated deployable prototype July 2009
Advanced proactive-intelligence planningand learning of inference rules July 2010
Value-added tools, which may include data-stream processing, entity co-reference, adversarial search, and Markov reasoning July 2011
Fully integrated deliverable system Jan 2012
26Evaluation
We expect that RAPID will provide significant
advantage over available off-the-shelf tools,
such as standard spreadsheets and database
systems.
27Evaluation
We expect that RAPID will provide significant
advantage over available off-the-shelf tools,
such as standard spreadsheets and database
systems.
To support this claim, we plan to compare the
productivity of analysts using RAPID with that of
analysts who perform the same tasks using
commercially available tools.
We will view RAPID as success if it consistently
outperforms the standard tools, and the analysts
report the overall positive experience of using
it.
28Adjustment of the earlier plan
- We need to adjust the plan to the new budget.We
will deliver the full core functionality, but we
propose to reduce the work on value-added tools.
- Reduced work
- Processing of data streams
- Advanced contingency analysis
- Analyst interface
- Suspended work
- Predictive Markov models
- Analysis of adversarial actions
29APPENDICES
30Appendices
- Previous work
- Empirical evaluation
- PAINT contributions
31ARGUS
- ARGUS project sponsored by DTO/ARDA
Identification and tracking of novel patterns in
massive databases and data streams.
Create
Detect
Create
Detect
Novel
Novel
Historical
Background
Novel
Historical
Background
Novel
Background
Novel
Re
-
cluster
Background
Novel
Re
-
cluster
Analysts
Clusters
Clusters
Data
Model
Events
Data
Model
Events
Model
Events
Model
Events
Tracked
New
Events
Data
Generate
Generate
Update
New
New
Match
Match
Profiles
Alerts
Profiles
Alerts
Profiles
Profiles
Profiles
Profiles
Profiles
Analysts
32ARGUS
- Estimate the density function at t0
- Grow the cluster for a period of ?t while
reducing the weight of old records - Estimate the new density function at t0?t
- Compare the two estimates
33ARGUS
Respiratory Diseases
SARS
Re-clustering
t0
?t
34RADAR
- RADAR project sponsored by DARPAAnalysis and
management of volatile crisis situations based on
uncertain data.
Top-level control and learning
Processnew data
Analysts
35RADAR
We have applied the system to repair a schedule
of a conference after a crisis loss of rooms.
36RAPID
- Unlike ARGUS
- Represents and analyzes uncertainty
- Supports complex inferences
- Unlike RADAR
- Scales to massive intelligence datasets
- Analyzes complex external situations
- Develops intelligence-collection plans
37Appendices
- Previous work
- Empirical evaluation
- PAINT contributions
38Evaluation goals
We expect that RAPID will provide significant
advantage over available off-the-shelf tools,
such as standard spreadsheets and database
systems.
39Experimental setup
We expect to recruit retired intelligence
analysts for the system evaluation, and ask them
to perform several tasks based on given uncertain
data.
- Identify the data most relevant to given tasks
- Evaluate the validity of given hypotheses
- Find relevant hidden patterns
- Identify critical missing data and propose
acost-effective plan for collecting this data
40Performance measurements
We will measure the following main factors to
evaluate the performance of analysts
- Number of high-level tasks completedwithin the
experiment time frame - Accuracy of hypothesis evaluation
- Number and relevance of identified patterns
- Effectiveness and costs of data-collection plans
We will also ask analysts to complete a
questionnaire on their overall experience.
41Expected results
- We will view the proposed work as success if
- RAPID consistently outperforms the off-the-shelf
tools in all four performance factors, - the performance difference for each factor is
statistically significant, and - analysts report the overall positive experience
of using the system.
42RAPID / REALISM evaluation
- Component evaluation
- We will measure the following performance
factors - Accuracy and completeness of text extraction
- Accuracy of hypothesis evaluation
- Effectiveness of data-collection plans
- Speed of each system component
Component utility We will also evaluate the
utility of REALISM and RAPID by comparing the
productivity of subjects under the following
three conditions
- Use of the integrated system
- Use of REALISM without RAPID
- Use of RAPID without REALISM
43Appendices
- Previous work
- Empirical evaluation
- PAINT contributions
44Main contributions
Strategy Generation and Exploration
3
Response Options
Data
Dynamic Simulation Models
Feedback
4
Representation of massive uncertain
knowledge Automated discovery of causal
relationships
Fast probabilistic integration of all
evidence Analysis of possible future developments
3
Identification of critical uncertainties Planning
of proactive intelligence gathering
4
45Inputs and outputs
Generalintelligencecollection
CONTINUOUS DATA STREAM
Uncertain intelligence and analyst
opinions Massive stream ofstructured records
Proactiveintelligencecollection
RAPID
Data-searchqueries
Querymatches
Uncertainsituationassessment
Specifichypotheses
Evaluation ofhypotheses
INTERACTIVE DATA ANALYSIS
New learnedrules
Inferencerules
Domainknowledge
Plans for proactiveintelligence collection
46Inputs
- From other PAINT components
- Available intelligence data and its certainty
- Hypotheses about unknown factors
- Available domain knowledge
- From analysts
- Intelligence-analysis tasks and priorities
- Hypotheses and related opinions
- Responses to RAPID-generated probes
- Additional domain knowledge
- From other sources
- Databases with available intelligence
- Public databases with relevant data
47Outputs
- Inferences from available uncertain data
- Evaluation of given hypotheses
- New hypotheses and their certainties
- Plans for proactive intelligence collection
- Learned inference rules