Title: A Process Catalog for Workflow Generation
1A Process Catalog for Workflow Generation
- Michael Wolverton, David Martin,Ian Harrison,
Jerome Thomere - SRI International
2Outline
- Program overview
- Project overview
- Qualitative (capabilities) layer
- Modeling query handling
- Quantitative (quality of service) layers
- Modeling query handling
- Implementation
Primary focus in this talk
3Tangram Program Objectives
- Support the intelligence analyst in using data
analysis tools effectively - Automatic instantiation of data analysis
workflows - Maximize performance within acceptable resource
constraints - Reusable workflow templates
- Flexible workflow requests
- Automatic selection of data analysis components
and datasets - Quick easy characterization of component
descriptions - By non-experts
- Supporting precise capabilities queries
- Incorporating empirical measures of speed and
effectiveness
4Example Workflow Template
Likelihood Ratio Detection
Group Detection
Entity/Transaction Data
Entity Equivalence (Alias Resolution)
Suspicion Scoring
Group Hypothesis Merging
Inexact Graph Matching
Event Equivalence
Group Detection
Logical Inference
GroupSeedSet
Recognized Events/Alerts
- Backwards sweep
- Forwards sweep
5Simple Example Backward Sweep
(containsLinkType ?DS suspiciousEntity)
(containsNodeType ?DS Group)
AccuracyModels
NetKit
UWisc Suspicion Scoring
Process Preconditions
memberOf
suspiciousEntity
(containsNodeType ?DS memberOf)
(containsNodeType ?DS suspiciousEntity)
Qualitative Query
ProcessDescriptions
LAW
CADRE
Process Preconditions
VulnExp Pattern
Threat Resource Acquire Pattern
(containsNodeType ?DS SuspiciousEvent)
Qualitative Query
6Simple Example Forward Sweep
Query Process Problem Data Model
AccuracyModels
NetKit
UWisc Suspicion Scoring
memberOf
suspiciousEntity
Data Model
ProcessDescriptions
Query Process Problem Data Model
LAW
CADRE
VulnExp Pattern
Threat Resource Acquire Pattern
Data Model
7Program Architecture
8Outline
- Program overview
- Project overview
- Qualitative (capabilities) layer
- Modeling query handling
- Quantitative (quality of service) layers
- Modeling query handling
- Implementation
9Project Overview Objectives and Approach
- Challenge Characterize individual components in
a way that allows a workflow management component
to reason about them effectively - Approach Characterize processes answer queries
in terms of - Process capabilities
- What kinds of problems they are capable of
answering - How they modify the available data
- What data looks like before running the process
and what it looks like after - Content
- Accuracy
- Performance
- System requirements (memory, OS, etc.)
- Time, memory use, etc.
10Approach Layered Process Description
Layer Name Contents Formalism Source of knowledge
Capabilities Qualitative functional descriptions, hard resource constraints, invocation details Static characteristics in OWL pre postconditions in rules Hand-coded by component developers
Data Modification Statistical before/after descriptions of data Problem X Data Model gt Data Model Experimental analysis, theoretical analysis
Accuracy Statistical description of expected accuracy of algorithm results Problem X Data Model X Accuracy Model gt Accuracy Model Experimental analysis, theoretical analysis
Performance Statistical prediction of performance of algorithm Problem X Data Model X Resource Model gt Performance Model Experimental analysis, theoretical analysis
11Relationship to Service Discovery Problem
- Easier in some ways (simplifying assumptions)
- Components operate on data only
- No side-effects in the world
- Simple patterns of I/O shared by most components
- Smallish domain model (ontology)
- Harder in some ways
- Need to return preconditions related to specific
needs - Least sufficient conditions
- Need hi-fidelity (quantitative) Quality of
Service models - Compute QoS for specific datasets at query-time
12ProCat Architecture
RDF/XML Syntax with Extension
SPARQL
Query Handler
CL Reasoner
Quantitative Layer Prediction
Quantitative Models Repository
Capabilities Layer KB
SPARQL
. . .
. . .
. . .
PM1
GD1
GD1
GD2
. . .
PM2
GD2
OWL
. . .
. . .
PM Non-Linear Search Model Predictor
Linear Predictor
Ontologies
RDFS Reasoning
. . .
Process
Data
TEO
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .
13Outline
- Program overview
- Project overview
- Qualitative (capabilities) layer
- Modeling query handling
- Quantitative (quality of service) layers
- Modeling query handling
- Implementation
14Capabilities Layer
Query Handler
CL Reasoner
Quantitative Layer Prediction
Quantitative Models Repository
Capabilities Layer KB
. . .
. . .
. . .
GD1
GD1
GD2
. . .
PM2
GD2
. . .
. . .
PM Non-Linear Search Model Predictor
Linear Predictor
Ontologies
. . .
Process
Data
TEO
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .
15Example Capabilities Query
ltpcatFindInputDataRequirementsgt
ltpcatcomponentgt ltrdfDescription
rdfabout"http//...?component2"gt
ltrdftype rdfresource"http//.../Process.owlPat
ternMatchingProcess"/gt ltpdlhasOutput
rdfresource"http//...?dataVariable5"/gt
ltpdlhasInput rdfresource"http//...?dataVariab
le4"/gt ltpdlhasInput rdfresource"http//.
..?dataVariable3"/gt lt/rdfDescriptiongt
lt/pcatcomponentgt ltpcatconstraintsgt
ltrdfDescription rdfabout"http//...?dataVariab
le5"gt ltpdlhasRole rdfresource"http//..
./Process.owlHypothesisOutputRole"/gt
ltrdftype rdfresource"http//...Hypothesis"/gt
ltpdlcontainsNodeType rdfresource"http//
...MoneyLaunderingEvent"/gt
lt/rdfDescriptiongt .. lt/pcatconstraintsgt lt
/pcatFindInputDataRequirementsgt
16Process Description Ontology
- Process
- Class hierarchy
- Parameters
- Types
- Roles
- Default values
- Multiple inheritance
- Pre- and post-conditions
- Process Usage Template
- Process installation
- Resource requirements
- Memory, disk space, libraries, etc.
- Invocation conventions
- Environment variables, paths
17Process Ontology
18Process
19Capabilities Layer Challenges
- Pre- and post-conditions
- Hypothetical in nature
- Inherently reified
- Applicable to execution instances of a process
pre (input containsNodeType Person)
20Capabilities Layer Challenges
- Pre- and post-conditions
- Hypothetical in nature
- Inherently reified
- Applicable to execution instances of a process
- Propagation of values (in backwards sweep)
pre (input containsNodeType ?T)post (output
containsNodeType ?T)
21Capabilities Layer Challenges
- Pre- and post-conditions
- Hypothetical in nature
- Inherently reified
- Applicable to execution instances of a process
- Propagation of values (in backwards sweep)
- Universally quantified conditional rules
(output containsNodeType ?T) - (input1
containsNodeType ?T), (input2
containsNodeType ?T).
22Capabilities Layer Challenges
- Pre- and post-conditions
- Hypothetical in nature
- Inherently reified
- Applicable to execution instances of a process
- Propagation of values (in backwards sweep)
- Universally quantified rules
- Queries may contain pre- and post-condition
elements(including arbitrary pre-condition
elements)
pre (input1 rdftype PersonDataset)
(input2 rdftype EventDataset) (input2
temporalRange lt...gt) post (output
containsLinkType ParticipatedIn)
23Capabilities Layer Challenges
- Pre- and post-conditions
- Hypothetical in nature
- Inherently reified
- Applicable to execution instances of a process
- Propagation of values (in backwards sweep)
- Universally quantified rules
- Queries may contain pre- and post-condition
elements - Least sufficient precondition is desired
24Solution
- Process Usage Template (PUT)
- Snapshot of an arbitrary successful occurrence
of a process - Each process can have multiple PUTs
- 2 declarative units
- Pre / post condition (existentially quantified)
- Conditional effect rules (universally quantified)
- Two-stage query processing
- SPARQL queries identify candidate processes based
on static properties - Prolog-based evaluation of pre/post-condition
query clauses - Asymmetric treatment of pre vs. post
- Query postcondition clauses must be derivable
from PUT postcondition (or conditional effect) - Query precondition clauses must be consistent
with PUT precondition - Result precondition is accumulation of
- Precondition (with propagated variable bindings)
- Bodies of CE rules used to establish
postcondition clauses(with propagated variable
bindings) - Precondition clauses given in query
25Outline
- Program overview
- Project overview
- Qualitative (capabilities) layer
- Modeling query handling
- Quantitative (quality of service) layers
- Modeling query handling
- Implementation
26Quantitative Layers
Layer Name Contents Formalism Source of knowledge
Capabilities Qualitative functional descriptions, hard resource constraints, invocation details Static characteristics in OWL pre postconditions in rules Hand-coded by component developers
Data Modification Statistical before/after descriptions of data Problem X Data Model gt Data Model Experimental analysis, theoretical analysis
Accuracy Statistical description of expected accuracy of algorithm results Problem X Data Model X Accuracy Model gt Accuracy Model Experimental analysis, theoretical analysis
Performance Statistical prediction of performance of algorithm Problem X Data Model X Resource Model gt Performance Model Experimental analysis, theoretical analysis
27ProCat Quantitative Layers Architecture
SR
Component Execution Data
Quantitative Data, Accuracy, and Performance
Predictions
4.2 and 5.2 Queries
Query Handler
TEE
Experimental Results
Data Characterizations
Quantitative Layer Prediction
Prediction Engine
Quantitative Models Repository
. . .
. . .
GD1
GD2
Models
DC Metrics Ontology
PM Nonlinear Search Model Predictor
Linear Predictor
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .
28Quantitative Layers
- Requirements
- Precise
- Efficient
- Composable
- Quantitative models represented declaratively
- Tabular format (not in OWL)
- Query result generation done procedurally
- Using lisp functions
- Coefficients for the linear model can be learned
through a regression method
29Process-specific Prediction Models
Data Modification
Performance
- Recurrence relation Pattern Matcher model
compared to LAW actual results - Mean error
- Data Modification 20
- Performance 19
- Runtime differs from LAW by over 2 orders of
magnitude
30Outline
- Program overview
- Project overview
- Qualitative (capabilities) layer
- Modeling query handling
- Quantitative (quality of service) layers
- Modeling query handling
- Implementation
31Implementation
Component descriptions
Domain ontologies
WINGS
ProCatGUI
TEE
Logging
Concurrent queries
Tangram Workflow Services API
ProCat API
Web service API
ProCat infrastructure
ProCat Server
Sparql
RDFS
Prolog
Access API
SOAP
AllegroGraph
Triple Store
32GUI
33Future directions
- Validity checking of ontology updates
- Validity checking of new / updated process
characterizations - Allow for disjunction in pre- and post-conditions
- Process characterization editor
- Automation of quantitative model acquisition
- Assistance for updating process descriptions
against ontology changes - Better online browsing and catalog management
34Summary
- Design implementation of a Process Catalog for
Workflow Generation - Qualitative (capabilities) layer
- Quantitative (quality of service) layers
- Novel elements
- Quantitative layers (Quality of Service)
- Numeric models for data modification, accuracy,
performance - Novel approach to reasoning about pre- and
post-conditions - Propagation of values (in backwards sweep)
- Universally quantified rules
- Queries may contain pre- and post-condition
elements - Computation of least sufficient precondition