A Process Catalog for Workflow Generation PowerPoint PPT Presentation

presentation player overlay
1 / 34
About This Presentation
Transcript and Presenter's Notes

Title: A Process Catalog for Workflow Generation


1
A Process Catalog for Workflow Generation
  • Michael Wolverton, David Martin,Ian Harrison,
    Jerome Thomere
  • SRI International

2
Outline
  • Program overview
  • Project overview
  • Qualitative (capabilities) layer
  • Modeling query handling
  • Quantitative (quality of service) layers
  • Modeling query handling
  • Implementation

Primary focus in this talk
3
Tangram Program Objectives
  • Support the intelligence analyst in using data
    analysis tools effectively
  • Automatic instantiation of data analysis
    workflows
  • Maximize performance within acceptable resource
    constraints
  • Reusable workflow templates
  • Flexible workflow requests
  • Automatic selection of data analysis components
    and datasets
  • Quick easy characterization of component
    descriptions
  • By non-experts
  • Supporting precise capabilities queries
  • Incorporating empirical measures of speed and
    effectiveness

4
Example Workflow Template
Likelihood Ratio Detection
Group Detection
Entity/Transaction Data
Entity Equivalence (Alias Resolution)
Suspicion Scoring
Group Hypothesis Merging
Inexact Graph Matching
Event Equivalence
Group Detection
Logical Inference
GroupSeedSet
Recognized Events/Alerts
  • Backwards sweep
  • Forwards sweep

5
Simple Example Backward Sweep
(containsLinkType ?DS suspiciousEntity)
(containsNodeType ?DS Group)
AccuracyModels
NetKit
UWisc Suspicion Scoring
Process Preconditions
memberOf
suspiciousEntity
(containsNodeType ?DS memberOf)
(containsNodeType ?DS suspiciousEntity)
Qualitative Query
ProcessDescriptions
LAW
CADRE
Process Preconditions
VulnExp Pattern
Threat Resource Acquire Pattern
(containsNodeType ?DS SuspiciousEvent)
Qualitative Query
6
Simple Example Forward Sweep
Query Process Problem Data Model
AccuracyModels
NetKit
UWisc Suspicion Scoring
memberOf
suspiciousEntity
Data Model
ProcessDescriptions
Query Process Problem Data Model
LAW
CADRE
VulnExp Pattern
Threat Resource Acquire Pattern
Data Model
7
Program Architecture
8
Outline
  • Program overview
  • Project overview
  • Qualitative (capabilities) layer
  • Modeling query handling
  • Quantitative (quality of service) layers
  • Modeling query handling
  • Implementation

9
Project Overview Objectives and Approach
  • Challenge Characterize individual components in
    a way that allows a workflow management component
    to reason about them effectively
  • Approach Characterize processes answer queries
    in terms of
  • Process capabilities
  • What kinds of problems they are capable of
    answering
  • How they modify the available data
  • What data looks like before running the process
    and what it looks like after
  • Content
  • Accuracy
  • Performance
  • System requirements (memory, OS, etc.)
  • Time, memory use, etc.

10
Approach Layered Process Description
Layer Name Contents Formalism Source of knowledge
Capabilities Qualitative functional descriptions, hard resource constraints, invocation details Static characteristics in OWL pre postconditions in rules Hand-coded by component developers
Data Modification Statistical before/after descriptions of data Problem X Data Model gt Data Model Experimental analysis, theoretical analysis
Accuracy Statistical description of expected accuracy of algorithm results Problem X Data Model X Accuracy Model gt Accuracy Model Experimental analysis, theoretical analysis
Performance Statistical prediction of performance of algorithm Problem X Data Model X Resource Model gt Performance Model Experimental analysis, theoretical analysis
11
Relationship to Service Discovery Problem
  • Easier in some ways (simplifying assumptions)
  • Components operate on data only
  • No side-effects in the world
  • Simple patterns of I/O shared by most components
  • Smallish domain model (ontology)
  • Harder in some ways
  • Need to return preconditions related to specific
    needs
  • Least sufficient conditions
  • Need hi-fidelity (quantitative) Quality of
    Service models
  • Compute QoS for specific datasets at query-time

12
ProCat Architecture
RDF/XML Syntax with Extension
SPARQL
Query Handler
CL Reasoner
Quantitative Layer Prediction
Quantitative Models Repository
Capabilities Layer KB
SPARQL
. . .
. . .
. . .
PM1
GD1
GD1
GD2
. . .
PM2
GD2
OWL
. . .
. . .
PM Non-Linear Search Model Predictor
Linear Predictor
Ontologies
RDFS Reasoning
. . .
Process
Data
TEO
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .






13
Outline
  • Program overview
  • Project overview
  • Qualitative (capabilities) layer
  • Modeling query handling
  • Quantitative (quality of service) layers
  • Modeling query handling
  • Implementation

14
Capabilities Layer
Query Handler
CL Reasoner
Quantitative Layer Prediction
Quantitative Models Repository
Capabilities Layer KB
. . .
. . .
. . .
GD1
GD1
GD2
. . .
PM2
GD2
. . .
. . .
PM Non-Linear Search Model Predictor
Linear Predictor
Ontologies
. . .
Process
Data
TEO
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .






15
Example Capabilities Query
ltpcatFindInputDataRequirementsgt
ltpcatcomponentgt ltrdfDescription
rdfabout"http//...?component2"gt
ltrdftype rdfresource"http//.../Process.owlPat
ternMatchingProcess"/gt ltpdlhasOutput
rdfresource"http//...?dataVariable5"/gt
ltpdlhasInput rdfresource"http//...?dataVariab
le4"/gt ltpdlhasInput rdfresource"http//.
..?dataVariable3"/gt lt/rdfDescriptiongt
lt/pcatcomponentgt ltpcatconstraintsgt
ltrdfDescription rdfabout"http//...?dataVariab
le5"gt ltpdlhasRole rdfresource"http//..
./Process.owlHypothesisOutputRole"/gt
ltrdftype rdfresource"http//...Hypothesis"/gt
ltpdlcontainsNodeType rdfresource"http//
...MoneyLaunderingEvent"/gt
lt/rdfDescriptiongt .. lt/pcatconstraintsgt lt
/pcatFindInputDataRequirementsgt
16
Process Description Ontology
  • Process
  • Class hierarchy
  • Parameters
  • Types
  • Roles
  • Default values
  • Multiple inheritance
  • Pre- and post-conditions
  • Process Usage Template
  • Process installation
  • Resource requirements
  • Memory, disk space, libraries, etc.
  • Invocation conventions
  • Environment variables, paths

17
Process Ontology
18
Process
19
Capabilities Layer Challenges
  • Pre- and post-conditions
  • Hypothetical in nature
  • Inherently reified
  • Applicable to execution instances of a process

pre (input containsNodeType Person)
20
Capabilities Layer Challenges
  • Pre- and post-conditions
  • Hypothetical in nature
  • Inherently reified
  • Applicable to execution instances of a process
  • Propagation of values (in backwards sweep)

pre (input containsNodeType ?T)post (output
containsNodeType ?T)
21
Capabilities Layer Challenges
  • Pre- and post-conditions
  • Hypothetical in nature
  • Inherently reified
  • Applicable to execution instances of a process
  • Propagation of values (in backwards sweep)
  • Universally quantified conditional rules

(output containsNodeType ?T) - (input1
containsNodeType ?T), (input2
containsNodeType ?T).
22
Capabilities Layer Challenges
  • Pre- and post-conditions
  • Hypothetical in nature
  • Inherently reified
  • Applicable to execution instances of a process
  • Propagation of values (in backwards sweep)
  • Universally quantified rules
  • Queries may contain pre- and post-condition
    elements(including arbitrary pre-condition
    elements)

pre (input1 rdftype PersonDataset)
(input2 rdftype EventDataset) (input2
temporalRange lt...gt) post (output
containsLinkType ParticipatedIn)
23
Capabilities Layer Challenges
  • Pre- and post-conditions
  • Hypothetical in nature
  • Inherently reified
  • Applicable to execution instances of a process
  • Propagation of values (in backwards sweep)
  • Universally quantified rules
  • Queries may contain pre- and post-condition
    elements
  • Least sufficient precondition is desired

24
Solution
  • Process Usage Template (PUT)
  • Snapshot of an arbitrary successful occurrence
    of a process
  • Each process can have multiple PUTs
  • 2 declarative units
  • Pre / post condition (existentially quantified)
  • Conditional effect rules (universally quantified)
  • Two-stage query processing
  • SPARQL queries identify candidate processes based
    on static properties
  • Prolog-based evaluation of pre/post-condition
    query clauses
  • Asymmetric treatment of pre vs. post
  • Query postcondition clauses must be derivable
    from PUT postcondition (or conditional effect)
  • Query precondition clauses must be consistent
    with PUT precondition
  • Result precondition is accumulation of
  • Precondition (with propagated variable bindings)
  • Bodies of CE rules used to establish
    postcondition clauses(with propagated variable
    bindings)
  • Precondition clauses given in query

25
Outline
  • Program overview
  • Project overview
  • Qualitative (capabilities) layer
  • Modeling query handling
  • Quantitative (quality of service) layers
  • Modeling query handling
  • Implementation

26
Quantitative Layers
Layer Name Contents Formalism Source of knowledge
Capabilities Qualitative functional descriptions, hard resource constraints, invocation details Static characteristics in OWL pre postconditions in rules Hand-coded by component developers
Data Modification Statistical before/after descriptions of data Problem X Data Model gt Data Model Experimental analysis, theoretical analysis
Accuracy Statistical description of expected accuracy of algorithm results Problem X Data Model X Accuracy Model gt Accuracy Model Experimental analysis, theoretical analysis
Performance Statistical prediction of performance of algorithm Problem X Data Model X Resource Model gt Performance Model Experimental analysis, theoretical analysis
27
ProCat Quantitative Layers Architecture
SR
Component Execution Data
Quantitative Data, Accuracy, and Performance
Predictions
4.2 and 5.2 Queries
Query Handler
TEE
Experimental Results
Data Characterizations
Quantitative Layer Prediction
Prediction Engine
Quantitative Models Repository
. . .
. . .
GD1
GD2
Models
DC Metrics Ontology
PM Nonlinear Search Model Predictor
Linear Predictor
GD1
GD2
Coeff.
Coeff.
Data
GD2
Data
PM1
Pattern1
Pattern1
. . .
. . .






28
Quantitative Layers
  • Requirements
  • Precise
  • Efficient
  • Composable
  • Quantitative models represented declaratively
  • Tabular format (not in OWL)
  • Query result generation done procedurally
  • Using lisp functions
  • Coefficients for the linear model can be learned
    through a regression method

29
Process-specific Prediction Models
Data Modification
Performance
  • Recurrence relation Pattern Matcher model
    compared to LAW actual results
  • Mean error
  • Data Modification 20
  • Performance 19
  • Runtime differs from LAW by over 2 orders of
    magnitude

30
Outline
  • Program overview
  • Project overview
  • Qualitative (capabilities) layer
  • Modeling query handling
  • Quantitative (quality of service) layers
  • Modeling query handling
  • Implementation

31
Implementation
Component descriptions
Domain ontologies
WINGS
ProCatGUI
TEE
Logging
Concurrent queries
Tangram Workflow Services API
ProCat API
Web service API
ProCat infrastructure
ProCat Server
Sparql
RDFS
Prolog
Access API
SOAP
AllegroGraph
Triple Store
32
GUI
33
Future directions
  • Validity checking of ontology updates
  • Validity checking of new / updated process
    characterizations
  • Allow for disjunction in pre- and post-conditions
  • Process characterization editor
  • Automation of quantitative model acquisition
  • Assistance for updating process descriptions
    against ontology changes
  • Better online browsing and catalog management

34
Summary
  • Design implementation of a Process Catalog for
    Workflow Generation
  • Qualitative (capabilities) layer
  • Quantitative (quality of service) layers
  • Novel elements
  • Quantitative layers (Quality of Service)
  • Numeric models for data modification, accuracy,
    performance
  • Novel approach to reasoning about pre- and
    post-conditions
  • Propagation of values (in backwards sweep)
  • Universally quantified rules
  • Queries may contain pre- and post-condition
    elements
  • Computation of least sufficient precondition
Write a Comment
User Comments (0)
About PowerShow.com