Data Grid Automation - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Data Grid Automation

Description:

... autonomous administrative domains of the same enterprise (ABCZ.com) ... ( converse is also true) San Diego Supercomputer Center. SDSC Storage Resource Broker ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 33
Provided by: aruns2
Learn more at: https://users.sdsc.edu
Category:
Tags: automation | com | data | grid | true

less

Transcript and Presenter's Notes

Title: Data Grid Automation


1
Data Grid Automation
Or What is SRB Matrix?
  • Arun Jagatheesan et al.,
  • San Diego Supercomputer Center
  • University of California, San Diego

VLDB Workshop on Data Management in
Grids Trondheim, Norway, 2-3 September 2005
2
Talk Outline
  • Data grid Landscape
  • Long-run data management processes
  • Data Grid ILM
  • Data Grid Triggers
  • Dataflow Pipelines
  • Execution Logic Data Grid Language
  • End-to-End Infrastructure Deployment
  • API
  • User GUI
  • Service-oriented Infrastructure

3
Data Grid Landscape
4
The Grid Vision
5
Data Grid Resource Providers
Grid Resource Providers (GRP) providing content
and/or storage
GRP
6
Data Grid Administrative Domain
  • Administrative domain with one or more GFS
    Resource Providers
  • Could include their data centers

Research Lab
GRP
7
Data Grid Administrative domains
University data storage (10)
Storage-R-Us Resource Providers data storage
(50)
Research lab- Taiwan data storage (40)
GRP
8
Data Grid (Enterprise Utility)
Physical Resources managed by autonomous
administrative domains of the same enterprise
(ABCZ.com)
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
9
Data Grid (Enterprise Utility)
Each project has a data grid instance consisting
of Logical Resources with different SLAs offered
by IT department
Project 1
Project 2
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
10
Data Grid (Enterprise Utility)
Each department has a data grid instance
consisting of Logical Resources with different
SLAs offered by IT department
Dept1
Dept2
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
11
Data Grid (Enterprise Utility)
Project1
Project2
Project3
Project4
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
12
Long-run Processes in Data Grid
  • Data Grid ILM
  • Data Grid Triggers
  • Data Gridflows

13
Data Grid ILM
14
Change is Constant
  • Changes in access patterns
  • Based on number of users accessing a data
  • Domains which want to access data
  • Data Value
  • The value of data set (collections?) for a
    particular domain based on it business model and
    users access patterns
  • Each domain will have a different value based on
    its users and its role in a data grid

15
Data Value based on users
When more users access a project data, its data
value increases, move that data to a faster
storage type
Project1
Project2
Project3
Project4
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
16
Data Value based on domain
When more users from the same domain access the
data, the data value for that particular data in
that particular domain increases, so replicate
the data to resources in that domain. (converse
is also true)
Project1
Project2
Project3
Project4
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
17
Data Value based on role
The 3rd party data center no users who use
data, but is interested in having replica of any
data (or deleted data) for long term preservation
Project1
Project2
Project3
Project4
3rd Party
IT Department US
IT Department Asia
ABCZ.com US
Data center
ABCZ.com Asia
18
Data Grid ILM
  • ILM Information Lifecycle Management
  • Dynamic re-orientation of data placement and data
    retention policies (rules)
  • Based on business value of data and storage
    cost
  • HSM Hierarchical Storage Management, based on
    data freshness. ILM goes one step further
  • Applying this concept on Data Grid, very tricky
    as different autonomous domains have different
    business rules

19
Data Grid Triggers
20
Data Grid Triggers
  • Similar to triggers in databases
  • Based on ECA concepts
  • Event
  • Condition
  • Action
  • Example
  • Event Insert new file in collection
    (/ourProject/data)
  • Condition (color blue galaxy
    Andromedia)
  • Action Run ( selectiveDataReplicator.dgl )

21
Data ? Discovery
New data
Digital entities
updates relationships among data in collections
Meta-data
Services invoked to analyze new relationships
Services
DGMS applications get notified of state updates
State
22
Data Gridflows
23
Gridflow in SCEC (data ? information pipeline)
Metadata derivation
Ingest Data
Ingest Metadata
Determine analysis pipeline
Initiate automated analysis
Use the optimal set of resources based on the
task on demand
Organize result data into distributed data grid
collections
All gridflow activities stored for data flow
provenance
24
Data Grid Language (DGL)
25
Data Grid Language
  • Requirement
  • Data Grid ILM process
  • The long run process that has to be run is
    described in DGL
  • Data Grid Triggers
  • Action part of the ECA (Event-Condition-Action)
    logic
  • Data Gridflows
  • Step by step execution of long run process on
    Data Grid
  • Analogy of SQL in relational databases
  • Long-run process procedures stored and executed
    in Data Grid it self
  • Captures the Infrastructure Execution Logic

26
DGL Request
Annotations about the Data Grid Request
Can be either a Flow or a Status Query
27
DGL Requests (2 types)
  • Data Grid Flow
  • An XML Structure that describes the execution
    logic, associated procedural rules and DGL
    variables. Can be synchronous or asynchronous
    flow
  • Status Query
  • An XML Structure used to query the execution
    status any gridflow or a sub-flow at any granular
    level. Status Queries can be made for both
    synchronous and asynchronous flows

28
Flow
Scoped Variables that can control the flow
Logic used by the sub-members
Sub-members that are the real execution statements
29
Flow Logic (How a flow executes)
30
ltuserDefinedRule name"beforeEntry"gt ltconditiongt
ltsimpleQuerygtnumVar 1lt/simpleQuerygt lt/conditi
ongt ltaction name"true"gt ltactionStringgtSET var1
1lt/actionStringgt lt/actiongt ltaction
name"true"gt ltactionStringgtSET var2
"foo"lt/actionStringgt lt/actiongt ltaction
name"false"gt ltactionStringgtSET var1
0lt/actionStringgt lt/actiongt lt/userDefinedRulegt
31
What is SRB Matrix?
  • Matrix provides the SRB as a Web Service
  • Web Service based on Data Grid Language
  • SOA for Data Grid or Digital Library
  • Service oriented infrastructure
  • Asynchronous end-user facing applications
  • Long run operations presented to users as
    portlets
  • Data Grid Automation and ILM
  • File Triggers on unstructured data
  • Automated movement or management of data

32
Matrix Gridflow Server Architecture
JAXM Wrapper
WSDL Description
SOAP Service for Matrix Clients
Matrix Data Grid Request Processor
Sangam P2P Gridflow Broker and Protocols
Transaction Handler
Status Query Handler
Workflow Query Processor
Flow Handler and Execution Manager
Gridflow Meta data Manager
XQuery Processor
ECA rules Handler
Persistence (Store) Abstraction
Matrix Agent Abstraction
Agents for java, WSDL and other grid executables
SDSC SRB Agents
Other SDSC Data Services
In Memory Store
JDBC
33
Conclusion
  • Data Grids are evolving
  • Data Grid Automation of long-run processes
    essential
  • Need a language for Data Grid Automation
  • Data Grid Language is one such effort as part SRB
    Matrix Project
  • Open source project for anyone to use (or join)
  • talk2matrix_at_sdsc.edu (or arun_at_sdsc.edu)
Write a Comment
User Comments (0)
About PowerShow.com