Data Management in Cloud Workflow Systems - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Data Management in Cloud Workflow Systems

Description:

Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology Outline Cloud Computing ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 17
Provided by: swinburne
Category:

less

Transcript and Presenter's Notes

Title: Data Management in Cloud Workflow Systems


1
Data Management in Cloud Workflow Systems Dong
Yuan Faculty of Information and Communication
Technology Swinburne University of Technology
2
Outline
  • Cloud Computing Cloud Workflow Systems
  • Introduction to cloud workflow systems. A brief
    overview of grid workflow systems.
  • Data Management in Cloud Workflow Systems
  • New features and research issues
  • Cloud Computing Environment and SwinDeW-C
  • Our simulation environment and cloud workflow
    system

3
  • Cloud Computing Cloud Workflow Systems

4
Cloud Computing
  • Some new features of cloud computing
  • Large data centres with cheap hardware
  • Virtualisation
  • Internet based and SOA
  • SaaS, PaaS, IaaS
  • Market driven and cost model
  • Research of cloud computing has emerged in many
    areas
  • Data mining, Database, Parallel computing
    Scientific application, Content delivery

5
Cloud Workflow Systems
  • Grid workflow systems
  • Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON
  • Gridbus, GridFlow
  • Build-time focus on data modelling.
  • Kepler actor-oriented data modelling. Taverna -
    Sculf. ASKALON - AGWL
  • Runtime adopt Data Grid system
  • Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS),
    GSB, DaltOn

6
Cloud Workflow Systems
  • Architecture
  • Based on Internet
  • Platform as a Service
  • More distributed

7
  • Data Management in Cloud Workflow Systems

8
Data Management in Cloud Workflow Systems
  • New features and challenges
  • Independent of users and automatic
  • Cost driven
  • computation cost, storage cost, data transfer
    cost
  • Data dependency
  • Task data, data data, derivation
  • Some research issues
  • Data partition, placement, replication,
    synchronisation, provenance, catalogue,
    meta-data, consistence, reduction, storage,
    movement, etc.

9
Data Placement in Cloud Workflow Systems
  • Data Placement to decide where to store the
    application data in the distributed data centres
  • Aims
  • Reduce data movement
  • Reduce task waiting time
  • Strategy
  • Data dependency dataset dataset
  • Build-time existing data, runtime generated
    data (also intermediate data)

10
Data Replication in Cloud Workflow Systems
  • Data replication for one dataset, store several
    copies in different places (data centres)
  • Aims
  • Increase data security
  • Fast data access
  • Reduce data movement
  • Strategy
  • Dynamic replication.

11
Intermediate Data Storage in Cloud Workflow
Systems
  • Intermediate data storage is especially
    importance in scientific workflows
  • Aim
  • Reduce system cost
  • Strategy
  • Intermediate data can be regenerated with data
    provenance information
  • Selectively store some key intermediate datasets

12
  • Cloud computing environment and SwinDeW-C

13
Simulation Cloud
14
Web Portal
15
Related key system components of SwinDeW-C
16
End
  • Questions?

Thanks!
Write a Comment
User Comments (0)
About PowerShow.com