Optimizing of data access using replication technique - PowerPoint PPT Presentation

About This Presentation
Title:

Optimizing of data access using replication technique

Description:

Optimal storage element is one with the maximal weight W(s,d) ... Takes under consideration gain from replication G(), cost of replica creation C ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 29
Provided by: Ste8344
Category:

less

Transcript and Presenter's Notes

Title: Optimizing of data access using replication technique


1
Optimizing of data access using replication
technique
  • Renata Slota1, Darin Nikolow1,Lukasz Skital2,
    Jacek Kitowski1,2
  • 1 Institute of Computer Science AGH-UST, Cracow
  • 2 ACC CYFRONET AGH, Cracow

2
Agenda
  • Motivation of the work
  • Why does today grid computing need replication?
  • Replication basics
  • Clusterix Data Management System
  • Architecture, optimization and replication
    algorithms
  • Optimization Example
  • Replication Example
  • Summary, conclusions

3
Site-level vs. Grid-levelreplication
  • Site-level replication
  • Replicas in one site
  • Implementation examples
  • RAID
  • HSM
  • Grid-level replication
  • Data management systems
  • Replicas spread on many sites

4
Motivation of the workWhy does today grid
computing need replication?
  • Data protection and availability
  • Malfunction of one storage does not affect data
    itself, only performance is affected
  • Performance
  • Low level optimization and replication are not
    sufficient (RAID, HSM)
  • Limited network bandwidth
  • Limited storage performance

5
Replication scenarios
  • Static replication
  • Decision made by system administrator or user
  • Limited system support replica selection,
    replica coherency, replica ordering
  • Dynamic replication
  • Decision made by dedicated grid component based
    on current data access pattern of users
  • Full system support

6
Replication consequences
  • Optimal replica selection algorithm
  • Replica creation and removal algorithm
  • Cost of replica creation, update and storage
  • Replica coherency

7
ClusterixNational Cluster of Linux Systems
  • Project aim
  • To develop set of tools and procedures allowing
    to build productive Grid environment based on
    local PC clusters spread in independent
    supercomputing centers
  • Network Layer
  • Pionier Polish optical networks

8
Clusterix Data Management SystemArchitecture
9
Optimization Algorithm
  • Selects optimal storage element for
  • data accessing
  • replica creation
  • Takes under consideration current state of the
    System
  • Optimal storage element is one with the maximal
    weight W(s,d)
  • W(s,d)min((1-NetLoad(s))?bandwidth(s,d),
    (1-Sload(s))?Sbandwidth(s))
  • s storage element
  • d destination node
  • NetLoad(s) s network interface load
  • Bandwidth(s,d) available bandwidth between s
    and d
  • Sload(s) storage system load
  • Sbandwidth(s) storage system bandwidth

10
Automatic replication algorithm
  • Takes under consideration gain from replication
    G(), cost of replica creation C(), cost of
    replicas update U() and administrative factor
    A().
  • Replication profit
  • P(d,R,S,f)G(d,R,S,f)C(d,R,f)U(d,R,S,f)A(d,f)
  • d storage element, which profit is computed for
  • R set of storage elements containing replicas
    of f
  • S statistic data history of file usage
  • f considered file

11
Storage oriented problems Data intensive
applications for Clusterix
  • Simulation of transonic flow past a wings tips
  • Visualization of complex multidimensional
    structures
  • Ecosystem modeling and simulation

12
Optimization Example
  • Node A needs file F stored on SE1, SE2 and SE3

F
NMS
Optimizer
F
SE1
NMS
CDMS
JIMS
NMS
JIMS
Node A
F
SE2
SE3
NMS
NMS
JIMS
F
13
Optimization Example
  • Node A sends request to CDMS

NMS
Optimizer
F
SE1
NMS
CDMS
JIMS
NMS
JIMS
Node A
F
SE2
SE3
NMS
NMS
JIMS
F
14
Optimization Example
  • CDMS uses Optimizer to choice optimal SE

NMS
Optimizer
F
SE1
NMS
CDMS
JIMS
NMS
JIMS
Node A
F
SE2
SE3
NMS
NMS
JIMS
F
15
Optimization Example
W(s3,d)min((1-NetLoad(s3))?bandwidth(s3,d),
(1-Sload(s3))?Sbandwidth(s3))
W(s2,d)min((1-NetLoad(s2))?bandwidth(s2,d),
(1-Sload(s2))?Sbandwidth(s2))
  • Optimizer is working

W(s1,d)min((1-NetLoad(s1))?bandwidth(s1,d),
(1-Sload(s1))?Sbandwidth(s1))
NMS
Optimizer
F
SE1
NMS
CDMS
JIMS
NMS
JIMS
Node A
F
SE2
SE3
NMS
NMS
JIMS
F
16
Initial replication example
JIMS
Optimizer
SE1
NMS
CDMS
NMS
Clusterix Entry point
NMS
User Workstation
JIMS
SE3
SE2
NMS
JIMS
NMS
17
Dynamic replication in Clusterix
  • Initial replication
  • Every stored data file should be replicated
  • Replication on demand
  • Job driven replication
  • Replication ordered by external process
  • Replication based on statistic analysis
  • Data access pattern driven replication

18
Automatic replication exampleSituation
  • 3 clusters
  • 4 storage elements
  • 2 contain replica of
  • Set of applications running on these clusters and
    accessing file

F
F
SE1
SE4
SE2
SE3
F
F
19
Automatic replication example
Gain
Optimizer
F
F
Cost of rep.
Sleeping
Working
Replication Module
Cost of update
Adm. factor
CDMS
SE1
Statistic Module
SE2
SE3
SE4
20
Automatic replication example
Optimizer
F
F
Working
Replication Module
Sleeping
CDMS
SE1
Statistic Module
SE2
SE3
F
F
F
SE4
F
F
F
F
21
Automatic replication example
Optimizer
F
F
Sleeping
Replication Module
CDMS
SE1
Statistic Module
SE2
SE3
F
SE4
22
Summary
  • Architecture of CDMS with Optimization and
    Replication modules has been designed
  • Replication and optimization algorithms has been
    specified
  • Modules interfaces has been specified
  • Future work
  • Integration and tests

23
Conclusions
  • Simulation of replication vs. real system
    implementation
  • Replication should be designed to meet specific
    Clusterix applications profile
  • Data availability
  • Replication drawbacks

24
Publications
  • Extended functionality of Virtual Storage System
    for grid
  • Renata Slota, Darin Nikolow, Lukasz Skital, Jacek
    Kitowski
  • Cracow Grid Workshop 2004, poster no. 13
  • Application of data replication methods in
    Clusterix project (in polish)
  • Renata Slota, Darin Nikolow, Lukasz Skital, Jacek
    Kitowski
  • Pionier 2004, 19-20 May, Poznan, electronic
    publication
  • Implementation of replication methods in the Grid
    Environment
  • Renata Slota, Darin Nikolow, Lukasz Skital, Jacek
    Kitowski
  • Submitted to European Grid Conference

25
Thank You!
26
Clusterix Data Management SystemArchitecture
  • Replication module
  • Responsible for
  • Automatic replica creation/removal
  • Implementation
  • Java
  • Apache SOAP
  • Cooperate with
  • Optimization module
  • Statistic module

27
Clusterix Data Management SystemArchitecture
  • Optimization Module
  • Responsible for
  • storage element selection for newly created
    replica,
  • optimal replica selection.
  • Implementation
  • C/C
  • gSOAP
  • Cooperates with
  • Network Monitoring System (NMS)
  • Information System
  • JMX-based Infrastructure Monitoring System (JIMS)

28
Clusterix Data Management SystemArchitecture
  • Information System (JIMS)
  • Department of Computer Science, AGH University of
    Science Technology
  • Provides the following information for selected
    node
  • Available storage capacity
  • Total storage capacity
  • Network interface load
  • Network interface bandwidth
  • Storage system load
  • Average storage system load
  • Maximal measured storage bandwidth

29
Clusterix Data Management SystemArchitecture
  • Network Monitoring System
  • Poznan Supercomputing and Networking Center
  • Provides the following information
  • Maximum bandwidth between two network nodes
  • Current load between two network nodes
  • Nodes availability

30
Clusterix Data Management SystemArchitecture
Statistic Module Bialystok Technical
University Responsible for gathering information
about past data usage
Write a Comment
User Comments (0)
About PowerShow.com