Title: Data Management for Grid Environments By Zafar Nasir
1Data Management for Grid EnvironmentsByZafar
Nasir
2(No Transcript)
3(No Transcript)
4Data Management for Grid Environments
- Topics to be covered
- Introduction motivation
- Current Build-up
- Data management as part of high performance and
high throughput computing. - Services fundamental to Data Grid
- Data grid services as an extension of Globus
Toolkit. - GridFTP
- Replica Management
5Introduction and Motivation
- Data Intensive applications
- Two possible Approaches to identify data
management needs in Grid environment - 1.Assesment based on existing components
associated with data management. - 2.Assesing application needs and requirements,
and identifying missing functionality. - Middle ground
6Current Build-up
- Data grid is being considered as next generation
of data handling system for sharing access to
data and storage systems with in multiple
administrative domains - Use of databases with in grid applications
- for data and metadata management
- Adoption of Open Grid services Architecture
- to promote integration of Grid with web services.
7Data Management Challenges
- Diverse usage scenarios, both updatable
- and readable data, differing consistency
requirements, diverse data access methods. - Heterogeneity at all system levels, storage
systems, data mechanisms, and policy. - Performance demand associated with access,
manipulation, and analysis of large quantities of
data.
8Communities Requiring Access to distributed Data
sources
- Digital libraries provide services for
manipulating, presenting, discovering, browsing
and displaying digital objects. - Grid environments for processing distributed data
with application involving extraction of complex
scientific information from large collection of
measured and computed data. - Persistent archives for maintaining collections
while the technology changes.
9High performance and high throughput computing
- Distinction based on single/multiple
- applications
- Difference between business computing and
scientific computing - Data management for component based environment
- Data Grid and Computational Grid
10Building Blocks to implement data Management
Functions
- Local services that must be provided by a given
storage resource. - Global services that need to be provided in a
wider context - Each data storage and management resource must
subscribe to a global service and must also
support all or some of the APIs . - Grid-enabled diversity in operations/mechanism
- and performance tolerance across mechanism and
resources
11Data management as unified process
- Entails number of stages each having its own
family of products and algorithms - 1.Management services Relates to operations
and mechanisms offered with in each storage
resource, and global services with which the
resource interacts - 2.Support and application services Relate to
higher level operations which undertake
correlations and aggregations on the stored data. -
12Common Data processing and management operations
- Data Pre-Processing and Formatting
- Data fusion
- Data storage, also involve data migration and
replication between different storage media - Data Analysis/Provenance
- Query estimation and optimisation
13Data Storage standards
- IEEE reference model for open storage systems
interconnection (OSSI), focussed on technical
details of mass storage systems, and contains
specification for storage media and data
management software. - ISOs Open Archival information system, aims to
provide a framework for the operation of long
term archives serving a particular community.
14Data Grid Infrastructure
- Support for data intensive, high performance
computing applications - Provides set of orthogonal, application
independent services - Fundamental data grid services.
- (1) Secure, reliable, efficient data transfer
- (2)Ability to register, locate and manage
multiple copies of data sets
15Globus Toolkit/Data grid
- Middleware services
- 1.GSI,provides key based authentication and
authorization services - 2.Resource management services
- 3.Mechanisms for immediate and advance
reservations of Grid resources - 4.Remote job management and information services
16Data Grid specific Services
- Data Grid services complement and build
- on the existing components of Globus Toolkit.
- 1.GridFTP transfer service
- 2.Replica management service
- 3.Higher level data replication service can use
the information service to locate the best
replica and the resource management service to
reserve variety of resources.
17GridFTP as a File Access Service
- Fundamental data access and data transport
service - Provides uniform interface to various storage
systems - GridFTP functionality includes both features
supported by the FTP standard and a number of
extensions - GridFTP can be used both to access specific data
values and to move data blobs
18Features as extension to FTP
- Authenticated third party control of data
transfers between storage servers - Parallel data transfer through FTP command
extensions and data channel extensions - Striped data transfer through multiple TCP
streams - Partial file transfer (HEP)
19Replica Management
- Responsible for managing the replication of
complete or partial copies of data sets - The services include
- 1.Creating new copies of a complete or partial
collection of files. - 2.Registering new copies in a Replica Catalogue
- 3.Allow users and applications to query the
catalogue
20Replica Management ArchitectureData Model
- Data are organized into files
- User groups file into collections
- A replica is a subset of collection that is
stored on a particular storage system - There may be multiple ,possibly overlapping
- subsets of a collection stored on multiple
storage systems in a data Grid. - Grid storage systems may use variety of
underlying storage technologies independent of
replica management. - Distinction between logical file name and physical
21Partial list of elements of Data Grid Reference
Architecture
22(No Transcript)
23THANK YOU