Data Management for Grid Environments By Zafar Nasir - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Data Management for Grid Environments By Zafar Nasir

Description:

Data management as part of high performance and high throughput computing. ... 1.Assesment based on existing components associated with data management. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: zn1
Category:

less

Transcript and Presenter's Notes

Title: Data Management for Grid Environments By Zafar Nasir


1
Data Management for Grid EnvironmentsByZafar
Nasir
2
(No Transcript)
3
(No Transcript)
4
Data Management for Grid Environments
  • Topics to be covered
  • Introduction motivation
  • Current Build-up
  • Data management as part of high performance and
    high throughput computing.
  • Services fundamental to Data Grid
  • Data grid services as an extension of Globus
    Toolkit.
  • GridFTP
  • Replica Management

5
Introduction and Motivation
  • Data Intensive applications
  • Two possible Approaches to identify data
    management needs in Grid environment
  • 1.Assesment based on existing components
    associated with data management.
  • 2.Assesing application needs and requirements,
    and identifying missing functionality.
  • Middle ground

6
Current Build-up
  • Data grid is being considered as next generation
    of data handling system for sharing access to
    data and storage systems with in multiple
    administrative domains
  • Use of databases with in grid applications
  • for data and metadata management
  • Adoption of Open Grid services Architecture
  • to promote integration of Grid with web services.

7
Data Management Challenges
  • Diverse usage scenarios, both updatable
  • and readable data, differing consistency
    requirements, diverse data access methods.
  • Heterogeneity at all system levels, storage
    systems, data mechanisms, and policy.
  • Performance demand associated with access,
    manipulation, and analysis of large quantities of
    data.

8
Communities Requiring Access to distributed Data
sources
  • Digital libraries provide services for
    manipulating, presenting, discovering, browsing
    and displaying digital objects.
  • Grid environments for processing distributed data
    with application involving extraction of complex
    scientific information from large collection of
    measured and computed data.
  • Persistent archives for maintaining collections
    while the technology changes.

9
High performance and high throughput computing
  • Distinction based on single/multiple
  • applications
  • Difference between business computing and
    scientific computing
  • Data management for component based environment
  • Data Grid and Computational Grid

10
Building Blocks to implement data Management
Functions
  • Local services that must be provided by a given
    storage resource.
  • Global services that need to be provided in a
    wider context
  • Each data storage and management resource must
    subscribe to a global service and must also
    support all or some of the APIs .
  • Grid-enabled diversity in operations/mechanism
  • and performance tolerance across mechanism and
    resources

11
Data management as unified process
  • Entails number of stages each having its own
    family of products and algorithms
  • 1.Management services Relates to operations
    and mechanisms offered with in each storage
    resource, and global services with which the
    resource interacts
  • 2.Support and application services Relate to
    higher level operations which undertake
    correlations and aggregations on the stored data.

12
Common Data processing and management operations
  • Data Pre-Processing and Formatting
  • Data fusion
  • Data storage, also involve data migration and
    replication between different storage media
  • Data Analysis/Provenance
  • Query estimation and optimisation

13
Data Storage standards
  • IEEE reference model for open storage systems
    interconnection (OSSI), focussed on technical
    details of mass storage systems, and contains
    specification for storage media and data
    management software.
  • ISOs Open Archival information system, aims to
    provide a framework for the operation of long
    term archives serving a particular community.

14
Data Grid Infrastructure
  • Support for data intensive, high performance
    computing applications
  • Provides set of orthogonal, application
    independent services
  • Fundamental data grid services.
  • (1) Secure, reliable, efficient data transfer
  • (2)Ability to register, locate and manage
    multiple copies of data sets

15
Globus Toolkit/Data grid
  • Middleware services
  • 1.GSI,provides key based authentication and
    authorization services
  • 2.Resource management services
  • 3.Mechanisms for immediate and advance
    reservations of Grid resources
  • 4.Remote job management and information services

16
Data Grid specific Services
  • Data Grid services complement and build
  • on the existing components of Globus Toolkit.
  • 1.GridFTP transfer service
  • 2.Replica management service
  • 3.Higher level data replication service can use
    the information service to locate the best
    replica and the resource management service to
    reserve variety of resources.

17
GridFTP as a File Access Service
  • Fundamental data access and data transport
    service
  • Provides uniform interface to various storage
    systems
  • GridFTP functionality includes both features
    supported by the FTP standard and a number of
    extensions
  • GridFTP can be used both to access specific data
    values and to move data blobs

18
Features as extension to FTP
  • Authenticated third party control of data
    transfers between storage servers
  • Parallel data transfer through FTP command
    extensions and data channel extensions
  • Striped data transfer through multiple TCP
    streams
  • Partial file transfer (HEP)

19
Replica Management
  • Responsible for managing the replication of
    complete or partial copies of data sets
  • The services include
  • 1.Creating new copies of a complete or partial
    collection of files.
  • 2.Registering new copies in a Replica Catalogue
  • 3.Allow users and applications to query the
    catalogue

20
Replica Management ArchitectureData Model
  • Data are organized into files
  • User groups file into collections
  • A replica is a subset of collection that is
    stored on a particular storage system
  • There may be multiple ,possibly overlapping
  • subsets of a collection stored on multiple
    storage systems in a data Grid.
  • Grid storage systems may use variety of
    underlying storage technologies independent of
    replica management.
  • Distinction between logical file name and physical

21
Partial list of elements of Data Grid Reference
Architecture
22
(No Transcript)
23
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com