GGF Data Grid Interoperability Demonstration - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

GGF Data Grid Interoperability Demonstration

Description:

Demonstrated on a state department collection of communiques about Amelia Earhart ... to 43 files in the Amelia Earhart collection on the UERJ-HEPGrid in ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 16
Provided by: ggf2
Learn more at: http://www.ggf.org
Category:

less

Transcript and Presenter's Notes

Title: GGF Data Grid Interoperability Demonstration


1
GGF Data Grid Interoperability Demonstration
  • Organizers Erwin Laure (Erwin.Laure_at_cern.ch)
  • Reagan Moore (moore_at_sdsc.edu)
  • Arun Jagatheesan (arun_at_sdsc.edu) - grid
    coordination
  • Sheau-Yen Chen (sheauc_at_sdsc.edu) - data grid
    administrator
  • Chien-Yi Hou (chienyi_at_sdsc.edu) - collection
    administrator
  • Goals
  • Demonstrate federation of 17 SRB data grids
    (shared name spaces)
  • Demonstrate replication of a collection
  • Participants (19 data grids)
  • APAC Australia Stephen McMahon
    stephen.mcmahon_at_anu.edu.au
  • ASGC Taiwan Eric Yen, Wei-Long Ueng
    wlueng_at_twgrid.org
  • ChinaGrid China Li Qi quick.qi_at_gmail.com
  • DEISA-Italy Giuseppe Fiameni
    g.fiameni_at_cineca.it
  • IB-New Zealand Daniel Hanlon
    d.j.hanlon_at_dl.ac.uk
  • IB-UK Daniel Hanlon d.j.hanlon_at_dl.ac.uk
  • IN2P3-France Jean-Yves Nief nief_at_cc.in2p3.fr
  • KEK- Japan Yoshimi Iida yoshimi.iida_at_kek.jp
  • LCDRG-US Chien-Yi Hou chienyi_at_sdsc.edu
  • NCHC Taiwan Hsu-Mei Chou hmchou_at_nchc.org.tw

2
Intellectual Property Policy
  • I acknowledge that participation in GGF18 is
    subject to the GGF Intellectual Property Policy.
  • Intellectual Property Notices Note Well All
    statements related to the activities of the GGF
    and addressed to the GGF are subject to all
    provisions of Section 17 of GFD-C.1 (.pdf), which
    grants to the GGF and its participants certain
    licenses and rights in such statements. Such
    statements include verbal statements in GGF
    meetings, as well as written and electronic
    communications made at any time or place, which
    are addressed to the GGF plenary session,
  • any GGF working group or portion thereof,
  • the GFSG, or any member thereof on behalf of the
    GFSG,
  • the GFAC, or any member thereof on behalf of the
    GFAC,
  • any GGF mailing list, including any working group
    or research group list, or any other list
    functioning under GGF auspices,
  • the GFD Editor or the GWD process
  • Statements made outside of a GGF meeting, mailing
    list or other function, that are clearly not
    intended to be input to an GGF activity, group or
    function, are not subject to these provisions.
  • Excerpt from Section 17 of GFD-C.1 Where the GFSG
    knows of rights, or claimed rights, the GGF
    secretariat shall attempt to obtain from the
    claimant of such rights, a written assurance that
    upon approval by the GFSG of the relevant GGF
    document(s), any party will be able to obtain the
    right to implement, use and distribute the
    technology or works when implementing, using or
    distributing technology based upon the specific
    specification(s) under openly specified,
    reasonable, non-discriminatory terms. The working
    group or research group proposing the use of the
    technology with respect to which the proprietary
    rights are claimed may assist the GGF secretariat
    in this effort. The results of this procedure
    shall not affect advancement of document, except
    that the GFSG may defer approval where a delay
    may facilitate the obtaining of such assurances.
    The results will, however, be recorded by the GGF
    Secretariat, and made available. The GFSG may
    also direct that a summary of the results be
    included in any GFD published containing the
    specification. GGF Intellectual Property
    Policies are adapted from the IETF Intellectual
    Property Policies that support the Internet
    Standards Process.

3
GIN - Two Approaches
  • Virtualize the storage resource
  • Provide a standard interface to the storage
    system for access
  • Storage Resource Manager
  • Asynchronous interface to storage
  • Virtualize the shared collection
  • Manage the properties of a shared collection
    independently of the multiple storage systems
    where it is distributed
  • Storage Resource Broker
  • Collection management
  • Federation of independent collections

4
SRB Data Grid Federation Status
5
Data Grid Federation
  • Builds on
  • Registry for data grid names - ensures each data
    grid has a unique identity
  • Trust establishment - explicit registration
    command issued by the data grid administrator of
    each data grid
  • Peer-to-peer server interaction - each SRB server
    can respond to commands from any other SRB
    server, provided trust has been established
    between the data grids
  • Administrator controlled registration of name
    spaces - each grid controls whether they will
    share user names, file names, replicate data,
    replicate metadata or allow remote data storage
  • Shibboleth style user authentication - a person
    is identified by
  • /Zone-name/user-name.domain-name.
  • Authentication is done by the home zone. No
    passwords are shared between zones.
  • Local authorization - operations are under the
    control of the zone being accessed, including
    controls on access to files, storage resources,
    metadata and user quotas.

6
Federation Between Data Grids
Data Access Methods (Web Browser, Scommands,
OAI-PMH)
Data Collection B
Data Collection A
  • Data Grid
  • Logical resource name space
  • Logical user name space
  • Logical file name space
  • Logical context (metadata)
  • Control/consistency constraints
  • Data Grid
  • Logical resource name space
  • Logical user name space
  • Logical file name space
  • Logical context (metadata)
  • Control/consistency constraints

Access controls and consistency constraints on
cross registration of name spaces
7
Challenge - Replicate a Collection
  • Replicate files in a collection
  • Demonstrated at GGF17
  • Replicate metadata associated with a shared
    collection
  • Authenticity metadata - describe provenance of
    file
  • Integrity metadata - state information such as
    checksums, access controls
  • SRB information synchronization command
  • Szonesync.pl -d -z remotezone
  • Synchronize data information with zone
    remotezone
  • Szonesync.pl -u -z remotezone
  • Synchronize user information with zone
    remotezone
  • Szonesync.pl -r -z remotezone
  • Synchronize user and resource information with
    zone remotezone

8
Collection Management
  • Metadata extraction - user-defined metadata
  • Execute remote procedure to extract metadata from
    a file
  • Load the extracted metadata into the remote zone
    MCAT catalog
  • Demonstrated on FITS astronomy image
  • Images provided by Irene Barg - NOAO
    (noao-ls-t3-z1 data grid)
  • Created parsing template to extract metadata
    attributes from FITS header
  • Modified SRB to support extraction of multiple
    versions of the same metadata attribute from
    large files
  • Executed commands on the /GGF-RNP data grid in
    Brazil
  • Extracted 183 metadata attributes from a FITS
    header
  • ./Sufmeta ct422131.fits
  • DTPI 'Christopher Stubbs'
  • 89 DTPIAFFL 'University of Washington'
  • 90 DTTITLE 'A Next Generation Microlensing
    Survey of the LMC'
  • 91 DTACQUIS 'ctioa8.ctio.noao.edu'
  • 92 DTACCOUN 'mosaic '
  • 93 DTACQNAM '/ua00/mosaic/tonight/sm84.051011_05
    16.100.fits'
  • 94 DTNSANAM 'ct422131.fits '

9
Collection Management
  • Metadata hierarchy - extensible schema
  • Create additional tables in MCAT catalog to
    support schema extension
  • Load a metadata hierarchy into the remote zone
    MCAT catalog
  • Demonstrated on a state department collection of
    communiques about Amelia Earhart
  • Collection provided by Mark Conrad - NARA
    (LCDRG-GGF data grid)
  • Created scripts to add 70 tables to the MCAT
    catalog
  • Created scripts to load the Life Cycle Data
    Requirements Guide metadata into MCAT
  • Added LCDRG metadata hierarchy to 43 files in the
    Amelia Earhart collection on the UERJ-HEPGrid in
    Brazil
  • Queried the metadata hierarchy

10
Information Management
  • Squery -N LCDRG_object -S LCDRG_object.object_data
    _id
  • --------------------------- RESULTS
    ------------------------------
  • data_id 503
  • --------------------------------------------------
    ---------------
  • data_id 558
  • --------------------------------------------------
    ---------------
  • Squery -N LCDRG_recordgroup -S LCDRG_recordgroup.r
    ecordgroup_grno -N LCDRG_object
    LCDRG_object.object_data_id 558
  • --------------------------- RESULTS
    ------------------------------
  • grno 59
  • --------------------------------------------------
    ---------------

11
Challenges
  • Multiple software versions
  • All 3.4 versions interoperate
  • Using both SRB 3.4.0, SRB 3.4.1, SRB 3.4.2, SRB
    3.4.2-P
  • Management of firewalls
  • Require ports opened to allow control messages to
    be exchanged
  • Support client-initiated and server-initiated
    parallel I/O and bulk load operations for data
    and metadata transport
  • Network tuning
  • Need to ensure system buffer size, TCP window
    size set for intercontinental latencies
  • Need to specify 6-16 parallel I/O streams as
    default
  • Management of shared collection
  • Decide what will be shared
  • Create logical resource name on which will
    support shared data

12
Challenges
  • Port of SRM interface as client API to a SRB
    collection
  • Established as a collaboration
  • Wayne Schroeder schroede_at_sdsc.edu
  • Wei-Long wlueng_at_twgrid.org
  • Eric Yen eric_at_sinica.edu.tw
  • Ethan Lin ethanlin_at_gate.sinica.edu.tw
  • Abhishek Singh Rana rana_at_fnal.gov
  • Wiki created at
  • http//www.sdsc.edu/srb/index.php/SRM-SRB
  • Initial draft document published on high-level
    approach

13
Demonstration - Web Browser
  • https//srb.npaci.edu/mysrb331reagan.shtml
  • Log onto shared collection at SDSC
  • Collection defined by port number and host
    machine
  • Differentiate between local collection and shared
    collection
  • Local collection - /home/user.domain/collection
  • Shared collection - /Zone/home/user.domain/collect
    ion
  • Web browser displays status of federated zone
  • Select remote data grid by clicking on zone
  • Browse metadata, list files, perform authorized
    operations

14
Demonstration - Shell Commands
  • SRB shell commands located in ./SRB3_4_1/utilities
    /bin
  • ./Sinit / connect to default collection
    specified in .srb environment file
  • authenticate yourself with
    challenge- response or GSI certificate /
  • ./Sls / list collections and files /
  • ./Scd collection-name / change to another
    collection /
  • ./Sufmeta -e stylesheet file / extract metadata
    from a file /
  • ./Smeta file-name / list user-defined metadata
    /
  • ./Squery -N namespace -S attributename
  • / query extensible schema /

15
Preservation Application
  • More detailed information provided in
  • Preservation Environments research group
  • Tuesday 1000 - 1130
  • Room 158 A-B
  • Will also discuss next generation data management
    systems in PERG session
  • Rule-oriented data systems - iRODS
  • Support mapping of management policies to rules
    that are executed by the data management system
  • Assertions on integrity and authenticity
  • Assertions on data management - replication, data
    distribution
  • Assertions on access controls and display
  • http//www.sdsc.edu/srb/future/index.php/Main_Page
Write a Comment
User Comments (0)
About PowerShow.com