WP2: Data Management PowerPoint PPT Presentation

presentation player overlay
1 / 22
About This Presentation
Transcript and Presenter's Notes

Title: WP2: Data Management


1
WP2 Data Management
  • Tutorial for PM9 Release
  • RAL 31st January 2002
  • Gavin McCance
  • University of Glasgow

2
PM9 Release
  • Grid Data Mirroring Package (GDMP)
  • Basic replica management tool
  • How-to
  • Spitfire
  • Basic meta-data management prototype

Previously called Grid Data Management Pilot
3
GDMP
  • Useful documentation and reference
  • WP2 web page
  • http//grid-data-management.web.cern.ch/grid-data-
    management
  • GDMP page
  • http//cmsdoc.cern.ch/cms/grid
  • GDMP 2.0 manual
  • GDMP User Instructions for the Testbed

4
GDMP
  • Version 2.0 (not the 2.0alpha)
  • Client-Server system for replicating files from
    one grid site to another
  • Subscription mechanism allows for automatic
    replication of files
  • Interfaced to the Grid Replica Catalogue
    (currently Globus MDS Replica Cat)

5
GDMP
  • Any file type can be transferred
  • Replication mechanisms assume read-only files
    i.e. no update synchronisation
  • Particular plug-in for Objectivity
  • Handles update of local database

6
GDMP Requirements
  • Tested on Linux RH6.1 and RH6.2
  • Globus Toolkit 2.0 Alpha 9
  • i.e. the EDG PM9 special release
  • GridFTP (NOT gsi-wuftp !)
  • g from gcc-2.91.66 or gcc-2.95.2
  • RPM v3 or higher
  • Or.. Usual GNU make collection

7
THE GDMP EDG PM9 RPM
One of these is not an acronym
  • Recommend this for UK testbed
  • DataGrid WP6 site
  • (Or.. Get original RPM from GDMP site)
  • Manual gives RPM, SRPM, and tarball installation
    instructions
  • All paths relative to GDMP_INSTALL_DIR
  • /opt/edg in testbed release
  • Or the path from ./configure --prefix

8
Configuration
  • Full details in manual
  • Edit /opt/edg/etc/gdmp.conf. Set
  • GDMP_INSTALL_DIR
  • GDMP_LOCAL_HOST PORT
  • GLOBUS_LOCATION
  • If used OBJECTIVITY stuff binaries,boot file
    path, root directory

9
RepCat Configuration
  • http//www.globus.org/datagrid/deliverables/replic
    aGettingStarted.pdf
  • GDMP_REP_CAT_URL
  • ldap//host2/rcreplica-catalogue,
  • GDMP_REP_CAT_MANAGER_DN
  • cnRCManager, dchost2, dccern, dcch
  • GDMP_REP_CAT_MANAGER_PWD
  • secret

10
Inetd Configuration
  • As root
  • configure_gdmp ltinstall-dirgt ltuseridgt ltportgt
  • Updates /etc/services, /etc/inetd
  • Request served as gdmp_server using
  • GDMP_INSTALL_DIR/utils/gdmp_server_start
  • User manual Section 3.4 and Appendix A.

11
Server cert
  • GDMP requires a CA-signed server certificate to
    identify itself
  • Default issue is one from CERN
  • Not really secure, since anyone can download GDMP
    RPMs.
  • Get a new one from your CA if being used for
    production

12
GDMP client usage
Site A
  • A) su gdmp (or whatever user)
  • Currently client applications should run as same
    user as the server (given in /etc/inetd)
  • A) grid-proxy-init
  • B) Add gdmp server DN cert to mapfile!
  • A) setenv GDMP_CONFIG_FILE /opt/edg/etc/gdmp.conf
  • A) gdmp_ping hostb.ac.uk2000
  • The GDMP server on hostb.ac.uk2000 is listening

Site B
13
GDMP usage
Site A
Site B
  • A,B) Start GDMP services (inetd)
  • B) Registers itself with site A
  • gdmp_host_subscribe hosta.ac.uk2000
  • A) New files ?Register them
  • gdmp_register_local_file -d /pool/files/
  • This updates the local GDMP internal catalogue
    (on A)

14
GDMP usage
Site A
Site B
  • A) Tell the world (well..all subscribed sites)
  • gdmp_publish_catalogue
  • Will update the import catalogue on all
    subscribed sites eg. The import catalogue on site
    B
  • By default, it will also publish the GDMP
    internal catalogue on the Globus Replica Catalogue

15
GDMP usage
Site A
Site B
  • B) Get the new files from site A (and from any
    other sites to which B may be subscribed)
  • gdmp_replicate_get
  • Any new files on A will be transferred from site
    A ? site B
  • Put in GDMP_FLATFILE_ROOT_DIR as specified by
    gdmp.conf
  • By default, Globus Replica Catalogue is updated

16
Staging Support
  • Support for staging to and from MSS
  • GDMP server at B will be notified if there is
    some staging to be done at A and will drop
    connection. When staging is complete, B is
    notified by A, and can re-request the transfer.
  • GDMP section 7.

17
Automation
  • Transfer waits until site B runs
    gdmp_replicate_get
  • However, when import catalogue is updated on B, a
    script is called GDMP_NOTIFICTION_FOR_PUBLISH_CATA
    LOGUE
  • An example would be to run gdmp_replicate_get so
    the transfer happens automatically

18
RepCat C API
  • Described in Appendix D.
  • WP2 working with Globus on new distributed
    Replica Catalogue model
  • GIGGLE framework
  • Will attempt to keep existing APIs as much as
    possible!

19
Meta-data
  • Spitfire is a basic prototype
  • Purpose is the allow secure access to any SQL
    database over the grid
  • Secure access via HTTP(S)
  • Standard access (ie. Dont need to know what the
    backend DB is)

20
Meta-data
  • Current implementation is via XSQL templates
  • http//hep-proj-spitfire.cern.ch/hep-proj-spitfire
  • Server side XSQL templates are filled-in by
    attributes from an http GET or POST
  • Example..

21
Meta-data
  • Template metatrig.xsql on server
  • select LFN from FileMetaData where TRIGGER_at_trig
    and RUNNOgt_at_runmin and RUNNOlt_at_runmax
  • An HTTP(S) request (eg. from a browser form)
  • http//meta1.atlas.rl.ac.uk/metatrig.xsql?triglow
    1-a25runmin1100runmax1500
  • Will return an XML or HTML encoded list of
    matching Logical File Names.
  • Good if you have a specific problem now!

22
Meta-data
  • Must maintain templates
  • Dependence on Oracle XSQL code
  • No client side APIs defined yet
  • Its being rewritten for next release
  • Initially for new replica catalogue
  • Proper authorisation meta-data distribution
    client side API
Write a Comment
User Comments (0)
About PowerShow.com