WP2: Data Management - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

WP2: Data Management

Description:

Generic mirroring tool for any file type (read only replica) ... Categorise possible areas for optimisation: User oriented: high performance ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 18
Provided by: gavinm8
Category:

less

Transcript and Presenter's Notes

Title: WP2: Data Management


1
WP2 Data Management
  • Gavin McCance
  • University of Glasgow
  • November 5, 2001

2
Overview
  • Deliverables
  • Replication GDMP
  • Meta-data Spitfire
  • GridPP effort
  • Future work
  • Query Optimisation

3
Deliverables
  • EU DataGrid WP2 Major M9 deliverables met
  • GDMP delivered
  • Spitfire delivered
  • Architecture Document
  • http//www.cern.ch/grid-data-management

4
GDMP
http//cmsdoc.cern.ch/cms/grid/
  • Generic mirroring tool for any file type (read
    only replica)
  • Particular plug-ins for Objectivity database
    files
  • Subscription model for automatic synchronisation
    of files
  • Automatic update of replica catalogue
  • Currently uses Globus Replica Catalogue

5
GDMP
  • BrokerInfo API from WP1
  • Allows users of GDMP to obtain information from
    the job scheduler
  • Mass Storage Interface from WP5
  • e.g. Support for file staging
  • Security is provided via standard GSI (single
    sign-on)
  • Authorisation via grid mapfile
  • File transfer made using GridFTP
  • Installation RPM and tarball

6
GDMP usage
Site A
Site B
  • A,B) Start GDMP services (inetd)
  • B) Registers itself with site A
  • gdmp_host_subscribe
  • A) New files ?Register them
  • gdmp_register_local_file ltpath-to-filegt
  • This updates the local (on A) catalogue
  • A) Tell the world (well..all subscribed sites)
  • gdmp_publish_catalogue
  • Will update the import catalogue on all
    subscribed sites

7
GDMP usage
Site A
Site B
  • B) Get the new files from site A
  • gdmp_replicate_get
  • The new files will be transferred from site A ?
    site B
  • Globus replica catalogue updated
  • Filters so you only get files you want
  • CRC checking of file transfer

8
Spitfire
http//hep-proj-spitfire.web.cern.ch/hep-proj-spit
fire/
  • Provides grid enabled access to any relational
    database
  • SQL Database Service
  • Storage of general meta-data
  • Service Index soon
  • Secure access via GSI (single sign-on)
  • Installation RPM and tarball

9
Spitfire
JAVA Servlet based
  • Allows any HTTP compliant system e.g.
    Web-browsers / standard C HTTP libraries to
    access any relational database across the grid



Oracle PostgreSQL
Grid Security
Standard communication protocols (XML over HTTPS)
SQL Database Service (Spitfire)
10
Spitfire security
  • Authentication is currently provided
  • Standard user server grid certificates
  • For both application programs and web browsers
  • Authorisation matrix coming soon
  • Will map grid identity to role(s)
  • Reader, info-update, manager
  • Roles will then map to a given database
    connection with given permissions on a database
  • Eg. query-only, insert, update, create new tables

11
Spitfire
  • Easy to install
  • Good documentation
  • Ready to run examples
  • For grid-based meta-data catalogue needs..
  • we need feedback!

12
WP2 GridPP Effort
  • Based at Glasgow
  • Effort will focus on primarily the query
    optimisation task of WP2
  • 1 PhD student, 1.5 RA
  • Continuing effort in development of Spitfire and
    related applications
  • 0.7 RA

13
Future Spitfire work
  • Look at common ground between WP2 and WP3
  • Spitfire and R-GMA?
  • Security
  • Authorisation mechanisms
  • Other spitfire applications
  • Service Index, Replica Catalogue
  • Work on scaleable architectures
  • Common with e.g. replica catalogue work

14
Query Optimisation work
  • Categorise possible areas for optimisation
  • User oriented high performance
  • Minimising cost for specific job
  • Grid oriented high throughput
  • Maximise efficient usage of resources
  • Site oriented local policy
  • Respond to specific site policies / requirements
  • Much preliminary work done!
  • Workshop in December 2001

15
Query Optimisation
  • Short term
  • Data Access optimisation
  • Replica Optimiser component
  • How long will it take to get the data here?
  • Developing and evaluating appropriate algorithms
    for working this out and choosing best replica

16
Query Optimisation
  • Modelling and Simulation
  • Best not to test out the more crazy algorithms on
    the experiment testbed
  • Work underway with MONARC tool
  • Evaluating suitability as simulation tool for
    this particular work
  • Integrate into the QO work

17
Summary
  • Major deliverables for M9 met
  • GDMP and Spitfire
  • GridPP will concentrate effort on Query
    Optimisation task of WP2
  • continued Spitfire development
  • Work already underway
Write a Comment
User Comments (0)
About PowerShow.com