Data Management GridPP and EDG - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Data Management GridPP and EDG

Description:

Data Management. GridPP and EDG. Gavin McCance. University of ... Site B' subscribes to. site A's files. A' produces new file B' will be notified of this ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 34
Provided by: grid8
Category:

less

Transcript and Presenter's Notes

Title: Data Management GridPP and EDG


1
Data ManagementGridPP and EDG
  • Gavin McCance
  • University of Glasgow
  • May 9, 2002

http//www.gridpp.ac.uk/datamanagement http//cern
.ch/grid-data-management
2
Overview
  • Status of data management work
  • Products delivered to 1.2
  • GDMP 3.0
  • Reptor replica manager
  • Spitfire
  • Optor grid simulation
  • Whats currently available and future plans

3
WP2 Data Management
Work is done within the EDG WP2 team (based in
CERN)
  • Replication
  • Replica catalogue
  • Replica manager
  • Query Optimisation
  • Grid replica optimisation
  • Meta-data management
  • Secure, transparent access to meta-data
  • Service discovery

Direct UK involvement
4
General Status
  • Deliverables on target
  • Major software released for 1.2
  • UK manpower based at Glasgow
  • 2.5 RAs, Me, Will Bell, Paul Millar (50)
  • 1 PhD student, David Cameron
  • 1 more student to come in September

5
File Replication
File-1
LFN
  • Requires replica catalogue or replica location
    service
  • Keeps track of the mapping between logical file
    name and physical file names
  • Requires replica manager or replica management
    service
  • High level tool to actually do the replication
    and manage what files are being replicated

Paris
File-1
Chicago
Glasgow
File-1
File-1
6
File Replication
  • Current replication functionality provided by
    GDMP 3.0 new for 1.2 release!
  • Used for mirroring of storage elements
  • Implements subscription based replication model
    with security, and updates the Globus replica
    catalogue

7
GDMP 3.0
Site A
Site B
  • Site B subscribes to site As files
  • A produces new file B will be notified of
    this
  • B then starts transfer of new files from A
  • Replica catalogue at B is updated to reflect
    new file replica.

8
GDMP 3.0
  • Changes w.r.t. 2.
  • New security model host certificates
  • Server delegation, i.e. accounts on SE not
    necessarily required
  • Client-only install possible
  • Basic space management
  • Stand-alone server option
  • unsubscribe option

9
GDMP 3.0 status
  • Final version of GDMP released for 1.2
  • For future, GDMP will be absorbed into the
    Replica Manager Service which will offer richer
    functionality
  • SRPM, RPM, tarball, User Guide, Quick Config for
    EDG SEs
  • http//cmsdoc.cern.ch/cms/grid/

10
Replica Location Service
  • Current Globus replica catalogue is LDAP based
  • To be replaced with new GIGGLE framework
    Replica Location Service
  • Joint EDG WP2 / Globus / PPDG project
  • Trade-offs global consistency, space, query /
    update overhead, reliability

11
RLS model
  • Reliable local state
  • Relaxed global consistency
  • Soft state updates to global index nodes permits
    graceful behaviour in face of network problems
  • Secure access
  • Implemented as web service

12
Hierarchical indexing. The higher- level RLI
contains pointers to lower-level RLIs or LRCs.
RLI
RLI Replica Location Index
LRC Local Replica Catalog
RLI
RLI
LRC
LRC
LRC
LRC
LRC
Storage Element
Storage Element
Storage Element
Storage Element
Storage Element
13
Scalable, reliable
  • LFN Namespace partitioned among RLIs
  • Redundant RLIs for reliability
  • Lossy compression
  • Higher level RLIs may lose accuracy about mappings

14
RLS status
  • Currently Alpha for developers
  • http//cern.ch/grid-data-management/replica-locati
    on-service/RLS.html
  • New version will be progressively integrated with
    other replication software.
  • Testbed deployment in September release

15
Replica Management Service
  • Web Service under development (Reptor)
  • Will absorb GDMP functionality and extend it
  • Will use the Replica Location Service
  • Two facets
  • Core Replica Management API
  • Optimisation API

16
Core Reptor API
  • Similar to GDMP API
  • registerEntry
  • copyFile
  • copyAndRegisterFile
  • replicateFile
  • deleteFile
  • listReplicas

17
Interactions with SE
  • Defined file types

18
RMS Current Status
  • Testbed can use GDMP for 1.2
  • Defined Reptor API currently wraps the Globus
    Replica Manager
  • Will be developed progressively
  • Full version on testbed in September
  • Technical reports http//cern.ch/grid-data-manage
    ment/publications.html

19
Grid Query Optimisation
  • Best place for a job?
  • Joint WP1 / WP2 question
  • Approach 2-Phase Optimisation
  • Phase 1 Find suitable CE for job execution given
    distribution of files it will access
  • Phase 2 Re-optimise file access during job
    execution (due to dynamic nature of Grid, the
    resource status changes over time)

20
Optimisation API
  • initFilePrefetch(LFN, CE, protocol, fraction)
  • cancelFilePrefetch(LFN, CE)
  • getBestFile(LFN, protocol, fraction)
  • getNetworkCosts(SE1, SE2, filesize, protocol)
    from WP7
  • getIOCosts(SE, PFN) from WP5

21
Grid Replica Optimisation
  • Controlled intelligent replication to optimise
    grid over the longer term
  • Collect getBestFile requests
  • Intelligence based on algorithms
  • Test replication algorithms on data-centric grid
    simulator

22
Optor replica optimiser simulation
  • Simulate prototype Grid
  • Input site policies and experiment data files.
  • Introduce replication algorithm
  • Files are always replicated to the local storage.
  • If necessary oldest files are deleted.

23
Optor first results
Even a basic replication algorithm significantly
reduces network traffic and program running times.
New economics-based algorithms under
investigation!
http//ppewww.ph.gla.ac.uk/ScotGRID/Optor
24
Meta-data Management
  • Spitfire v1.1.0 delivered
  • A grid enabled database service
  • Grid enabled front end to any type of RDBMS
  • Examples
  • Grid meta-data replica catalogue, service
    registry
  • Application meta-data experimental data
    catalogues, calibration data

25
V1.1.0 XSQL Spitfire
  • CURRENT (v1.1.0) is based on XSQL templates on
    the server, e.g.

ltroleRead-only/gt ltquerygt SELECT FILENAME
FROM HFS_DATASET WHERE RNNO_at_run AND
TRIGGER_at_trig AND STATUS_at_stat lt/querygt
File URL http//filecat1.atlas.cern.ch/hfs/findD
ataSet.xsql
26
V1.1.0 Spitfire client
  • Any HTTP client either your own app, or a
    web-browser form
  • POST an HTML FORM to
  • http//filecat1.atlas.cern.ch/hfs/findDataSet.xs
    ql with parameters run25555, trighighlumi,
    statgood
  • The operation is made on the database, and the
    result send back to the client

27
Security Mechanism
Servlet Container
SSLServletSocketFactory
RDBMS
Trusted CAs
TrustManager
Revoked Certsrepository
Security Servlet
ConnectionPool
Authorization Module
Does user specify role?
Role repository
Translator Servlet
Role
Connectionmappings
Map role to connection id
28
V1.1.0
  • V1.1.0 available for 1.2 release now!
  • SRPM, RPM, tarball installation
  • User / Admin / Quick Install guides
  • http//cern.ch/hep-proj-spitfire

29
New spitfire client (dev)
  • Users can use either this or v1.1.0 static (XSQL
    template based) functionality
  • A database client API has been defined
  • Will implement as grid service using standard web
    service technologies

30
Client side API to access remote database
  • DB Admin
  • Create(), Drop(), Alter() Table, Database
  • DB Core functionality
  • Insert(), Update(), Delete(), Select()
  • DB Role admin
  • Secure, role based authorisation
  • DB Information
  • Schema, Quotas, Disk space

31
Extra functionality
  • To be developed..
  • Distributed querying
  • Replication of meta-data
  • Automated expiration and cleanup
  • Discussions with UK DBTF and GGF Database Group

32
Service Index
  • How do I find a specific grid service?
  • E.g. replica location server, image database,
    information service
  • XML Service description
  • What, where, attributes, how to contact.
  • Scalable architectures for querying this
    developed
  • Service index web service
  • W. Hoscheks thesis and paper (WP2_at_CERN)
  • API developed

33
More Info
  • More information available at

http//www.gridpp.ac.uk/datamanagement http//cer
n.ch/grid-data-management
Write a Comment
User Comments (0)
About PowerShow.com