Gridifying the LHC Data - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Gridifying the LHC Data

Description:

Projects and job schedules. Data placement. Monitoring of. Running jobs. Available resources ... USA. UK. France. Italy. CERN Tier 1. Japan. CERN Tier 0. P. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 38
Provided by: PeterK119
Category:
Tags: lhc | data | gov | gridifying | jobs | usa

less

Transcript and Presenter's Notes

Title: Gridifying the LHC Data


1
Gridifying the LHC Data
  • Peter Kunszt
  • CERN IT/DB
  • EU DataGrid Data Management
  • Peter.Kunszt_at_cern.ch

2
Outline
  • The Grid as a means of transparent data access
  • Current mode of operations at CERN
  • Elements of Grid data access
  • Current capabilities of the EU DataGrid/LCG-1
    Grid infrastructure
  • Outlook

3
Outline
  • The Grid as a means of transparent data access
  • Current mode of operations at CERN
  • Elements of Grid data access
  • Current capabilities of the EU DataGrid/LCG-1
    Grid infrastructure
  • Outlook

4
The Grid vision
  • Flexible, secure, coordinated resource sharing
    among dynamic collections of individuals,
    institutions, and resource
  • From The Anatomy of the Grid Enabling Scalable
    Virtual Organizations
  • Enable communities (virtual organizations) to
    share geographically distributed resources as
    they pursue common goals -- assuming the absence
    of
  • central location,
  • central control,
  • omniscience,
  • existing trust relationships.

5
Grids Elements of the Problem
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual
    organizations
  • Community overlays on classic organization
    structures
  • Large or small, static or dynamic

6
Grid middleware architecture hourglass
  • Current Grid architectural functional blocks

ALICE
ATLAS
CMS
LHCb
Specific application layer
LCG
Common application layer
EU DataGrid middleware
Advanced Grid Services
Basic Grid Services
GLOBUS 2.2
OS, Storage Network services
7
Grid Middleware Cloud
Authentication, Authorization Requirement
parsing Resource matching Resource
allocation Accessibility
8
Vision of Grid Data Management
  • Distributed Shared Data Storage
  • Ubiquitous Data Access
  • Transparent Data Transfer and Migration
  • Consistency and Robustness
  • Optimisation

9
Vision of Grid Data Management
  • Distributed Shared Data Storage
  • Different architectures
  • Heterogenous data stores
  • Self-describing data and metadata

GRID
10
Vision of Grid Data Management
  • Ubiquitous Data Access
  • Global Namespace
  • Transparent security control and enforcement
  • Access from anytime anywhere, physical data
    location irrelevant
  • Automatic Data Replication and Validation

GRID
11
Vision of Grid Data Management
  • Transparent Data Transfer and Migration
  • Protocol negotiation and multiple protocol
    support
  • Management of data formats and database versions

GRID
12
Vision of Grid Data Management
  • Consistency and Robustness
  • Replicated data is reasonably up-to-date
  • Reliable data transfer
  • Self-detecting and self-correcting mechanisms
    upon data corruption

GRID
?
X
?
13
Vision of Grid Data Management
  • Optimisation
  • Customisation or self-adaptation to specific
    access patterns
  • Distributed Querying, Data Analysis and Data
    Mining

GRID
!
14
Existing Middleware for Grid Data Management -
Overview
  • Globus
  • GridFTP
  • Replica Catalog
  • Replica Manager
  • EU DataGrid
  • GDMP
  • Replica Catalog
  • Replica Manager
  • Spitfire
  • Condor
  • NeST
  • PPDG
  • Magda
  • JASMine
  • GDMP
  • SAM
  • Griphyn/iVDGL
  • Virtual Data Toolkit
  • Storage Resource Broker
  • Storage Resource Manager
  • ROOT
  • Alien
  • Nimrod-G
  • Legion

Not exhaustive
15
What you would like to see
reliable available powerful calm cool easy to use
16
Outline
  • The Grid as a means of transparent data access
  • Current mode of operations at CERN
  • Non-Grid operations
  • Grid operations
  • Elements of Grid data access
  • Current capabilities of the EU DataGrid/LCG-1
    Grid infrastructure
  • Outlook

17
Current non-Grid Operations (Oversimplified)
Fabric
Storage
18
Current non-Grid Operations
  • Planning of
  • Resources (computing and storage)
  • Projects and job schedules
  • Data placement
  • Monitoring of
  • Running jobs
  • Available resources
  • Alarms

19
(No Transcript)
20
(No Transcript)
21
Grid Testbed Today
  • Currently largest Grid Testbed EU DataGrid
  • Not a full-fledged Grid fulfilling the Grid
    vision
  • Pragmatic what can be done today
  • Research aspect trying out novel approaches
  • Operation of each Grid Site
  • Huge effort at each computing center for
    installation and operations support
  • Local user support is necessary but not
    sufficient
  • Operation of the Grid as a logical entity
  • Complex management coordination effort among
    Grid centers concerning Grid middleware updates,
    but also policies, trust relationships
  • Grid middleware support through many channels
  • Complex interdependencies of Grid middleware

See talk in the afternoon
22
Outline
  • The Grid as a means of transparent data access
  • Current mode of operations at CERN
  • Elements of Grid data access
  • Current capabilities of the EU DataGrid/LCG-1
    Grid infrastructure
  • Outlook

23
Grid Data Access ElementsStorage and Transfer
  • Grid Storage Resource Manager
  • Managed storage
  • GSI enabled
  • Mass Storage System interface
  • Grid accessible Relational Database
  • Data transfer mechanism between sites in place

Site 1
Site 2
Site 3
24
Grid Data Access ElementsI/O
  • Transfer protocols (gsiftp, https, scp)
  • File System and/or Posix I/O for direct
    read/write of files from Worker Nodes
  • SQL or equivalent interface to relational data
    from Worker Nodes

Storage
Posix or SQL Interface
Transfer Protocols
25
Grid Data Access ElementsCatalogs
  • Grid Data Location Service
  • Find location of all identical copies (replicas)
  • Metadata Catalogs
  • File/Object specific metadata
  • Logical names
  • Collections
  • Grid Database Object Catalog

catalog
26
Higher Level Data Management Services
  • Customizable pre- and post-processing services
  • Transparent encryption and decryption of data for
    transfer and storage
  • External catalog updates
  • Optimization Services
  • Location of the best replica based on access
    cost
  • Active preemptive replication based on usage
    patterns
  • Automated replication service based on
    subscriptions
  • Data Consistency Services
  • Data Versioning
  • Consistency between replicas
  • Reliable data transfer service
  • Consistency between catalog data and the actual
    stored data
  • Virtual Data Services
  • On-the-fly data generation

27
Outline
  • The Grid as a means of transparent data access
  • Current mode of operations at CERN
  • Elements of Grid data access
  • Current capabilities of the EU DataGrid/LCG-1
    Grid infrastructure
  • Outlook

28
File Names
  • GUID Global Unique IDentifier
  • guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
  • LFN Logical File Name
  • lfnpresentation
  • LCN Logical Collection Name
  • lcnstorage_workshop_presentations
  • SFN Site File Name
  • sfn//ccgridli02.in2p3.fr/edg/SE/dev/wpsix/higgs/d
    ata/123.ppt
  • TURL Transport URL
  • file///home/pkunszt/presentation.ppt
  • https//storage.cern.ch/data/pkunszt/file4256.dat
  • rfio//srm.cern.ch/castor/user/p/pkunszt/presentat
    ion.ppt
  • gsiftp//pcrd24.cern.ch/data/pkunszt/pfn10_1.ppt

29
Data Services in EDG/LCG1 Storage and I/O
  • Grid Storage Element for files
  • Understands SFNs maps into TURL
  • GSI enabled interface Storage Resource
    Managerhttp//sdm.lbl.gov/srm/documents/joint.doc
    s/SRM.joint.func.design.part1.doc
  • Support of different MSS backends
  • Support for GridFTP and RFIO (CASTOR)
  • Will be deployed this week in the EDG testbed for
    the first time.
  • GridFTP Server for files
  • only TURL (gsiftp//)
  • Also GSI enabled
  • Only FTP-like functionality, no management
    capabilities
  • Current remote I/O
  • NFS
  • RFIO
  • GridFTP

30
Data Services in EDG/LCG1 Database access
  • Spitfire
  • Thin client for GSI-enabled database access
  • Customizable, API exposed through a Web Service
    WSDL
  • Not suitable for large result sets
  • Not used by HEP applications yet

31
Data Services in EDG/LCG1 Replica Location
Service RLS
  • Distributed file catalog
  • Stores GUID ? SFNmappings
  • Stores replication metadata (e.g. file size,
    creator, MD5 checksum) on SFNs
  • Local Catalogs hold the actual name mappings
  • Remote Indices respond with the list of LRCs most
    probably having an entry on the file
  • LRCs are configured to send index updates to any
    number of RLIs
  • Indexes are Bloom Filters Implementation using
    Web Service Technology
  • Scales well

32
Data Services in EDG/LCG1 Replica Metadata
Catalog
  • Single logical service for replication metadata
  • Deployement possible as a high-availability
    service (Web service technology)
  • Possibility of synchronized data on many sites to
    avoid a single site entry point (Using underlying
    database technology)
  • Holds Logical File Name (LFN) ? GUID mappings
    (aliases)
  • Contains LCN ? set of GUIDs mapping
    (collections)
  • Holds replication metadata on LFNs, LCNs and
    GUIDs
  • Might hold a small amount of replica-specific
    application metadata O(10) items

33
Data Services in EDG/LCG1 Higher Level Services
  • EDG Replica Manager ERM
  • Coordinates all replication service
  • Replica Optimization Service ROS
  • Rely on Network Monitoring (iperf) between
    Testbed sites
  • Rely on Storage Element access cost method
    (estimated time to stage a file)
  • Summarize network costs for generic access
    requests
  • Allows Replica Manager to choose best replica

34
Replication Services Interactions
35
Outlook
  • We took only the first step on the long road to
    fulfilling the Grid vision
  • Promising initial results but a lot of work still
    needs to be done
  • Industry has only recently joined the Grid
    community through GGF industrial-strength
    middleware solutions are not available yet
  • By the end of this year well have a first
    experience with a Grid infrastructure that was
    built for production from the start (LCG1).

36
Open Grid Services Architecture
  • OGSA is a framework for a Grid architecture based
    on the Web Service paradigm.
  • Every service is a Grid Service. The main
    difference to Web Services is that Grid Services
    may be stateful services.
  • These Grid Services interoperate through
    well-understood interfaces.
  • The reference implementation, Globus Toolkit 3 is
    still in beta, the first release is expected for
    this summer.
  • Depending on the evolution until the end of the
    year, we will see whether OGSA becomes stable
    enough to be considered for integration into LCG
    next year.

37
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com