Integration of Large File Store and HSM - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Integration of Large File Store and HSM

Description:

particle passes through the detector, collides with atoms and kicks out electrons. ... Programming Interface to build custom application for HSM access or ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 21

Provided by: home7

Category:

more less

Transcript and Presenter's Notes

Title: Integration of Large File Store and HSM

1
Integration ofLarge File Store and HSM

Michael Ernst
Fermilab
May 29, 2003

2
The Place
The Machine
The Detectors
3
A Stack of Accelerators
4
(No Transcript)
5
particle passes through the detector, collides
with atoms and kicks out electrons.     The
electrons are attracted to the nearest positive
wire.     The electric pulse on the wire is
amplified and sent to a computer.     From
the position of the wire and the arrival time of
the signal, the computer reconstructs the
position of the collision(s).
6
The CMS Experiment
7
CMS has adopted a distributed computing model to
perform data analysis, event simulation, and
event reconstruction in which two-thirds of the
total computing resources are located at
regional centers.
The unprecedented size of the LHC collaborations
and complexity of the computing task requires
that new approaches be developed to allow
physicists spread globally to efficiently
participate.
8
Shared Resources (ATLAS and CMS)
9
Size of US CMS Tier1 Regional Center
10
Data Handling Problems and Opportunities

Data volumes are huge, size of individual data
product 1kB-1MB
Workload are dominated by reading
Stepwise refinements of algorithms and event
selection functions leads to running over the
same input dataset
Because of the large data reduction factor in CMS
physics analysis, the input datasets in such
analysis efforts will represent very sparse
subsets of the set of events over a period of
detector running (with sparseness increases with
progressing analysis effort).
System that uses fixed partitioning of data
product values over large files stages all data
needed by staging all large files is too
inefficient
Leads to creation of new files (copy sparse
subset of data product values into new files
could be as bad as doing it manually)

11
CMS is working on

CMS is currently in the process of developing the
Data Model
Data Storage and Data Access are the most
demanding problems
The choice of OS and Persistency solution can
strongly influence the hardware needs (and the
human resources required to support )
Moving away from a Persistency Model based on
OODB
Problem mapping Objects to Files
New Model should be developed with focus on
optimization of the underlying storage
architecture and storage technology
Classic Filesystems at the limit of their scaling
capabilities

12
Requirements

Offline (Batch) Analysis of MC Data
Total Volume 10TB now growing to 100TB in 2005
Subset O(1lt sz TB lt10) analyzed at given time
gt 500 simultaneous streams _at_ O(0.1lt bw MB/s
lt1)/stream
Interactive Analysis
Space/Bandwidth Ratio 2 TB / 200
300 MB/s
100 Physicists working simultaneously
Analysis performed on Desktop w/100BaseT
connection

13
HSM
10TB (dCache)
Production Analysis Batch
- shared work space
AFS

user home areas
executables

Interactive
14
HSM
100TB (dCache) Farms Desktops
Production Analysis Batch

user home areas
executables
shared workspace

NAS Server
Interactive
15
User access to FNAL Facilities

Large Filestore (Caching)
Objects
Network
HSM
16
System Architecture
PRODUCTION CLUSTER (150 - 1500 Nodes)
ESNET (OC12)
(DF to StarLight)
CISCO 6509
US-CMS TESTBED
RD

HSM (17 DRIVES, shared)

NAS
DCACHE (gt 7 TB)
USER ANALYSIS
17
Unified Storage Interface
HSM
CMS- specific
18
Integration of Large Fileserver with HSM

Required Fileserver Components
Internal Locking, handle file states, state
changes and conflicts (migration and client
access)
Separation of File Data Management and File
Namespace Management (Metadata Handling)
Ability to handle generic file metadata for any
file object
Store bitfile-id and other data in persistent
relation to the file object
Support for versioning of migrated data
Space Management Components
Agent looking for candidates to be migrated
(control the migration/recall operations)
Admin Interface to control the migration behavior
(rule based low/high watermark based system)

19
Integration of large Fileserver with HSM

Interface Scenarios
Scenario 1
File Copy style application internal to
Fileserver interfacing with HSM
POSIX compliant I/O calls
TCP Data Connections between Fileserver and HSM
UDP/TCP/RPC based control connections (depending
on HSM)
Likely that requirements demand a POSIX compliant
OS
Fileserver Core needs to support following
Components/Services
IO to/from the file data to be migrated/recalled/r
emoved
Transparent to end-user
File metadata is not affected by migration
operation (unchanged)
Access to Metadata related to File Object
Strict relation between File Object and HSM
Metadata is
managed by the Fileserver
Components
Network programming interface (i.e. socket
interface, ONC RPC)
HSM requirements
Control and Data Communication through standard
network protocols (i.e. TCP/IP)

20
Integration of large Fileserver with HSM

Interface Scenarios
Scenario 2
Application accessing the HSM runs outside the
fileserver
All Communication is done through the network.
Data and Metadata access through the supported
Network FS Protocols (i.e. NFS)
Requirements on Fileserver
Export of a separate namespace which allows
metadata access (virtual
files) and
access to 'true' file object metadata (i.e. the
real filesize) in contrast to the unchanged file
metadata the user sees on the client system
Ability to trigger store/restore/remove
operations (i.e. through
special
files, file contents) and pass the
result/completion code back to
the
fileserver
Requirements on HSM
Programming Interface to build custom application
for HSM access or
Toolkit supporting the requested functionality
(might require 'glue' code (i.e. shell-script,
TCL, Perl etc.)