Title: Using distributed resources in STAR
1Using distributed resources in STAR
An Overview of our tools, architecture
and experience
Jérôme Lauret, Gabriele Carcassi, Richard
Casella, Eftathiadis Eftratios, Eric Hjort, Doug
Olson, Jeff Porter
Jérôme Lauret, CHEP03, March 2003
2STAR overview
- The STAR collaboration
- 450 members, 45 institutions (3 more in a week
... ) - Large data sets 33 M events last year (AuAu
200 AGeV), 55 M events this year so far and only
half way through (planned 150 M) - Represents Pbytes of data per year (total) and
growing ... - A demanding user community data mining done in
a few month ... also want to cross-compare with
past years data sets ... - 6 to 7 FTE for the entire computing staf
- Data-reduction pass1 between 5 to 10 reduction
, micro-DST extra gain of 2 to 5 typical pass 1
storage size 13 TB (last year) - About 20 TB of centralized storage available .
- 2 main facilities / processing centers RCF/BNL
and PDSF/NERSC
Jérôme Lauret, CHEP03, March 2003
3STAR overview
137,000 pads 70 M pixel If Zero suppressed
10 M pixels Event size comparable to an image
taken by a Digital Camera ...
Jérôme Lauret, CHEP03, March 2003
4Getting to Distributed Disk ...
- The situation then ...
- NFS resident storage did not scale well (
issue) - IO bottlenecks reduces overall farm utilization
efficiency - Some storage on distributed disk (DD local to
our processing nodes) 15 TB actually ...
very tempting ... - Not GRID but
- Problem(s) are the same ...
- Sociological accessing data not visible from
the user stand point (node are non-interactive).
Cannot use find, wild card ls or recursive
scanning script ... - Adiabatic changes needed (Physics first)
- Infrastructure accurate inventory, easy access
with minimal amount of knowledge from user, rapid
/ real time deployment of data set - Analysis assumed statistically and data sets
driven
Jérôme Lauret, CHEP03, March 2003
5Where we need(ed) to go ...
- Needed
- A good and efficient file Catalog / Replica
Catalog - A user interface to submit jobs on the Grid ...
- Maximizing the usage of the 2 STAR main
processing sites (implies resource brokering,
monitoring ) - Other components needed (VO management, )
- The tools to distribute the data around but also
to make the result available back to the users
HRM / DRM
Jérôme Lauret, CHEP03, March 2003
6What are RMs ?
Hierarchical Resource Managers / Storage Resource
Managers Grid middle ware developed by the
Scientific Data Management Resource group (SDMR)
in collaboration with STAR Includes DRM, TRM,
HRM Software handling the data transfer for you
... http//sdm.lbl.gov/indexproj.php?ProjectID
SRM Talk by Alex Sim 450 Center Hall
115 Will not go through the details ... It works
great !! We like it ... Looking for improving
and expanding transfer capabilities ... Poster
P6 Design of a High Performance Data Replication
in the Grid Environment for the STAR
Collaboration
Dantong Yu
Jérôme Lauret, CHEP03, March 2003
7File Cataloging
- STAR had a file Catalog ... in flat table format
MySQL back-end - Currently, 1.7 M entries (logical names) and 2.4
M replicas - Queries became problematic at 800k entries (10
seconds order)Needed a new (temporary) approach
- Could support million of entries
- Would not necessarily be centralized ...
- Needed to contain the information we had before
Complex MetaData contains information about
The Run magnetic field, collision, detector
configuration, trigger setup The Production
conditions, library version ... - MUST support our distributed disk approach ...
-
Nothing on the market at the time
Jérôme Lauret, CHEP03, March 2003
8FileCatalog
- Schedule
- First proof-of-principle in 2001 ...
Nikita Soldatov - Pushed it in early 2002 (had to fight some
momentum) Adam Kisiel - In service for more than a year
- DD program made it an essential tool and
breaking the first user psychological barrier
(enforced end of 2002) - Basic design relies
- MetaData in separate tables we call dictionaries
- A main table for the logical name FileData
- One table holding the FileLocations which
includes - site (BNL, LBL, ...), node type of storage
(NFS, HPSS, local ...)
Jérôme Lauret, CHEP03, March 2003
9FileCatalog - Design
Storage, site, node and path forms the unique
key for FileLocations
RunParams
File Locations
1.N
FileData
Storage Types
HPSS NFS local
Production Conditions
1.N
1.N
N.1
Storage Sites
1.N
FileTypes
N.1
Meta Data
Locations / Replicas
Jérôme Lauret, CHEP03, March 2003
10FileCatalog Learning experience
- MySQL based
- Avoid as much as possible table locking and index
rebuild - INSERT in tables with un-referenced index ID
delayed - All UPDATE are in low priority (delayed and on
the stack) - Instead of deleting an entry, we mark
FileLocations availability0 - A replication only adds an entry in
FileLocations (optimized) - Supports ancestry (actually needed)
- Regression test
- 150K files distributed on local disks over 100
nodes - No problem with simultaneous connections
- Takes less than 10 seconds to get the records,
check all files and update the Catalog ... - Most of the time spent in if ( ! -e
file ) nop
Jérôme Lauret, CHEP03, March 2003
11Distributing files ... and Catalog
Server interacts with HPSS, sorting,
threads, restore files on cache
Client Script adds records
Pftp on local disk
DataCarousel
Update FileLocations Mark un-available Spider
and update
Control Nodes
FileCatalog Management
Jérôme Lauret, CHEP03, March 2003
12Distributing files ... and Catalog
- File distribution is 6 month-old production and
sturdy ... - Files added to the Catalog as they are produced
and disk populated - Spidering done on demand, ucheck once an hour,
full check only once a day (stable) - Learning lessons
- Turns out that not all analysis are statistically
driven - Automate population ?? Magic algorithm ??
- Best bet is to look at users usage pattern
Mixed technology ?? - Resource sharing resource blocking
- Facilities are shared More replicas ??
Pre-emption ?? - Smarter distribution re-distribute
Mercedes Lopez Noriega - API one complete in perl/DBI but a partial C
interface (does not access the full complexity).
Should have tried to use schemas ...
Jérôme Lauret, CHEP03, March 2003
13Catalog - Web Front end
Jérôme Lauret, CHEP03, March 2003
14FileCatalog - What's next ?
- Test Distributed Catalog in real life (and spare
time) ... - Master-Slave replication. Extensive experience
here - At any given site, will have N Catalogs, 1 slave
per site - One Master-Copy (will) contain Merged records
- Interface for connections already set (XML one
API) - For now
- Deploy at PDSF (on-going)
- WAIT for a replacement a GRID aware viable
solution - or 1 FTE
-
Good enough for now and for
Jérôme Lauret, CHEP03, March 2003
15The STAR Scheduler Resource Broker
Poster P9
- For details, MUST see Poster P9 ...
Gabriele Carcassi Fully interfaced with
our FileCatalog Flexible XML U-JDL also
provides hand-shaking with (any) dbThe
scheduler - Interfaces with the FileCatalog Query Resolver
implementation is modular - Split the job into N sub-jobs according to where
the files OR where a resource is available ...
lt?xml version1.0 encodingutf-8
?gtltjobgt ltcommandgtroot4star -q -b
myMacro.C\(\FILELIST\\)lt/commandgt ltstdout
URLfile/star/u/xxx/work/JOBID.outgt ltinput
URLcatalogstar.bnl.gov?productionP03ia,storage
local,filetypedaq_reco_MuDst nFiles2000/gt
ltoutput fromScratch.root toURLfile/star/u
/xxx/work//gt lt/jobgt
Jérôme Lauret, CHEP03, March 2003
16Scheduler Architecture Poster P9
MySQL Server
Perl Module
Ganglia MDS
Do not address yet how Files are returned to
users if returned. Next
Jérôme Lauret, CHEP03, March 2003
17Ganglia / MDS
- Ganglia a distributed Monitoring System ...
http//ganglia.sourceforge.net/ - Ganglia information / MDS
- MDS Monitoring and Discovery Service
- Why ? Security Issues ... Not adequate for
cross-site propagation Mesh of dependencies
may become complex - Phase 1 done
Eftathiadis Eftratios
Provider vs Schema issues debugged and now
works Information pushed into MDS and checked - Need much much more work to be usable for
resource brokering - publishing delays issues
- service availability issue
Jérôme Lauret, CHEP03, March 2003
18Speaking of security ...
Before long, we will need to address MySQL
security issues (data integrity) Not only a
FileCatalog issue STAR has ALL Calibration in
MySQL and already 10 mirrors Million of records,
10x GB of (reduced) data MySQL 4.x - X509
certificates ... Being investigate
Richard Casella,
Jeff Porter Strategy for now what is really
there and what works What we need is encrypted
database replication Should investigate GT3/
OGSA
Collaborative effort needed and welcomed ...
Jérôme Lauret, CHEP03, March 2003
19Conclusion
- We have learn and gained a lot ...
- Nicely preparing our Users to the Grid (local
scale distributed resource, fixed U-JDL) without
them noticing it. - We learn ourselves lessons on resource sharing
blocking - In a position to refine our resource brokering
- Will are ready for components swapping
- STAR Scheduler Components can be replaced by
Grid middle ware - Submission to Condor-G tested more experience
in the months to come - Waiting for a stable Replica Catalog (but not
stuck) - Last but not least learning to work with one
another, its merits and limitations
Jérôme Lauret, CHEP03, March 2003