Title: Proto-GRID at Tevatron a personal view
1Proto-GRID at Tevatrona personal view
- Stefano Belforte
- INFN-Trieste
2Proto GRID at Tevatron
- Tevatron means now 2 experiments CDF and D0
- Running experiments started 15 years ago. Now is
Run2. - Started Run2 with same structure as Run1
- it works, dont fix it !
- Run2 data 10xRun1, 5 years later, a piece of
cake ? - Instead cpu needs 1000 x Run1
- solution 10x from technology, 100x from brute
force (linux farms) - Evolution toward full fledged GRIDs natural and
in progress - cant wait for LCG tools to be production
quality - Hence a proto-GRID building most of GRID
functionalities with simpler tools, explore the
fastest ways, test effectiveness, users
response, cost/benefit ratio.. - Biggest contribution to LCG will be from our
experience, not our designing. So I will only
talk about what I really know CDF and in
particular data analysis.
3CDF situation
- Motivated by a vast data sample, and unexciting
code performance, CDF is going to gather
computing resources from anywhere ? The
CDF-GRID - A project started recently
- From enabling remote institutions to look at
some data - To integrate remote resources in a common
framework - Data Handling is at the core and is now a joint
project CDF-D0 - SAM SequentialAccess through Metadata
- Moves data around (fast and safely, CRC check
enforced) and manages local disk cache while
keeping track of locations and associates data
and metadata (files to datasets primarily) - users process hundreds of files in a single job
4Why the CDF-GRID ? Physics !!
- CDF is increasing DAQ rate to tape
- Event compression x2 rate in same
bandwidth - Increased bandwidth 20 ? 60MBytes/sec by 2006
- Main motivation is B physics
- Get the most out of the Tevatron
- Increase and saturate L1/L2/L3/DAQ bandwidth
- Many analysis (Bs e.g.) statistic limited
- This doubles the (already large) needs for
analysis computing - Cpu
- Disk
- Tape drives
- Analysis computing by far the single most
expensive item in CDF computing 50 of total
cost (23 M/year vs. 1.5 available from
Fermilab)
5Convergence toward a GRID
- Resources from Fermilab not enough any more
- All MC must be done offsite
- At least 50 of users analysis has to be done
offsite, I.e. at least all the hadronic-B sample - Thanks to SVT, lots of data to do beautiful
physics - Huge sample whose size is independent of Tevatron
Luminosity - Collaborating institutions want to do more at
home - Have more money, and/or computers are cheaper
and/or want to spend more locally - Want to tap on local SuperComputer centers
CM,UCSD - Want to tap on emerging LHC-oriented computing
centers - Want independence, resource control
- At last is possible WAN not a bottleneck any
more - no data has moved on tape in/out of FNAL in run2
6What to do on the GRID
- Reconstruction limited need, one site enough,
mostly logistic/bookkeeping and code robustness
problem - rare bugs (1/106 events) slow down farm
significantly - Monte Carlo not a very difficult problem,
relatively easy to do offsite, centralized/control
led activity, best to limit to a few sites - just a matter of money
- Users Analysis the most demanding both as
resources and as functional requirements, needs
to reach everybody everywhere and be easy, fast,
solid, effective - still, the most rewarding for users
- main topic of following discussion
- DataBase too seldom forgotten. At the heart of
everything there is a DB that keeps track of all.
Very difficult, very unexciting and unrewarding
to work on.
7Users analysis on the GRID
- Very challenging
- Have to cope immediately and effectively with
- Authentication/Authorization
- Hundreds of users fair share, priorities,
quotas, short lived data (a user produce little
data at a time, but does it again and again),
scratch areas, access to desktops - Immediate response, robustness, easy of usage,
diagnostics - Why does my job not run after 1hour, 5 hours, 2
days ? - Why did my job crash ? It was running on my
desktop ! - Full cycle optimization, no point in making
ntuple fast if desktop can not process them fast - Need to do it across many sites the CDF-GRID
8Starting point CAF CDF Analysis Farm
My Desktop
- Compile/link/debug everywhere
- Submit from everywhere
- Execute on the CAF
- Submission of N parallel jobs with single command
- Access local data from
- CAF disks
- Access tape data via transparent cache
- Get job output everywhere
- Store small output on local scratch area for
later analysis - Access to scratch area from everywhere
- IT WORKS NOW at FNAL, in Italy and elsewhere
- 10 CAFs all around the world
My favorite Computer
FNAL
out
job
Enstore
Log
ftp
rootd
gateway
scratchserver
dCache
N jobs
out
SAM
GridFtp
INFN
dCache
NFS
rootd
Local Data servers
A pile of PCs
9The VISION beyond many CAFs
- Develop/debug application on desktop anywhere in
the world - Submit to CDF-GRID specifying usual CAF stuff and
dataset - Data are (pre)fetched if/as needed from central
repository - DH take care of striping datasets across physical
volumes for optimal performance, load balancing,
fault tolerance etc. - Users output data are also stored by DH on
limited,recycled disk space for each user, backup
on request, cataloguing, storing and associating
metadata are also provided - Interactive grid to provide fast (GB/sec) root
access to final data - Organize GRID around virtual analysis centers,
not just regional centers, each site has a copy
of one (or more) datasets and support everybodys
analysis on those. - More efficient than everyone has a piece of many
datasets - Force collaboration you run my jobs, I run yours
10From Vision to Reality POLITICS
- Recently CDF International Finance Committee
received a proposal from the collaboration - Move offsite 50 of the foreseen analysis load
- equivalent to 0.51 M contribution every year
- Require 6 sites tied in a CDF-GRID providing at
least 100 dual cpu servers and 20TB disk each - Candidates italy, uk, germany, ucsd, canada,
- Good response by committee, no money committed
yet, but most of that hardware is already in the
planning. - Means we will build the CDF-GRID and try to get
more hardware in it as we go along - Financing bodies accept the idea that e.g.
hardware bought in Italy for INFN physicists can
be expanded and shared with everybody
11Software
- CAF at FNAL, the basic brick
- dCAFs CAFs clones around the world
- SAM
- Data management on the WAN scale
- Metadata and Data File catalog
- Datasets documented file collections, handled
as a single unit - no tcl files with hundreds/thousands of file
names - dCache our best (and only) solution for serving
up to 100TB to 1000nodes without hitting NFS
limits - JIM
- Job brokering across many farms
- From kerberos to x509 for authentication
- PEAC (proof enabled analysis cluster) (proof
parallel root) - CPU need is a series of temporal spikes, how to
get the CPU ? - Piggy back on top of large batch farm, allow high
priority proof to suck 10 of total time,
handing a set of nodes to each user who will
accept 1 duty cycle.
12PEAC
- Initiate sessions in minutes, perform queries
in seconds - 1GB/5sec proofed with 10 nodes (demonstrated on
real Bs sample)
13Status
- What works (extremely well)
- CAF dCAFs
- SAM for data import
- dCache at Fermilab
- What is still in progress
- Usage of dCache outside FNAL
- Integration SAM/dCache
- Tapeless, redundant, dCache pool
- Friendly tools to manage users data in/out of
SAM/dCache/enstore - JIM
- How are we doing
- Reasonably well
- Too slow (as usual) but 2004 will be the year of
the CDF-GRID
14What we are learning
- Real world means 1st priority is
authentication/authorization - Cant use any tool that does not have a solid and
easy to use authentication method now - Do not try to outguess/outsmart users
- Do not look for complete automatization, expect
some intelligence from users shall I run this MC
at FNAL or at FZKA? - Beware providing a tool that they will not use
- Be prepared for success it started just to see
if it works, and now none can live without it,
even if it is ugly and not ready, when will we
do cleanup and documentation ? - Do not only look at usage patterns in the
past/present, try to imagine what they will be
with new tool, try to figure out how it will
affect the daily work of the student who is doing
analysis, our real and only customer - Give the users abundant monitor/diagnosing tools,
let them figure out by themselves why their jobs
crash (the dCAF provide top, ls and tail of log
files, and gdb)
15Example Who needs a resource broker ?
- The Vision submit your job to the grid, the grid
will look for resources, run it, bring back the
result asap - The Reality real world is complex, some
information just is not on the web. Very
difficult to automate educated decisions, - Farm at site A is full now, but
- I have friends there who will let me jump ahead
in priority - most jobs are from my colleague X and I know
they are going to fail - most jobs are from my students and I will ask
them to kill them - Farm in country B is free, but
- I know my colleague Y is preparing a massive MC
who will swamp it for weeks starting tonight - Data I want are not cached in farm C now, it will
take longer if I run there, but - I know I will run on those data again next week
- because that farm has lots of cpu, or
- What is the point in giving users something that
is not as good ?
16More learning
- Sites are managed by people opinions differ
- Security concerns are different at different
places - Most sites will not relinquish ownership of
system - Have to make software work on different
environments rather then imposing environment on
users, can not distribute system installation.
Let local sysman deal with security patches, ssh
version, default compiler etc. - Live with firewall, nodes on private network,
constraints on node names, user name, sharing of
computer farm with other experiments (CDF and D0
can not both have a user sam on the same
cluster) - Lots of sites do not have full time sys.managers
dedicated to CDF - Experiment software installation, operation,
upgrade should not require system privileges,
must be doable by users - This includes much of the CDF-GRID infrastructure
- SAM and CAF are operated by non-privileged users
17Conclusion
- Never forget that users already have a way to do
analysis - It maybe ackward and slow, but it works
- Users priority is to get results, not to
experiment tools - New tools have to provide significant advantages
- CDF is using a bottom up approach in which we
introduce grid elements w/o breaking the current
working (although saturated) system, looking for
just those tools that make analysis easier and
letting users decide if new tools are better then
old ones. - This makes CDF a less cutting-edge place on the
technical standpoint, but an excellent testing
grounds for the effectiveness and relative
priority of various grid components. - We are a physics driven collaboration !
- Software improvement is graded in
time-to-publication - We hope LCG learns something from us, while we
try to incorporate new tools from them
18Spare/additional slides
19Hardware
FY03 FY04 FY05 FY06
RECO Farms 0.13 0.19 0.19 0.19
CAF CPU 0.31 0.76 1.16 1.03
CAF Disk 0.34 0.20 0.64 0.56
Tape Robot 0.20 0.27 0.57 0.78
Inter CPU 0.08 0.12 0.10 0.10
NetworkDB 0.37 0.35 0.29 0.22
Total 1.4 1.9 3.0 2.9
FNAL budget 1.5 1.5 1.5 1.5
Total 1.4 1.9 3.0 2.9
20Politics details
- CDF has recently reviewed (internally) the
possibility of upgrading the system (called CSL)
that writes data to disk online. It presently
peaks at 20MB/s The recommended upgrade would be
capable of writing up to 40 MB/s and eventually
60 MB/s to disk. The main physics goals
associated with this upgrade are to strengthen
the B physics program associated with the silicon
vertex trigger (SVT), developed largely by Italy
(Ristori et al.) and collaboration from the US.
The SVT has been very successful and we continue
to plan on how to best exploit this novel
resource. The yield of charm and bottom is
limited by the trigger and the rate we write data
to disk. - CDFGrid Proposal CDF will pursue the increased
bandwidth upgrade. This upgrade will increase the
charm and bottom physics program of CDF while
maintaining the full high transverse momentum
program at the highest luminosity. We will pursue
a GRID model of computing and are asking our
international colleagues to participate in
building a world-wide network for CDF analysis.
Each country would be welcome to contribute what
is practical. Discussions are under way with the
Fermilab CD for support for a local GRID team
that will facilitate the plan. It is envisioned
that this work will be beneficial to CDF and LHC
experiments. Making LHC software and CDF/Fermilab
software more GRID friendly is expected to
require a large effort.