Title: Status GridKa
1Status GridKaALICE T2in Germany
- Kilian Schwarz
- GSI Darmstadt
2ALICE T2
- Present status
- Plans and timelines
- Issues and problems
3Status GridKa
- Pledged 600 KSI2k, delivered 133, 11 of ALICE
jobs (last month)
FZK
CERN
4GridKa main issue
- Resources provided according to megatable
- The share among Tier1s comes automatically when
considering the Tier2s connecting to this Tier1 - GridKa pledges 2008 tape 1.5 PB, disk 1 PB
- Current megatable tape 2.2 PB !!!
- ? Much more than pledged, more than all other
experiments together, most of the additional
demand due to the Russian T2 (0.8 PB)
The point is the money is fixed. In principle
switch between tape/disk/cpu should be possible
not on short notice, though. Eventually for 2009
things still can be changed.
5GridKa one more issue
- disk cache in front of the mass storage how to
compute this value ? - Suggestion
- strongly depending on the ALICE computing model
and therefore the formula to compute it should be
the same for all T1 centres. - The various parameters in the formula should be
defined by individual sites, according to actual
MSS implementation (dCache, DPM, xrootd, )
6ALICE T2 present status
CERN
GridKa
150 Mbps
Grid
30 TB 120 ALICEGSISExrootd
vobox
LCG RB/CE
GSI Batchfarm (39 nodes/252 cores for ALICE)
GSIAF(14 nodes)
Directly attached disk storage (55
TB) ALICEGSISE_tactical xrootd
PROOF/Batch
GSI
7Present Status
- ALICEGSISExrootd
- gt 30 TB disk on fileserver (8 FS a 4 TB each)
- 120 TB disk on fileserver
- 20 fileserver 3U 15500 GB disks RAID 5
- 6 TB user space per server
- Batch Farm/GSIAF and ALICEGSISE_tacticalxroo
td - nodes dedicated to ALICE
- 15 D-Grid funded boxes each
- 22core 2.67 GHz Xeon, 8 GB RAM
- 2.1 TB local disk space on 3 disks system disk
- Additionally 24 new boxes each
- 24core 2.67 GHz Xeon, 16 GB RAM
- 2.0 TB local disk space on 4 disks including
system
8ALICE T2 short term plans
- Extend GSIAF to all 39 nodes
- Study coexistence of interactive and batch
processes on the same machines. Develop
possibility to increase/decrease the number of
batch jobs on the fly to give advantage to
analysis. - Add newly bought fileservers (about 120 TB disk
space) to ALICELCGSExrootd
9ALICE T2 medium term plans
- Add 25 additional nodes to GSI Batchfarm/GSIAF to
be financed via 3rd party project (D-Grid) - Upgrade GSI network connection to 1 Gbs either as
dedicated line to GridKa (direct T2 connection to
T0 problematic) or as general internet connection
10ALICE T2 ramp up plans
 http//lcg.web.cern.ch/LCG/C-RRB/MoU/WLCGMoU.pdf
11Plans for the Alice Tier 23 at GSI
- Remarks
- 2/3 of that capacity is for the tier 2 (ALICE
central, fixed via WLCG MoU) - 1/3 for the tier 3 (local usage, may be used via
Grid) - according to the Alice computing model no tape
for tier2 - tape for tier3 independent of MoU
- hi run in October -gt upgrade operational 3Q each
year
12ALICE T2/T3
Language definition according to GSI
interpretation ALICE T2 central use ALICE T3
local use. Resources may be used via Grid. But no
pledged resources.
- remarks related to ALICE T2/3
- At T2 centres are the Physicists who know what
they are doing - Analysis can be prototyped in a fast way with the
experts close by - GSI requires flexibility for optimising the ratio
of calibration/analysis simulation at tier2/3
13ALICE T2 use cases (see computing model)
- Three kinds of data analysis
- Fast pilot analysis of the data just collected
to tune the first reconstruction at CERN Analysis
Facility (CAF) - Scheduled batch analysis using GRID (Event
Summary Data and Analysis Object Data) - End-user interactive analysis using PROOF and
GRID (AOD and ESD)
CERN Does first pass reconstruction Stores one
copy of RAW, calibration data and first-pass
ESDs T1 Does reconstructions and scheduled
batch analysis Stores second collective copy of
RAW, one copy of all data to be kept, disk
replicas of ESDs and AODs T2 Does simulation
and end-user interactive analysis Stores disk
replicas of AODs and ESDs
14Data reduction in ALICE
RAW 14MB/ev
RAW 1.1MB/ev
- In principle individual file transfer works
fine, now. Plan next transfers with Pablos new
collections based commands. Webpage where
transfer requests can be entered and transfer
status can be followed.
15data transfers CERN GSI
- motivation calibration modell and algorithms
need to be tested before October - test the functionality of current T0/T1 ? T2
transfer methods. - At GSI the CPU and storage resources are
available, but how do we bring the data here ?
16data transfer CERN GSI
- The system is not ready yet for generic use.
Therefore expert control by a mirror
master_at_CERN is necessary. - In principle individual file transfer works
fine, now. Plan next transfers with Pablos new
collections based commands. Webpage where
transfer requests can be entered and transfer
status can be followed up. - So far about 700 ROOT files have been
successfully transfered. This corresponds to
about 1 TB of data. - 30 of the newest request still pending.
- Maximum speed achieved so far 15 MB/s (almost
complete bandwidth of GSI), but only during a
relatively short time - Since August 8 no relevant transfers anymore.
Reasons - August 8, pending xrootd update at Castor SE
- August 14, GSI SE failure due to network problems
- August 20, instability of central AliEn services.
Production comes first - -- Up to recently AliEn update
- GSI plans to analyse the transferred data ASAP
and to continue with more transfers. Also PDC
data need to be transferred for prototyping and
testing of analysis code.
17data transfer CERN GSI
18ALICE T2 problems and issues
- Where do we get our KSI2k values from for
monitoring of CPU usage. Currently
http//www.spec.org/cpu/results (but e.g. HEPiX
intel CPUs not complete performance available
for typical HEP applications since optimised for
Intel compilers etc - How to do comparision between values published in
ALICE and WLCG ?