Title: DISc NIKHEF: Driving Grid Development with Real Applications Kors Bos NIKHEF
1DISc _at_ NIKHEFDriving Grid Development with Real
Applications Kors BosNIKHEF
15 October 2004
http//www.nikhef.nl/grid
EGEE is a project funded by the European Union
under contract IST-2003-508833
2Contents
- The HEP Computing Problem
- Grid research and dissemination activities at
NIKHEF - Community forming
- Large-scale infrastructure operation
- Data Intensive Sciences on the Grid
3The HEP Computing Problem
- Collect, distribute, and archive data
- Transform these data to useful quantities
- Mine the transformed data looking for interesting
stuff
4Collect, distribute, and archive data
5Scales Data Archival
One of the four LHC detectors
40 MHz (40 TB/sec)
online system multi-level trigger filter out
background reduce data volume
level 1 - special hardware
75 KHz (75 GB/sec)
level 2 - embedded processors
5 KHz (5 GB/sec)
level 3 - PCs
100 Hz (100 MB/sec)
data recording offline analysis
6Collect, distribute, and archive data
10 PB
Few Gb/s
Few to tens of GB/s depending on how you do it
7Transform the data to useful quantities
8Transform the Data to Useful Qs
- Place event info on 3D map
- Trace trajectories through hits
- Assign type to each track
- Find particles you want
- Needle in a haystack!
- This is relatively easy case
9Already Doing Data Transformation on Global Scale!
10Scales Data Transformation
- 90 seconds per event to reconstruct and analyze
- 100 incoming events per second
- Even to keep up, need either
- A computer that is nine thousand times faster, or
- nine thousand computers working together
- Its worse than just keeping up each event will
need to be analyzed several times - Each event is an independent entity pleasantly
parallel
11Results Data Challenge 04
- Up to 3000 simultaneous jobs per experiment,
globally distributed - Equivalent of 2.2 million hours (250 yrs) of CPU
time (2.0 GHz) in period of one month - Total data volume produced gt 25 TB
For LHCb NIKHEF 6 of worldwide total
12Community Management
- Effective Grid infrastructures need
- Low threshold for forming user communities
- Effective mechanisms for arranging resource
sharing and federation - Effective tracking of resource usage at site
level - NIKHEF research focuses on enabling sites to
support - dynamic user communities
- securely
- with minimal effort
13Large-ScaleInfrastructure Operation
- Grids ultimately need to scale to
- 10.000 100.000 processors
- Petabytes to exabytes of data
- Hundreds of user communities
- Thousands of users
- When NIKHEF started in 2001, scales were
- 100s of processors
- Gigabytes of data
- Six user communities
- Tens of users
- We learn at each increase in scale
- Our users at NIKHEF are the ideal test team they
really use the stuff, and want to use it at the
largest scales possible - We find out how to solve scaling problems on our
local facility - We transmit design requirements to our
software-engineer partners - We transmit experience and fixes to our
production-facility partner SARA - Crucial that NIKHEF has its own relatively large
facility!
14Data Intensive Science Research
- The HEP use cases are reasonably generic
- If they work for HEP, they work for others too
- Astronomy (radio telescopics VLBI)
- Bioinformatics (FMRI)
- Earth Observation (Ozone Profile Processing)
- Biodiversity (tracking bird migration patterns)
- Didnt detail data mining use case, we do
metadata research here, useful for many other
data miners - Link to these groups via
- Hosting part-time presence at NIKHEF
- Leadership of Data Intensive Sciences track in
the VL-E (Virtual Lab for e-Science) project