Title: Data processing with GFARM at Belle experiment
1Data processing with GFARM at Belle experiment
- Hidekazu Kakuno(Tokyo)
- Ichiro Adachi(KEK)
- Nobu Katayama(KEK)
- Frederic Ronga(KEK)
- 9 September 2005
Gfarm Workshop2005
2Belle experiment
- B-meson factory experiment at KEK, tsukuba
B meson bound state of b and (u or d) quark
KEKB ring
B
e
e-
B
- Explore CP violation and flavor physics in B
mesons
3Belle Detector
Silicon Vertex Detector 3 layers of DSSD for
vertexing
ToF counter P-ID
Aerogel Chrenkov Counter p/K seperation
Central Drift Chamber tracking dE/dx
CsI(Tl) Calorimeter photon and electrons
KLM muon KL catcher
Superconduncting solenoid of 1.5T
4Belle data processing scheme
5dataflow
6Data acquisition and processing figure
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB Total DST
data 390 TB
Raw event size 38 kB DST event size 60 kB mdst
event size 12 kB Total mdst data 80 TB
_at_summer, 2004
_at_ summer 2005, total data is 1.6 times of summer
2004
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
Output event rate 230 Hz Output data rate 8.9
MB/s Total raw data 247 TB ? since last
summer 120 TB Total DST data 390 TB ? since
last summer 186 TB
7Present Belle computing system
- 2 major components
- under rental contract
- start from 2001
- Belle own system
8Computing resources evolving
- Purchased what we needed as we accumulated
integrated luminosities so far -
GHz
TB
TB
Processing power at 2005 7fb-1/day
? 5fb-1/day at 2004
9Resources in future
- 40,000 specCINT2000_rates compute servers at 2006
- 5(1)PB tape(disk) storage system with extensions
- fast enough network connection to read/write data
at the rate of 2-10GB/s (2 for DST, 10 for
physics analysis) - User friendly and efficient batch system that can
be used collaboration wide
x 6 data
10Computing at remote site
In addition to the real data,Monte-Carlo
simulation data is necessary(x 3 times of the
real data) Many Institute help generating
MC Total CPU at remote site 600GHz All MC data
are transferred to KEK via ftp
KEK
11Data Management
User has to go through those to get final results
- 20K files for beam runs
- 240K files for run-dependent MC data
File information are stored in postgreSQL
database meta data
data files
inquire
read
inquire
user
access
job submit
answer
12GFARM at Belle experiment
13Possible application of GFARM
- Management of MC data among the institutes and
KEK - Better management of file information (inside of
KEK) - Distributed storage of analysis data to reduce
the heavy IO for analysis jobs - (Some data file can be accessed from many jobs
(100)at the same time)
14Test of GFARM with Belle software
- Testing Belle software framework at a cluster at
AIST - 50 node 2-CPU 2.8GHz Xeon
- Belle software work with GFARM without
modification - basic test by MC data production
- 300Million events/day
MC production job
Input data
Output data
/gfarm/...
/gfarm/...
15Test of GFARM with Belle software(cont'd)
- Tested more advanced usage
- Distributed storage and distributed data
processing - Tested up to 30 parallel jobs
processing
/gfarm/...
/gfarm/...
Distribute Belle rawdata using gfarm library
Input data
output data
Input data
16Test of GFARM with Belle software(cont'd)
- Install Belle software to gfarm and load from it
/gfarm/...
Belle software modules (executable shared libs)
processing
/gfarm/...
load
Input data
output data
Input data
17Possible usage in future
- distributed storage of MC data among institutes
KEK
MC production
/gfarm/...
MC data
metadata
18Summary
- Belle experiment have accumulated a lot of data
- (500M B decay events) with excellent
performance of the accelerator. - -gt will increase more and more in future.
- A sophisticated scheme of the data handling is
needed
GFARM will be a solution
Belle software work well with GFARM. We will test
GFARM for more practical usage.