Data GRID deployment in HEPnetJ - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Data GRID deployment in HEPnetJ

Description:

We are one of the governmental agencies as like other national ... Muon and meson science. Technology transfer. Medical applications. Simulation. Accelerator ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 21
Provided by: tsas2
Category:

less

Transcript and Presenter's Notes

Title: Data GRID deployment in HEPnetJ


1
Data GRID deployment in HEPnet-J
  • Takashi Sasaki
  • Computing Research Center
  • KEK

2
Who we are?
  • KEK stands for
  • Kou High
  • Enerugi Energy
  • Kasokuki Kennkyu Kiko Accelerator Research
    Organization
  • We are one of the governmental agencies as like
    other national universities and national
    laboratory in Japan since the year 2004
  • We are an Inter-University Research Institute
    Corporation

3
Major projects at KEK
  • Belle
  • CP violation
  • _at_KEK
  • K2K, T2K
  • Neutrino
  • KEK/Tokai to Kamioka
  • CDF
  • Hadron collider, top quark
  • Fermi Lab., US
  • ALTAS
  • Hadron collider, SUSY
  • CERN, Switerland
  • J-PARC
  • Joint project with JAEA
  • Being built at Tokai
  • ILC
  • International Linear Collider
  • Site still note decided
  • International competition
  • Japan has interests to host
  • Lattice QCD
  • Dedicated IBM blue gene
  • 57.3TFlops
  • Material and life science
  • Synchrotron radiation
  • Muon and meson science
  • Technology transfer
  • Medical applications
  • Simulation
  • Accelerator

4
HENP institutes in Japan
  • KEK is the only central laboratory in Japan
  • Smaller scale centers are exist also
  • ICEPP(U Tokyo), Riken, Osaka Univ. and a few
  • Majorities are smaller groups in universities
  • Mostly 1-3 faculties and/or researchers and
    graduate students
  • No engineers nor no technicians for IT
  • This is not HENP specific, but commonly observed
  • KEK has a role to offer necessary assistance to
    them
  • Mostly graduate students in physics are the main
    human resource to support IT unfortunately

5
HEPnet-J
  • Originally, KEK organized HEP institutes in Japan
    to provide the networking among them
  • We started from 9600bps DECnet in early 1980s
  • KEK is one of the first Internet sites and the
    first web site in Japan (1983? and 1992)
  • This year, Super SInet3 will be introduced with
    20Gbps and 10Gbps to main nodes as the final
    upgrade
  • Shift to more application oriented rather than
    the band width
  • GRID deployment is an issue
  • Virtual Organization for HEP Japan

6
History of HEPnet-J
2003 Super SInet (backbone)
IP 10Gbps
7
Belle at KEK
8
(No Transcript)
9
Belle collaboration
10
Data flow model in Belle
  • At every beam crossing, an interaction between
    particles happens and final state particles are
    observed by the detector
  • Event
  • Different type of interactions may happen at each
    beam crossings
  • Events are in time sequence
  • Something like one picture in the movie film
  • Run
  • Something like a role of the movie film
  • Cut at a good file size for later processing
    (historically a size of a tape, 2GB or 4GB)
  • Data from the detector (signals) are called as
    raw data
  • Physical properties for each particles are
    reconstructed
  • Vectorization of images and conversions of units
  • a signal processing
  • Events are classified into types of interactions
    (pattern matching)
  • Data Summary Tape (DST)
  • More condensed events samples are selected from
    DST
  • Something like a knowledge discovery in images
  • Called Mini DST
  • Detector signals are striped
  • Sometimes, subset of mini DST, micro DST is
    produced

11
Belle data analysis
  • Frequency of reprocessing
  • Reconstruction from raw data
  • One a year or less
  • DST production
  • Twice a year or less
  • Mini DST production
  • Many times
  • Micro DST production
  • Many times
  • End users analysis
  • Every day, very many times
  • Monte Carlo production
  • More than number of real data
  • More likely CPU intensive jobs
  • Full simulation
  • Fast simulation
  • Event size
  • 40KB in raw data (signal only)
  • Record rate
  • 10MB/sec
  • Accumulated event in total
  • 1 PB

12
Event processing
  • Reconstruction and DST production is done on site
    due to large data size
  • Physics analysis jobs are executed locally
    against miniDST or microDST, and also MC
  • What they are doing mainly is statistical
    analysis and visualization of histograms
  • Also software development
  • Official jobs, like MC production, cross the
    levels
  • CPU intensive jobs
  • miniDST and microDST production are done by
    sub-groups and can be localized
  • Most of jobs are integer intensive than floating
    points
  • Many branches in the code

13
Data Distribution Model in Belle
  • Level 0 (a few PB)
  • Only KEK has raw data and reconstructed data
  • Whole MC data
  • Level 1 (a few 10TB)
  • Big institutions may want a replica of DST
  • Join MC production
  • Level 2 (a few 100GB)
  • Most of institutions are satisfied with mini DST
  • Join May join MC production
  • Smaller institutions may satisfied with micro DST
    even
  • Collaboration wide data set
  • Raw data
  • Reconstructed data
  • DST
  • MC events (background signal)
  • Sub group wide data set
  • Mini DST
  • Micro DST
  • MC events (signals)

14
GRID deployment at KEK
  • Bare Globus
  • Up to GT2 and gave up to follow
  • We have our own GRID CA
  • In production since this January
  • Accredited by APGRID PMA
  • Two LCG sites and one test bed
  • KEK-LCG-01
  • For RD
  • KEK-LCG-02
  • For production
  • Interface to HPSS
  • Test bed
  • Training and tests
  • NAREGI test bed
  • Under construction
  • SRB (UCSD)
  • GSI authentication or password
  • SRB-DSI became available
  • Works as SRM for the SRB world from LCG side
  • Performance test will be done
  • Performance tests among RAL, CC-IN2P3 and KEK is
    on going
  • Gfarm
  • Collaboration with AIST

15
GRID deployment
  • ATLAS definitely require LCG/gLite
  • ICEPP (International Center for Elementary
    Particle Physics), U of Tokyo will be a tier-2
    center of ATLAS
  • They have degraded from tier-1
  • One professor, one associate professor and a few
    assistant professors are working on the tier-2
    center
  • No technician, no engineer nor no contractors,
    but only physicists
  • Can you believe this?
  • How other ATLAS member institutes, mostly smaller
    groups, can survive?
  • Belle
  • Some of the collaborators requested us to support
    a GRID environment for data distribution and
    efficient analysis
  • Some time their collaborators also join either of
    LHC experiments
  • They want to use the same thing for both

16
LCG/gLite
  • LCG (LHC Computing GRID) is now based on gLite
    3.0.
  • Only middleware available today to satisfy HEP
    requirements
  • US people are also developing their own
  • Difficulty
  • Support
  • Language gaps
  • Quality assurance
  • Assumes rich man power

17
NAREGI
  • What we expect for NAREGI
  • Better quality
  • Easier deployment
  • Better support in the native language
  • What we need but still looks not in NAREGI
  • File/replica catalogue and data GRID related
    functionalities
  • Need more assessments
  • Comes a little bit late
  • Earlier is better for us
  • We need something working today!
  • Require commercial version of PBS for ß

18
First stage plan
  • Ask NAREGI to implement LFC on their middleware
  • We assume job submission between them will be
    realized
  • Share the same file/replica catalogue space
    between LCG/gLite and NAREGI
  • Move data between them using GridFTP
  • Try something by ourselves
  • Brute force porting of LFC on NAREGI
  • NAREGIlt-gtSRBlt-gtgLite will be tried also
  • Assessments will done for
  • Command level compatibility (syntax) between
    NAREGI and gLite
  • Job description languages
  • Software in experiments, especially ATLAS
  • How depends on LCG/gLite?

19
Future strategy
  • ILC, International Linear Collider, will be the
    target
  • interoperability among gLite, OSG and NAREGI
    will be required

20
Conclusion
  • HE(N)P has a problem to be solved today
  • GRID seems the solution, however, much human
    resource consumption is the problem
  • We expect much on NAREGI
  • Still we cannot escape from gLite
  • Interoperability is the issue
  • We work on this issue together with NARGI and
    IN2P3
Write a Comment
User Comments (0)
About PowerShow.com