Title: Mihnea Dulea, IFIN-HH
1Efficient Handling and Processing
of PetaByte-Scale Data for the Grid
Centers within the FR Cloud 1ST JOINT SYMPOSIUM
CEA-IFA
HaPPSDaG - PROJECT PRESENTATION - - FIRST YEAR
PROGRESS REPORT -
M. Dulea National Institute for Nuclear Physics
and Engineering 'Horia Hulubei' (IFIN-HH)
Mihnea Dulea, IFIN-HH
2OVERVIEW
- Computing support for LHC
- Project topics
- Project objectives and work planning
- Framework agreements
- General information
- Project teams and infrastructure
- First year results
Mihnea Dulea, IFIN-HH
3COMPUTING SUPPORT for LHC - LCG
- LHC COMPUTING GRID
- LCG is a wide distributed array of computing
resources that provides the computing support
required for the storage, processing, simulation
and analysis of the data gathered by the four
major - experiments performed at LHC.
- It consists of more than 140 computing centres
- and federations of centres from 35 countries.
- The resource centres are classified according
- to their size and functionality as Tier-0 (CC
- _at_ CERN), Tier-1 (11 centres), and Tier-2.
- The centres are interconnected through a
- high-speed network (GEANT2 in EU).
- Current and 2012-2014 activity related to LHC.
Mihnea Dulea, IFIN-HH
4COMPUTING SUPPORT - FR
- ATLAS FRENCH CLOUD
- Grid sites
- CC-IN2P3 (Tier-1)
- Tier-2 centres ... (many)
- GRIF
- Grille de Recherche d'Ile de France
- computing grid in Paris region, joint
- initiative of CEA/IRFU labs of
- CNRS/IN2P3 (6 sites)
- The sites are interconnected
- through dedicated 10 Gbps links connected to the
FR NREN - RENATER Réseau national de télécommunications
pour la technologie, l'enseignement et la
recherche - FR Cloud includes foreign grid centres from
China, Japan, Romania
Mihnea Dulea, IFIN-HH
5COMPUTING SUPPORT - RO
- ROMANIAN TIER-2 FEDERATION RO-LCG
- Grid sites
- IFIN-HH, 5 Grid sites (resource centres)
- ISS - Inst. for Space Sciences (2 sites)
- UPB - Univ. 'Politehnica' of Bucharest
- ITIM - NIRD in Molecular Isotopic
- Technologies - Cluj
- UAIC, Alex. Ioan Cuza University - Iasi
- The sites are connected to the 10 Gbps
- backbone of the RO NREN - the Romanian
- Educational and Research Network RoEduNet
- 4 grid sites currently support ATLAS vo
RO-07-NIPNE, RO-02-NIPNE (IFIN-HH) - RO-14-ITIM (Cluj), RO-16-UAIC (Iasi)
Mihnea Dulea, IFIN-HH
6PROJECT TOPIC
- Computing support for LHC experiments provision
of grid resources services - The overall support of LCG
- deployment and operation is
- provided from other funds (e.g.
- CONDEGRID project in RO).
HAPPSDAG addressess specific ATLAS issues in
order to optimize resource usage
Mihnea Dulea, IFIN-HH
7ATLAS ISSUES
- Generic requirements regarding
- - data transfer from Tier-1 to the associated
Tier-2 sites (CC-IN2P3 gt RO-LCG) - - transfer of large files from SE to WN for each
analysis job consider many simultaneous jobs - - transfer of log and results files from WN to
SE immediate transfer of log file to UI - RO specific needs at the beginning of the
project Grid cluster - - analysis of the causes of the lower performance
of - RO-LCG sites before Oct. 2010
- - elaborate and test technical solutions for
performance - improvement
- - ensure better communication and coordination
- between the RO sites and the FR-cloud partners
- - general measures for improving Tier1 - Tier2
interaction - - elaborate general guidelines regarding the
- improvement in efficiency of the grid centers
which are - associated to ATLAS clouds Transfer
paths from/to the Storage Element (SE)
Mihnea Dulea, IFIN-HH
8PROJECT OBJECTIVES
Strategic objective provide means for
improvement of the processing and handling of
large data sets at the Tier2 centers which
participate in the ATLAS experiment at the LHC
computing support. (RO - case study) Specific
objectives and partner contributions
- Improve communication and coordination between
GRIF/IN2P3 and RO sites (RO/FR) - Testing improving quality of the FR - RO data
link for large dataset transfers (RO/FR) - Implementation of specific measures for
increasing ATLAS job load and storage - performance on sites (RO)
- Improving large dataset transfer between FR - RO
and data analysis (RO/FR) - Contributing to grid monitoring and technical
support within FR-cloud (RO) - Training regarding grid monitoring and support
(FR gt RO) - Dissemination (RO/FR)
Mihnea Dulea, IFIN-HH
9PLANNING of WORK
-
- Stage 1 (01.10.2010 - 10.12.2010)
- Analysis of Tier1-Tier2 communication
- Stage 2 (01.01.2011 - 30.09.2011)
- Studies and software tools for monitoring
and operation of the FR Cloud - RO grid
connection and job loading. Testing of data
handling and processing. - Stage 3 (01.10.2011 - 30.09.2012)
- Methods and procedures for improving the
performance of the RO sites within the FR
Cloud
Mihnea Dulea, IFIN-HH
10FRAMEWORK AGREEMENTS
- General Cooperation Agreement for Scientific
Research - between CEA and IFA, signed in December 2009
- - Field of cooperation Technologies for
Information and Health - - Topic proposed for 2010 Grid Technologies
- Joint Call for proposals of joint RD projects
(May 2010) - - IFIN-HH and IRFU submitted a proposal for
a Joint Research and - Development Projects
- Cooperation Agreement in the Field of Scientific
Research (AS) - between CEA and IFIN-HH, (01.10.2010)
- - General Coordinators Gerard Cognet (FR),
Ioan Ursu (RO) - - leading and coordinating the cooperation
activities - Project Agreement (CEA, IFIN-HH)
Mihnea Dulea, IFIN-HH
11GENERAL INFORMATION
- RO Contract n C1-06/2010, between IFA and
IFIN-HH - Start date 01/10/2010
- Duration 24 months
- Funding of the RO part of the project 400 000
lei ( 94.000 ) - Funding of the FR part of the project 133 000
BUDGET 2010 2010 2011 2011 2012 2012
BUDGET RO (lei) CEA (Eur) RO (lei) CEA (Eur) RO (lei) CEA (Eur)
Manpower 25.333 6000 120.133 48000 82.000 22000
Travels 8.000 4000 3.200 14000 8.000 14000
Others (Romanian Engineer staying at Saclay ) 5000 10000 10000
Others (French guests staying in Romania ) 0 10.000 10.000
Others (equipment) 0 40.000 40.000
Others (indirect costs) 6.667 26.667 20.000
Total 40.000 15.000 200.000 72.000 160.000 46.000
Mihnea Dulea, IFIN-HH
12PROJECT TEAMS
- Project coordinators Jean-Pierre Meyer (FR),
Mihnea Dulea (RO) - Technical correspondents Pierrick Micout (FR),
Gabriel Stoicea (RO) - FR team (CEA/IRFU)
- Eric LANÇON
- Pierrick MICOUT
- Christine LEROY
- Frédéric SCHAER
- Zoulikha GEORGETTE
- Adelino GOMEZ
- RO team (IFIN-HH)
- Serban Constantinescu
- Mihai Ciubancan
- Ionut Traian Vasile
- Camelia Mihaela Visan
Mihnea Dulea, IFIN-HH
13Centre for Informational Technologies (CTI) -
IFIN-HH
INFRASTRUCTURE _at_ CTI/DPETI
1200 (grid) 960 (hpc) cores, 270 TB
14ANALYSIS of NETWORK INFRASTRUCTURE
- Objective identify the weak points of the FR-RO
data connection and adoption of measures for
improving the transfer capacity of large
datasets. - Network structure complex, various owners and
administrators gt more difficult to act
Section Centres Administrator Owner Location
IFIN-HH LAN RO-02-NIPNE RO-07-NIPNE CTI/DPETI IFIN-HH Magurele
IFIN - UPB UPB ICOMM IFIN-HH UPB
RoEduNet RO-14-ITIM RO-16-UAIC AARNIEC MECTS Romania
GEANT2 In 34 EU states DANTE EU NRENs EU
RENATER GRIF, IN2P3 GIP RENATER GIP RENATER France
- Activities (ROFR)
- Testing connectivity transport capacity with
various tools - Finding routing paths and points of data traffic
delay - Comparing performances of RO-CERN link with
those of RO-IN2P3
- Conclusions a) performance degradation at
RoEduNet / GEANT2 interface - b) bottlenecks on some of the RoEduNet
routers
Mihnea Dulea, IFIN-HH
15IMPROVING POINT-TO-POINT TRAFFIC PERFORMANCES
- Requires close collaboration with network
administrators along the RO-FR path - Example following bandwidth capacity and traffic
analysis, a RoEduNet router was found,
responsible of bottlneck. AARNIEC's intervention
rised the available bandwidth to 700 Mbps (fig.
below).
Permanent monitoring required
Mihnea Dulea, IFIN-HH
16MONITORING TOOLS for DATA TRANSFER and STORAGE
PERFORMANCE - 1
- Development of software tools for monitoring of
SE traffic (in/out) (adding data sent by daemons
running on storage servers in a database web
interface for display) - Tools developed in IFIN-HH useful for FR
partners too for monitoring RO sites. - Traffic from/to WNs and from/to external network
-
Max at 5 Gbps
Max at gt 3 Gbps
Mihnea Dulea, IFIN-HH
17MONITORING TOOLS for DATA TRANSFER and STORAGE
PERFORMANCE - 2
- Traffic on gateway (in/out) SE extern
throughput -
- Monitoring groups of running or pending jobs
-
Mihnea Dulea, IFIN-HH
18MONITORING TOOLS for DATA TRANSFER and STORAGE
PERFORMANCE - 3
- Accounting of running or pending jobs on CE or
CREAM-CE -
-
Mihnea Dulea, IFIN-HH
19IMPROVEMENT of SITE MONITORING and TECHNICAL
SUPPORT
- Implementation of its own SAM (Service
Availability Monitoring) system, that uses
IFIN-HH grid infrastructure and a new monitoring
vo - ifops. Results published using Nagios. - Early notification of technical staff leads to
improvement of availability of grid services
Monitoring of CREAM-CE, tbit03.nipne.ro
Mihnea Dulea, IFIN-HH
20IMPROVEMENT and TESTS of SE-WN THROUGHPUT
- Adding more resources (WNs) doesn't always mean
better results. Scalability is required - Improvement of file transfer speed from SE to WN,
required by analysis jobs (4-6 files 2-4 GB) - Replacing the transfer to disk servers through
Network File System (NFS) protocole by new DPM
(Disk Pool Manager) disk storage servers. - Higher transfer speed gt no job exceeds the time
limit gt no cancellation - Tests of the new configuration
Time representation of the transfer speed (in
Mbps) for 70 quasi-simultaneous jobs
Mihnea Dulea, IFIN-HH
21GLOBAL IMPROVEMENT of EFFICIENCY
- Mean efficiency of ATLAS job execution in 2011
91
Monthly number of ATLAS jobs and number of ATLAS
events processed in RO-LCG
Mihnea Dulea, IFIN-HH
22TRAINING REGARDING MONITORING AND TECHNICAL
SUPPORT
- 20.06.11 - 04.07.11 training stage of C. Visan
at CEA/IRFU, preparing later participation to
monitoring and support activities for FR Cloud
sites. - Topics
- - CEA/IRFU monitoring methods at site, VO,
project levels EGI/WLCG and LHC monitoring
(Christine Leroy, Pierrick Micout ) - - grid site usage (Georgette Zoulikha)
- - NAGIOS installing/configuration on virtual
machines (Frederic Schaer) - - job submission through Pathena (PanDA Athena),
at LAL-Orsay (Laurent Duflot) - - CACTI site monitoring (Victor Mendoza,
Université Pierre et Marie Curie (UPMC)) - - instructions for site and job monitoring in
ADCoS (ATLAS Distributed Computing Operations
Shift) and for support team of FR Cloud (Squad).
(Sabine Crepe)
Mihnea Dulea, IFIN-HH
23MOBILITY
- Kick-off meeting (15-16.11 2010, Saclay)
- Participation at the RO-LCG 2010 Conference,
Bucharest (Christine Leroy, Sabine - Crepe - IN2P3)
- Participation of Gabriel Stoicea to the spring
meeting of LCG-France (30-31.05.2011) - Training - monitoring and support (20.06.11 -
04.07.11, Saclay), C.M. Visan
Mihnea Dulea, IFIN-HH
24BENEFITS
- CEA/IRFU
- The results of the project contribute to global
improvement of FR Cloud efficiency - Elaboration, in collaboration, of general
guidelines for interaction between grid centres
in ATLAS clouds, and - Using FR-RO interaction as a representative case
study for sharing best practices with smaller
sites
- IFIN-HH
- General efficiency improvement of the activity
of the RO sites - Better integration and visibility in the
framework of the computing support for ATLAS
collaboration - High-level training of RO technical staff
Mihnea Dulea, IFIN-HH
25PROSPECTS
- Further development of methods and procedures
for improving the performance of the RO sites
within the FR Cloud - General guidelines regarding the improvement in
efficiency of the grid centers which are
associated to ATLAS clouds - HAPPSDAG workshop and technical meeting in
Bucharest (28-30.11.2011) -
- Participation of IFIN-HH to site and job
monitoring in ADC shifts (ATLAS Distributed
Computing) or in the monitoring team of FR Cloud.
- Dissemination of results
Mihnea Dulea, IFIN-HH
26THANK YOU FOR YOUR ATTENTION ! Questions?
Mihnea Dulea, IFIN-HH