Title: Production Grid Challenges in Hungary
1Production Grid Challenges in Hungary
- Péter Stefán
- Ferenc Szalai
- Gábor Vitéz
- NIIF/HUNGARNET
2Agenda
- Brief introduction
- Grid initiatives - ClusterGrid
- Challenges in a production environment
- Generic ClusterGrid operation model
- Management issues
- User support
- Monitoring
- ClusterGrid future challenges
- Conclusions
3Brief NIIF Introduction
GRID supercomputing
Videoconference, central HA cluster
VPNs, VoIP, directory service
IP, IPv6, MPLS, lambda etc
10G backbone, 600.000 users, 750 institutions
4Supercomputers
- Consists of 2 SUN E15Ks and 2 SUN 10Ks located at
two universities, including 276 CPUs, 300 GB of
memory. - Used to be in the top 500.
- In production since 2001.
- Serves more than 200 users, and 100 scientific
projects.
5Hungarian grid initiatives, MGKK
- Hungarian grid initiatives can be classified into
grid infrastructure and grid system development
projects. - Key role-players formulate grid collaboration
Hungarian Grid Competence Center (MGKK) involving
BUTE, ELUB, MTA-SZTAKI, NIIF/HUNGARNET, KFKI,
University of Veszprém. - Intensive participation in many national and
European grid initiatives EGEE, NorduGrid,
SEE-GRID, etc.
6ClusterGrid initiative
- It is a pool of 1400 PC nodes throughout the
country involving more than 26 clusters. - Production infrastructure since July 2002.
- Supercomputer clusters are planned to be involved
too. - A rough measurement on the total compute capacity
is about 600 Gflops. - Even though it is much smaller than regional,
continental grids, in complexity it is at the
same range.
7Challenges in production environment
- Grid definition - set clear objectives what to
build - Simplicity - keep the system transparent, usable
- Completeness - cover not only application level
- Security - using computer networking methods
(MPLS, VLAN technologies) - Compatibility - other grids (X509, LDAP)
- Manageability - easy-to-maintain
- Robustness - fault tolerant behavior
- Usability - cover many job classes, user support
- Platform independency - to be able to execute on
MS
8ClusterGrid architecture
9Some new ideas
- MPLS, VLAN connected resources
- Web-transaction based resource broker
- Dynamic, separated run-time environment
10Generic production service model
11Challenges in production contd
- Management
- physical compute resources (supercomputers,
clusters), - virtual resources (virtual clusters),
- storage nodes,
- users,
- services
- User support
- Grid architecture monitoring
12Compute-cluster management
13Virtual compute-cluster management
14Virtual compute-cluster management
15Storage management
- Low level management of disks and volumes, file
systems (cost efficient storage solutions by
using ATA over Ethernet - AoE). - Medium level access management (gridFTP, FTPS).
- High level data brokering (extended SRM model).
16User management
- User personal data is kept in an LDAP based
directory service separately from authentication
data. - Aided by a web registration interface.
- Authentication
- X509 certificates,
- LDAP based authentication.
- No authorization yet.
17Service management (experimental)
- Relatively new direction.
- It is a special service.
- It is based on well-established authorization.
- Basically helps to start, stop, (re)configure
grid services.
18User support
- Grid service provider gives user support
covering - consultation about the benefits of grid usage,
- code porting and optimization,
- partial aid in code implementation,
- job formation and execution,
- generic grid usage.
- Not yet covered
- model creation,
- formal description,
- algorithm creation.
19ClusterGrid monitoring
- Fluctuation of grid cluster resources between the
day-shift and night-shift operation. - Blue line total Green area occupied.
- 2-layer hierarchical monitoring system.
20ClusterGrid monitoring
21ClusterGrid monitoring
22Future ClusterGrid (?) challenges
- Continuously growing demands for reliable compute
and data storage infrastructure. - Grid systems should conform to international
standards and MUST interoperate with one another. - Platform-independency is not an issue yet, but
will be. - LEGO-based principles are of increasing
importance. - Threats solutions that prevent development
erosion of the belief in the power of grid.
23Conclusions
- One of the first production-level grids have been
shown in a nutshell. - With special emphasis on operation, management
and user support issues. - Management generally covers grid resource, grid
user management and monitoring. - Some remarks regarding future development were
also done.
24Thanks for your attention!
- www.clustergrid.hu
- www.mgkk.hu
- grid-tech_at_niif.hu