Title: Antun Balaz, antun'balazscl'rs
1High Performance Cluster and Grid Computing
- Antun Balaz, antun.balaz_at_scl.rs
- Scientific Computing Laboratory
- Institute of Physics Belgrade
- Serbia
Introduction to High Performance and Grid
Computing
2Overview
- Introduction to clusters
- High performance computing
- Grid computing paradigm
- Ingredients for Grid development
- Introduction to Grid middleware
3Parallel computing
- Splitting problem in smaller tasks that are
executed concurrently - Why?
- Absolute physical limits of hardware components
(speed of light, electron speed, ) - Economical reasons more complex more expensive
- Performance limits double frequency ltgt double
performance - Large applications demand too much memory time
- Advantages Increasing speed optimizing
resources utilization - Disadvantages Complex programming models
difficult development
4Parallelism levels
- CPU
- Multiple CPUs
- Multiple CPU cores
- Threads time sharing
- Memory
- Shared
- Distributed
- Hybrid (virtual shared memory)
5Parallel architectures (1)
- Vector machines
- CPU processes multiple data sets
- shared memory
- advantages performance, programming difficulties
- issues scalability, price
- examples Cray SV, NEC SX, Athlon3/d, Pentium-
IV/SSE/SSE2 - Massively parallel processors (MPP)
- large number of CPUs
- distributed memory
- advantages scalability, price
- issues performance, programming difficulties
- examples ConnectionSystemsCM1 i CM2, GAAP
(GeometricArrayParallel Processor)
6Parallel architectures (2)
- Symmetric Multiple Processing (SMP)
- two or more processors
- shared memory
- advantages price, performance, programming
difficulties - issues scalability
- examples UltraSparcII, Alpha ES, Generic
Itanium, Opteron, Xeon, - Non Uniform Memory Access (NUMA)
- Solving SMPsscalability issue
- hybrid memory model
- advantages scalability
- issues price, performance, programming
difficulties - examples SGI Origin/Altix, Alpha GS, HP
Superdome
7Clusters
- Poors man supercomputer Collection of
interconnected stand-alone computers working
together as a single, integrated computing
resourceR. Buyya - Cluster consists of
- Nodes
- Network
- OS
- Cluster middleware
- Standard components
- Avoiding expensive proprietary components
8Cluster classification
- High performance clusters (HPC)
- Parallel, tightly coupled applications
- High throughput clusters (HTC)
- Large number of independent tasks
- High availability clusters (HA)
- Mission critical applications
- Load balancing clusters
- Web servers, mail servers,
- Hybrid clusters
- Example HPCHA
9Beowulf clusters
- 1994
- T. Sterling M. Baker
- NASA Ames Centre
- Frontend
- Access machine
- JMS Monitoring server
- Shared storage NFS (directory /home)
- Nodes
- Multiple private networks
- Local storage (/scratch)
- Private networks
- High speed / low latency
10From clusters to Grids
- Many distributed computing resources (clusters)
exist, even in Serbia - Problem 1 they cannot be used by end users
transparently - Problem 2 even when access is granted to users
to several clusters, they tend to neglect smaller
clusters - Problem 3 distribution of input/output data,
sharing of data between clusters - To overcome such problems, Grid paradigm was
introduced
11Unifying concept Grid
Resource sharing and coordinated problem solving
in dynamic, multi-institutional virtual
organizations.
12Effective policy governing access within a
collaboration
13What problems Grid addresses
- Too hard to keep track of authentication data
(ID/password) across institutions - Too hard to monitor system and application status
across institutions - Too many ways to submit jobs
- Too many ways to store access files/data
- Too many ways to keep track of data
- Too easy to leave dangling resources lying
around (robustness)
14Requirements
- Security
- Monitoring/Discovery
- Computing/Processing Power
- Moving and Managing Data
- Managing Systems
- System Packaging/Distribution
- Secure, reliable, on-demand access to data,
software, people, and other resources (ideally
all via a Web Browser!)
15Why Grid security is hard (1)
- Resources being used may be valuable the
problems being solved sensitive - Both users and resources need to be careful
- Dynamic formation and management of user groups
- Large, dynamic, unpredictable
- Resources and users are often located in distinct
administrative domains- Cannot assume
cross-organizational trust agreements - Different mechanisms credentials
16Why Grid security is hard (2)
- Interactions are not just client/server, but
service-to-service on behalf of user - Requires delegation of rights user ? service
- Services may be dynamically instantiated
- Standardization of interfaces to allow for
discovery, negotiation and use - Implementation must be broadly available
applicable - Standard, well-tested, well-understood
protocols integrated with wide variety of tools - Policy from sites, user communities and users
need to be combined - Varying formats
- Want to hide as much as possible from
applications!
17Grids and VOs (1)
- Virtual organizations (VOs) are groups of Grid
users (authenticated through digital
certificates) - VO Management Service (VOMS) serves as a central
repository for user authorization information,
providing support for sorting users into a
general group hierarchy, keeping track of their
roles,etc. - VO Manager, according to VO policies and rules,
authorizes authenticated users to become VO
members
18Grids and VOs (2)
- Resource centers (RCs) may support one or more
VOs, and this is how users are authorized to use
computing, storage and other Grid resources - VOMS allows flexible approach to AA on the Grid
19User view of the Grid
20Ingredients for GRID development
- Right balance of push and pull factors is needed
- Supply side
- Technology inexpensive HPC resources (linux
clusters) - Technology network infrastructure
- Financing domestic, regional, EU, donations
from industry - Demand side
- Need for novel eScience applications
- Hunger for number crunching power and storage
capacity
21Supply side - clusters
- The cheapest supercomputers massively parallel
PC clusters - This is possible due to
- Increase in PC processor speed (gt Gflop/s)
- Increase in networking performance (1 Gbs)
- Availability of stable OS (e.g. Linux)
- Availability of standard parallel libraries (e.g.
MPI) - Advantages
- Widespread choice of components/vendors, low
price (by factor 5-10) - Long warranty periods, easy servicing
- Simple upgrade path
- Disadvantages
- Good knowledge of parallel programming is
required - Hardware needs to be adjusted to the specific
application (network topology) - More complex administration
- Tradeoff brain power ? ? purchasing power
- The next step is GRID
- Distributed computing, computing on demand
- Should do for computing the same as the Internet
did for information (UK PM, 2002)
22Supply side - network
- Needed at all scales
- World-wide
- Pan-European (GEANT2)
- Regional (SEEREN2, )
- National (NREN)
- Campus-wide (WAN)
- Building-wide (LAN)
- Remember it is end user to end user connection
that matters
23GÉANT2 Pan-European IP RE network
24GÉANT2 Global Connectivity
25Future development regional network
26Supply side - financing
- National funding (Ministries responsible for
research) - Lobby gvnmt. to commit to Lisbon targets
- Level of financing should be following an
increasing trend (as a of GDP) - Seek financing for clusters and network costs
- Bilateral projects and donations
- Regional initiatives
- Networking (HIPERB)
- Action Plan for RD in SEE
- EU funding
- FP6 IST priority, eInfrastructures GRIDs
- FP7
- CARDS
- Other international sources (NATO, )
- Donations from industry (HP, SUN, )
27Demand side - eScience
- Usage of computers in science
- Trivial text editing, elementary visualization,
elementary quadrature, special functions, ... - Nontrivial differential eq., large linear
systems, searching combinatorial spaces, symbolic
algebraic manipulations, statistical data
analysis, visualization, ... - Advanced stochastic simulations, risk
assessment in complex systems, dynamics of the
systems with many degrees of freedom, PDE
solving, calculation of partition
functions/functional integrals, ... - Why is the use of computation in science growing?
- Computational resources are more and more
powerful and available (Moores law) - Standard approaches are having problemsExperiment
s are more costly, theory more difficult - Emergence of new fields/consumers finance,
economy, biology, sociology - Emergence of new problems with unprecedented
storage and/or processor requirements
28Demand side - consumers
- Those who study
- Complex discrete time phenomena
- Nontrivial combinatorial spaces
- Classical many-body systems
- Stress/strain analysis, crack propagation
- Schrodinger eq diffusion eq.
- Navier-Stokes eq. and its derivates
- functional integrals
- Decision making processes w. incomplete
information -
- Who can deliver? Those with
- Adequate training in mathematics/informatics
- Stamina needed for complex problems solving
- Answer rocket scientists (natural sciences and
engineering)
29Scenario
User interface
stderr.txt
stdout.txt
stderr.txt
stdout.txt
publish state
Input sandbox
Output sandbox
A worker node is allocated by the local jobmanager
Logging and bookkeeping
- STD input stream is read from file
- STD out and err. streams are redirected into
files
stderr.txt
/bin/hostname
Computing Element
stdout.txt