Title: Grids From Computing to Data Management
1GridsFrom Computing to Data Management
- Jean-Marc Pierson
- Lionel Brunie
- INSA Lyon, dec 04
2Outline
- a very short introduction to Grids
- a brief introduction to parallelism
- a not so short introduction to Grids
- data management in Grids
3Grid concepts an analogy
Electric power distribution the electric
network and high voltage
4Grids concepts
Computer power distribution Internet network
and high performance (parallelism and
distribution)
5Parallelism an introduction
- Grids dates back only 1996
- Parallelism is older ! (first classification in
1972) - Motivations
- need more computing power (weather forecast,
atomic simulation, genomics) - need more storage capacity (petabytes and more)
- in a word improve performance ! 3 ways ...
- Work harder --gt Use faster hardware
- Work smarter --gt Optimize algorithms
- Get help --gt Use more computers !
6Parallelism the old classification
- Flynn (1972)
- Parallel architectures classified by the number
of instructions (single/multiple) performed and
the number of data (single/multiple) treated at
same time - SISD Single Instruction, Single Data
- SIMD Single Instruction, Multiple Data
- MISD Multiple Instructions, Single Data
- MIMD Multiple Instructions, Multiple Data
7SIMD architectures
- in decline since 97 (disappeared from market
place) - concept same instruction performed on several
CPU (as much as 16384) on different data - data are treated in parallel
- subclass vectorprocessors
- act on arrays of similar data using specialized
CPU used in computer intensive physics
8MIMD architectures
- different instructions are performed in parallel
on different data - divide and conquer many subtasks in parallel
to shorten global execution time - large heterogeneity of systems
9Another taxonomy
- based on how memories and processors
interconnect - SMP Symmetric Multiprocessors
- MPP Massively Parallel Processors
- Constellations
- Clusters
- Distributed systems
10Symmetric Multi-Processors (1/2)
- Small number of identical processors (2-64)
- Share-everything architecture
- single memory (shared memory architecture)
- single I/O
- single OS
- equal access to resources
HD
Memory
network
CPU
CPU
CPU
11Symmetric Multi-Processors (2/2)
- Pro
- easy to program only one address space to
exchange data (but programmer must take care of
synchronization in memory access critical
section) - Cons
- poor scalability when the number of processors
increase, the cost to transfer data becomes too
high more CPUs more access memory by the
network more need in memory bandwidth ! - Direct transfer from proc. to proc. (-gtMPP)
- Different interconnection schema (full
impossible !, growing in O(n2) when nb of procs
increases by O(n)) bus, crossbar, multistage
crossbar, ...
12Massively Parallel Processors (1/2)
- Several hundred nodes with a high speed
interconnection network/switch - A share-nothing architecture
- each node owns a memory (distributed memory), one
or more processors, each runs an OS copy
CPU
Memory
network
CPU
CPU
Memory
Memory
13Massively Parallel Processors (2/2)
- Pros
- good scalability
- Cons
- communication between nodes longer than in shared
memory improve interconnection schema
hypercube, (2D or 3D) torus, fat-tree, multistage
crossbars - harder to program
- data and/or tasks have to be explicitly
distributed to nodes - remote procedure calls (RPC, JavaRMI)
- message passing between nodes (PVM, MPI),
synchronous or asynchronous communications - DSM Distributed Shared Memory a virtual
memory - upgrade processors and/or communication ?
14Constellations
- a small number of processors (up to 16) clustered
in SMP nodes (fast connection) - SMPs are connected through a less costly network
with poorer performance - With DSM, memory may be addressed globally
each CPU has a global memory view, memory and
cache coherence is guaranteed (ccNuma)
Periph.
Interconnection network
15Clusters
- a collection of workstations (PC for instance)
interconnected through high speed network, acting
as a MPP/DSM with network RAM and software RAID
(redundant storage, // IO) - clusters specialized version of NOW Network
Of Workstation - Pros
- low cost
- standard components
- take advantage of unused computing power
16Distributed systems
- interconnection of independent computers
- each node runs its own OS
- each node might be any of SMPs, MPPs,
constellations, clusters, individual computer - the heart of the Grid !
- A distributed system is a collection of
independent computers that appear to the users of
the system as a single computer Distributed
Operating System. A. Tanenbaum, Prentice Hall,
1994
17Where are we today (nov 22, 2004) ?
- a source for efficient and up-to-date
information www.top500.org - the 500 best architectures !
- we head towards 100 Tflops
- 1 Flops 1 floating point operation per second
- 1 TeraFlop 1000 GigaFlops 100 000 MegaFlops
1 000 000 000 000 flops one thousand billion
operations per second
18Today's bests
- comparison on a similar matrix maths test
(Linpack) Axb - Rank Tflops Constructor Nb of procs
- 1 70.72 (USA) IBM BlueGene/L / DOE
32768 - 2 51.87 (USA) SGI Columbia / Nasa
10160 - 3 35.86 (Japon) NEC EarthSim 5120
- 4 20.53 (Espagne) IBM MareNostrum
3564 - 5 19.94 (USA) California Digital
Corporation 4096 - 41 3.98 (France) HP Alpha Server / CEA
2560
19NEC earth simulator
Single stage crossbar 2700 km of cables
- a MIMD with Distributed Memory
700 TB disk space 1.6 PB mass storage area 4
tennis court, 3 floors
20How it grows ?
- in 1993 (11 years ago!)
- n1 59.7 GFlops
- n500 0.4 Gflops
- Sum 1.17 TFlops
- in 2004 (yesterday?)
- n1 70 TFlops (x1118)
- n500 850 Gflops (x2125)
- Sum 11274 Tflops and 408629 processors
21(No Transcript)
22(No Transcript)
23Problems of the parallelism
- Two models of parallelism
- driven by data flow how to distribute data ?
- driven by control flow how to distribute tasks
? - Scheduling
- which task to execute, on which data, when ?
- how to insure highest compute time (overlap
communication/computation?) ? - Communication
- using shared memory ?
- using explicit node to node communication ?
- what about the network ?
- Concurrent access
- to memory (in shared memory systems)
- to input/output (parallel Input/Output)
24The performance ? Ideally grows linearly
- Speed-up
- if TS is the best time to treat a problem in
sequential, its time should be TPTS/P with P
processors ! - Speedup TS/TP
- limited (Amdhal law) any program has a
sequential and a parallel part TSFT//, thus
the speedup is limited S (FT//)/(FT///P)lt1/F
- Scale-up
- if TPS is the time to treat a problem of size S
with P processors, then TPS should also be the
time to treat a problem of size nS with nP
processors
25Network performance analysis
- scalability can the network be extended ?
- limited wire length, physical problems
- fault tolerance if one node is down ?
- for instance in an hypercube
- multiple access to media ?
- inter-blocking ?
- The metrics
- latency time to connect
- bandwidth measured in MB/s
26Tools/environment for parallelism (1/2)
- Communication between nodes
- By global memory ! (if possible, plain or
virtual) - Otherwise
- low-level communication sockets
- s socket(AF_INET, SOCK_STREAM, 0 )
- mid-level communication library (PVM, MPI)
- info pvm_initsend( PvmDataDefault )
- info pvm_pkint( array, 10, 1 )
- info pvm_send( tid, 3 )
- remote service/object call (RPC, RMI, CORBA)
- service runs on distant node, only its name and
parameters (in, out) have to be known
27Tools/environment for parallelism (2/2)
- Programming tools
- threads small processes
- data parallel language (for DM archi.)
- HPF (High Performance Fortran)
- say how data (arrays) are placed, the system will
infer the best placement of computation (to
minimize total computation time (e.g. further
communications) - task parallel language (for SM archi.)
- OpenMP compiler directives and library
routines based on threads. The parallel program
is close to sequential it is a step by step
transform - Parallel loop directives (PARALLEL DO)
- Task parallel constructs (PARALLEL SECTIONS)
- PRIVATE and SHARED data declarations
28Parallel databases motivations
- Necessity
- Information systems increased in size !
- Transactional load increased in volume !
- Memory and IO bottleneck were worse and worse
- Query in increased in complexity (multi sources,
multi format txt-img-vid) - Price
- the ratio "price over performance" is
continuously decreasing (see clusters) - New applications
- data mining (genomics, health data)
- decision support (data warehouse)
- management of complex hybrid data
29Target application example (1/2) Video servers
- Huge volume of raw data
- bandwidth 150 Mb/s 1h - 70 GB
- bandwidth TVHD 863 Mb/s 1h 362 GB
- 1h MPEG2 2 GB
- NEED of Parallel I/O
- Highly structured, complex and voluminous
metadata (descriptors) - NEED large memory !
30Target application example (2/2) Medical image
databases
- images produced by PACS (Picture Archiving
Communication Systems) - 1 year of production 8 TB of data
- Heterogeneous data (multiple imaging devices)
- Heterogeneous queries
- Complex manipulations (multimodal image fusion,
features extractions) - NEED CPUs, memory, IO demands !
31Other motivation the theoric problems !
- query optimization
- execution strategy
- load balancing
32Intrinsic limitations
- Startup time
- Contentions
- concurrent accesses to shared resources
- sources of contention
- architecture
- data partitioning
- communication management
- execution plan
- Load imbalance
- response time slowest process
- NEED to balance data, IO, computations, comm.
33Shared memory archi. and databases
- Pros
- data transparently accessible
- easier load balancing
- fast access to data
- Cons
- scalability
- availability
- memory and IO contentions
34Shared disks archi. and databases
- Pros
- no memory contentions
- easy code migration from uni-processor server
- Cons
- cache consistence management
- IO bottleneck
35Share-nothing archi. and databases
- Pros
- cost
- scalability
- availability
- Cons
- data partitioning
- communication management
- load balancing
- complexity of optimization process
36Communication high performance networks (1/2)
- High bandwidth Gb/s
- Low latency µs
- Thanks to
- point to point
- switch based topology
- custom VLSI (Myrinet)
- network protocols (ATM)
- kernel bypassing (VIA)
- net technology (optical fiber)
37Communication high performance networks (2/2)
- Opportunity for parallel databases
- WAN connecting people to archives (VoD, CDN)
- LAN local network as a parallel machine
- NOW are used
- as virtual super-servers
- ex one Oracle server some read-only databases
on idle workstations (hot sub-base) - as virtual parallel machines
- a different database on several machines (in
hospital, one for citology, one for MRI, one for
radiology, ) - SAN on PoPC (Piles of PC), clusters
- low cost parallel DBMS
38Bibliography / Webography
- G.C Fox, R.D William and P.C Messina
- "Parallel Computing Works !"
- Morgan Kaufmann publisher, 1994, ISBN
1-55860-253-4 - M. Cosnard and D Trystram
- "Parallel Algorithms and Architectures"
- Thomson Learning publisher, 1994, ISBN
1-85032-125-6 - M. Gengler, S. Ubéda and F. Desprez,
- "Initiation au parallélisme concepts,
architectures et algorithmes" - Masson, 1995, ISBN 2-225-85014-3
- Parallelism www.ens-lyon.fr/desprez/SCHEDULE/tut
orials.html www.buyya.com/cluster - Grids www.lri.fr/fci/Hammamet/Cosnard-Hammamet-
9-4-02.ppt - TOP 500 www.top500.org PVMwww.csm.ornl.gov/pvm
- OpenMP www.openmp.org HPF www.crpc.rice.edu/H
PFF