Title: NORA/Clusters
1NORA/Clusters
- AMANO, Hideharu
- Textbook pp.140-147
2NORA (No Remote Access Memory Model)
- No hardware shared memory
- Data exchange is done by messages (or packets)
- Dedicated synchronization mechanism is provided.
- High peak performance
- Message passing library (MPI,PVM) is provided.
3Message passing(Blocking)
4Message passing(Non-blocking)
5PVM (Parallel Virtual Machine)
- A buffer is provided for a sender.
- Both blocking/non-blocking receive is provided.
- Barrier synchronization
6MPI(Message Passing Interface)
- Superset of the PVM for 1 to 1 communication.
- Group communication
- Various communication is supported.
- Error check with communication tag.
- Next week, a homework based on MPI with cluster
will be presented.
7Shared memory model vs. Message passing model
- Benefits
- Distributed OS is easy to implement.
- Automatic parallelize compiler.
- Message passing like Smalltalk requires shared
memory. - GHC ? KL/I
- Message passing
- Formal verification is easy (Blocking)
- No-side effect (Shared variable is side effect
itself) - Small cost
8Multicomputers vs. Clusters
- Multicomputers
- Dedicated hardware (CPU, network)
- High performance but expensive
- Hitachis SR8000, Cray T3E, etc.
- WS/PC Clusters
- Using standard CPU boards
- High Performance/Cost
- Standard bus often forms a bottleneck
- Beowluf Clusters
- Standard CPU boards, Standard components
- LANTCP/IP
- Free-software
- A cluster with Standard System Area Network(SAN)
like Myrinet is often called Beowulf Cluster
9Beowluf Clusters (RWCP, using Myrinet)
10Networks for NORA machines/Clusters
- Direct or Non-symmetric indirect networks
- Nodes are connected with links.
- Locality of communication can be used.
- Extension to large size is easy.
11Basic direct networks
Linear
Central concentration
Ring
Tree
??x
Complete connection
Mesh
12Metrics of Direct interconnection network(D and d)
- DiameterD
- Number of hops between most distant two nodes
through the minimal path - degree d
- The largest number of links per a node.
- D represents performance and d represents cost
- Recent trends
- Performance Throughput
- Cost The number of long links
13Diameter
2(n-1)
14Other requirements
- UniformityEvery node/link has the same
configuration. - Extendability The size can be easily extended.
- Fault Torelance A single fault on link or node
does not cause a fatal damage on the total
network. - Embeddability Emulating other networks
- Bisection Bandwidth
15bi-section bandwidth
The total amount of data traffic between two
halves of the network.
16Hypercube
0001
0000
0010
0011
0100
0111
0101
0110
1000
1001
1010
1011
1100
1101
1110
1111
17Routing on hypercube
0001
0000
0010
0101?1100
0011
Different bits
0100
0111
0101
0110
1000
1001
1010
1011
1100
1101
1110
1111
18The diameter of hypercube
0001
0000
0010
0101?1010
0011
All bits are different ? the largest distance
0100
0111
0101
0110
1000
1001
1010
1011
1100
1101
1110
1111
19Characteristics of hypercube
- DdlogN
- High throughput, Bisection Bandwidth
- Enbeddability for various networks
- Satisfies all fundamental characteristics of
direct networks(Extendability is quistionable) - Most of the first generation of NORA machines are
hypercubes(iPSC,NCUBE,FPS-T)
20Problems of hypercube
- Large number of links
- Large number of distant links
- High bandwidth links are difficult for a high
performance processors. - Small D does not contribute performance because
of innovation of packet transfer. - Programming is difficult ? Hypercubes dilemma
21Is hypercube extendable?
- Yes(Theoretical viewpoint)
- The throughput increases relational to the system
size. - No(Practical viewpoint)
- The system size is limited by the link of node.
22Hypercubes dilemma
- Programming considering the topology is difficult
unlike 2-D,3-D mesh/torus - Programming for random communication network
cannot make the use of locality of communication.
- 2-D/3-D mesh/torus
- Killer applications fit to the topology
- Partial differential equation, Image processing,
- Simple mapping stratedies
- Frequent communicating processes should be
- Assigned to neighboring nodes
23k-ary n-cube
- Generalized mesh/torus
- K-ary n digits number is assigned into each node
- For each dimension (digit), links are provided to
nodes whose number are the same except the
dimension in order. - Rap-around links (n-1?0) form a torus, otherwise
mesh.
24k-ary n-cube
25k-ary n-cube
3-ary 1-cube
3-ary 2-cube
263-ary 4-cube
0
27k-ary n-cube
400
300
200
100
000
001
002
003
004
010
014
024
020
444
030
034
040
044
285-ary 4-cube
3
29Properties of k-ary n-cube
- A class of networks which has Linear, Ring
2-D/3-D mesh/torus and Hypercube(binary n-cube)
as its member. - Small d2n but large D(O(k ))
- Large number of neighboring links
- k-ary n-cube has been a main stream of NORA
networks. Recently, small-n large-k networks are
trendy.
1/n
30Exercise
- Calculate Diameter (D) and degree (d) of the6-ary
4-cube (mesh-type).