Title: The fattree topology and its performance issues
1The fat-tree topology and its performance issues
- Different names for fat-tree
- folded Clos networks
- constant bisection bandwidth (CBB) networks
- Trees with multiple roots
- multi-stage networks
- Fat-tree is the de-facto topology in high speed
system area networks. - Almost all medium and large clusters (gt 100
ports) are connected with some kinds of fat-tree
topologies.
2Why is fat-tree so popular?
- Clusters with nodes connected by one centralized
switch have many desirable properties for
building large (scalable) systems. - Constant latency
- Bisection bandwidth scales linearly with the
number of nodes - Fat-tree approximate a centralized switch with a
large number of ports. (the Clos network)
3Fat-tree construction
- Fat-tree as it was original defined by C.E.
Leiserson is very flexible regarding bisection
bandwidth. - C.E. Leiserson, Fat-trees Universal Networks
for Hardware-Efficient Supercomputing, IEEE
Transactions on Computers, 34(10)892-901, Oct.
1985. - The ones used in the current system area networks
are mostly constant bisectional bandwidth (CBB)
networks. - We will introduce two sub-class of fat-trees.
4Some example fat-trees
5General idea in fat-tree construction
- A perfect fat-tree has the same functionality as
a crossbar. - Use smaller switches to approximate large
switches. - Connectivity is reduced, but the topology is
implementable - Constant bisection bandwidth is maintained by
having the same number of links in each level.
6FT(m, n) m-port n-tree
- Reference X. Lin, Y. Chung and T. Huang, A
Multiple LID Routing Scheme for Fat-tree Based
InfiniBand Networks, IEEE IPDPS 2004. - Fat-trees built with m-port switches all
internal nodes are of degree m - FT(m, n) is built over sub fat-trees (SUBFT) fat
trees with open up-links.
7SUB-fat-trees
- SUBFT(m, h) has (m/2)h open up links and
connects (m/2)h machines (leaves). - (m/2)(h-1) top level switches
- m/2 SUBFT(m, h-1)
8FT(m, h)
- (m/2)(h-1) top level switches
- m SUBFT(m, h-1)
9FT(m, h)
- Number of machines m(m/2)(h-1)
- Number of switches (2h-1) (m/2)(h-1)
- Typical value for m 24
- Typical value for h 2 or 3.
- FT(24, 3) 3456 ports, 720 switches
10FT(4, 3)
11Generalized fat-treeGFT(h, m, w)
- Reference S. R. Ohring, M. Ibel, S. K. Das, M.
J. Kumar, On Generalized Fat-tree, IEEE IPPS
1995. - FT(m, n) is a constant bisection bandwidth (CBB)
network - each node has m/2 children and m/2 parents.
- GFT(h, m, w)
- Each node has m children and w parents
- mw bisection bandwidth ratio
- mw 11 is sometimes called full bisectional
bandwidth (FBB).
12GFT(h, m, w)
- GFT(0, m, w) a single node
- GFT(h1, m, w)
- w(h1) top level switches (eaching having m
child) - m GFT(h, m, w)s
- Similar to how FT(m, n) is constructed.
13GFT(x, 2, 1)
GFT(2, 2, 1)
GFT(1, 2, 1)
GFT(0, 2, 1)
14GFT(x, 2, 2)
GFT(0, 2, 2)
GFT(1, 2, 2)
GFT(2, 2, 2)
15GFT(3, 2, 2)
GFT(2, 2, 2)
GFT(1,2,2)
How is this different from FT(4, 3)?
16GFT(2, 4, 4)
17GFT(2, 3, 3)
18GFT(2, 2, 3)
19GFT(2, 4, 2)
20Performance issues in fat-trees
- Clos network is non-blocking when ngt2m-1.
- 2-level fat-trees (e.g. GFT(1, 2, 4)) are
equivalent to Clos networks, thus the name folded
Clos. - Can 2-level fat-trees achieve non-blocking
communication?
21Can 2-level fat-trees achieve non-block
communications?
- Clos networks are non-blocking when
- The system knows all the current on-going
traffics - Needs a centralized controller.
- The source must be able to use any path to the
destination. - Needs to support a large number of paths.
- Are these conditions practical in large computer
clusters?
22Practical fat trees
- 2-level CBB networks or folded Clos (nm) are the
minimum required to achieve non-blocking
(rearrangeable non-blocking). - Network contention is possible due to the lack of
centralized controller. - Needs techniques to minimize the possibility of
network contention. - What kind of techniques can do this?
23Practical fat trees
- What kind of techniques can reduce contention?
- Routing spread traffics among all links
- Adaptive routing (Quadrics)
- Require multiple paths, avoid links currently
under use. - Limited applicability used in up links, but not
down links. - Source routing similar idea as adaptive routing,
but less flexibility. (Myrinet) - Deterministic routing worst performer, but
simple implementation. (InfiniBand) - Congestion control slow down when the network is
in trouble. - Reactive approach is this good for high speed
networks?
24Routing issue in fat trees
Can we compute routes that achieve non-blocking
Communication for any permutation?
25A case study for the current fat-tree
interconnection networks
- Reference T. Hoefler, T. Schneider, and A.
Lumsdaine Multistage Interconnection Networks
are not Crossbars Effects of static routing in
high performance networks, IEEE Cluster, 2008. - Many large scale fat-tree based networks have
been built. How are they doing?
26Performance metrics
- User perceived bisection bandwidth
- 4X DDR InfiniBand ? 20 Gbps between each pair.
- What happens when half of the machines send to
the other half simultaneously? - In a crossbar, all pairs should get 20Gbps!!
- How about fat-tree?
- Due to the routing constraints, the user
perceived bisection bandwidth should depend on
the permutation.
27User perceived bisection bandwidth on some systems
- Results obtained using simulation average of
many random permutations - Ranger (3908 nodes) 57.5
- Atlas (1142 nodes) 55.6
- Thunderbird (4390 nodes) 40.6
- 40 to 60 of a crossbars seem not too bad.
- But the results are the average case, not the
worst case.
28Other effects of network contenion
- Bandwidth varies with communication pattern.
- Performance prediction and modeling is not easy.
- Message latency is also affected.
29Conclusion
- Fat-trees can only approximate cross-bar.
- Are there better topologies than fat-trees under
practical constraints? - In the current fat-tree topology, what are the
best routing schemes with adaptive, source route,
and single path routing? - It is commonly believed that adaptive routing is
good for fat-trees, but is adaptive routing good
enough?
30References
- Fat-tree origins
- C.E. Leiserson, Fat-trees Universal Networks
for Hardware-Efficient Supercomputing, IEEE
Transactions on Computers, 34(10)892-901, Oct.
1985. - Fat-tree construction
- S. R. Ohring, M. Ibel, S. K. Das, M. J. Kumar,
On Generalized Fat-tree, IEEE IPPS 1995. - X. Lin, Y. Chung and T. Huang, A Multiple LID
Routing Scheme for Fat-tree Based InfiniBand
Networks, IEEE IPDPS 2004. - Fat-tree routing and performance issues
- T. Hoefler, T. Schneider, and A. Lumsdaine
Multistage Interconnection Networks are not
Crossbars Effects of static routing in high
performance networks, IEEE Cluster, 2008. - P. Geoffray and T. Hoefler. Adaptive Routing
Strategies for Modern High Performance Networks.
In 16th Annual IEEE Symposium on High Performance
Interconnects (HOTI 2008), pages 165-172, Aug.
2008.