Title: EE384Y: Packet Switch Architectures
1EE384Y Packet Switch Architectures Part
II Scaling Crossbar Switches
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu http//www.stanford.
edu/nickm
2Outline
- Up until now, we have focused on high performance
packet switches with - A crossbar switching fabric,
- Input queues (and possibly output queues as
well), - Virtual output queues, and
- Centralized arbitration/scheduling algorithm.
- Today well talk about the implementation of the
crossbar switch fabric itself. How are they
built, how do they scale, and what limits their
capacity?
3Crossbar switchLimiting factors
- N2 crosspoints per chip, or N x N-to-1
multiplexors - Its not obvious how to build a crossbar from
multiple chips, - Capacity of I/Os per chip.
- State of the art About 300 pins each operating
at 3.125Gb/s 1Tb/s per chip. - About 1/3 to 1/2 of this capacity available in
practice because of overhead and speedup. - Crossbar chips today are limited by I/O
capacity.
4Scaling number of outputs Trying to build a
crossbar from multiple chips
Building Block
4 inputs
4 outputs
Eight inputs and eight outputs required!
5Scaling line-rate Bit-sliced parallelism
k
- Cell is striped across multiple identical
planes. - Crossbar switched bus.
- Scheduler makes same decision for all slices.
Linecard
8
7
6
5
4
Cell
Cell
Cell
3
2
1
Scheduler
6Scaling line-rate Time-sliced parallelism
k
- Cell carried by one plane takes k cell times.
- Scheduler is unchanged.
- Scheduler makes decision for each slice in turn.
Linecard
Cell
8
7
6
5
4
Cell
3
Cell
2
Cell
1
Cell
Cell
Scheduler
7Scaling a crossbar
- Conclusion scaling the capacity is relatively
straightforward (although the chip count and
power may become a problem). - What if we want to increase the number of ports?
- Can we build a crossbar-equivalent from multiple
stages of smaller crossbars? - If so, what properties should it have?
83-stage Clos Network
m x m
1
n x k
k x n
1
1
1
2
1
n
n
2
2
N
m
m
N
N n x m k gt n
k
9With k n, is a Clos network non-blocking like a
crossbar?
Consider the example scheduler chooses to
match (1,1), (2,4), (3,3), (4,2)
10With k n is a Clos network non-blocking like a
crossbar?
Consider the example scheduler chooses to
match (1,1), (2,2), (4,4), (5,3),
By rearranging matches, the connections could be
added. Q Is this Clos network rearrangeably
non-blocking?
11With k n a Clos network is rearrangeably
non-blocking
- Routing matches is equivalent to edge-coloring in
a bipartite multigraph. - Colors correspond to middle-stage switches.
(1,1), (2,4), (3,3), (4,2)
No two edges at a vertex may be colored the same.
Each vertex corresponds to an n x k or k x n
switch.
Vizing 64 a D-degree bipartite graph can be
colored in D colors. Therefore, if k n, a
3-stage Clos network is rearrangeably
non-blocking (and can therefore perform any
permutation).
12How complex is the rearrangement?
- Method 1 Find a maximum size bipartite matching
for each of D colors in turn, O(DN2.5). - Method 2 Partition graph into Euler sets,
O(N.logD) Cole et al. 00
13Edge-Coloring using Euler sets
- Make the graph regular Modify the graph so that
every vertex has the same degree, D. combine
vertices and add edges O(E). - For D2i, perform i Euler splits and 1-color
each resulting graph. This is logD operations,
each of O(E).
14Euler partition of a graph
- Euler partiton of graph G
- Each odd degree vertex is at the end of one open
path. - Each even degree vertex is at the end of no open
path.
15Euler split of a graph
G
G1
G2
- Euler split of G into G1 and G2
- Scan each path in an Euler partition.
- Place each alternate edge into G1 and G2
16Edge-Coloring using Euler sets
- Make the graph regular Modify the graph so that
every vertex has the same degree, D. combine
vertices and add edges O(E). - For D2i, perform i Euler splits and 1-color
each resulting graph. This is logD operations,
each of O(E).
17Implementation
Scheduler
Route connections
Request graph
Permutation
Paths
18Implementation
- Pros
- A rearrangeably non-blocking switch can perform
any permutation - A cell switch is time-slotted, so all connections
are rearranged every time slot anyway - Cons
- Rearrangement algorithms are complex (in addition
to the scheduler) - Can we eliminate the need to rearrange?
19Strictly non-blocking Clos Network
Clos Theorem If k gt 2n 1, then a new
connection can always be added without
rearrangement.
20m x m
M1
n x k
k x n
1
1
I1
O1
M2
n
n
I2
O2
Im
Om
N
N
N n x m k gt n
Mk
21Clos Theorem
x
Ia
Ob
n 1 alreadyin use at inputand output.
x n
- Consider adding the n-th connection between1st
stage Ia and 3rd stage Ob. - We need to ensure that there is always
somecenter-stage M available. - If k gt (n 1) (n 1) , then there is always
an M available. i.e. we need k gt 2n 1.
22Scaling Crossbars Summary
- Scaling capacity through parallelism (bit-slicing
and time-slicing) is straightforward. - Scaling number of ports is harder
- Clos network
- Rearrangeably non-blocking with k n, but
routing is complicated, - Strictly non-blocking with k gt 2n 1, so
routing is simple. But requires more bisection
bandwidth.