Title: Using LoadBalancing to Build HighPerformance Routers PhD Oral Exam
1EE384Y Packet Switch Architectures Part
II Load-balanced Switches
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu http//www.stanford.
edu/nickm
2The Arbitration Problem
- A packet switch fabric is reconfigured for every
packet transfer. - For example, at 160Gb/s, a new IP packet can
arrive every 2ns. - The configuration is picked to maximize
throughput and not waste capacity. - Known algorithms are probably too slow.
3Approach
- We know that a crossbar with VOQs, and uniform
Bernoulli i.i.d. arrivals, gives 100 throughput
for the following scheduling algorithms - Pick a permutation uar from all permutations.
- Pick a permutation uar from the set of size N in
which each input-output pair (i,j) are connected
exactly once in the set. - From the same set as above, repeatedly cycle
through a fixed sequence of N different
permutations. - Can we make non-uniform, bursty traffic uniform
enough for the above to hold?
4Design Example
- Stanford Optics in Routers project
- http//yuba.stanford.edu/or/
- Some challenging numbers
- 100Tb/s
- 160Gb/s linecards
- 640 linecards
- Goals
- Scale to High Linecard Speeds (160Gb/s)
- No Centralized Scheduler
- Optical Switch Fabric
- Low Packet-Processing Complexity
- Scale to High Number of Linecards (640)
- Provide Performance Guarantees
- 100 Throughput Guarantee
- No Packet Reordering
5Outline
- Basic idea of load-balancing
- Packet mis-sequencing
- An optical switch fabric
- Scaling number of linecards
6100 Throughput in a Mesh Fabric
R
In
R
In
In
7If Traffic Is Uniform
R
In
R
In
R
In
8Real Traffic is Not Uniform
9Load-Balanced Switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Forwarding stage
100 throughput for weakly mixing traffic
(Valiant, C.-S. Chang)
10Load-Balanced Switch
R
R
In
R/N
R/N
1
2
3
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
11Load-Balanced Switch
R
R
In
R/N
R/N
R/N
R/N
1
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
2
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
3
12Intuition 100 Throughput
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
R/N
In
R/N
R/N
R/N
R/N
R/N
R
R/N
R
R/N
R/N
In
R/N
R/N
- Arrivals to second mesh
- Capacity of second mesh
- Second mesh arrival rate lt service rate
C.-S. Chang
13Another way of thinking about it
Internal Inputs
External Inputs
External Outputs
Load-balancing cyclic shift
Switching cyclic shift
- First stage load-balances incoming packets
- Second stage is a cyclic shift
14Load-Balanced Switch
External Outputs
Internal Inputs
External Inputs
Load-balancing cyclic shift
Switching cyclic shift
15(No Transcript)
16Outline of Changs Proof
17Outline
- Basic idea of load-balancing
- Packet mis-sequencing
- An optical switch fabric
- Scaling number of linecards
18Packet Reordering
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
19Bounding Delay Difference Between Middle Ports
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
20UFS (Uniform Frame Spreading)
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
21FOFF (Full Ordered Frames First)
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
22FOFF (Full Ordered Frames First)
1
2
3
4
1
2
- Input Algorithm
- N FIFO queues corresponding to the N output flows
- Spread each flow uniformly if last packet was
sent to middle port k, send next to k1. - Every N time-slots, pick a flow - If full frame
exists, pick it and spread like UFS - Else if
all frames are partial, pick one in round-robin
order and send it
23Bounding Reordering
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
24FOFF
Output
1
1
1
4
2
2
3
3
3
- Output properties
- N FIFO queues corresponding to the N middle ports
- Buffer size less than N2 packets
- If there are N2 packets, one of the head-of-line
packets is in order
25FOFF Properties
- Property 1 FOFF maintains packet order.
- Property 2 FOFF has O(1) complexity.
- Property 3 Congestion buffers operate
independently. - Property 4 FOFF maintains an average packet
delay within constant from ideal output-queued
router. - Corollary FOFF has 100 throughput for any
adversarial traffic.
26Output-Queued Router
R
In
R
In
In
27Outline
- Basic idea of load-balancing
- Packet mis-sequencing
- An optical switch fabric
- Scaling number of linecards
28From Two Meshes to One Mesh
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
R/N
R/N
29From Two Meshes to One Mesh
R
First mesh
Second mesh
30From Two Meshes to One Mesh
R
Combined mesh
31Many Fabric Options
N channels each at rate 2R/N
Any spreading device
Options Space Full uniform mesh Time
Round-robin crossbar Wavelength Static WDM
32AWGR (Arrayed Waveguide Grating Router) A
Passive Optical Component
1
l
Linecard 1
Linecard 1
1
Linecard 2
1
l
Linecard 2
2
NxN AWGR
1
l
Linecard N
Linecard N
N
- Wavelength i on input port j goes to output port
(ij-1) mod N - Can shuffle information from different inputs
33Static WDM Switching Packaging
AWGR Passive andAlmost ZeroPower
A
B
C
D
34Outline
- Basic idea of load-balancing
- Packet mis-sequencing
- An optical switch fabric
- Scaling number of linecards
35Scaling Problem
- For N lt 64, an AWGR is a good solution.
- We want N 640.
- Need to decompose.
36A Different Representation of the Mesh
Mesh
37A Different Representation of the Mesh
2R/N
38Example N8
2R/8
39When N is Too LargeDecompose into groups (or
racks)
2R
2R
4R
4R/4
4R
2R
2R
40When N is Too LargeDecompose into groups (or
racks)
Group/Rack 1
Group/Rack 1
2R
2R
2RL/G
2R
2R
2RL
2RL
2R
2R
2RL/G
Group/Rack G
Group/Rack G
2RL/G
2R
2R
2R
2R
2RL
2RL
2R
2R
2RL/G
41Outline
- Basic idea of load-balancing
- Packet mis-sequencing
- An optical switch fabric
- Scaling number of linecards