Title: Scaling Internet Routers Using Optics
1Scaling Internet Routers Using Optics
- Isaac Keslassy, Shang-Tse Chuang, Kyoungsik Yu,
David Miller, Mark Horowitz, Olav Solgaard, Nick
McKeown - Department of Electrical Engineering
- Stanford University
2Backbone router capacity
1Tb/s
100Gb/s
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
3Backbone router capacity
1Tb/s
100Gb/s
Traffic 2x every year
10Gb/s
Router capacity per rack 2x every 18 months
1Gb/s
4Extrapolating
100Tb/s
2015 16x disparity
Traffic 2x every year
Router capacity 2x every 18 months
1Tb/s
5Consequence
- Unless something changes, operators will need
- 16 times as many routers, consuming
- 16 times as much space,
- 256 times the power,
- Costing 100 times as much.
- Actually need more than that
6Stanford 100Tb/s Internet Router
- Goal Study scalability
- Challenging, but not impossible
- Two orders of magnitude faster than deployed
routers - We will build components to show feasibility
7Throughput Guarantees
- Operators increasingly demand throughput
guarantees - To maximize use of expensive long-haul links
- For predictability and planning
- Despite lots of effort and theory, no commercial
router today has a throughput guarantee.
8Requirements of our router
- 100Tb/s capacity
- 100 throughput for all traffic
- Must work with any set of linecards present
- Use technology available within 3 years
- Conform to RFC 1812
9What limits router capacity?
Approximate power consumption per rack
Power density is the limiting factor today
10Trend Multi-rack routersReduces power density
11Juniper TX8/T640
Alcatel 7670 RSP
TX8
Avici TSR
Chiaro
12Limits to scaling
- Overall power is dominated by linecards
- Sheer number
- Optical WAN components
- Per packet processing and buffering.
- But power density is dominated by switch fabric
13Trend Multi-rack routersReduces power density
14Multi-rack routers
Switch fabric
Linecard
In
WAN
Out
In
WAN
Out
15Question
- Instead, can we use an optical fabric at 100Tb/s
with 100 throughput? - Conventional answer No.
- Need to reconfigure switch too often
- 100 throughput requires complex electronic
scheduler.
16Outline
- How to guarantee 100 throughput?
- How to eliminate the scheduler?
- How to use an optical switch fabric?
- How to make it scalable and practical?
17100 Throughput
In
In
In
18If traffic is uniform
R
In
R
In
R
In
19Real traffic is not uniform
20Two-stage load-balancing switch
R
R
R
R/N
R/N
Out
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
In
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
R/N
R/N
In
R/N
R/N
Load-balancing stage
Switching stage
21R
R
In
R/N
R/N
3
3
3
1
R/N
R/N
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
22R
R
In
R/N
R/N
1
R/N
R/N
3
R/N
R/N
R/N
R/N
R
R
In
2
R/N
R/N
3
R/N
R/N
R/N
R/N
R/N
R
R
R/N
In
3
R/N
R/N
3
23Changs load-balanced switchGood properties
- 100 throughput for broad class of traffic
- No scheduler needed a Scalable
24Changs load-balanced switchBad properties
- Packet mis-sequencing
- Pathological traffic patterns a Throughput
1/N-th of capacity - Uses two switch fabrics a Hard to package
- Doesnt work with some linecards missinga
Impractical
25Single Mesh Switch
2R/N
In
2R/N
2R/N
2R/N
In
2R/N
2R/N
2R/N
2R/N
In
2R/N
26Packaging
R
In
R
In
R
In
27Many fabric options
N channels each at rate 2R/N
Any permutation network
Options Space Full uniform mesh Time
Round-robin crossbar Wavelength Static WDM
28Static WDM switching
Array Waveguide Router (AWGR) Passive
andAlmost ZeroPower
A
B
C
D
29Linecard dataflow
In
l1
l1, l2,.., lN
R
R
WDM
lN
1
3
1
1
1
1
2
3
4
1
1
1
1
30Problems of scale
- For N lt 64, WDM is a good solution.
- We want N 640.
- Need to decompose.
31Decomposing the mesh
2R/8
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
32Decomposing the mesh
2R/8
2R/8
1
1
2R/4
2R/8
2R/8
2
2
3
3
4
4
5
5
6
6
7
7
8
8
33When N is too largeDecompose into groups (or
racks)
Group/Rack 1
2R
Array Waveguide Router (AWGR)
l1, l2, , lG
2R
1
2R
Group/Rack G
2R
l1, l2, , lG
2R
G
2R
34When a linecard is missing
- Each linecard spreads its data equally over every
other linecard. - Problem If one is missing, or failed, then the
spreading no longer works.
35When a linecard fails
2R/3
In
2R/3
2R/3
- Solution
- Move light beams
- Replace AWGR with MEMS switch.
- Reconfigure when linecard added, removed or
fails. - Finer channel granularity
- Multiple paths.
2R/3
In
2R/3
2R/3
2R/3
2R/3
In
2R/3
36SolutionUse transparent MEMS switches
Group/Rack 1
MEMS switches reconfigured only when linecard
added, removed or fails.
2R
2R
2R
Group/Rack G40
2R
2R
2R
Theorems 1. Require LG-1 MEMS switches 2.
Polynomial time reconfiguration algorithm
37Challenges
In
l1
Address Lookup
l1, l2,.., lG
R
R
WDM
lG
l1, l2,.., lG
R
l1, l2,.., lG
1
1
1
2
2
R160Gb/s
3
4
Out
l1
R
l1, l2,.., lG
R
WDM
lG
38What we are building
250ms DRAM
320Gb/s
Chip 1 160Gb/s Packet Buffer
Buffer Manager 90nm ASIC
160Gb/s
160Gb/s
Optical Detector
Optical Modulator
39100Tb/s Load-Balanced Router
L 16 160Gb/s linecards