Title: The Crosspoint Queued Switch
1The Crosspoint Queued Switch
Yossi Kanizo (Technion, Israel)
Joint work with Isaac Keslassy (Technion, Israel)
and David Hay (Politecnico di Torino, Italy)
2Typical Switch Architectures
Linecards
Linecards
Switch Fabric
Switch Fabric
Assumes Instantaneous Closed Loop
CICQ Combined Input and Crosspoint Queued
IQ Input Queued
3Single-Rack Router
- Instantaneous closed loop ? works in a single
rack - Problem multi-rack routers
4Current Router Architectures
Is the closed loop still instantaneous?
Source N. McKeown
5Time Trends
ns
6Hiding Propagation Delays
- Traditional solutions
- Increase time-slot ? poor switch performance
- Hide propagation delays using buffers ?
impractical amount of buffering - Proposed solution closed loop ? open loop
- Performance degradation vs. instantaneous closed
loop
7Outline
- CQ Open-loop switch architecture
- Performance Evaluation
- Analytical results
- Simulations
? CQ performance degradation is not significant
8Proposed ArchitectureThe Crosspoint-Queued (CQ)
Switch
Linecards
Switch Core
- No queues in the linecards
- Buffering only inside the fabric
- Independent output schedulers
- Drops with full buffers
10s of meters
9CQ Properties
- Open loop
- No communication overhead
- No linecard queues
- No linecard queue management
- Router on a chip
- Buffering and switch fabric on same chip
10Why not 10 years ago?
- No need single rack
- No technology SRAM density
- Moores law density doubling every 2.5 years
- Aggressive 128x128 CQ switch 4 cells of 64 bytes
per crosspoint ? 64 cells today - Conservative buffer requirements
- TCP Stanford model with smaller buffer needs
Appenzeller, Keslassy and McKeown 04
11Outline
- CQ Our open-loop switch architecture
- Performance Evaluation
- Analytical results
- Simulations
12100 Throughput as B?
8
- Throughput boundsOQ(2B-1) CQ(B) OQ(NB)
100 Throughput
100 Throughput
100 Throughput
Buffer size B, LQF scheduling algorithm
13Uniform Traffic, B1
- Uniform traffic model
- At each time-slot, at each of the N inputs
Bernoulli IID packet arrivals with probability
?. - Each packet is destined for one of the N outputs
uniformly at random - Theorem Under uniform traffic and B1, the
performance of the switch is independent of the
specific work-conserving scheduling algorithm - Intuition Symmetry
14Uniform Traffic, B1
- Theorem The throughput and waiting time of a CQ
switch, B1 is - Proof Based on Z-transform
q1-r/N
Goes to 100 as N goes to infinity
15Models for larger buffers
- Approximate Performance Analysis
- Model for exhaustive round-robin scheduling
- Based on modifications to polling system with
zero switch-over times - Model for random scheduling algorithm
- Show 100 throughput as N?8
16Trace-Driven Simulation
32x32 CQ switch with different buffer sizes (in
units of 64-byte packets)
Buffers of size 64 suffice to ensure 99
throughput for N32.
17Conclusions
- CQ is open loop ? allows multi-rack configuration
- CQ provides easy scheduling
- CQ is feasible to implement in a single chip
- CQ shows good performance in simulations
18Thank You