Packet Switch Architecture and Buffered Crossbar Switches - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Packet Switch Architecture and Buffered Crossbar Switches

Description:

Interface: keyboards, displays, ... Fully designed, in Verilog. Core only, no pads & transceivers. Fully verified: Verilog versus C performance simulator ... – PowerPoint PPT presentation

Number of Views:280

Avg rating:3.0/5.0

Slides: 23

Provided by: pas50

Category:

more less

Transcript and Presenter's Notes

Title: Packet Switch Architecture and Buffered Crossbar Switches

1
Packet Switch Architecture and Buffered Crossbar
Switches

Manolis Katevenis
FORTH and Univ. of Crete, Greece
http//archvlsi.ics.forth.gr/kateveni

2
Ubiquitous Interconnects

Information Technology Infrastructure
Compute processors, specialized compute engines
Store memories, disks, etc.
Interface keyboards, displays, sensors,
actuators, etc.
Communicate interconnect all above together!
from on-chip to cross-continent range
Packet Switches the building block for
high-performance interconnects This Talk
brief background review Packet Switch
Architecture
recent research at FORTH Buffered Crossbars

3
Packet Switching Unscheduled Arrivals
4
Buffer Memory Architectures
(1) Output Queueing the reference architecture

ideal performance, but excessive cost
N(N1) total memory throughput for an NN switch

5
(2) Input Queueing the usual architecture

N2 total memory throughput
need to solve the crossbar scheduling problem

6
Crossbar Scheduling (1 of 3)
7
Crossbar Scheduling (2 of 3)
8
Crossbar Scheduling (3 of 3)
9
(3) Combined Input-Output Queueing (CIOQ) the
practical architecture
10
Part 2 Buffered Crossbars _at_ FORTH

Small buffers inside the crossbar (or switching
fabric)
Large buffers at the inputs (as before)
Backpressure to keep the small buffers from
overflowing
simpler (distributed) scheduling, QoS capable
(WRR)
better scheduling efficiency
directly operates with variable-size packets,
w/o SAR
NO internal speedup needed
lower power consumption, lower cost, or
external lines as fast as internal core
NO output queues need

11
Distributed Scheduling in Buffered Crossbars

inputs decide independently
where to send to, subject to space
availability
outputs decide independently
where to feed from, subject to data availability

12
No Speedup needed to approach Output Queuing

Uniform destinations
Internet-style synthetic workload 40-1500 byte
packet sizes
Unbuffered crossbar w. SAR one-iteration iSLIP,
64-byte segments

13
Saturation Throughput under Unbalanced Traffic

Poisson arrivals, Pareto sizes (40-1500)
For iSLIP, packet sizes are multiples of 64 B (?
no SAR overhead)

14
A VPS Buffered Crossbar Chip Design

32x32 ports, 300 Gbps aggregate throughput
2 KBytes / crosspoint buffer x 1024 crosspoints
Variable-size packets (multiples of 4 Bytes)
32-bit datapaths
Cut-through at the crosspoints
Fully designed, in Verilog
Core only, no pads transceivers
Fully verified Verilog versus C performance
simulator
Crosspoint logic 100 FF 25 gates
(simplicity!)

15
Chip Design Synthesis, Placement Routing

32x32 ports, 300 Gbps
Synthesized Synopsys
Placed routed Cadence Encounter, 0.18 µm UMC
? Clock frequency 300 MHz _at_ 0.18 µm
(operates at maximum SRAM clock frequency)
? Core Power 6 Watt typical _at_ 0.18 µm
? Core Area 420 mm² _at_ 0.18 µm, or 200 mm² _at_ 0.13
µm
Conclusion
0.18 µm 24x24 ports (or 10x10 ports w. Jumbo
frames)
0.13 µm 32x32 ports _at_ 10 Gbps/port
0.09 µm higher port counts and line rates
achievable

16
Chip Core Layout
17
Core Area, Power Allocation
18
Optimizing for Buffer Memory Technologies

Crosspoint buffers are too expensive when maximum
packet size is large (1.5 10 Kbytes)
DRAM on ingress line card operates efficiently
only on fixed-size blocks

19
Variable-Size Multipacket Segments
Fixed size segments (cells)
Variable size segments
Buffered Crossbars can operate
on variable size units
Pack multiple packets
into each segment

Fixed size cells induce heavy padding overheads

Variable size segments small buffers, no
padding

Variable size multipacket segments

Encapsulating multiple packets into each
segment

- reduces overhead of internal headers
- provides better performance with smaller
xpoint buffers - well suited to DRAM
buffers on ingress line cards
20
SRAM DRAM Queueingusing Variable-Size
Multipacket Segments
21
Multipacket Segments in CICOQ vs. CIOQ
Uniform synthetic workload including jumbo
frames Max Segment size 512 B Crosspoint
buffer size 512 B iSLIP with 5
iterations Switch size 32

CICOQ delay curve is translated by the
reassembly delay relative to OQ
CIOQ curve reveals the timer setting problem

22
Multipacket vs. Unipacket Segments in CICOQ
Uniform traffic of 40-byte packets only 4-byte
internal header Max segment size 512
B Crosspoint buffer size 520 B

Multipacket Segments
- Switch stable for all loads up to
512/516 99, due to segment size adaptivity
Unipacket Segments
- Switch unstable for loads greater than
40/44 91

23
Multipacket vs. Unipacket Segments in CICOQ
(cont.)
Unbalanced synthetic workload Maximum pkt size
1500 B Max segment size 512B Crosspoint
buffer size 512/1024B RTT 500 byte times

Multipacket Segments
- Satisfactory performance with just 1 max.
size segment crospoint buffer
Unipacket Segments
- At least 1 max. size segment 1 RTT Wnd
is needed

24
Switching Fabrics with Internal Backpressure

Most promising scalable architecture
Open questions still remain congestion control
? active research topic
past present research at FORTH

25
ATLAS ISingle-chip ATM Switch

1996-98
6 million transistors
0.35 µm CMOS
10 Gbit/s (16x16 _at_ 622
Mbps)
multilane backpressure at the granularity of 32 K
flows
on-chip shared buffer (pipelined memory, US
patent 5,774,653)

26
Commodity Architectures an Analogy?

Network switches
1985-2005 immature switch architectures.
1995-2005 Internet Routers, Digital Telephony
specialized, expensive, small market.
2005(?)- clusters (fabrics) of commodity
switches ? - SAN, LAN, WAN