Packet Switch Architecture and Buffered Crossbar Switches - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Packet Switch Architecture and Buffered Crossbar Switches

Description:

Interface: keyboards, displays, ... Fully designed, in Verilog. Core only, no pads & transceivers. Fully verified: Verilog versus C performance simulator ... – PowerPoint PPT presentation

Number of Views:280
Avg rating:3.0/5.0
Slides: 23
Provided by: pas50
Category:

less

Transcript and Presenter's Notes

Title: Packet Switch Architecture and Buffered Crossbar Switches


1
Packet Switch Architecture and Buffered Crossbar
Switches
  • Manolis Katevenis
  • FORTH and Univ. of Crete, Greece
  • http//archvlsi.ics.forth.gr/kateveni

2
Ubiquitous Interconnects
  • Information Technology Infrastructure
  • Compute processors, specialized compute engines
  • Store memories, disks, etc.
  • Interface keyboards, displays, sensors,
    actuators, etc.
  • Communicate interconnect all above together!
  • from on-chip to cross-continent range
  • Packet Switches the building block for
    high-performance interconnects This Talk
  • brief background review Packet Switch
    Architecture
  • recent research at FORTH Buffered Crossbars

3
Packet Switching Unscheduled Arrivals
4
Buffer Memory Architectures
(1) Output Queueing the reference architecture
  • ideal performance, but excessive cost
  • N(N1) total memory throughput for an NN switch

5
(2) Input Queueing the usual architecture
  • N2 total memory throughput
  • need to solve the crossbar scheduling problem

6
Crossbar Scheduling (1 of 3)
7
Crossbar Scheduling (2 of 3)
8
Crossbar Scheduling (3 of 3)
9
(3) Combined Input-Output Queueing (CIOQ) the
practical architecture
10
Part 2 Buffered Crossbars _at_ FORTH
  • Small buffers inside the crossbar (or switching
    fabric)
  • Large buffers at the inputs (as before)
  • Backpressure to keep the small buffers from
    overflowing
  • simpler (distributed) scheduling, QoS capable
    (WRR)
  • better scheduling efficiency
  • directly operates with variable-size packets,
    w/o SAR
  • NO internal speedup needed
  • lower power consumption, lower cost, or
  • external lines as fast as internal core
  • NO output queues need

11
Distributed Scheduling in Buffered Crossbars
  • inputs decide independently
    where to send to, subject to space
    availability
  • outputs decide independently
    where to feed from, subject to data availability

12
No Speedup needed to approach Output Queuing
  • Uniform destinations
  • Internet-style synthetic workload 40-1500 byte
    packet sizes
  • Unbuffered crossbar w. SAR one-iteration iSLIP,
    64-byte segments

13
Saturation Throughput under Unbalanced Traffic
  • Poisson arrivals, Pareto sizes (40-1500)
  • For iSLIP, packet sizes are multiples of 64 B (?
    no SAR overhead)

14
A VPS Buffered Crossbar Chip Design
  • 32x32 ports, 300 Gbps aggregate throughput
  • 2 KBytes / crosspoint buffer x 1024 crosspoints
  • Variable-size packets (multiples of 4 Bytes)
  • 32-bit datapaths
  • Cut-through at the crosspoints
  • Fully designed, in Verilog
  • Core only, no pads transceivers
  • Fully verified Verilog versus C performance
    simulator
  • Crosspoint logic 100 FF 25 gates
    (simplicity!)

15
Chip Design Synthesis, Placement Routing
  • 32x32 ports, 300 Gbps
  • Synthesized Synopsys
  • Placed routed Cadence Encounter, 0.18 µm UMC
  • ? Clock frequency 300 MHz _at_ 0.18 µm
  • (operates at maximum SRAM clock frequency)
  • ? Core Power 6 Watt typical _at_ 0.18 µm
  • ? Core Area 420 mm² _at_ 0.18 µm, or 200 mm² _at_ 0.13
    µm
  • Conclusion
  • 0.18 µm 24x24 ports (or 10x10 ports w. Jumbo
    frames)
  • 0.13 µm 32x32 ports _at_ 10 Gbps/port
  • 0.09 µm higher port counts and line rates
    achievable

16
Chip Core Layout
17
Core Area, Power Allocation
18
Optimizing for Buffer Memory Technologies
  • Crosspoint buffers are too expensive when maximum
    packet size is large (1.5 10 Kbytes)
  • DRAM on ingress line card operates efficiently
    only on fixed-size blocks

19
Variable-Size Multipacket Segments
Fixed size segments (cells)
Variable size segments
Buffered Crossbars can operate
on variable size units
Pack multiple packets
into each segment
  • Fixed size cells induce heavy padding overheads
  • Variable size segments small buffers, no
    padding

Variable size multipacket segments
  • Encapsulating multiple packets into each
    segment

- reduces overhead of internal headers
- provides better performance with smaller
xpoint buffers - well suited to DRAM
buffers on ingress line cards
20
SRAM DRAM Queueingusing Variable-Size
Multipacket Segments
21
Multipacket Segments in CICOQ vs. CIOQ
Uniform synthetic workload including jumbo
frames Max Segment size 512 B Crosspoint
buffer size 512 B iSLIP with 5
iterations Switch size 32
  • CICOQ delay curve is translated by the
    reassembly delay relative to OQ
  • CIOQ curve reveals the timer setting problem

22
Multipacket vs. Unipacket Segments in CICOQ
Uniform traffic of 40-byte packets only 4-byte
internal header Max segment size 512
B Crosspoint buffer size 520 B
  • Multipacket Segments
  • - Switch stable for all loads up to
    512/516 99, due to segment size adaptivity
  • Unipacket Segments
  • - Switch unstable for loads greater than
    40/44 91

23
Multipacket vs. Unipacket Segments in CICOQ
(cont.)
Unbalanced synthetic workload Maximum pkt size
1500 B Max segment size 512B Crosspoint
buffer size 512/1024B RTT 500 byte times
  • Multipacket Segments
  • - Satisfactory performance with just 1 max.
    size segment crospoint buffer
  • Unipacket Segments
  • - At least 1 max. size segment 1 RTT Wnd
    is needed

24
Switching Fabrics with Internal Backpressure
  • Most promising scalable architecture
  • Open questions still remain congestion control
  • ? active research topic
  • past present research at FORTH

25
ATLAS ISingle-chip ATM Switch
  • 1996-98
  • 6 million transistors
  • 0.35 µm CMOS
  • 10 Gbit/s (16x16 _at_ 622
    Mbps)
  • multilane backpressure at the granularity of 32 K
    flows
  • on-chip shared buffer (pipelined memory, US
    patent 5,774,653)

26
Commodity Architectures an Analogy?
  • Network switches
  • 1985-2005 immature switch architectures.
  • 1995-2005 Internet Routers, Digital Telephony
    specialized, expensive, small market.
  • 2005(?)- clusters (fabrics) of commodity
    switches ? - SAN, LAN, WAN
  • Processors
  • 1975-85 immature pre-RISC architectures.
  • 1985-95 Supercomputers specialized, expensive,
    small market.
  • 1995- clusters of low-cost, mass-market
    (commodity) processing nodes
Write a Comment
User Comments (0)
About PowerShow.com