A Deficit Round Robin 20MB/s Layer 2 Switch presentation

About This Presentation

Transcript and Presenter's Notes

Title: A Deficit Round Robin 20MB/s Layer 2 Switch

1
A Deficit Round Robin 20MB/s Layer 2 Switch

Muraleedhara Navada
Francois Labonte

2
Fairness in Switches
Output Queued Switch

How to provide fair bandwidth allocation at
output link ?
Simple FIFO favors greedy flow
Separate flows into FIFOs at output
Bit by Bit fair queuing
Weighted Fair Queuing allows different weight for
flows
Packetized Weighted Fair Queuing (aka PGPS)
calculates departure time for each packet

50
100
50
50
50
50
50
50
150
Round-Robin bit by bit allocation
3
Deficit Round Robin

Packetized Weighted Fair Queuing is complicated
to implement
Deficit Round Robin keeps track of credits for
each flow
Flow sends according credits
Add credits according to weight
Essentially PWFQ at coarser level

Credits
75
50
100
50
50
50
75
75
50
50
50
75
150
Credits
75
50
100
Time
50
50
50
50
25
25
50
50
75
150
Credits
150
50
100
50
50
100
100
50
50
150
150
4
NetFPGA System
1MB SRAM

8 Port 10MB/s duplex ethernet
Control FPGA (CFPGA) handles physical interface
(MAC)
Our design targets both the User FPGAs (UFPGA)

UFPGA1
CFPGA
10MB/s Ethernet
UFPGA0
1MB SRAM
1MB SRAM
5
Design Considerations

4 MACs behind each port (8)
Each flow is a unique Source Address
Destination Address pair
1024 flows
Split across FPGAs
Each UFPGAs read incoming packets from different
ports(0-3 and 4-7)
tradeoff between memory storage and fairness
across all flows

6
Memory Buffer Allocation

Static Partitioning of 1MB SRAM across 512 flows
gives 2kbytes per flow lt 2 max size packets
Need more dynamic allocation
Segments smaller size means less fragmentation,
but more pointer and list handling overhead
128 bytes was chosen
Keep free segments list
Save on-chip only pointer to head and tail of
each flow

P1
P1
P2
P3
P4
P5
P5
P6
7
MAC address Learning

Instead of telling which MAC addresses belong to
which port
Learn them from the source address
Note that our split FPGA design (reading from
different ports) require them to communicate the
MACs learned between them
When destination MAC is not learned yet,
broadcast (send to all other ports).
So MAC learning implies broadcast capability

8
Read Operation
Share SA
Master Control
Read, port
MAC Learning Flow Assignment
CFPGA Interface
Control Handler
DA, SA
Flow ID
Packet Memory Manager
DRR Engine
Flow Tail
Length, ptr
1 MB SRAM
9
Write Operation
Master Control
Write, port
MAC Learning Flow Assignment
CFPGA Interface
Port REQ
Control Handler
Port GNT
Data Ready
Packet Memory Manager
DRR Engine
Head, length
Next head, length, latency
1 MB SRAM
10
DRR Engine

How to handle 512 flows and stay work conserving
Only one flow active at any time
DRR allocation happens on dequeuing
Fifos contain the next flow to be serviced for
each port
Statistics per flow
Weight
Latency
Byte sent
Packet sent
Packets active

A Deficit Round Robin 20MB/s Layer 2 Switch PowerPoint PPT Presentation