A Deficit Round Robin 20MB/s Layer 2 Switch - PowerPoint PPT Presentation

About This Presentation
Title:

A Deficit Round Robin 20MB/s Layer 2 Switch

Description:

A Deficit Round Robin 20MB/s Layer 2 Switch Muraleedhara Navada Francois Labonte Fairness in Switches How to provide fair bandwidth allocation at output link ? – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 12
Provided by: Franco87
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: A Deficit Round Robin 20MB/s Layer 2 Switch


1
A Deficit Round Robin 20MB/s Layer 2 Switch
  • Muraleedhara Navada
  • Francois Labonte

2
Fairness in Switches
Output Queued Switch
  • How to provide fair bandwidth allocation at
    output link ?
  • Simple FIFO favors greedy flow
  • Separate flows into FIFOs at output
  • Bit by Bit fair queuing
  • Weighted Fair Queuing allows different weight for
    flows
  • Packetized Weighted Fair Queuing (aka PGPS)
    calculates departure time for each packet

50
100
50
50
50
50
50
50
150
Round-Robin bit by bit allocation
3
Deficit Round Robin
  • Packetized Weighted Fair Queuing is complicated
    to implement
  • Deficit Round Robin keeps track of credits for
    each flow
  • Flow sends according credits
  • Add credits according to weight
  • Essentially PWFQ at coarser level

Credits
75
50
100
50
50
50
75
75
50
50
50
75
150
Credits
75
50
100
Time
50
50
50
50
25
25
50
50
75
150
Credits
150
50
100
50
50
100
100
50
50
150
150
4
NetFPGA System
1MB SRAM
  • 8 Port 10MB/s duplex ethernet
  • Control FPGA (CFPGA) handles physical interface
    (MAC)
  • Our design targets both the User FPGAs (UFPGA)

UFPGA1
CFPGA
10MB/s Ethernet
UFPGA0
1MB SRAM
1MB SRAM
5
Design Considerations
  • 4 MACs behind each port (8)
  • Each flow is a unique Source Address
    Destination Address pair
  • 1024 flows
  • Split across FPGAs
  • Each UFPGAs read incoming packets from different
    ports(0-3 and 4-7)
  • tradeoff between memory storage and fairness
    across all flows

6
Memory Buffer Allocation
  • Static Partitioning of 1MB SRAM across 512 flows
    gives 2kbytes per flow lt 2 max size packets
  • Need more dynamic allocation
  • Segments smaller size means less fragmentation,
    but more pointer and list handling overhead
  • 128 bytes was chosen
  • Keep free segments list
  • Save on-chip only pointer to head and tail of
    each flow

P1
P1
P2
P3
P4
P5
P5
P6
7
MAC address Learning
  • Instead of telling which MAC addresses belong to
    which port
  • Learn them from the source address
  • Note that our split FPGA design (reading from
    different ports) require them to communicate the
    MACs learned between them
  • When destination MAC is not learned yet,
    broadcast (send to all other ports).
  • So MAC learning implies broadcast capability

8
Read Operation
Share SA
Master Control
Read, port
MAC Learning Flow Assignment
CFPGA Interface
Control Handler
DA, SA
Flow ID
Packet Memory Manager
DRR Engine
Flow Tail
Length, ptr
1 MB SRAM
9
Write Operation
Master Control
Write, port
MAC Learning Flow Assignment
CFPGA Interface
Port REQ
Control Handler
Port GNT
Data Ready
Packet Memory Manager
DRR Engine
Head, length
Next head, length, latency
1 MB SRAM
10
DRR Engine
  • How to handle 512 flows and stay work conserving
  • Only one flow active at any time
  • DRR allocation happens on dequeuing
  • Fifos contain the next flow to be serviced for
    each port
  • Statistics per flow
  • Weight
  • Latency
  • Byte sent
  • Packet sent
  • Packets active

11
Conclusion
  • A Deficit Round Robin Switch with 1k flows has
    been implemented
  • Provides dynamic memory buffer allocation, MAC
    learning and broadcast
  • Parallel design split across 2 chips
  • Gathers statistics on flows
Write a Comment
User Comments (0)
About PowerShow.com