EE384x: Packet Switch Architectures I

About This Presentation

Title:

EE384x: Packet Switch Architectures I

Description:

Size: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data. ... If instead a different address is used for each memory, and packets in the 320B ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 25

Provided by: nickmc

Category:

more less

Transcript and Presenter's Notes

Title: EE384x: Packet Switch Architectures I

1
EE384x Packet Switch Architectures I

Parallel Packet Buffers

Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu http//www.stanford.
edu/nickm
2
The Problem

All packet switches (e.g. Internet routers, ATM
switches) require packet buffers for periods of
congestion.
Size For TCP to work well, the buffers need to
hold one RTT (about 0.25s) of data.
Speed Clearly, the buffer needs to store
(retrieve) packets as fast as they arrive
(depart).

Linerate, R
Linerate, R
Memory
1
1
Memory
Linerate, R
Linerate, R
Memory
N
N
3
An ExamplePacket buffers for a 40Gb/s router
linecard
10Gbits
Buffer Memory
Buffer Manager
4
Memory Technology

Use SRAM?
Fast enough random access time, but
Too low density to store 10Gbits of data.
Use DRAM?
High density means we can store data, but
Cant meet random access time.

5
Cant we just use lots of DRAMs in parallel?
Read/write 320B every 32ns
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Buffer Manager
One 40B packet every 8ns
6
Works fine if there is only one FIFO
Buffer Memory
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
40B
320B
40B
320B
One 40B packet every 8ns
One 40B packet every 8ns
7
Works fine if there is only one FIFO
Variable Length Packets
Buffer Memory
320B
320B
320B
320B
320B
320B
320B
320B
320B
320B
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
?B
320B
?B
320B
One 40B packet every 8ns
One 40B packet every 8ns
8
In practice, buffer holds many FIFOs
1
320B
320B
320B
320B

e.g.
In an IP Router, Q might be 200.
In an ATM switch, Q might be 106.

How can we writemultiple variable-lengthpackets
into different queues?
2
320B
320B
320B
320B
Q
320B
320B
320B
320B
40-79
Bytes 0-39

280-319
9
Problems

A 320B block will contain packets for different
queues, which cant be written to, or read from
the same location.
If instead a different address is used for each
memory, and packets in the 320B block are written
to different locations, how do we know the memory
will be available for reading when we need to
retrieve the packet?

10
Hybrid Memory Hierarchy
11
Some Thoughts

What is the minimum SRAM needed to guarantee that
a byte is always available in SRAM when
requested?
What algorithm should we use to manage the
replenishment of the SRAM cache memory?

12
An Example Q 5, w 9, b 6
13
An Example Q 5, w 9, b 6

14
The size of the SRAM cache
Bytes
Q
w
w

Necessity
How large does the SRAM cache need to be under
any MMA?
Theorem wQ gt Q(b - 1)(2 lnQ)
Sufficiency
For a specific MMA, and for any pattern of
arrivals, what is the smallest SRAM cache needed
so that a byte is always available when
requested?
For one particular algorithm wQ Qb(2 lnQ)

15
Some Definitions

Occupancy X(q,t)
The number of bytes in FIFO q (in SRAM) at time
t.
Deficit D(q,t) w - X(q,t)

Q
w
w
deficit
occupancy
16
Smallest SRAM cacheNecessity
17
Smallest SRAM cacheNecessity

In addition, each queue needs to hold (b 1)
bytes in case it is replenished with b bytes when
only 1 byte has been removed.
Therefore, SRAM size must be at least Qw gt Q(b
1)(2 lnQ).

18
Most Deficit Queue First MMASufficiency

Algorithm Every b timeslots, MDQF-MMA
replenishes the queue with the largest deficit.
Theorem With MDQF-MMA, an SRAM cache of size Qw
gt Qb(2 lnQ) is sufficient.

19
Reducing the size of the SRAM

Intuition
If we use a lookahead buffer to peek at the
requests in advance, we can replenish the SRAM
cache only when needed.
This increases the latency from when a request is
made until the byte is available.
But because it is a pipeline, the issue rate is
the same.

20
The ECQF-MMA Algorithm
Queues
Q

Lookahead Next Q(b 1) 1 arbiter requests
are known.

b - 1
Queues

Replenish Fetch b bytes for the troubled queue.

Q
b - 1
21
Example of ECQF-MMA Q4, b4
22
Theorem

Patient Arbiter An SRAM cache of size Q(b 1)
bytes is sufficient to guarantee that a requested
byte is available within Q(b 1) 1 request
times. Algorithm is called ECQF-MMA (Earliest
Critical Queue first).

23
Maximum Deficit Queue First with Latency
(MDQFL-MMA)

What if application can only tolerate a latency
lmax lt Q(b 1) 1 timeslots?
Algorithm Maximum Deficit Queue First with
latency (MDQFL-MMA) services a queue, once every
b timeslots in the following order
If there is an earliest critical queue, replenish
it.
If not, then replenish the queue that will have
the most deficit lmax timeslots in the future.