Memory for High Performance - PowerPoint PPT Presentation

About This Presentation

Title:

Memory for High Performance

Description:

... For TCP to work well, the buffers need to hold one RTT (about ... High density means we can store data, but. Can't meet random access time. Nick McKeown ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 32

Provided by: tinytera

Learn more at: http://tiny-tera.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Memory for High Performance

1
Memory for High Performance Internet
Routers Micron February 12th 2003
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2
Ways to get involved

Weekly group meetings, talks and papers
http//klamath.stanford.edu
Optics in Routers Project http//klamath.stanford
.edu/or
Networking classes at Stanford
Introduction to Computer Networks EE284, CS244a,
EE384a.
Packet Switch Architectures EE384x, EE384y.
Multimedia Networking EE384b,c
Stanford Network Seminar Series
http//netseminar.stanford.edu
Stanford Networking Research Center
http//snrc.stanford.edu

3
Outline

Context High Performance Routers
Trends and Consequences
Fast Packet Buffers

4
What a High Performance Router Looks Like
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
Juniper M160
Cisco GSR 12416
5
Points of Presence (POPs)
6
Generic Router Architecture
Header Processing
Lookup IP Address
Update Header
Queue Packet
7
Generic Router Architecture
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
8
Outline

Context High Performance Routers
Trends and Consequences
Routing Tables
Network Processors
Circuit Switches
Bigger Routers
Multi-rack Routers
Packet Buffers
Fast Packet Buffers

9
Trends in Routing Tables
10
Trends in Technology, Routers Traffic
Line Capacity 2x / 7 months
User Traffic 2x / 12months
Router Capacity 2.2x / 18months
Moores Law 2x / 18 months
DRAM Random Access Time 1.1x / 18months
11
Trends and Consequences

Consequences
Packet processing is getting harder, and
eventually networkprocessors will be used less
for high performance routers.
(Much) bigger routers will be developed.

12
Trends and Consequences (2)
Power consumption will Exceed POP limits
Disparity between line-rate and memory access
time
3
4
Consequences 3. Multi-rack routers will spread
power over multiple racks. 4. It will get harder
to build packet buffers for linecards
13
Outline

Context High Performance Routers
Trends and Consequences
Fast Packet Buffers
Work with Sundar Iyer (PhD Student)
Problem of big, fast memories
Hybrid SRAM-DRAM
How big does the SRAM need to be?
Prototyping

14
The Problem

All packet switches (e.g. Internet routers, ATM
switches) require packet buffers for periods of
congestion.
Size For TCP to work well, the buffers need to
hold one RTT (about 0.25s) of data.
Speed Clearly, the buffer needs to store
(retrieve) packets as fast as they arrive
(depart).

Linerate, R
Linerate, R
Memory
1
1
Memory
Linerate, R
Linerate, R
Memory
N
N
15
An ExamplePacket buffers for a 40Gb/s router
linecard
10Gbits
Buffer Memory
Buffer Manager
16
Memory Technology

Use SRAM?
Fast enough random access time, but
Too low density to store 10Gbits of data.
Use DRAM?
High density means we can store data, but
Cant meet random access time.

17
Cant we just use lots of DRAMs in parallel?
Read/write 320B every 32ns
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Buffer Manager
One 40B packet every 8ns
18
Works fine if there is only one FIFO
Buffer Memory
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
40B
320B
40B
320B
One 40B packet every 8ns
One 40B packet every 8ns
19
Works fine if there is only one FIFO
Variable Length Packets
Buffer Memory
320B
320B
320B
320B
320B
320B
320B
320B
320B
320B
40-79
Bytes 0-39

280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
?B
320B
?B
320B
One 40B packet every 8ns
One 40B packet every 8ns
20
In practice, buffer holds many FIFOs
1
320B
320B
320B
320B

e.g.
In an IP Router, Q might be 200.
In an ATM switch, Q might be 106.

How can we writemultiple variable-lengthpackets
into different queues?
2
320B
320B
320B
320B
Q
320B
320B
320B
320B
40-79
Bytes 0-39

280-319
21
Problems

A 320B block will contain packets for different
queues, which cant be written to, or read from
the same location.
If instead a different address is used for each
memory, and packets in the 320B block are written
to different locations, how do we know the memory
will be available for reading when we need to
retrieve the packet?

22
Hybrid Memory Hierarchy
23
Some Thoughts

The buffer architecture itself is well known.
Usually designed to work OK on average.
We would like deterministic guarantees.
What is the minimum SRAM needed to guarantee that
a byte is always available in SRAM when
requested?
What algorithm should we use to manage the
replenishment of the SRAM cache memory?

24
An Example Q 5, w 9, b 6
25
An Example Q 5, w 9, b 6

26
Theorem

Impatient Arbiter An SRAM cache of size Qb(2
ln Q) bytes is sufficient to guarantee a byte is
always available when requested. Algorithm is
called MDQF (Most Deficit Queue first).

27
Reducing the size of the SRAM

Intuition
If we use a lookahead buffer to peek at the
requests in advance, we can replenish the SRAM
cache only when needed.
This increases the latency from when a request is
made until the byte is available.
But because it is a pipeline, the issue rate is
the same.

28
Theorem

Patient Arbiter An SRAM cache of size Q(b 1)
bytes is sufficient to guarantee that a requested
byte is available within Q(b 1) 1 request
times. Algorithm is called ECQF (Earliest
Critical Queue first).

29
Maximum Deficit Queue First with Latency (MDQFL)

What if application can only tolerate a latency
lmax lt Q(b 1) 1 timeslots?
Algorithm Maximum Deficit Queue First with
latency (MDQFL) services a queue, once every b
timeslots in the following order
If there is an earliest critical queue, replenish
it.
If not, then replenish the queue that will have
the most deficit lmax timeslots in the future.

30
Queue Length vs. LatencyQ1000, b 10
31
Whats Next