Memory for High Performance - PowerPoint PPT Presentation

About This Presentation
Title:

Memory for High Performance

Description:

... For TCP to work well, the buffers need to hold one RTT (about ... High density means we can store data, but. Can't meet random access time. Nick McKeown ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 32
Provided by: tinytera
Category:

less

Transcript and Presenter's Notes

Title: Memory for High Performance


1
Memory for High Performance Internet
Routers Micron February 12th 2003
Nick McKeown Professor of Electrical Engineering
and Computer Science, Stanford
University nickm_at_stanford.edu www.stanford.edu/ni
ckm
2
Ways to get involved
  • Weekly group meetings, talks and papers
    http//klamath.stanford.edu
  • Optics in Routers Project http//klamath.stanford
    .edu/or
  • Networking classes at Stanford
  • Introduction to Computer Networks EE284, CS244a,
    EE384a.
  • Packet Switch Architectures EE384x, EE384y.
  • Multimedia Networking EE384b,c
  • Stanford Network Seminar Series
    http//netseminar.stanford.edu
  • Stanford Networking Research Center
    http//snrc.stanford.edu

3
Outline
  • Context High Performance Routers
  • Trends and Consequences
  • Fast Packet Buffers

4
What a High Performance Router Looks Like
19
19
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
Juniper M160
Cisco GSR 12416
5
Points of Presence (POPs)
6
Generic Router Architecture
Header Processing
Lookup IP Address
Update Header
Queue Packet
7
Generic Router Architecture
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
Buffer Manager
Buffer Memory
8
Outline
  • Context High Performance Routers
  • Trends and Consequences
  • Routing Tables
  • Network Processors
  • Circuit Switches
  • Bigger Routers
  • Multi-rack Routers
  • Packet Buffers
  • Fast Packet Buffers

9
Trends in Routing Tables
10
Trends in Technology, Routers Traffic
Line Capacity 2x / 7 months
User Traffic 2x / 12months
Router Capacity 2.2x / 18months
Moores Law 2x / 18 months
DRAM Random Access Time 1.1x / 18months
11
Trends and Consequences
  • Consequences
  • Packet processing is getting harder, and
    eventually networkprocessors will be used less
    for high performance routers.
  • (Much) bigger routers will be developed.

12
Trends and Consequences (2)
Power consumption will Exceed POP limits
Disparity between line-rate and memory access
time
3
4
Consequences 3. Multi-rack routers will spread
power over multiple racks. 4. It will get harder
to build packet buffers for linecards
13
Outline
  • Context High Performance Routers
  • Trends and Consequences
  • Fast Packet Buffers
  • Work with Sundar Iyer (PhD Student)
  • Problem of big, fast memories
  • Hybrid SRAM-DRAM
  • How big does the SRAM need to be?
  • Prototyping

14
The Problem
  • All packet switches (e.g. Internet routers, ATM
    switches) require packet buffers for periods of
    congestion.
  • Size For TCP to work well, the buffers need to
    hold one RTT (about 0.25s) of data.
  • Speed Clearly, the buffer needs to store
    (retrieve) packets as fast as they arrive
    (depart).

Linerate, R
Linerate, R
Memory
1
1
Memory
Linerate, R
Linerate, R
Memory
N
N
15
An ExamplePacket buffers for a 40Gb/s router
linecard
10Gbits
Buffer Memory
Buffer Manager
16
Memory Technology
  • Use SRAM?
  • Fast enough random access time, but
  • Too low density to store 10Gbits of data.
  • Use DRAM?
  • High density means we can store data, but
  • Cant meet random access time.

17
Cant we just use lots of DRAMs in parallel?
Read/write 320B every 32ns
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
Buffer Memory
40-79
Bytes 0-39





280-319
320B
320B
Write Rate, R
Buffer Manager
One 40B packet every 8ns
18
Works fine if there is only one FIFO
Buffer Memory
40-79
Bytes 0-39





280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
40B
320B
40B
320B
One 40B packet every 8ns
One 40B packet every 8ns
19
Works fine if there is only one FIFO
Variable Length Packets
Buffer Memory
320B
320B
320B
320B
320B
320B
320B
320B
320B
320B
40-79
Bytes 0-39





280-319
320B
320B
Write Rate, R
Read Rate, R
Buffer Manager
?B
320B
?B
320B
One 40B packet every 8ns
One 40B packet every 8ns
20
In practice, buffer holds many FIFOs
1
320B
320B
320B
320B
  • e.g.
  • In an IP Router, Q might be 200.
  • In an ATM switch, Q might be 106.

How can we writemultiple variable-lengthpackets
into different queues?
2
320B
320B
320B
320B
Q
320B
320B
320B
320B
40-79
Bytes 0-39





280-319
21
Problems
  1. A 320B block will contain packets for different
    queues, which cant be written to, or read from
    the same location.
  2. If instead a different address is used for each
    memory, and packets in the 320B block are written
    to different locations, how do we know the memory
    will be available for reading when we need to
    retrieve the packet?

22
Hybrid Memory Hierarchy
23
Some Thoughts
  • The buffer architecture itself is well known.
  • Usually designed to work OK on average.
  • We would like deterministic guarantees.
  • What is the minimum SRAM needed to guarantee that
    a byte is always available in SRAM when
    requested?
  • What algorithm should we use to manage the
    replenishment of the SRAM cache memory?

24
An Example Q 5, w 9, b 6
25
An Example Q 5, w 9, b 6


26
Theorem
  • Impatient Arbiter An SRAM cache of size Qb(2
    ln Q) bytes is sufficient to guarantee a byte is
    always available when requested. Algorithm is
    called MDQF (Most Deficit Queue first).

27
Reducing the size of the SRAM
  • Intuition
  • If we use a lookahead buffer to peek at the
    requests in advance, we can replenish the SRAM
    cache only when needed.
  • This increases the latency from when a request is
    made until the byte is available.
  • But because it is a pipeline, the issue rate is
    the same.

28
Theorem
  • Patient Arbiter An SRAM cache of size Q(b 1)
    bytes is sufficient to guarantee that a requested
    byte is available within Q(b 1) 1 request
    times. Algorithm is called ECQF (Earliest
    Critical Queue first).

29
Maximum Deficit Queue First with Latency (MDQFL)
  • What if application can only tolerate a latency
    lmax lt Q(b 1) 1 timeslots?
  • Algorithm Maximum Deficit Queue First with
    latency (MDQFL) services a queue, once every b
    timeslots in the following order
  • If there is an earliest critical queue, replenish
    it.
  • If not, then replenish the queue that will have
    the most deficit lmax timeslots in the future.

30
Queue Length vs. LatencyQ1000, b 10
31
Whats Next
  • We plan to prototype a 160Gb/s linecard buffer.
  • Part of Optics in Routers Project at Stanford
  • http//klamath.stanford.edu/or
  • Funding Cisco, MARCO (US Government-Industry
    consortium), TI.
  • Would Micron like to work with us?
Write a Comment
User Comments (0)
About PowerShow.com