Network Processor Algorithms: Design and Analysis - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Network Processor Algorithms: Design and Analysis

Description:

Why are they interesting to industry and ... Network processors are an increasingly important component ... Arpita Ghosh and Costas Psounis. Traffic statistics ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 47

Provided by: nic8166

Category:

more less

Transcript and Presenter's Notes

Title: Network Processor Algorithms: Design and Analysis

1
Network Processor Algorithms Design and Analysis
Balaji Prabhakar
Balaji Prabhakar Stanford University

Stochastic Networks Conference
Montreal
July 22, 2004

2
Overview

Network Processors
What are they?
Why are they interesting to industry and to
researchers
SIFT a simple algorithm for identifying large
flows
The algorithm and its uses
Traffic statistics counters
The basic problem and algorithms
Sharing processors and buffers
A cost/benefit analysis

3
IP Routers
19
19
Capacity is sum of rates of line-cards
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
2.5ft
Juniper M160
Cisco GSR 12416
4
A Detailed Sketch
Output Scheduler
Interconnection Fabric Switch
Lookup Engine
Packet Buffers
Network Processor
Lookup Engine
Packet Buffers
Network Processor
Lookup Engine
Packet Buffers
Network Processor
Line cards
Outputs
5
Network Processors

Network processors are an increasingly important
component of IP routers
They perform a number of tasks
(essentially everything except Switching and
Route lookup)
Buffer management
Congestion control
Output scheduling
Traffic statistics counters
Security
They are programmable, hence add great
flexibility to a routers functionality

6
Network Processors

But, because they operate under severe
constraints
very high line rates
heat constraints
the algorithms that they can support should
be lightweight
They have become very attractive to industry
They give rise to some interesting algorithmic
and performance analytic questions

7
Rest Of The Talk

SIFT a simple algorithm for identifying large
flows
The algorithm and its uses
with Arpita Ghosh and Costas Psounis
Traffic statistics counters
The basic problem and algorithms
with Sundar Iyer, Nick McKeown and Devavrat Shah
Sharing processors and buffers
A cost/benefit analysis
with Vivek Farias and Ciamac Moallemi

8
SIFT Motivation

Current egress buffers on router line cards serve
packets in a FIFO manner

But, giving the packets of short flows a higher
priority, e.g. using the SRPT (Shortest Remaining
Processing Time) policy
reduces average flow delay
given the heavy-tailed nature of Internet flow
size distribution, the reduction in delay can be
huge

9
But

SRPT is unimplementable
router needs to know residual flow sizes for all
enqueued flows virtually impossible to implement
Other pre-emptive schemes like SFF (shortest flow
first) or LAS (least attained service) are
like-wise too complicated to implement
This has led researchers to consider tagging
flows at the edge, where the number of distinct
flows is much smaller
but, this requires a different design of edge and
core routers
more importantly, needs extra space on IP packet
headers to signal flow size
Is something simpler possible?

10
SIFT A randomized algorithm

Flip a coin with bias p ( 0.01, say) for heads
on each arriving packet, independently from
packet to packet
A flow is sampled if one its packets has a head
on it

H
T
T
T
T
T
H
11
SIFT A Randomized Algorithm

A flow of size X has roughly 0.01X chance of
being sampled
flows with fewer than 15 packets are sampled with
prob 0.15
flows with more than 100 packets are sampled with
prob 1
the precise probability is 1 (1-0.01)X
Most short flows will not be sampled, most long
flows will be

12
The Accuracy of Classification

Ideally, we would like to sample like the blue
curve
Sampling with prob p gives the red curve
there are false positives and false negatives
Can we get the green curve?

Prob (sampled)
Flow size
13
SIFT

Sample with a coin of bias q 0.1
say that a flow is sampled if it gets two
heads!
this reduces the chance of making errors
but, you have to have a count the number heads
So, how can we use SIFT at a router?

14
SIFT at a router

Sample incoming packets
Place any packet with a head (or the second such
packet) in the low priority buffer
Place all further packets from this flow in the
low priority buffer (to avoid mis-sequencing)

15
Simulation results

Topology

Traffic
Traffic
Sinks
Sources
16
Overall Average Delays
17
Average Delay for Short Flows
18
Average Delay for Long Flows
19
Implementation Requirements

SIFT needs
two logical queues in one physical buffer
to sample arriving packets
a table for maintaining id of sampled flows
to check whether incoming packet belongs to
sampled flow or not
all quite simple to implement

20
A Big Bonus

The buffer of the short flows has very low
occupancy
so, can we simply reduce it drastically without
sacrificing performance?
More precisely, suppose
we reduce the buffer size for the small flows,
increase it for the large flows, keep the total
the same as FIFO

21
SIFT Incurs Fewer Drops
Buffer_Size(Short flows) 10 Buffer_Size(Long
flows) 290 Buffer_Size(Single FIFO Queue)
300
SIFT ------ FIFO ------
22
Reducing Total Buffer Size

Suppose we reduce the buffer size of the long
flows as well
Questions
will packet drops still be fewer?
will the delays still be as good?

23
Drops With Less Total Buffer
Buffer_Size(PRQ0) 10 Buffer_Size(PRQ1)
190 Buffer_Size(One Queue) 300
One Queue
SIFT ------ FIFO ------
24
Delay Histogram for Short Flows
SIFT ------ FIFO ------
25
Delay Histogram for Long Flows
SIFT ------ FIFO ------
26
Why SIFT Reduces Buffers

The amount of buffering needed to keep links
fully utilized
old formula 10 Gbps x 0.25
2.5 G
corrected to
¼ 250 M
But, this formula is for large (elephant) flows,
not for short (mice) flows
elephant arrival rate 0.65 or 0.7 of C hence
they smaller buffers for them
mice buffers are almost empty due to high
priority, mice dont cause elephant packet drops
elephants use TCP to regulate their sending rate
according to

mice
SIFT
elephants
27
Conclusions for SIFT

A randomized scheme, preliminary results show
that
it has a low implementation complexity
it reduces delays drastically
(users are happy)
with 30-35 smaller buffers at egress line cards
(router manufacturers are happy)
Leads to a 15 pkts or less lane on the Internet,
could be useful
Further work needed
at the moment we have a good understanding of how
to sample,
and extensive (and encouraging) simulation
tests
need to understand the effect of reduced buffers
on end-to-end congestion control algorithms

28
Traffic Statistics Counters Motivation

Switches maintain statistics, typically using
counters that are incremented when packets arrive
At high line rates, memory technology is a
limiting factor for the implementation of
counters for example, in a 40 Gb/s switch, each
packet must be processed in 8 ns
To maintain a counter per flow at these line
rates, we would like an architecture with the
speed of SRAM, and the density (size) of DRAM

29
Hybrid Architecture

Shah, Iyer, Prabhakar, and McKeown (2001)
proposed a hybrid SRAM/DRAM architecture

DRAM
SRAM
Update counter in DRAM, empty corresponding
counter in SRAM (once every b time slots)
N counters
Arrivals (at most one per time slot)
Counter Management Algorithm

30
Counter Management Algorithm

Shah et al. place a requirement on the counter
management algorithm (CMA) that it must maintain
all counter values accurately
That is, given N and b, what should the size of
each SRAM counter be so that no counts are missed?

31
Some CMAs

Round robin
maximum counter value is bN
Largest Counter First (LCF)
optimal in terms of SRAM memory usage
no counter can have a value larger than

32
Analysis of LCF

This upper bound is proved by establishing a
bound on the following potential (Lyapunov)
function
let Qi(t) be the size of counter i at time t, then

E.g. for b 2,

Hence, the size of the largest counter is at most

33
An Implementable Algorithm

LCF is difficult to implement
with one counter per flow, we would like to
support at least 1 million counters
maintaining a sorted list of counters to
determine the longest counter takes too much SRAM
memory
Ramabhadran and Varghese (2003) proposed a
simpler algorithm with the same memory usage as
LCF

34
LCF with Threshold

The algorithm keeps track of the counters that
have value at least as large as b
At any service time, let j be the counter with
the largest value among those incremented since
the previous service, and let c be its value
if c b, serve counter j
if c b, serve any counter with value at least
b if no such counter exists, serve counter j
Maintaining the counters with values at least b
is a non-trivial problem it is solved using a
bitmap and an additional data structure
Is something even simpler possible?

35
Some Simpler Algorithms

Possible approaches for a CMA that is simpler to
implement
arrival information (serve largest counter among
those incremented)
random sampling
round-robin pointer
Trade-off between simplicity and performance
more SRAM is needed in the worst case for these
schemes

36
An Alternative Architecture
DRAM
SRAM
N counters
FIFO Buffer
Counter Management Algorithm

Decision problem given a counter with a
particular value and the occupancy of the buffer,
when should the counter value be moved to the
FIFO buffer? What size counters does this lead
to?
Interesting question with Poisson arrivals,
exponential services, tractable

37
The Cost of Sharing

We have seen that there is a very limited amount
of buffering and processing capability in each
line card
In order to fully utilize these resources, it
will become necessary to share them amongst the
packets arriving at each line card
But, sharing imposes a cost
we may need to traverse the switch fabric more
often than needed
each of the two processors involved in a
migration will need to do some processing e.g. e
local, 1 remote, instead of just 1
or, the host processor may simply be worse at the
processing
e.g. 1 local versus K (gt 1) remote
Need to understand the tradeoff between costs
and benefits
will focus on a specific queueing model
interested in simple rules
benefit measured in reduction of backlogs

38
The Setup
Poisson (l)
Poisson (l)
Poisson (l)
Poisson (l)
K
1
1
exp(1)
exp(1)
exp(1)
exp(1)

Does sharing reduce backlogs?

39
Additive Threshold Policy

Job arrives at queue 1
Send the job to queue 2 if
Otherwise, keep the job in queue 1
Analogous policy for jobs arriving at queue 2

40
Additive Thresholds - Queue Tails
No Sharing
41
Additive Thresholds - Stability

Theorem Additive policy is stable if
and unstable if
For example, if

Stable for Unstable for
42
Inference

The pros/cons of sharing
Reduction in backlogs
Loss of throughput

43
Multiplicative Threshold Policy

Job arrives at queue 1
Send the job to queue 2 if
Otherwise, keep the job in queue 1

Theorem Multiplicative policy is stable for all
l lt 1
Interestingly, this policy improves delays while
preserving throughput!

44
Multiplicative Thresholds - Queue Tails
No Sharing
45
Multiplicative Thresholds - Delay
Average Delay
46
Conclusions

Network processors add useful features to a
routers function
There are many algorithmic questions that come up
simple, high performance algorithms are needed
For the theorist, there are many new and
interesting questions we have seen three
examples briefly
SIFT a sampling algorithm
Designing traffic statistics counters
Sharing a cost-benefit analysis