Network Processor Algorithms: Design and Analysis - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Network Processor Algorithms: Design and Analysis

Description:

Why are they interesting to industry and ... Network processors are an increasingly important component ... Arpita Ghosh and Costas Psounis. Traffic statistics ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 47
Provided by: nic8166
Category:

less

Transcript and Presenter's Notes

Title: Network Processor Algorithms: Design and Analysis


1
Network Processor Algorithms Design and Analysis
Balaji Prabhakar
Balaji Prabhakar Stanford University
  • Stochastic Networks Conference
  • Montreal
  • July 22, 2004

2
Overview
  • Network Processors
  • What are they?
  • Why are they interesting to industry and to
    researchers
  • SIFT a simple algorithm for identifying large
    flows
  • The algorithm and its uses
  • Traffic statistics counters
  • The basic problem and algorithms
  • Sharing processors and buffers
  • A cost/benefit analysis

3
IP Routers
19
19
Capacity is sum of rates of line-cards
Capacity 160Gb/sPower 4.2kW
Capacity 80Gb/sPower 2.6kW
6ft
3ft
2ft
2.5ft
2.5ft
Juniper M160
Cisco GSR 12416
4
A Detailed Sketch
Output Scheduler
Interconnection Fabric Switch
Lookup Engine
Packet Buffers
Network Processor
Lookup Engine
Packet Buffers
Network Processor
Lookup Engine
Packet Buffers
Network Processor
Line cards
Outputs
5
Network Processors
  • Network processors are an increasingly important
    component of IP routers
  • They perform a number of tasks
  • (essentially everything except Switching and
    Route lookup)
  • Buffer management
  • Congestion control
  • Output scheduling
  • Traffic statistics counters
  • Security
  • They are programmable, hence add great
    flexibility to a routers functionality

6
Network Processors
  • But, because they operate under severe
    constraints
  • very high line rates
  • heat constraints
  • the algorithms that they can support should
    be lightweight
  • They have become very attractive to industry
  • They give rise to some interesting algorithmic
    and performance analytic questions

7
Rest Of The Talk
  • SIFT a simple algorithm for identifying large
    flows
  • The algorithm and its uses
  • with Arpita Ghosh and Costas Psounis
  • Traffic statistics counters
  • The basic problem and algorithms
  • with Sundar Iyer, Nick McKeown and Devavrat Shah
  • Sharing processors and buffers
  • A cost/benefit analysis
  • with Vivek Farias and Ciamac Moallemi

8
SIFT Motivation
  • Current egress buffers on router line cards serve
    packets in a FIFO manner
  • But, giving the packets of short flows a higher
    priority, e.g. using the SRPT (Shortest Remaining
    Processing Time) policy
  • reduces average flow delay
  • given the heavy-tailed nature of Internet flow
    size distribution, the reduction in delay can be
    huge

9
But
  • SRPT is unimplementable
  • router needs to know residual flow sizes for all
    enqueued flows virtually impossible to implement
  • Other pre-emptive schemes like SFF (shortest flow
    first) or LAS (least attained service) are
    like-wise too complicated to implement
  • This has led researchers to consider tagging
    flows at the edge, where the number of distinct
    flows is much smaller
  • but, this requires a different design of edge and
    core routers
  • more importantly, needs extra space on IP packet
    headers to signal flow size
  • Is something simpler possible?

10
SIFT A randomized algorithm
  • Flip a coin with bias p ( 0.01, say) for heads
    on each arriving packet, independently from
    packet to packet
  • A flow is sampled if one its packets has a head
    on it

H
T
T
T
T
T
H
11
SIFT A Randomized Algorithm
  • A flow of size X has roughly 0.01X chance of
    being sampled
  • flows with fewer than 15 packets are sampled with
    prob 0.15
  • flows with more than 100 packets are sampled with
    prob 1
  • the precise probability is 1 (1-0.01)X
  • Most short flows will not be sampled, most long
    flows will be

12
The Accuracy of Classification
  • Ideally, we would like to sample like the blue
    curve
  • Sampling with prob p gives the red curve
  • there are false positives and false negatives
  • Can we get the green curve?

Prob (sampled)
Flow size
13
SIFT
  • Sample with a coin of bias q 0.1
  • say that a flow is sampled if it gets two
    heads!
  • this reduces the chance of making errors
  • but, you have to have a count the number heads
  • So, how can we use SIFT at a router?

14
SIFT at a router
  • Sample incoming packets
  • Place any packet with a head (or the second such
    packet) in the low priority buffer
  • Place all further packets from this flow in the
    low priority buffer (to avoid mis-sequencing)

15
Simulation results
  • Topology

Traffic
Traffic
Sinks
Sources
16
Overall Average Delays
17
Average Delay for Short Flows
18
Average Delay for Long Flows
19
Implementation Requirements
  • SIFT needs
  • two logical queues in one physical buffer
  • to sample arriving packets
  • a table for maintaining id of sampled flows
  • to check whether incoming packet belongs to
    sampled flow or not
  • all quite simple to implement

20
A Big Bonus
  • The buffer of the short flows has very low
    occupancy
  • so, can we simply reduce it drastically without
    sacrificing performance?
  • More precisely, suppose
  • we reduce the buffer size for the small flows,
    increase it for the large flows, keep the total
    the same as FIFO

21
SIFT Incurs Fewer Drops
Buffer_Size(Short flows) 10 Buffer_Size(Long
flows) 290 Buffer_Size(Single FIFO Queue)
300
SIFT ------ FIFO ------
22
Reducing Total Buffer Size
  • Suppose we reduce the buffer size of the long
    flows as well
  • Questions
  • will packet drops still be fewer?
  • will the delays still be as good?

23
Drops With Less Total Buffer
Buffer_Size(PRQ0) 10 Buffer_Size(PRQ1)
190 Buffer_Size(One Queue) 300
One Queue
SIFT ------ FIFO ------
24
Delay Histogram for Short Flows
SIFT ------ FIFO ------
25
Delay Histogram for Long Flows
SIFT ------ FIFO ------
26
Why SIFT Reduces Buffers
  • The amount of buffering needed to keep links
    fully utilized
  • old formula 10 Gbps x 0.25
    2.5 G
  • corrected to
    ¼ 250 M
  • But, this formula is for large (elephant) flows,
    not for short (mice) flows
  • elephant arrival rate 0.65 or 0.7 of C hence
    they smaller buffers for them
  • mice buffers are almost empty due to high
    priority, mice dont cause elephant packet drops
  • elephants use TCP to regulate their sending rate
    according to

mice
SIFT
elephants
27
Conclusions for SIFT
  • A randomized scheme, preliminary results show
    that
  • it has a low implementation complexity
  • it reduces delays drastically
  • (users are happy)
  • with 30-35 smaller buffers at egress line cards
  • (router manufacturers are happy)
  • Leads to a 15 pkts or less lane on the Internet,
    could be useful
  • Further work needed
  • at the moment we have a good understanding of how
    to sample,
  • and extensive (and encouraging) simulation
    tests
  • need to understand the effect of reduced buffers
    on end-to-end congestion control algorithms

28
Traffic Statistics Counters Motivation
  • Switches maintain statistics, typically using
    counters that are incremented when packets arrive
  • At high line rates, memory technology is a
    limiting factor for the implementation of
    counters for example, in a 40 Gb/s switch, each
    packet must be processed in 8 ns
  • To maintain a counter per flow at these line
    rates, we would like an architecture with the
    speed of SRAM, and the density (size) of DRAM

29
Hybrid Architecture
  • Shah, Iyer, Prabhakar, and McKeown (2001)
    proposed a hybrid SRAM/DRAM architecture

DRAM
SRAM
Update counter in DRAM, empty corresponding
counter in SRAM (once every b time slots)
N counters
Arrivals (at most one per time slot)
Counter Management Algorithm


30
Counter Management Algorithm
  • Shah et al. place a requirement on the counter
    management algorithm (CMA) that it must maintain
    all counter values accurately
  • That is, given N and b, what should the size of
    each SRAM counter be so that no counts are missed?

31
Some CMAs
  • Round robin
  • maximum counter value is bN
  • Largest Counter First (LCF)
  • optimal in terms of SRAM memory usage
  • no counter can have a value larger than

32
Analysis of LCF
  • This upper bound is proved by establishing a
    bound on the following potential (Lyapunov)
    function
  • let Qi(t) be the size of counter i at time t, then
  • E.g. for b 2,
  • Hence, the size of the largest counter is at most

33
An Implementable Algorithm
  • LCF is difficult to implement
  • with one counter per flow, we would like to
    support at least 1 million counters
  • maintaining a sorted list of counters to
    determine the longest counter takes too much SRAM
    memory
  • Ramabhadran and Varghese (2003) proposed a
    simpler algorithm with the same memory usage as
    LCF

34
LCF with Threshold
  • The algorithm keeps track of the counters that
    have value at least as large as b
  • At any service time, let j be the counter with
    the largest value among those incremented since
    the previous service, and let c be its value
  • if c b, serve counter j
  • if c b, serve any counter with value at least
    b if no such counter exists, serve counter j
  • Maintaining the counters with values at least b
    is a non-trivial problem it is solved using a
    bitmap and an additional data structure
  • Is something even simpler possible?

35
Some Simpler Algorithms
  • Possible approaches for a CMA that is simpler to
    implement
  • arrival information (serve largest counter among
    those incremented)
  • random sampling
  • round-robin pointer
  • Trade-off between simplicity and performance
    more SRAM is needed in the worst case for these
    schemes

36
An Alternative Architecture
DRAM
SRAM
N counters
FIFO Buffer
Counter Management Algorithm

  • Decision problem given a counter with a
    particular value and the occupancy of the buffer,
    when should the counter value be moved to the
    FIFO buffer? What size counters does this lead
    to?
  • Interesting question with Poisson arrivals,
    exponential services, tractable

37
The Cost of Sharing
  • We have seen that there is a very limited amount
    of buffering and processing capability in each
    line card
  • In order to fully utilize these resources, it
    will become necessary to share them amongst the
    packets arriving at each line card
  • But, sharing imposes a cost
  • we may need to traverse the switch fabric more
    often than needed
  • each of the two processors involved in a
    migration will need to do some processing e.g. e
    local, 1 remote, instead of just 1
  • or, the host processor may simply be worse at the
    processing
  • e.g. 1 local versus K (gt 1) remote
  • Need to understand the tradeoff between costs
    and benefits
  • will focus on a specific queueing model
  • interested in simple rules
  • benefit measured in reduction of backlogs

38
The Setup
Poisson (l)
Poisson (l)
Poisson (l)
Poisson (l)
K
1
1
exp(1)
exp(1)
exp(1)
exp(1)
  • Does sharing reduce backlogs?

39
Additive Threshold Policy
  • Job arrives at queue 1
  • Send the job to queue 2 if
  • Otherwise, keep the job in queue 1
  • Analogous policy for jobs arriving at queue 2

40
Additive Thresholds - Queue Tails
No Sharing
41
Additive Thresholds - Stability
  • Theorem Additive policy is stable if
  • and unstable if
  • For example, if

Stable for Unstable for
42
Inference
  • The pros/cons of sharing
  • Reduction in backlogs
  • Loss of throughput

43
Multiplicative Threshold Policy
  • Job arrives at queue 1
  • Send the job to queue 2 if
  • Otherwise, keep the job in queue 1
  • Theorem Multiplicative policy is stable for all
    l lt 1
  • Interestingly, this policy improves delays while
    preserving throughput!

44
Multiplicative Thresholds - Queue Tails
No Sharing
45
Multiplicative Thresholds - Delay
Average Delay
46
Conclusions
  • Network processors add useful features to a
    routers function
  • There are many algorithmic questions that come up
  • simple, high performance algorithms are needed
  • For the theorist, there are many new and
    interesting questions we have seen three
    examples briefly
  • SIFT a sampling algorithm
  • Designing traffic statistics counters
  • Sharing a cost-benefit analysis
Write a Comment
User Comments (0)
About PowerShow.com