SIP%20Server%20Overload%20Control:%20Design%20and%20Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

SIP%20Server%20Overload%20Control:%20Design%20and%20Evaluation

Description:

Application layer signaling protocol for managing sessions in the ... Flash Crowds: American Idol, 'Free tickets to the third caller' Denial of service attacks ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 25
Provided by: charle235
Category:

less

Transcript and Presenter's Notes

Title: SIP%20Server%20Overload%20Control:%20Design%20and%20Evaluation


1
SIP Server Overload Control Design and
Evaluation
  • Charles Shen and Henning Schulzrinne
  • Columbia University
  • Erich Nahum
  • IBM T.J. Watson Research Center

2
Session Initiation Protocol (SIP)
  • Application layer signaling protocol for managing
    sessions in the Internet
  • Run on top of the transport layer e.g. UDP, TCP
    and SCTP
  • Typical usage voice over IP call setup, instant
    messaging, presence, conferencing

3
SIP Server Overload Problem
  • Many causes to excessive number of messages
    overwhelming the server
  • Natural disaster and emergency-induced call
    volume earthquake,
  • Predictable special events Mothers Day
  • Flash Crowds American Idol, Free tickets to the
    third caller
  • Denial of service attacks
  • Simply dropping requests on overload?
  • SIP has retransmission timers for message loss,
    especially over UDP
  • E.g., Timer A for INVITE retransmission
  • T1 500 ms, increases exponentially until total
    timeout period exceeds 32 s
  • Simple message dropping induces more messages due
    to retransmission!

4
SIP Server Overload Problem (Cont.)
  • Rejecting excessive requests upon overload?
  • SIP 503 (Service Unavailable) response code used
    to reject individual request
  • Individual sessions are rejected but overall
    sending rate is not reduced.
  • Even worse rejecting requests takes comparable
    CPU cycles with accepting requests!
  • 503 (Service Unavailable) with Retry-After?
  • Client completly shut off during the period
    specified
  • Reducing rate with an on/off pattern, may cause
    oscillation
  • Trying an alternative server?
  • Alternative server may soon be overloaded too-gt
    cascading failure!
  • Feedback-based SIP overload control
  • Sender is instructed by the receiver not to send
    more requests than the receiver can accept in the
    first place!

5
Feedback-based SIP Overload Control
  • Absolute rate feedback
  • RE estimates and feedbacks to SEs target
    controlled load (?)
  • SE throttles offered load Pb (1-?/?) so actual
    load to RE conforms to target load
  • Key is accurate controlled load estimation
  • Relative rate feedback (loss-based feedback)
  • RE estimates and feedbacks to SEs a load throttle
    percentage Pb based on a target metric (e.g. CPU
    utilization, queue length)
  • SE throttles offered load by Pb to conform to the
    target controlled load.
  • Key is the target metric and the throttle
    percentage adjustment algorithm
  • Window feedback
  • RE estimates and feedbacks to SEs a window size
    indicates current acceptable num of new calls
  • SE throttles any new call arrivals while no
    window slot available, thus limiting offered load
    (?) to the target controlled load.
  • Key is the maximum window setup and dynamic
    window adjustment algorithm

6
SIP Overload Feedback Control Design
Considerations Control Unit
  • What is a control unit a SIP message, a SIP
    session?
  • Although the signaling is message based, not all
    messages carry equal weight
  • Typical SIP call contains one INVITE followed by
    six additional messages
  • A new INVITE is much more expensive than other
    messages
  • A job or a control unit is defined as a whole SIP
    session (e.g. a SIP Call)
  • How to characterize the end of a SIP session?
  • Can we always expect a BYE as an end of a
    session?
  • Easier if we can - full session check approach
  • Otherwise, use a dynamic start session check
    approach
  • under normal working conditions, the actual
    session acceptance rate is roughly equal to the
    session service rate.
  • estimated session service rate is number of
    INVITEs accepted over a unit of measurement
    interval
  • Standard smoothing functions can be applied

7
SIP Overload Feedback Control Design
Considerations Dynamic Session Est.
  • Often need to know current number of sessions in
    the server system
  • NOT equal to number of INVITE messages in the
    system
  • non-INVITE messages must also be accounted for!
  • Proposed Dynamic Session Estimation Algorithm
    (DSEA)
  • Nsess Ninv (Nnoninv / (Lsess-1) )
  • Where Lsess is estimated session size (number of
    messages per session)
  • Ninv is number of INVITE messages in the system
  • Nnoninv is number of non-INVITE messages in the
    system
  • DSEA holds for both full session check and
    start session check approaches.
  • differ in how the Lsess parameter is obtained.
  • full session check checking the start and end of
    each individual SIP sessions.
  • start session check number of messages processed
    over number of sessions accepted per unit time

8
SIP Overload Feedback Control Design
Considerations- Active Source Estimation and
Feedback Communication
  • RE may wish to know number of active sources,
    e.g. to explicitly allocate its total capacity
    among multiple SEs.
  • directly tracking and maintaining a table entry
    for each current active SE.
  • each entry has an expiration timer set to one
    second.
  • Feedback Communication
  • for SIP overload between servers, in-band
    feedback is appropriate
  • any feedback information is piggybacked in the
    next SIP message sending to the corresponding
    next hop

9
Win-disc Window Control Algorithm
  • Principle estimate and adjust the number of
    acceptable sessions every control interval
  • Decrease window upon new session arrival
  • Adjust window every control interval Tc
  • new available window (W) is the total allowed
    number of session in the next interval minus
    existing backlog
  • W µTc
    µDB - Nsess
  • µ current session service rate
  • DB budget queuing delay (should be smaller than
    the INVITE timer)
  • Nsess Ninv (Nnoninv / (Lsess-1) ) is current
    num of sessions in the system
  • Initial window suggested W0 µengTc where µeng
    is the engineered server capacity.

10
Win-cont Window Control Algorithm
  • Principle continuously keep the estimated number
    of existing sessions in the system below a target
    number
  • Decrease window size upon new session arrival
    (enqueueing INVITE)
  • Increase available window size (W) when currently
    estimated existing num of sessions is smaller
    than maximum allowed num of jobs
  • W µDB
    Nsess
  • µDB is equal to maximum allowed num of sessions
    in the system (max window size)
  • Nsess Ninv (Nnoninv / (Lsess-1) ) is current
    num of sessions in the system
  • Initial window suggested W0 µengTc where µeng
    is the engineered server capacity.

11
Win-auto Window Control Algorithm
  • Principle simple window adaptation that
    automatically slows down when the system is
    congested
  • Decrease window size by one upon new session
    arrival (receiving INVITE)
  • Increase window by one up dequeueing a NEW INVITE
    (not a retransmission).
  • Therefore, window increase is slower than window
    decrease
  • system adapts itself to a steady state w/ a
    fairly low dynamic available window
  • Initial window suggested W0 is a reasonably
    large positive value, exact value not important
  • Biggest advantage simple

12
rate-abs Absolute Rate Based Control
  • During every control interval Tc, the RE notifies
    the SE of the new target load ?
  • ? µ 1-
    (dq - DB ) / Tc
  • µ the current estimated service rate
  • dq Nsess / µ queuing delay at the
    last measurement interval where
  • Nsess is current num of sessions in the
    server obtained using our Dynamic
  • Session Estimation Algorithm
  • The SE does percentage throttle to limit offered
    load to RE within the feedback assignment for
    each control interval

Algorithm proposed by Hosein etc.
13
rate-occ Relative Rate Based Control
  • During every control interval Tc, the RE notifies
    the SE of an acceptance ratio f
  • Adjustment of f is based on the measured
    processor occupancy comparing to a budget
    processor occupancy ?B
  • fk and fk1 are acceptance ratios of
    current and next control interval
  • ?k min(?B /?k,?max) and ?k current
    processor occupancy
  • fmin a none-zero minimal acceptance ratio
  • ?max max multiplicative increase factor in
    two consecutive Tc
  • In this paper ?max 5 and fmin 0.02

Algorithm proposed by Cyr. etc.
14
Simulation Assumptions and Metrics
  • Simulator RFC3261 compatible simulator built on
    OPNET
  • Node model
  • Each UA represents infinite number of
    callers/callees
  • UAs and SEs have infinite capacity
  • RE server configuration service capacity
    72 cps, rejecting rate 3000 cps
  • Traffic model
  • Calls from callers on the left to callees on the
    right
  • Exponential interarrival times and call holding
    time
  • Standard seven-message call flow
  • Transport and network model
  • UDP transport-gt all SIP timers active
  • No link delay and loss is assumed
  • Feedback method piggybacked in the next
    available message to the particular next hop.

15
SIP Overload Performance without Any Feedback
Control
  • Simple Drop scenario
  • message dropped when queue full
  • Threshold Rejection scenario
  • queue length configured with a high and a low
    threshold value.
  • when queue length high threshold
  • new INVITE requests are rejected but other
    messages are still processed.
  • when queue length falls below low threshold
  • INVITE processing restored
  • Similar congestion collapse but DIFFERENT
    reasons
  • Simple Drop
  • one third of INVITE arriving at the callee
  • all 180 RINGING and most of the 200 OK also
    dropped due to queue overflow.
  • Threshold Rejection
  • no INVITE reaches the callee
  • RE is only sending rejection messages

16
Summary and Comparison of Feedback Algorithm
Parameters
Algorithm Binding Control Interval Measurement Interval Additional Parameters
Rate-abs DB TC Tm
Rate-occ ?B TC Tm fmin and ?
Win-disc DB TC Tm
Win-cont DB N/A Tm
Win-auto N/A N/A N/A
  • Most algorithms have a binding parameter
  • three use budget queuing delay DB
  • one uses budget CPU occupancy ?B
  • All three discrete time control algorithms need
    Tc
  • Tm used by four of the five algorithms for
    service rate and CPU occupancy, where applicable
  • Tm min(100 ms,Tc) found to be a reasonable
    choice
  • Queue length is measured instantly
  • DB budget queuing delay
  • ?B CPU occupancy
  • Tc discrete time feedback control interval
  • Tm discrete time measurement interval for
    selected server metric Tm Tc
  • fmin minimal acceptance fraction
  • ? multiplicative factor
  • DB recommended for robustness, although a fixed
    binding window size can also be used Optionally
    DB may be applied for corner cases

17
Sensitivity of Budget Queuing Delay and Control
Interval
  • Sensitivity of budget queuing delay
  • Small queuing delay (lt ½ T1 timer) avoids timeout
    and gives best results
  • Example results for win-disc
  • Unit goodput when DB lt 200 ms and Tc 200 ms
  • Goodput degraded by 25 DB 500 ms
  • Results for win-cont and rate-abs show similar
    shape, with slightly different sensitivity.
  • In general, a positive DB value centered at
    around 200 ms sufficient for all
  • Sensitivity of control interval
  • the smaller the Tc the better.
  • Example results for win-disc,
  • at D 200 ms Tc lt 200 ms sufficient to archive
    unit goodput in our scenario

All load and goodput values normalized over
server capacity
18
Impact of Control Interval across Algorithms
  • Comparing Tc for win-disc, rate-abs and rate-occ
    at DB 200ms
  • For both win-disc and rate-abs
  • close to unit goodput except Tc 1s w/ heavy
    load
  • win-disc more sensitive to Tc than rate-abs -gt
    more busty traffic resulted from window throttle.
  • shorter Tc better results (lt 200 ms sufficient)
  • rate-occ not as good as the other two
  • Interesting point from 14 ms to 100 ms goodput
    increases in light and decreases in heavy
    overload
  • Possible result of rate adjustment parameters
    cutting the rate too much at the light overload.

Goodput vs. Tc
Goodput vs. Tc at Load 1
Goodput vs. Tc at Load 8.4
rate-occ has ?B set to 85 which is seen to
give the highest and stable performance across
different load conditions in the given scenario
19
Best Performance Comparison across Algorithms
  • All except rate-occ reaches unit goodput
  • no retransmission ever
  • server always busy processing messages
  • each single message part of a successful session
  • rate-occ does not operate at unit goodput
  • not simply due to artificial 85 CPU limit
  • inherently occupancy not as direct a metric as
    needed
  • extremely small Tc improves performance at heavy
    load but with many problems
  • difficulty in implementation
  • actual server occupancy departs greatly from the
    original intended setting
  • poor performance under light overload, -gt may be
    linked to OCC increase and decrease heuristic
    parameters.

DB (ms) Tc (ms) Tm (ms)
Rate-abs 0.2 0.2 0.1
Rate-occ1 NA 0.2 0.1
Rate-occ2 NA 0.014 0.014
Win-disc 0.2 0.2 0.1
Win-cont 0.2 NA 0.1
Win-auto NA NA NA
?B 0.85 ? 5, fmin 0.02
20
Fairness for SIP Overload Control
  • User-centric fairness
  • In its basic form it ensures equal success rate
    for each individual user
  • Implementation by assigning the capacity of the
    overloaded server proportionally to the upstream
    servers according to the original load arrival
  • Applicability example Third caller receives a
    free gift
  • Provider-centric fairness
  • Assuming each upstream server represents a
    provider, in its basic form it ensures each
    provider gets the same aggregate share of total
    capacity
  • Implementation by dividing the capacity equally
    among upstream servers
  • Applicability example equal-share SLA
  • Customized fairness
  • Any allocation as pre-specified by SLA etc.
  • Deny of Service attacks, penalizing the specific
    sources

21
Dynamic Load Performance w/ Provider Centric
Fairness
  • Realistic server to server overload situations
  • more likely short periods of bulk loads
  • possibly accompanied by new source arrivals or
    departures.
  • Example result using rate-abs algorithm
  • Each upstream SE share close to equal RE capacity
  • Fast dynamic transition

22
Dynamic Load Performance w/ User Centric Fairness
  • Double feed architecture
  • With load feedforward to assist receiver capacity
    allocation
  • Example using win-cont algorithm
  • Upstream SEs share to RE capacity proportional to
    their offered load
  • Fast dynamic transition

23
Dynamic Load Performance of win-auto Algorithm
  • Source arrival transition time could be
    noticeably longer
  • Capacity split not easy to predict
  • hard to enforce explicit fairness
  • basically no processing intervention
  • Still achieves aggregate unit goodput

24
Conclusions and Future Work
  • SIP overload problem is special because of the
    high rejection cost and drop retransmission
  • SIP overload control goal is to maximize number
    of timely completed call
  • Approach is to have SE send only the appropriate
    number of calls RE can timely handle
  • Presented and compared five algorithms under both
    steady and dynamic load
  • Win-disc/win-cont/win-auto/rate-abs/rate-occ
  • All but rate-occ are able to achieve unit goodput
  • Algorithms binding on queue metrics is preferred
    over occupancy-based heuristic
  • All but win-auto adapts to dynamic load and
    source departure/arrival well
  • All but win-auto can achieve both user-centric
    and provider centric fairness
  • Win-disc/win-cont/rate-abs requires double
    feedback architecture for user-centric fairness
  • win-auto is still extremely simple with close to
    unit steady state aggregate goodput
  • Future work
  • More realistic network configuration including
    link delay and loss, node failure model
  • Feedback enforcement algorithms other than
    percentage throttle and window throttle
Write a Comment
User Comments (0)
About PowerShow.com