Title: Advanced Topics in Congestion Control
1Advanced Topics inCongestion Control
- Slides originally developed by S. Kalyanaraman
(RPI) based in part upon slides of Prof. Raj Jain
(OSU), Srini Seshan (CMU), J. Kurose (U Mass),
I.Stoica (UCB)
2Overview
- Queue Management Schemes
- RED, ARED, FRED, BLUE, REM
- TCP Congestion Control (CC) Modeling,
- TCP Friendly CC
- Accumulation-based Schemes TCP Vegas, Monaco
- Static Optimization Framework Model for
Congestion Control - Explicit Rate Feedback Schemes (ATM ABR ERICA)
3Readings
- Refs Chap 13.21, 13.22 in Comer textbook
- Floyd and Jacobson "Random Early Detection
gateways for Congestion Avoidance" - Ramakrishnan and Jain, A Binary Feedback Scheme
for Congestion Avoidance in Computer Networks
with a Connectionless Network Layer, - Padhye et al, "Modeling TCP Throughput A Simple
Model and its Empirical Validation" - Low, Lapsley "Optimization Flow Control, I
Basic Algorithm and Convergence" - Kalyanaraman et al "The ERICA Switch Algorithm
for ABR Traffic Management in ATM Networks" - Harrison et al "An Edge-based Framework for Flow
Control"
4Queuing Disciplines
- Each router must implement some queuing
discipline - Queuing allocates bandwidth and buffer space
- Bandwidth Which packet to serve next
(scheduling) - Buffer Which packet to drop next (buffer mgmt)
- Queuing also affects latency
Traffic Sources
Traffic Classes
Class A
Class B
Class C
Drop
Scheduling
Buffer Management
5Typical Internet Queuing
- FIFO and Drop-tail
- Simplest choice
- Used widely in the Internet
- FIFO (first-in-first-out)
- Implies single class of traffic
- Drop-tail
- Arriving packets get dropped when queue is full
regardless of flow or importance - Important distinction
- FIFO Scheduling discipline
- Drop-tail Buffer management policy
6FIFO and Drop-tail Problems
- FIFO Issues In a FIFO discipline, the service
seen by a flow is convoluted with the arrivals of
packets from all other flows! - No isolation between flows full burden on e2e
control - No policing send more packets ? get more service
- Drop-tail Issues
- Routers are forced to have have large queues to
maintain high utilizations - Larger buffers gt larger steady state
queues/delays - Synchronization End hosts react to same events
because packets tend to be lost in bursts - Lock-out A side effect of burstiness and
synchronization is that a few flows can
monopolize queue space
7Design Objectives
- Keep throughput high and delay low (i.e. knee)
- Accommodate bursts
- Queue size should reflect ability to accept
bursts rather than steady-state queuing - Improve TCP performance with minimal hardware
changes
8Queue Management Ideas
- Synchronization, lock-out
- Random drop Drop a randomly chosen packet
- Drop front Drop packet from head of queue
- High steady-state queuing vs. burstiness
- Early drop Drop packets before queue full
- Do not drop packets too early because queue may
reflect only burstiness and not true overload - Misbehaving vs. Fragile flows
- Drop packets proportional to queue occupancy of
flow - Try to protect fragile flows from packet loss
(e.g. color them or classify them on the fly) - Drop packets vs. Mark packets
- Dropping packets interacts w/ reliability
mechanisms - Mark packets Need to trust end-systems to
respond!
9Packet Drop Dimensions
Aggregation
Per-connection state
Single class
Class-based queuing
Drop position
Head
Tail
Random location
Overflow drop
Early drop
10Random Early Detection (RED)
Min thresh
Max thresh
Running Average Queue Length
P(drop)
1.0
maxP
minth
maxth
Running Avg Q Length
11Random Early Detection (RED)
- Maintain running average of queue length avgQ
- Low pass filtering
- If avgQ lt minth do nothing
- Low queuing, send packets through
- If avgQ gt maxth, drop packet
- Protection from misbehaving sources
- Else mark (or drop) packet in a manner
proportional to queue length bias to protect
against synchronization - Pb maxp(avgQ - minth) / (maxth - minth)
- Further, bias Pb by history of unmarked packets
- Pdrop Pb/(1 - countPb)
12RED Issues
- Issues
- Breaks synchronization well
- Extremely sensitive to parameter settings
- Wild queue oscillations upon load changes
- Fail to prevent buffer overflow as sources
increases - Does not help fragile flows (e.g. small window
flows or retransmitted packets) - Does not adequately isolate cooperative flows
from non-cooperative flows - Isolation
- Fair queuing achieves isolation using per-flow
state - RED penalty box Monitor history for packet
drops, identify flows that use disproportionate
bandwidth
13Variant ARED (Feng, Kandlur, Saha, Shin 1999)
- Adaptive RED
- Motivation RED extremely sensitive to number of
sources and parameter settings - Idea Adapt maxp to load
- If avgQ lt minth, decrease maxp
- If avgQ gt maxth, increase maxp
- No per-flow information needed
14Variant FRED (Ling Morris 1997)
- Flow RED
- Motivation Marking packets in proportion to
flow rate is unfair (e.g. adaptive vs.
non-adaptive flows) - Idea
- A flow can buffer up to minq packets w/o being
marked - A flow that frequently buffers more than maxq
packets gets penalized - All flows with backlogs in between are marked
according to RED - No flow can buffer more than avgcq packets
persistently - Where avgcq average per-flow buffer use
- Need per-active-flow accounting
15Variant BLUE (Feng, Kandlur, Saha, Shin 1999)
- BLUE ??? (2nd 3rd authors at IBM)
- Motivation Wild oscillation of RED leads to
cyclic overflow underutilization - Algorithm
- On buffer overflow, increment marking probability
- On link idle, decrement marking probability
16Variant Stochastic Fair Blue
- Motivation Protection against non-adaptive
flows - Algorithm
- L hash functions map a packet to L bins (out of
NxL ) - Marking probability associated with each bin is
- Incremented if bin occupancy exceeds threshold
- Decremented if bin occupancy is 0
- Packets marked with min p1, , pL
h1
h2
hL
hL-1
nonadaptive
adaptive
17Stochastic Fair Blue continued
- Idea
- A non-adaptive flow drives marking prob to 1 at
all L bins it is mapped to - An adaptive flow may share some of its L bins
with non-adaptive flows - Non-adaptive flows can be identified and
penalized with reasonable state overhead (not
necessarily per-flow) - Large numbers of bad flows may cause false
positives
18REM Athuraliya Low 2000
- REM Random Exponential Marking
- Decouple congestion performance measure
- Price adjusted to match rate and clear buffer
- Marking probability exponential in price
REM
RED
1
Avg queue
19Comparison of AQM Performance
- REM
- queue 1.5 pkts
- utilization 92
- g 0.05, a 0.4, f 1.15
DropTail queue 94
20The DECbit Scheme
- Basic ideas
- Mark packets instead of dropping them
- Special support at routers and e2e
- Scheme
- On congestion, router sets congestion indication
(CI) bit on packet - Receiver relays bit to sender
- Sender adjusts sending rate
- Key design questions
- When to set CI bit?
- How does sender respond to CI?
21Setting CI Bit
Queue length
Current time
Time
Previous cycle
Current cycle
Averaging interval
AVG queue length (previous busyidle)
current interval/averaging interval
22DECbit Routers
- Router tracks average queue length
- Regeneration Cycle Queue goes from empty to
non-empty to empty - Average from start of previous cycle
- If average gt 1 ? router sets bit for flows
sending more than their share - If average gt 2 ? router sets bit in every packet
- Threshold is a trade-off between queuing and
delay - Optimizes power (throughput / delay)
- Compromise between sensitivity and stability
- Acks carry bit back to source
23DECbit Source
- Source averages across acks in window
- Congestion if gt 50 of bits set
- Will detect congestion earlier than TCP
- Additive increase, multiplicative decrease
- Decrease factor 0.875
- Increase factor 1 packet
- After change, ignore DECbit for packets in flight
(vs. TCP ignore other drops in window) - No slow start
24Alternate Congestion Control Models
- Loss-based TCP Classic, TCP Reno,
- Accumulation-based schemes TCP Vegas, Monaco
- Use per-flow queue contribution (backlog) as a
congestion estimate instead of loss rate - Explicit rate-based feedback
- Controller at bottleneck assigns rates to each
flow - Packet Pair congestion control Not covered
- WFQ at bottlenecks isolates flows, and gives fair
rates - Packet-pair probing discovers this rate and sets
source rate to that.
25TCP Reno (Jacobson 1990)
window
time
CA
SS
SS Slow Start CA Congestion Avoidance
26TCP Vegas (Brakmo Peterson 1994)
window
time
CA
SS
- Accumulation-based scheme
- Converges, no retransmission
- provided buffer is large enough for no loss.
27Accumulation Single Queue
- Flow i at router j
- Arrival curve Aij(t)
- Service curve Sij(t)
- Cumulative
- Continuous
- Non-decreasing
- If no loss, then
Aij(t)
bit
delay
Sij(t)
b2
qij(t1)
b1
time
t2
t1
27
28Accumulation Series of Queues
Model as a single queue
1
j
j1
J
ingress
egress
dj
fi
µij
?i,j1
µi
?i
28
29Queue vs. Accumulation Behavior
- Queue qij(t) -- info of flow i queued in a FIFO
router j
- Accumulation ai(t) -- info of flow i queued in a
set of FIFO routers 1, , J
- The collective queuing behavior of a set of FIFO
routers looks similar to that of one single FIFO
router
29
30Accumulation Distributed, Time-shifted Sum
31Control Policy
- Control objective keep
- If , no way to probe increase of available
bw
31
32Two Accumulation-Based Schemes
- Monaco
- Accumulation estimation out-of-band / in-band
- Congestion response
- Additive Increase/Additive Decrease (AIAD)
- Vegas
- Accumulation estimation in-band
- Congestion response
- Additive Increase/Additive Decrease (AIAD)
32
33Accumulation vs. Monaco Estimator
time
1
j
j1
J
34Accumulation vs. Monaco Estimator
1
jf
Jf
jf1
djf
fi
data
µij
?i,j1
µi
?i
ctrl
Jb
jb1
jb
1
djb
ctrl
out-of-bd ctrl
classifier
FIFO
in-band ctrl, data pkt
34
35Monaco
- Congestion estimation
- Out-of-band and in-band control packets
- Congestion response
- If qm lt a, cwnd(k1) cwnd(k) 1
- If qm gt ß, cwnd(k1) cwnd(k) 1 1 a lt
ß 3
35
36TCP Vegas
- Congestion estimation
- Define qv ( cwnd / rttp cwnd / rtt ) rttp
- where rttp is estimate of round trip
propagation delay - Congestion response
- if qv lt a, cwnd(k1) cwnd(k) 1
- if qv gt ß, cwnd(k1) cwnd(k) 1 1 a lt ß
3
36
37Vegas Accumulation Estimator
- The physical meaning of qv
- rtt rttp rttq rttq is queuing time
- qv ( cwnd / rttp cwnd / rtt ) rttp
- ( cwnd / rtt ) ( rtt rttp )
- ( cwnd / rtt ) rttq if rtt is typical
- sending rate rttq Littles Law
- packets backlogged Littles Law again
- So vegas maintains a ß number of packets queued
inside the network - It adjusts sending rate additively to achieve this
37
38Accumulation vs. Vegas Estimator
38
39Vegas vs. Monaco Estimators
- Vegas accumulation estimator
- Ingress-based
- Round trip (forward data path and backward ack
path) - Sensitive to ack path queuing delay
- Sensitive to round trip propagation delay
measurement error - Monaco accumulation estimator
- Egress-based
- One way (only forward data path)
- Insensitive to ack path queuing delay
- No need to explicitly know one way propagation
delay
40Queue, Utilization w/ basertt Errors
40
41TCP Modeling
- Given the congestion behavior of TCP can we
predict what type of performance we should get? - What are the important factors
- Loss rate
- Affects how often window is reduced
- RTT
- Affects increase rate and relates BW to window
- RTO
- Affects performance during loss recovery
- MSS
- Affects increase rate
42Overall TCP Behavior
- Lets focus on steady state (congestion
avoidance) with no slow starts, no timeouts and
perfect loss recovery - Some additional assumptions
- Fixed RTT
- No delayed ACKs
Window
Time
43Derivation
window
t
- Each cycle delivers 2w2/3 packets
- Assume Each cycle delivers 1/p packets 2w2/3
- Delivers 1/p packets followed by a drop
- gt Loss probability p/(1p) p if p is small.
- Hence
44Alternate Derivation
- Assume Loss is a Bernoulli process with
probability p - Assume p is small
- wn is the window size after nth RTT
45 Law
- Equilibrium window size
- Equilibrium rate
- Empirically constant a 1
- Verified extensively through simulations and on
Internet - References
- T.J.Ott, J.H.B. Kemperman and M.Mathis (1996)
- M.Mathis, J.Semke, J.Mahdavi, T.Ott (1997)
- T.V.Lakshman and U.Mahdow (1997)
- J.Padhye, V.Firoiu, D.Towsley, J.Kurose (1998)
46Implications
- Applicability
- Additive increase, multiplicative decrease (Reno)
- Congestion avoidance dominates
- No timeouts, e.g., SACKRH
- Small losses
- Persistent, greedy sources
- Receiver not bottleneck
- Implications
- Reno equalizes window
- Reno discriminates against long connections
- Halving throughput gt quadrupling loss rate!
47Refinement (Padhye, Firoin, Towsley Kurose 1998)
- Renewal model including
- FR/FR with Delayed ACKs (b packets per ACK)
- Timeouts
- Receiver window limitation
- Source rate
- When p is small and Wr is large, reduces to
48TCP Friendliness
- What does it mean to be TCP friendly?
- TCP is not going away
- Fairness means equal shares
- Any new congestion control must compete with TCP
flows - Should not clobber TCP flows and grab bulk of
link - Should also be able to hold its own, i.e. grab
its fair share, or it will never become popular
49Binomial Congestion Control
- In AIMD
- Increase Wn1 Wn ?
- Decrease Wn1 (1- ?) Wn
- In Binomial
- Increase Wn1 Wn (? / Wnk)
- Decrease Wn1 Wn - ? Wnl
- k0 l1 ? AIMD
- l lt 1 results in less than multiplicative
decrease - Good for multimedia applications
50Binomial Congestion Control continued
- Rate 1/ (loss rate)1/(kl1)
- If kl1 ? rate 1/p0.5
- TCP friendly
- AIMD (k0, l1) is the most aggressive of this
class - SQRT (k1/2,l1/2) and IIAD (k1,l0)
- Good for applications that want to probe quickly
and can use any available bandwidth
51Static Optimization Framework
pl(t)
xi(t)
Feedback?
- Duality theory ? equilibrium
- Source rates xi(t) are primal variables (i is a
flow) - Congestion measures pl(t) are dual variables,
where pl(t) prob of loss on link l at time
t - Congestion control is optimization process over
Internet
52Overview Equilibrium
- Interaction of source rates xs(t) and congestion
measures pl(t) - Duality theory
- They are primal and dual variables
- Flow control is optimization process
- Example congestion measure
- Loss (Reno)
- Queueing delay (Vegas)
53Overview Equilibrium continued
- Congestion control problem
- TCP/AQM protocols (F, G)
- Maximize aggregate source utility
- With different utility functions Us(xs)
54Model
- Sources s
- L(s) - links used by source s
- Us(xs) - utility if source rate xs
- Network
- Link l has capacity cl
55Primal Problem
- Assumptions
- Strictly concave increasing Us
- Unique optimal rates xs exist
- Direct solution impractical
56Duality Approach
57Gradient Algorithm
Theorem (Low, Lapsley, 1999) Converges to
optimal rates in an asynchronous environment
58Example
59Example continued
- xs proportionally fair (Vegas)
- pl Lagrange multiplier, (shadow) price,
congestion measure - How to compute (x, p)?
- Gradient algorithms, Newton alg, Primal-dual alg,
- Relevance to TCP/AQM ??
- TCP/AQM protocols implement primal-dual
algorithms over Internet
60Example continued
- xs proportionally fair (Vegas)
- pl Lagrange multiplier, (shadow) price,
congestion measure - How to compute (x, p)?
- Gradient algorithms, Newton alg, Primal-dual alg,
- Relevance to TCP/AQM ??
- TCP/AQM protocols implement primal-dual
algorithms over Internet
61Active Queue Management (AQM)
- Idea provide congestion information by
probabilistically marking packets - Issues
- How to measure congestion (p and G)?
- How to embed congestion measure?
- How to feed back congestion info?
62RED (Floyd Jacobson 1993)
- Congestion measure average queue length
- pl(t1) pl(t) xl(t) - cl
- Embedding p-linear probability function
marking
1
Avg queue
63REM (Athuraliya Low 2000)
- Congestion measure price
- pl(t1) pl(t) g(al bl(t) xl (t) - cl
) - Embedding exponential probability function
64Key Features
- Clear buffer and match rate
Theorem (Paganini 2000) Global asymptotic
stability for general utility function (in the
absence of delay)
65AQM Summary
66Reno F(p(t), x(t))
for every ack (ca) W 1/W for every
loss W W/2
Primal-dual algorithm
x(t1) F( p(t), x(t) ) p(t1) G( p(t),
x(t) )
67Reno Implications
- Equilibrium characterization
- Duality
- Congestion measure p loss
- Implications
- Reno equalizes window wi ti xi
- inversely proportional to delay ti
- dependence for small p
- DropTail fills queue, regardless of queue
capacity
68Reno Gradient Algorithm
- TCP approximate version of gradient algorithm
69Reno Gradient Algorithm
- TCP approximate version of gradient algorithm
70Vegas
for every RTT if W/RTTmin W/RTT lt a then
W if W/RTTmin W/RTT gt a then W --
for every loss W W/2
queue size
71ATM ABR Explicit Rate Feedback
RM Cell
Source
Destination
- Sources regulate transmission using a rate
parameter - Feedback scheme
- Every (n1)th cell is an RM (control) cell
containing current cell rate, allowed cell rate,
etc - Switches adjust the rate using rich information
about congestion to calculate explicit, multi-bit
feedback - Destination returns the RM cell to the source
- Control policy Sources adjust to the new rate
72ERICA Design Goals
- Allows utilization to be 100 (better tracking)
- Allows operation at any point between knee and
cliff - The queue length can be set to any desired value
(tracking) - Max-min fairness (fairness)
100
Link Utilization
Throughput
Time
Load
QueueLength
Delay
50
Load
73Efficiency vs. Fairness OSU Scheme
- Efficiency high utilization
- Fairness Equal allocations for contending
sources - Worry about fairness after utilization close to
100 utilization . Target Utilization (U) and
Target Utilization Band (TUB).
overload region
worry about fairness here
999591
TotalLoad
TUB
U
underload region
Time
74ERICA Switch Algorithm
- Overload Input rate/Target rate
- Fair Share Target rate/ of active VCs
- This VCs Share VCs rate /Overload
- ER Max(Fair Share, This VCs Share)
- ER in Cell Min(ER in Cell, ER)
- This is the basic algorithm.
- Has more steps for improved fairness, queue
management, transient spike suppression,
averaging of metrics.
75TCP Rate Control
- Step 1 Explicit control of window
Congestion window (CWND)
W
Actual Window Min(Cwnd, Wr)
- Step 2 Control rate of acks (ack-bucket)
Tradeoff ack queues in reverse path for fewer
packets in forward path
r
pkts
W
acks
R
Time
76Summary
- Active Queue Management (AQM) RED, REM, etc
- Alternative models
- Accumulation-based schemes Monaco, Vegas
- Explicit Rate-based Schemes
- TCP stochastic modeling
- Static (Duality) Optimization Framework
- Can manage TCP with very little loss