Title: Advanced Computer Networks
1Advanced Computer Networks
- TCP Congestion Control and RED
- Lan Wang
- lanwang_at_memphis.edu
- http//www.cs.memphis.edu/lanwang
Based in part upon the slides of Shriv
Kalyanaraman (RPI), Lixin Gao (Umass), Raj Jain
(OSU), J.Kurose (Umass), S. Keshav
(Cornell)), I.Stoica (UCB), S. Deering (Cisco).
2Congestion Tragedy of Commons
- Different sources compete for common or
shared resources inside network. - Sources are unaware of current state of resource
- Sources are unaware of each other
- Source has self-interest. Assumes that increasing
rate by N will lead to N increase in
throughput! - Conflicts with collective interests if all
sources do this to drive the system to overload,
throughput gain is NEGATIVE, and worsens rapidly
with incremental overload gt congestion
collapse!! - Need enlightened self-interest!
3Congestion A Close-up View
packet loss
knee
cliff
- knee point after which
- throughput increases very slowly
- delay increases fast
- cliff point after which
- throughput starts to decrease very fast to zero
(congestion collapse) - delay approaches infinity
- Note (in an M/M/1 queue)
- delay 1/(1 utilization)
Throughput
congestion collapse
Load
Delay
Load
4Congestion Control vs. Congestion Avoidance
- Congestion control goal
- stay left of cliff
- Congestion avoidance goal
- stay left of knee
- Right of cliff
- Congestion collapse
knee
cliff
Throughput
congestion collapse
Load
5Basic Control Model
- Lets assume window-based operation
- Reduce window when congestion is perceived
- How is congestion signaled?
- Either mark or drop packets
- When is a router congested?
- Drop tail queues when queue is full
- Average queue length at some threshold
- Increase window otherwise
- Probe for available bandwidth how?
6Simple linear control
- Many different possibilities for reaction to
congestion and methods for probing - Examine simple linear controls
- Window(t 1) a b Window(t)
- Different ai/bi for increase and ad/bd for
decrease - Supports various reaction to signals
- Increase/decrease additively
- Increased/decrease multiplicatively
- Which of the four combinations is optimal?
7Phase plots
- Simple way to visualize behavior of competing
flows over time - Caveat assumes 2 flows, synchronized feedback,
equal RTT, discrete rounds of operation
Fairness Line
Overload
User 2s Allocation x2
Optimal point
Underutilization
Efficiency Line
User 1s Allocation x1
8Additive Increase/Decrease
- Both X1 and X2 increase/decrease by the same
amount over time - Additive increase improves fairness increases
load - Additive decrease reduces fairness decreases
load
Fairness Line
T1
User 2s Allocation x2
T0
Efficiency Line
User 1s Allocation x1
9Multiplicative Increase/Decrease
- Both X1 and X2 increase by the same factor over
time - Fairness unaffected (constant), but load
increases (MI) or decreases (MD)
Fairness Line
T1
User 2s Allocation x2
T0
Efficiency Line
User 1s Allocation x1
10Additive Increase/Multiplicative Decrease (AIMD)
Policy
- Assumption decrease policy must (at minimum)
reverse the load increase over-and-above
efficiency line - Implication decrease factor should be
conservatively set to account for any congestion
detection lags etc
11TCP Congestion Control
- Maintains three variables
- cwnd congestion window
- rcv_win receiver advertised window
- ssthresh threshold size (used to update cwnd)
- Rough estimate of knee point
- For sending use win min(rcv_win, cwnd)
12TCP Slow Start
- Goal initialize system and discover congestion
quickly - How? Quickly increase cwnd until network
congested ? get a rough estimate of the optimal
cwnd - How do we know when network is congested?
- packet loss (TCP)
- over the cliff here ? congestion control
- congestion notification (eg DEC Bit, ECN)
- over knee before the cliff?congestion avoidance
- Implications of using loss as congestion
indicator - Late congestion detection if the buffer size is
large - Higher speed links or large buffers gt larger
windows gt higher probability of burst loss - Interactions with retransmission algorithm and
timeouts
13TCP Slow Start
- Whenever starting traffic on a new connection, or
whenever increasing traffic after congestion was
experienced - Set cwnd 1
- Each time a segment is acknowledged increment
cwnd by one (cwnd). - Does Slow Start increment slowly? Not really. In
fact, the increase of cwnd is exponential!! - Window increases to W in RTT log2(W)
14Slow Start Example
- The congestion window size grows very rapidly
- TCP slows down the increase of cwnd when cwnd gt
ssthresh
cwnd 2
cwnd 4
cwnd 8
15Slow Start Sequence Plot
. . .
Sequence No
Window doubles every round
Time
16Congestion Avoidance
- Goal maintain operating point at the left of the
cliff and close to the knee. - How?
- additive increase starting from the rough
estimate (ssthresh), slowly increase cwnd to
probe for additional available bandwidth - multiplicative decrease cut congestion window
size aggressively if a loss is detected.
17Congestion Avoidance
- Slow down Slow Start
- If cwnd gt ssthresh then each time a segment is
acknowledged increment cwnd by 1/cwnd - i.e. (cwnd 1/cwnd).
- So cwnd is increased by one only if all segments
have been acknowledged. - (more about ssthresh latter)
18Congestion Avoidance Sequence Plot
Sequence No
Window grows by 1 every round
Time
19Slow Start/Congestion Avoidance Eg.
ssthresh
Cwnd (in segments)
Roundtrip times
20Putting Everything TogetherTCP Pseudo-code
- Initially
- cwnd 1
- ssthresh infinite
- New ack received
- if (cwnd lt ssthresh)
- / Slow Start/
- cwnd cwnd 1
- else
- / Congestion Avoidance /
- cwnd cwnd 1/cwnd
- Timeout (loss detection)
- / Multiplicative decrease /
- ssthresh win/2
- cwnd 1
while (next lt unack win) transmit next
packet where win min(cwnd, flow_win)
unack
next
seq
win
21The big picture
cwnd
Timeout
Congestion Avoidance
Slow Start
Time
22Packet Loss Detection Timeout Avoidance
- Wait for Retransmission Time Out (RTO)
- Whats the problem with this?
- Because RTO is a performance killer
- In BSD TCP implementation, RTO is usually more
than 1 second - the granularity of RTT estimate is 500 ms
- retransmission timeout is at least two times of
RTT - Solution Dont wait for RTO to expire
- Use alternate mechanism for loss detection
- Fall back to RTO only if these alternate
mechanisms fail.
23Fast Retransmit
- Resend a segment after 3 duplicate ACKs
- A duplicate ACK means that an out-of sequence
segment was received - Notes
- duplicate ACKs due to packet reordering!
- if window is small dont get duplicate ACKs!
ACK 1
cwnd 2
segment 2
segment 3
ACK 1
ACK 3
cwnd 4
segment 4
segment 5
segment 6
segment 7
ACK 4
ACK 4
3 duplicate ACKs
ACK 4
24Fast Recovery (Simplified)
- After a fast-retransmit set cwnd to ssthresh/2
- i.e., dont reset cwnd to 1
- But when RTO expires still do cwnd 1
- Fast Retransmit and Fast Recovery ? implemented
by TCP Reno most widely used version of TCP
today
25Fast Retransmit and Fast Recovery
cwnd
Congestion Avoidance
Slow Start
Time
- Retransmit after 3 duplicated acks
- prevent expensive timeouts
- No need to slow start again
- At steady state, cwnd oscillates around the
optimal window size.
26Fast Retransmit
Retransmission
X
Duplicate Acks
Sequence No
Time
27Typical Internet Queuing
- FIFO drop-tail
- Simplest choice
- Used widely in the Internet
- FIFO (first-in-first-out)
- Implies single class of traffic
- Drop-tail
- Arriving packets get dropped when queue is full
regardless of flow or importance - Important distinction
- FIFO scheduling discipline
- Drop-tail drop (buffer management) policy
28FIFO Drop-tail Problems
- FIFO Issues In a FIFO discipline, the service
seen by a flow is convoluted with the arrivals of
packets from all other flows! - No isolation between flows full burden on e2e
control - No policing send more packets ? get more service
- Drop-tail issues
- Routers are forced to have large queues to
maintain high utilizations - Larger buffers gt larger steady state
queues/delays - Synchronization end hosts react to same events
because packets tend to be lost in bursts - Lock-out a side effect of burstiness and
synchronization is that a few flows can
monopolize queue space
29Design Objectives
- Keep throughput high and delay low (i.e. knee)
- Accommodate bursts
- Queue size should reflect ability to accept
bursts rather than steady-state queuing - Improve TCP performance with minimal hardware
changes
30Queue Management Ideas
- Synchronization, lock-out
- Random drop drop a randomly chosen packet
- Drop front drop packet from head of queue
- High steady-state queuing vs burstiness
- Early drop Drop packets before queue full
- Do not drop packets too early because queue may
reflect only burstiness and not true overload - Misbehaving vs Fragile flows
- Drop packets proportional to queue occupancy of
flow - Try to protect fragile flows from packet loss
(eg color them or classify them on the fly) - Drop packets vs Mark packets
- Dropping packets interacts w/ reliability
mechanisms - Mark packets need to trust end-systems to
respond!
31Packet Drop Dimensions
Aggregation
Single class
Per-connection state
Class-based queuing
Drop position
Tail
Head
Random location
Early drop
Overflow drop
32Random Early Detection (RED)
Min thresh
Max thresh
Average Queue Length
P(drop)
1.0
maxP
minth
maxth
Avg queue length
33Random Early Detection (RED)
- Maintain running average of queue length
- Low pass filtering
- If avg Q lt minth do nothing
- Low queuing, send packets through
- If avg Q gt maxth, drop packet
- Protection from misbehaving sources
- Else mark (or drop) packet in a manner
proportional to queue length bias to protect
against synchronization - Pb maxp(avg - minth) / (maxth - minth)
- Further, bias Pb by history of unmarked packets
- Pa Pb/(1 - countPb)
34Assignments
- Paper Summaries
- L. Zhang, S. Deering, D. Estrin, S. Shenker, and
D. Zappala, RSVP A New Resource ReSerVation
Protocol, IEEE Network, September 1993 (due Mar.
31, 2005, lecture date Apr. 12, 2005). - Ion Stoica, Scott Shenker, Hui Zhang,
Core-Stateless Fair Queueing A Scalable
Architecture to Approximate Fair Bandwidth
Allocations in High Speed Networks, Proceedings
of the ACM SIGCOMM '98 (due Apr. 5, 2005, lecture
date Apr. 7, 2005).