Title: Chapter 6 TCP Congestion Control
1Chapter 6TCP Congestion Control
- Professor Rick Han
- University of Colorado at Boulder
- rhan_at_cs.colorado.edu
2Announcements
- Read Sections 6.1 - 6.4, Skip 6.5
- Hope to get HW 4 to you by Thursday
- Programming Assignment 2 due April 2
- Midterm
- Still grading, a little long, time management
also an issue, will be curved, hope to get back
by Thursday but may have to wait until April 2 - Next, more on TCP
3Recap of Previous Lecture
- TCP Transmission Control Protocol
- Three-way handshake
- SYN and SYN/ACK exchange
- FIN and FIN/ACK exchange
- TCP State Machine
- Setup
- Established
- Tear-Down
- TCP Segments
- Sequence is of lowest byte
- Cumulative ACK
- TCP Flow Control
- Receiver advertises window
4TCP Adaptive Retransmission
- TCP achieves reliability by retransmitting
segments after - A Timeout
- Receiving 3 duplicate cumulative ACKs
- Two out-of-order segments arrive at receiver, but
lowest unacknowledged segment has yet to arrive - Receiver repeats its highest received cumulative
sequence -- hence duplicate cumulative ACKs - Doesnt wait for timeout fast retransmit
- Choosing the value of the Timeout
- If too small, retransmit unnecessarily
- If too large, poor throughput
- Make this adaptive, to respond to changing
congestion delays in Internet
5Initial Round-trip Estimator
- Round trip times exponentially averaged
- New RTT estimate a (old RTT estimate) (1 - a)
(new RTT) - Recommended value for a 0.8 - 0.9
- 0.875 for most TCPs
- Retransmission Timeout RTO b RTT, where b 2
- Thought to be large enough to provide enough
cushion to prevent spurious retransmissions - and small enough to keep throughput high
- Every time the timer expires, retransmit segment
6RTT Retransmission Ambiguity
A
B
Original transmission
X
RTO
Sample RTT
retransmission
ACK
7Karn/Partridges modified RTT Estimator
- Basic problem
- If a sender has retransmitted a segment, then ACK
for that segment may correspond to any of the
retransmissions - Is RTT for first transmission or retransmission?
- Solution
- Each time a segment is retransmitted
- Dont average the RTT estimate with the current
RTT sample - Also, Double the RTO exponential backoff like
Ethernet, assuming that the packet loss was due
to congestion. - If a segment was ACKed after one transmission
- Recalculate RTT estimate and RTO
8Jacobson/Karels Retransmission Timeout
- Key observation
- Original smoothed RTT cant keep up with
wide/rapid variations in RTT - Solution
- Base RTO on both the average RTT and
variance/standard deviation of RTT estimate - Should have the property that
- When stddev is large, want RTO to stay above the
rapid oscillations and not timeout too often - i.e. set RTO Average RTT Nstddev
- When stddev is small, stay close to average RTT
9Jacobson/Karels Retransmission Timeout (2)
- Err current RTT old Ave RTT A
- Next A old A gErr, g0.125
- Next Std Dev D old D h(Err-old D), h0.25
- RTO A 4 Next D
10TCP Congestion Control
- Flow control addresses congestion at receiver,
not in middle of network - Suppose you initially send up to the size of flow
control window - Intermediate routers may not be able to handle so
much traffic - Congestion overflows router buffers causing lost
packets causing retransmissions causing more
congestion congestion collapse in late 80s - Van Jacobsen observation
- Send only enough packets into the network that
the network has the capacity to handle without
loss
11TCP Congestion Control (2)
- Define a congestion window CW
- Distinct from flow control window FW
- Actual window size W min (CW, FW)
- of data bytes that can be on the link
- If receiver is slowest, then W FW
- Else if network is slowest, then W CW
- Send no more data than the bottleneck can handle
12TCP Congestion Control (3)
- How does sender set CW?
- Adaptively probe the network with data segments
- Keep expanding the window until a segment is
lost, then contract window. - Continue with expand/contract cycle throughout
connection sawtooth behavior
13TCP Slow Start
- The rate at which new packets should be injected
into network is the rate at which ACKs are
returned by other end - Use ACKs to pace transmission of packets
self-clocking - Start by setting CW 1 segment (in bytes)
- Initial segment size set by receiver
- For each ACK that returns, increment CW by one.
- Send 1 packet. When ACK returns, increment CW,
CW2 - Send 2 packets. When 1st ACK returns, increment
CW to CW3, when 2nd ACK returns, increment CW to
CW4 - Can send 4 packets. After 4 ACKs return, CW will
be up to 8 - Exponential increase not slow, quickly reach
window size that the network can accommodate
14TCP Slow Start
CW1
CW2
CW3
CW4
CW8
15TCP Additive Increase/ Multiplicative Decrease
- How does a sender detect that CW is too large?
- It starts to see timeouts, which are interpreted
as packet loss due to congestion - After a timeout
- TCP remembers that congestion occurred near CW by
storing CW/2 in ssthresh CW/2 - ssthresh Slow Start Threshold
- TCP drastically resets CW1 and slow starts again
- But TCP exponentially increases only to ssthresh,
halfway to old congestion mark - After CWgtssthresh, additively increase CW
- Rationale Be cautious about sending new data
packets once you get near old mark that caused
timeouts/congestion
16TCP Additive Increase/ Multiplicative Decrease (2)
- Additive increase
- If entire windows worth (CW) of packets in a RTT
is ACKed w/o error, then increment CW by one - In practice, TCP adds a/CW to CW as each ACK
returns, rather than waiting for a full CW of
ACKs to return
17TCP Additive Increase
CW1
CW2
CW3
CW4
18TCP Additive Increase/ Multiplicative Decrease (3)
- Why not just slow start exclusively (exponential
increase) after timeout, instead of additive
increase? - Rationale Be more cautious about adding new
packets once youre near old congestion point, so
dont do exponential increase exclusively - Each time a timeout occurs, divide CW by half and
store in ssthresh multiplicative decrease - Minimum ssthresh and minimum CW is one
19TCP Additive Increase/ Multiplicative Decrease (4)
- Why not additive decrease instead of
multiplicative decrease after congestion? - Consequences of having a too-large congestion
window are worse than having a too-small CW - Adds to congestion
- Additive decrease can keep CW too large for too
long compared to multiplicative decrease
20TCP Saw Tooth Behavior
Congestion Window
Timeouts may still occur
Time
Slowstart to pace packets
Fast Retransmit and Recovery
Initial Slowstart
Courtesy Srini Seshan
21TCP Additive Increase/ Multiplicative Decrease (5)
- What happens if the amount of unacknowledged data
is greater than CW? - Cant send new data
- Retransmit unacknowledged data
- Wait for ACKs for unacknowledged data to increase
CW above size of unacknowledged data, then can
send new data - After a timeout, TCP slows down in two ways
- Congestion window collapses, restricting new data
- RTO backs off exponentially, slowing down
retransmission of old unacknowledged data
22TCP Tahoe vs. TCP Reno
- TCP Tahoe
- Slow start
- Additive increase after hitting ssthresh
- Multiplicative decrease and slow start after
timeout - Fast Retransmit
- TCP Reno
- TCP Tahoe Fast Recovery
- Observation packet loss can be inferred not only
by timeouts, but also by duplicate ACKs - Widely deployed on most UNIX systems, though
SACK-TCP is now gaining prominence
23TCP Tahoe vs. TCP Reno (2)
- How should TCP react if it receives duplicate
cumulative ACKs? - This is a sign that some but not all packets are
getting through to receiver, out of order - Dont react as harshly as when there is a timeout
- If 3 duplicate ACKs are received, then infer that
one segment has been lost - Retransmit immediately, rather than wait for a
timeout called Fast Retransmit - Cancel slow start, and drop CW to half its value
(approximately) rather than to one called Fast
Recovery
24TCP Saw Tooth Behavior
Congestion Window
Timeouts may still occur
Time
Slowstart to pace packets
Fast Retransmit and Recovery
Initial Slowstart
Courtesy Srini Seshan
25Congestion Avoidance
- Congestion control
- Cycle of actively probing, transmitting more than
the network can handle, then backing off - Congestion avoidance
- Back off before there are packet losses
- How can you tell that congestion is increasing?
- Look at RTT is it expanding?
- Be informed by routers that there is congestion
- Set a bit in the packet DECbit
26Congestion Avoidance (2)
- DECbit
- Implemented in Digital Network Architecture (DNA)
- Could be implemented in TCP/IP
- Router sets a congestion bit, and receiver sends
back ACK with a congestion bit set at the
transport layer - If lt50 of packets had congestion bit set, then
add one to CW - If gt50 of packets had congestion bit set, then
decrease CW by 0.875
27Congestion Avoidance (3)
- RED Random Early Detection
- Rather than explicitly setting a congestion bit,
drop a packet before queues are completely full - Controversial why drop a packet until you have
to? - Early signals to end host via timeout or
duplicate ACK that router is getting full - Policy
- If avg queue len lt minthresh, queue packet
- If minthresh lt avg queue len lt maxthresh, drop
packet with probability p - If avg queue len gt maxthresh, drop packet
28Congestion Avoidance (4)
- Source-based congestion avoidance look at RTT
and back off early in your transmissions - Example during two RTTs, if avg RTT gt (min RTT
max RTT)/2, then decrease CW by factor of 8 - Example compare throughput, keep incrementing CW
(CWn1CW1) until - if send rate at RTT(n1) lt 0.5send rate at
RTT(n), then decrease CW by one