Title: TCP Traffic Control
1Chapter 12
2Introduction
- Performance implications of TCP Flow and Error
Control - Performance implications of TCP Congestion
Control - Performance of TCP/IP over ATM
3TCP Flow and Error Control
- Uses a form of sliding window (GBN)
- Differs from mechanism used in LLC, HDLC, X.25,
and others - Decouples acknowledgement of received data units
from granting permission to send more - TCPs flow control is known as a credit
allocation scheme - Each transmitted octet is considered to have a
sequence number - Each acknowledgement can grant permission to send
a specific amount of additional data - TCP Window Size lt Min (CongWin, RcvWin)
4TCP Header Fields for Flow Control
- Sequence number (SN) of first octet in data
segment - Acknowledgement number (AN) of next expected
octet - Window size (W) in octets
- Acknowledgement contains AN i, W j
- Octets through SN i - 1 acknowledged
- Permission is granted to send W j more octets,
- i.e., octets i through i j - 1
5TCP Credit Allocation Mechanism
Note trailing edge advances each time A sends
data, leading edge advances only when B grants
additional credit.
6Credit Allocation is Flexible
- Suppose last message B issued was AN i, W
j - To increase credit to k (k gt j) when no new data,
B issues AN i, W k - To acknowledge segment containing m octets (m lt
j) without granting additional credit, B issues
AN i m, W j - m
7Flow Control Perspectives
8Credit Policy
- Receiver needs a policy for how much credit to
give sender - Conservative approach grant credit up to limit
of available buffer space - May limit throughput in long-delay situations
- Optimistic approach grant credit based on
expectation of freeing space before data arrives
9Effect of TCP Window Size on Performance
- W TCP window size (octets)
- R Data rate (bps) at TCP source
- D Propagation delay (seconds)
- between source and destination
- After TCP source begins transmitting, it takes D
seconds for first bits to arrive, and D seconds
for acknowledgement to return (RTT) - TCP source could transmit at most 2RD bits, or
RD/4 octets
10Maximum Normalized Throughput S
- 1 W ? RD / 4
-
- 4W W ? RD / 4
- RD
S
Where W window size (octets) R data rate
(bps) at TCP source D dprop (seconds) between
TCP source and destination, 2D RTT Note RD
(bits) known as rate-delay product
11Window Scale Parameter (Optional header item)
W RD/4
W 220 - 1
W 216 - 1
RD
12Complicating Factors
- Multiple TCP connections are multiplexed over
same network interface, reducing data rate, R,
per connection (?S?) - For multi-hop connections, D is the sum of delays
across each network plus delays at each router,
increasing D (?S?) - If source data rate R at source exceeds data rate
on one of the hops, that hop will be a bottleneck
(?S?) - Lost segments are retransmitted, reducing
throughput. Impact depends on retransmission
policy (?S?)
13Retransmission Strategy
- TCP relies exclusively on positive
acknowledgements and retransmission on
acknowledgement timeout - There is no explicit negative acknowledgement
(NAK-less) - Retransmission required when
- Segment arrives damaged, as indicated by checksum
error, causing receiver to discard segment - Segment fails to arrive (implicit detection
scheme)
14TCP Timers
- A timer is associated with each segment as it is
sent - If a timer expires before the segment is
acknowledged, sender must retransmit - Key Design Issue
- value of retransmission timer
- Too small many unnecessary retransmissions,
wasting network bandwidth - Too large delay in handling lost segment
15Two Strategies
- Timer should be longer than round-trip delay
(send segment, receive ack) - Round Trip Delay is variable
- Strategies
- Fixed timer
- Adaptive
16Problems with Adaptive Scheme
- Peer TCP entity may accumulate acknowledgements
and not acknowledge immediately - For retransmitted segments, sender cant tell
whether acknowledgement is response to original
transmission or retransmission - Network conditions may change suddenly
- These tend to introduce artificialities in
- timer measurements at the TCP source
- (more later)
17Adaptive Retransmission Timer
- Average Round-Trip Time (ARTT)
- K 1
- ARTT(K 1) 1 ? RTT(i)
- K 1 i 1
- K ARTT(K)
RTT(K 1) - K 1
K 1
18RFC 793 Exponential Averaging
- Smoothed Round-Trip Time (SRTT)
- SRTT(K 1) a SRTT(K)
- (1 a) RTT(K 1)
- The older the observation, the less it is counted
in the average.
19Exponential Smoothing Coefficients
20Exponential Averaging
Decreasing function
Increasing function
21RFC 793 Retransmission Timeout
- RTO(K 1)
- Min(UB, Max(LB, ß SRTT(K 1)))
- UB, LB prechosen fixed upper and lower bounds
- Example values for a, ß
- 0.8 lt a lt 0.9 1.3 lt ß lt 2.0
22TCP Implementation Policy Options (per RFC
793/1122)
- Send free to transmit when convenient
- Deliver free to deliver to application when
convenient - Accept how data are accepted
- In-order (out of order data is discarded)
- In-window (accept and buffer any data in window)
- Retransmit how to handle queued, but not yet
acknowledged, data - First-only (timer per queue, retransmit segment
FIFO) - Batch (timer per queue, retransmit whole queue)
- Individual (timer per segment, retransmit by
segment) - Acknowledge
- immediate (send immediate ack for each good
segment) - cumulative (timed wait, and piggyback cumulative
ack)
Performance implications?
23TCP Congestion Control
- Dynamic routing can alleviate congestion by
spreading load more evenly - But, only effective for unbalanced loads and
brief surges in traffic - Congestion can only be effectively controlled by
limiting total amount of data entering the
network - ICMP Source Quench message is crude and not
effective - RSVP may help, but not widely implemented
24TCP Congestion Control is Difficult
- IP is connectionless and stateless, with no
provision for detecting or controlling congestion - RFC 3168 adds ECN to IP, but not widely deployed
yet - TCP only provides end-to-end flow control
- No cooperative, distributed algorithm to bind
together various TCP entities
25TCP Flow and Congestion Control
- The rate at which a TCP entity can transmit is
determined by rate of incoming ACKs to previous
segments with new credit (TCP self-clocking) - Rate of ACK arrival is determined by the
round-trip path between source and destination - Bottleneck may be destination or internet
- Sender cannot tell which
- Only the internet bottleneck can be due to
congestion
26TCP Segment Pacing Self-Clocking
Congestion Control (bottleneck in the network)
Flow Control ( bottleneck at the receiver)
27TCP Flow and Congestion Control Potential
Bottlenecks
Physical bottlenecks physical capacity
constraints Logical bottlenecks queuing effects
due to load
28TCP Congestion Control Measures
Note TCP Tahoe and TCP Reno from Berkeley Unix
TCP implementations
29Retransmission Timer Management
- Three Techniques to calculate retransmission time
out (RTO) value - RTT Variance Estimation
- Exponential RTO Backoff
- Karns Algorithm
30RTT Variance Estimation(Jacobsons Algorithm)
- 3 sources of high variance in RTT
- If data rate is relatively low, then transmission
delay will be relatively large, with larger
variance due to variance in packet size - Load may change abruptly due to other sources
- Peer may not acknowledge segments immediately
31Jacobsons Algorithm
- SRTT(K 1) (1 g) SRTT(K) g RTT(K 1)
- SERR(K 1) RTT(K 1) SRTT(K)
- SDEV(K 1) (1 h) SDEV(K) h SERR(K
1) - RTO(K 1) SRTT(K 1) f SDEV(K 1)
- g 0.125 (?)
- h 0.25 (?)
- f 2 or f 4
- (most current implementations use f 4)
32Jacobsons RTO Calculations
Decreasing function
Increasing function
33Two Other Factors
- Jacobsons algorithm can significantly improve
TCP performance, but - What RTO should we use for retransmitted
segments? - ANSWER exponential RTO backoff algorithm
- Which round-trip samples should we use as input
to Jacobsons algorithm? - ANSWER Karns algorithm
34Exponential RTO Backoff
- Increase RTO each time the same segment is
retransmitted backoff process - Multiply RTO by constant
- RTO q RTO
- q 2 is called binary exponential backoff (like
Ethernet CSMA/CD)
35Which Round-trip Samples?
- If an ack is received for a retransmitted
segment, there are 2 possibilities - Ack is for first transmission
- Ack is for second transmission
- TCP source cannot distinguish between these two
cases - No valid way to calculate RTT
- From first transmission to ack, or
- From second transmission to ack?
36Karns Algorithm
- Do not use measured RTT for retransmitted
segments to update SRTT and SDEV - Calculate exponential backoff RTO when a
retransmission occurs - Use backoff RTO for segments until an ack arrives
for a segment that has not been retransmitted - Then use Jacobsons algorithm to calculate RTO
37Window Management
- Slow start
- Dynamic window sizing on congestion
- Fast retransmit
- Fast recovery
- ECN
- Other Mechanisms
38Slow Start
- awnd MIN credit, cwnd
- where
- awnd allowed window in segments
- cwnd congestion window in segments (assumes MSS
bytes per segment) - credit amount of unused credit granted in most
recent ack (rcvwindow) - cwnd 1 for a new connection and increased by 1
(except during slow start) for each ack received,
up to a maximum
39Effect of TCP Slow Start
40Dynamic Window Sizing on Congestion
- A lost segment indicates congestion
- Prudent (conservative) to reset cwnd to 1 and
begin slow start process - May not be conservative enough easy to
drive a network into saturation but hard for the
net to recover (Jacobson) - Instead, use slow start with linear growth in
cwnd after reaching a threshold value
41Slow Start and Congestion Avoidance
42Illustration of Slow Start and Congestion
Avoidance
43Fast Retransmit (TCP Tahoe)
- RTO is generally noticeably longer than actual
RTT - If a segment is lost, TCP may be slow to
retransmit - TCP rule if a segment is received out of order,
an ack must be issued immediately for the last
in-order segment - Tahoe/Reno Fast Retransmit rule if 4 acks
received for same segment (I.e. 3 duplicate
acks), highly likely it was lost, so retransmit
immediately, rather than waiting for timeout
44Fast Retransmit
Triple duplicate ACK
45Fast Recovery (TCP Reno)
- When TCP retransmits a segment using Fast
Retransmit, a segment was assumed lost - Congestion avoidance measures are appropriate at
this point - E.g., slow-start/congestion avoidance procedure
- This may be unnecessarily conservative since
multiple ACKs indicate segments are actually
getting through - Fast Recovery retransmit lost segment, cut
threshold in half, set congestion window to
threshold 3, proceed with linear increase of
cwnd - This avoids initial slow-start
46Fast Recovery Example
Tahoe Slow Start
47Explicit Congestion Notification in TCP/IP RFC
3168
- ECN capable router sets congestion bits in the IP
header of packets to indicate congestion - TCP receiver sets bits in TCP ACK header to
return congestion indication to peer (sender) - TCP senders respond to congestion indication as
if a packet loss had occurred
IPv4 TOS field, IPv6 Traffic Class field
48Explicit Congestion Notification in TCP/IP RFC
3168
- 0 0 Not ECN Capable Transport
- 0 1 ECT (1)
- 1 0 ECT (0)
- 1 1 Congestion Experienced
0 0 Set-up not ECN Capable response 0
1 Receiver ECN-Echo CE packet received 1
0 Sender Congestion Window Reduced
acknowledgement 1 1 Set-up with SYN to indicate
ECN capability
49TCP/IP ECN Protocol
TCP Sender
TCP Receiver
Hosts negotiate ECN capability during TCP
connection setup
SYN ECE CWR
SYN ACK ECE
50TCP/IP ECN Some Other Considerations
- Sender sets CWR only on first data packet after
ECE - CWR also set for window reduction for any other
reason - ECT must not be set in retransmitted packets
- Receiver continues to send ECE in all ACKs until
CWR received - Delayed ACKs if any data packet has CE set, send
ECE - Fragmentation if any fragment has CE set, send
ECE
51Some Other Mechanisms for TCP Congestion Control
- Limited Transmit for small congestion windows
triggers fast retransmit for lt 3 dup ACKs - Appropriate Byte Counting (ABC) congestion
window modified based number of bytes
acknowledged by each ACK, rather than by the
number of ACKs that arrive - Selective Acknowledgement defines use of TCP
selective acknowledgement option
52Performance of TCP over ATM
- How best to manage TCPs segment size, window
management and congestion control mechanisms - at the same time as ATMs quality of service and
traffic control policies - TCP may operate end-to-end over one ATM network,
or there may be multiple ATM LANs or WANs with
non-ATM networks
53TCP/IP over AAL5/ATM
54Performance of TCP over UBR
- Buffer capacity at ATM switches is a critical
parameter in assessing TCP throughput performance
(why?) - Insufficient buffer capacity results in lost TCP
segments and retransmissions - No segments lost if each ATM switch has buffer
capacity equal to or greater than the sum of the
rcvwindows for all TCP connections through the
switch (practical?)
55Effect of Switch Buffer Size (example - Romanow
Floyd)
- Data rate of 141 Mbps
- End-to-end propagation delay of 6 µs
- IP packet sizes of 512 9180 octets
- TCP window sizes from 8 Kbytes to 64 Kbytes
- ATM switch buffer size per port from 256 8000
cells - One-to-one mapping of TCP connections to ATM
virtual circuits - TCP sources have infinite supply of data ready
56Performance of TCP over UBR
(UBR)
57Observations
- If a single cell is dropped, other cells in the
same IP datagram are unusable, yet ATM network
forwards these useless cells to destination - Smaller buffer increases probability of dropped
cells - Larger segment size increases number of useless
cells transmitted if a single cell dropped
58Partial Packet and Early Packet Discard
- Reduce the transmission of useless cells
- Work on a per-virtual-channel basis
- Partial Packet Discard
- If a cell is dropped, then drop all subsequent
cells in that segment (i.e., up to and including
the first cell with SDU type bit set to one) - Early Packet Discard
- When a switch buffer reaches a threshold level,
preemptively discard all cells in a segment
59Performance of TCP over UBR
(UBR)
60ATM Switch Buffer Layout Selective Drop and
Fair Buffer Allocation
61EPD Fairness - Selective Drop
- Ideally, N/V cells buffered for each of the V
virtual channels - Weight ratio, W(i) N(i) N(i) V
- N/V
N - Selective Drop
- If N gt R and W(i) gt Z
- then drop next new packet on VC(i)
- Z is a parameter to be chosen (studies show
optimal Z slightly lt 1)
62Fair Buffer Allocation
- More aggressive dropping of packets as congestion
increases - Drop new packet when
- N gt R and W(i) gt Z B R
- N - R
-
Note that the larger the portion of the safety
zone (B-R) that is occupied, the smaller the
number with which W(i) is compared.
63TCP over ABR
- Good performance of TCP over UBR can be achieved
with minor adjustments to switch mechanisms
(i.e., PPD/EPD) - This reduces the incentive to use the more
complex and more expensive ABR service - Performance and fairness of ABR is quite
sensitive to some ABR parameter settings - Overall, ABR does not provide significant
performance over simpler and less expensive
UBR-EPD or UBR-EPD-FBA