Title: TCP: Overview
1Transport Layer
Part 2
TCP Flow Control, Congestion Control, Connection
Management, etc.
2Encapsulation in TCP/IP
IP datagram
3TCP Overview
Error detection, retransmission, cumulative ACKs,
timers, header fields for sequence and ACK numbers
- full duplex data
- bi-directional app. data flow in same connection
- MSS maximum segment size
- connection-oriented
- handshaking (exchange of control msgs) init's
sender, receiver state before data exchange - flow controlled
- sender will not ''flood'' receiver with data
- point-to-point
- one sender, one receiver
- reliable, in-order byte stream
- no message boundaries
- pipelined
- TCP congestion and flow control set window size
- send receive buffers
application
application
writes data
reads data
socket
socket
door
door
TCP
TCP
send buffer
receive buffer
segment
4Recall
application
application
writes data
reads data
socket
socket
door
door
TCP
TCP
send buffer
receive buffer
Packet -gt
- Reliable Data Transfer Mechanisms
- Checksum
- Timer
- Sequence number
- ACK
- NAK
- Window, pipelining
- Verification of integrity of packet
- Signals necessary re-transmission is required
- Keeps track of which packet has been sent and
received
- Indicates receipt of packet in good or bad form
- Allows for the sending of multiple
yet-to-be-acknowledged packets
5Internet Checksum Example
- Note
- When adding numbers, a carryout from the most
significant bit needs to be added to the result - Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1
0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0
1 1
data
wraparound
1
sum
checksum
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
To check
6Connection Oriented Transport TCP
- TCP Segment Structure
- SEQ and ACK numbers
- Calculating the Timeout Interval
- The Simplified TCP Sender
- ACK Generation Recommendation (RFC 1122, RFC
2581) - Interesting Transmission Scenarios
- Flow Control
- TCP Connection Management
7TCP segment structure
Header
sequence number
acknowledgement number
We can view these teeny-weeny details using
Ethereal.
In practice, PSH, URG, and the Urgent Data
Pointer are not used.
8Example
Suppose that a process in Host A wants to send a
stream of data to a process in Host B over a TCP
connection.
Assume Data stream file consisting of 500,000
bytes MSS 1,000 bytes First byte of data stream
numbered as 0
TCP constructs 500 segments out of the data
stream.
500,000 bytes/1,000 bytes 500 segments
9TCP sequence 's and ACKs
...
Segment 1
Segment 2
0 1 2 3 4 .....999 1000 1001 1002....1999
- Sequence. Numbers ('s)
- byte stream 'number' of first byte in segment's
data - Do not necessarily start from 0, use random
initial number R - Segment 1 0 R
- Segment 2 1000 R etc...
- ACKs (acknowledgment)
- Seq of next byte expected from other side (last
byte 1) - Cumulative ACK
- If received segment 1, waits for segment 2
- E.g. Ack1000 R (received up to 999th byte)
10TCP sequence 's and ACKs
client
server
- Q how receiver handles out-of-order segments
- A TCP specs. does not say, - decide when
implementing
Im sending data starting at seq. num42
Assuming that the starting sequence numbers for
Host A and Host B are 42 and 79 respectively
Send me the bytes from 43 onward ACK is being
piggy-backed on server-to-client data
11Yet another server echo example
Host A seq42 ack79 seq47 ack84
Host B seq79 ack47 seq84 ack50
Seq84, ACK50, data 200
ACK tells about up to what byte has been received
and what is the next starting byte the host is
expecting to receive
12TCP Round Trip Time and Timeout
Main Issue How long is the sender willing to
wait before re-transmitting the packet?
- Q how to set TCP timeout value?
- longer than RTT
- note RTT will vary
- too short premature timeout
- unnecessary retransmissions
- too long slow reaction to segment loss
- RTT round trip time
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - ignore retransmissions, cumulatively ACKed
segments - SampleRTT will vary, we would want estimated RTT
to be ''smoother'' - use several recent measurements, not just current
SampleRTT
13TCP Round Trip Time and Timeout
- Setting the timeout
- EstimatedRTT plus ''safety margin''
- large variation in EstimatedRTT -gt larger safety
margin - recommended value of x 0.25
Deviation (1-x) Deviation x
SampleRTT-EstimatedRTT
14Sample Calculations
EstimatedRTT 0.875 EstimatedRTT 0.125
SampleRTT EstimatedRTT after the receipt of the
ACK of segment 1 EstimatedRTT RTT for Segment
1 0.02746 second EstimatedRTT after the
receipt of the ACK of segment 2 EstimatedRTT
0.875 0.02746 0.125 0.035557
0.0285 EstimatedRTT after the receipt of the ACK
of segment 3 EstimatedRTT 0.875 0.0285
0.125 0.070059 0.0337 EstimatedRTT after the
receipt of the ACK of segment 4 EstimatedRTT
0.875 0.0337 0.125 0.11443
0.0438 EstimatedRTT after the receipt of the ACK
of segment 5 EstimatedRTT 0.875 0.0438
0.125 0.13989 0.0558 EstimatedRTT after the
receipt of the ACK of segment 6 EstimatedRTT
0.875 0.0558 0.125 0.18964 0.0725
15RTT Samples and RTT estimates
Estimated RTT
300 250 200 150 100
Sample RTT
RTT (msec.)
The variations in the SampleRTT are smoothed out
in the computation of the EstimatedRTT.
time
16An Actual RTT estimation
17FSM of TCP for Reliable Data Transfer
Simplified TCP sender, assuming
- one way data transfer - no flow, congestion
control
18SIMPLIFIED TCP
SENDER
- Assumptions
- sender is not constrained by TCP flow or
congestion control - that data from above is less than MSS in size
- that data transfer is in one direction only
Associated with the oldest unACKed segment
19TCP sender(simplified)
NextSeqNum InitialSeqNum
SendBase InitialSeqNum loop (forever)
switch(event) event
data received from application above
create TCP segment with sequence number
NextSeqNum if (timer currently
not running) start timer
pass segment to IP
NextSeqNum NextSeqNum length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer event ACK
received, with ACK field value of y
if (y gt SendBase)
SendBase y if (there are
currently not-yet-acknowledged segments)
start timer
/ end of loop forever /
- Comment
- SendBase-1 last
- cumulatively acked byte
- Example
- SendBase-1 71y 73, so the rcvrwants 73
y gt SendBase, sothat new data is acked
20TCP with MODIFICATIONS
SENDER
Why wait for the timeout to expire, when
consecutive ACKs can be used to indicate a lost
segment
With Fast Retransmit
21TCP ACK generation RFC 1122, RFC 2581
Receiver does not discard out-of-order segments
1
2
3
4
22TCP Interesting Scenarios
Simplified TCP version
ACK120
Timer is restarted here for Seq92
ACK120
Segment with Seq100 not retransmitted
Retransmission due to lost ACK
23TCP Retransmission Scenario
X loss
ACK120
Cumulative ACK avoids retransmission of the first
segment.
24TCP Modifications Doubling the Timeout Interval
Provides a limited form of congestion control
Congestion may get worse if sources continue to
retransmit packets persistently.
Timer expiration is more likely caused by
congestion in the network
TimeoutInterval 2 TimeoutIntervalPrevious
After ACK is received, TimeoutInterval is
derived from most recent EstimatedRTT and DevRTT
TCP acts more politely by increasing the
TimeoutInterval, causing the sender to retransmit
after longer and longer intervals.
Others check RFC 2018 selective ACK
25TCP Flow Control
- receiver explicitly informs sender of
(dynamically changing) amount of free buffer
space - RcvWindow field in TCP segment
- sender keeps the amount of transmitted, unACKed
data less than most recently received RcvWindow
sender won't overrun receiver's buffer
by transmitting too much, too fast
26FLOW CONTROL Receiver
EXAMPLE HOST A sends a large file to HOST B
RECEIVER HOST B uses RcvWindow, LastByteRcvd,
LastByteRead
Application Process
Data from IP
HOST B tells HOST A how much spare room it has in
the connection buffer by placing its current
value of RcvWindow in the receive window field of
every segment it sends to HOST A.
Initially, RcvWindow RcvBuffer
Application reads from the buffer
RcvWindowRcvBuffer-LastByteRcvd-LastByteRead
27FLOW CONTROL Sender
EXAMPLE HOST A sends a large file to HOST B
SENDER HOST A uses RcvWindow of HostB,
LastByteSent, LastByteACKed
SENDER HOST A
ACKs from Host B
Data
To ensure that HOST B does not overflow, HOST A
maintains throughout the connections life that
LastByteSent-LastByteACKed lt RcvWindow
28FLOW CONTROL
Some issue to consider
RcvWindow used by the connection to provide the
flow control service
What happens when the receive buffer of HOST B is
full ? (that is, when RcvWindow0)
TCP requires that HOST A continue to send
segments with one data byte when HOST Bs receive
window is 0. Such segments will be ACKed by HOST
B. Eventually, the buffer will have some space
and the ACKs will contain RcvWindow gt 0
TCP sends a segment only when there is data or
ACK to send. Therefore, the sender must maintain
the connection alive.
29TCP Connection Management
- Recall TCP sender, receiver establish
connection before exchanging data segments - Initialize TCP variables
- sequence numbers
- buffers, flow control info (e.g. RcvWindow)
- Client is the connection initiator
-
- In Java, Socket clientSocket new
Socket("hostname","port number") connect - Server is contacted by client
- In Java,Socket accept()
if (connect(s, (struct sockaddr )sin,
sizeof(sin)) ! 0) printf("connect
failed\n") WSACleanup() exit(1)
ns accept(s,(struct sockaddr )(remoteaddr),ad
drlen)
30TCP Connection Management
Establishing a connection
- Three way handshake
- Step 1 client end system sends TCP SYN control
segment to server (executed by TCP itself) - specifies initial seq number (isn)
- Step 2 server end system receives SYN, replies
with SYNACK control segment - ACKs received SYN
- allocates buffers
- specifies servers initial seq. number
- Step 3 client ACKs the connection with
- ACKserver_isn 1
- allocates buffers
- sends SYN0
- Connection established!
This is what happens when we create a socket for
connection to a server
After establishing the connection, the client can
receive segments with app-generated data! (SYN0)
31TCP Connection Management (cont.)
How TCP connection is established and torn down
- Closing a connection
- client closes socket
- closesocket(s)
- Java clientSocket.close()
- Step 1 client end system sends TCP FIN control
segment to server - Step 2 server receives FIN, replies with ACK.
Closes connection, sends FIN.
32TCP Connection Management (cont.)
- Step 3 client receives FIN, replies with ACK.
- Enters ''timed wait'' - will respond with ACK to
received FINs - Step 4 server, receives ACK. Connection closed.
- Note with small modification, can handle
simultaneous FINs.
ACK
FIN
33TCP Connection Management (cont)
Used in case ACK gets lost. It is
implementation-dependent (e.g. 30 seconds, 1
minute, 2 minutes
12
2
10
TCP server lifecycle
8
4
6
11
TCP client lifecycle
1
9
Connection formally closes all resources (e.g.
port numbers) are released
3
7
5
34End of Flow Control and Error Control
35Flow Control vs. Congestion Control
Similar actions are taken, but for very different
reasons
- Flow Control
- point-to-point traffic between sender and
receiver - speed matching service, matching the rate at
which the sender is sending against the rate at
which the receiving application is reading - prevents Receiver Buffer from overflowing
Congestion happens when there are too many
sources attempting to send data at too high a
rate for the routers along the path
- Congestion Control
- service that makes sure that the routers
between End Systems are able to carry the
offered traffic - prevents routers from overflowing
Same course of action Throttling of the sender
36Principles of Congestion Control
- Congestion
- Informally ''too many sources sending too much
data too fast for network to handle'' - different from flow control!
- Manifestations
- lost packets (buffer overflow at routers)
- long delays (queuing in router buffers)
- a top-10 problem!
37Approaches towards congestion control
Two broad approaches towards congestion control
- Network-assisted congestion control
- routers provide feedback to End Systems in the
form of - single bit indicating link congestion (SNA,
DECbit, TCP/IP ECN, ATM ABR) - explicit transmission rate the sender should send
at
1
2
- End-to-end congestion control
- no explicit feedback from network
- congestion inferred by end-systems from observed
packet loss delay - approach taken by TCP
38TCP Congestion Control
How TCP sender limits the rate at which it sends
traffic into its connection?
New variable! Congestion Window
SENDER
(Amount of unACKed data)SENDER lt min(CongWin,
RcvWindow)
By adjusting CongWin, sender can therefore adjust
the rate at which it sends data into its
connection
Assumptions
- TCP receive buffer is very large no RcvWindow
constraint - ? Amt. of unACKed data at sender is solely
limited by CongWin - Packet loss delay packet transmission delay
are negligible
CongWin
Sending rate (approx.)
RTT
39TCP Congestion Control
TCP uses ACKs to trigger (clock) its increase
in congestion window size self-clocking
Arrival of ACKs indication to the sender that
all is well
- Slow Rate
- Congestion window will be increased at a
relatively slow rate
- High rate
- Congestion window will be increased more quickly
40TCP Congestion Control
How TCP perceives that there is congestion on the
path?
Loss Event when there is excessive
congestion, router buffers along the path
overflows, causing datagrams to be dropped, which
in turn, results in a loss event at the sender
- Timeout
- no ACK is received after segment loss
- Receipt of three duplicate ACKs
- segment loss is followed by three ACKs received
at the sender
41TCP Congestion Control details
- sender limits transmission
- LastByteSent-LastByteAcked
- ? cwnd
- roughly,
- cwnd is dynamic, function of perceived network
congestion
- How does sender perceive congestion?
- loss event timeout or 3 duplicate acks
- TCP sender reduces rate (cwnd) after loss event
- Three mechanisms
- AIMD
- slow start
- conservative after timeout events
42TCP congestion avoidance additive increase,
multiplicative decrease
- approach increase transmission rate (window
size), probing for usable bandwidth, until loss
occurs - additive increase increase cwnd by 1 MSS every
RTT until loss is detected - multiplicative decrease cut cwnd in half after
loss
saw tooth behavior probing for bandwidth
cwnd congestion window size
time
43TCP Slow Start
- when connection begins, increase rate
exponentially until first loss event - initially cwnd 1 MSS
- double cwnd every RTT
- done by incrementing cwnd by 1 MSS for every ACK
received - summary initial rate is slow but ramps up
exponentially fast (doubling of the sending rate
every RTT)
Host A
Host B
one segment
RTT
two segments
four segments
44Refinement inferring loss
- after 3 dup ACKs
- cwnd is cut in half
- window then grows linearly
- but after timeout event
- cwnd is set to 1 MSS
- window then grows exponentially
- Up to a threshold, then grows linearly
Philosophy
- 3 dup ACKs indicates network capable of
delivering some segments - timeout indicates a more alarming congestion
scenario
45Refinement
- Q when should the exponential increase switch to
linear? - A when cwnd gets to 1/2 of its value before
timeout. -
- Implementation
- variable ssthresh (slow-start threshold)
- on loss event, ssthresh is set to 1/2 of cwnd
just before loss event
46TCP Sender Congestion Control
STATE EVENT TCP SENDER Congestion-Control Action Commentary
SLOW START (SS) ACK receipt for previously unACKed data CongWin CongWin MSS, If(CongWin gt Threshold) set state to Congestion Avoidance Resulting in a doubling of CongWin every RTT
Congestion Avoidance (CA) ACK receipt for previously unACKed data CongWin CongWin MSS (MSS/CongWin) Additive increase, resulting in increasing of CongWin by 1 MSS every RTT
SS or CA Loss event detected by triple duplicate ACK Threshold CongWin / 2, CongWin Threshold, Set state to Congestion Avoidance Fast recovery, implementing multiplicative decrease, CongWin will not drop below 1 MSS.
SS or CA Timeout Threshold CongWin / 2, CongWin 1 MSS, Set state to Slow Start Enter Slow Start.
SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed CongWin and Threshold not changed
47Summary TCP Congestion Control
48TCPs Congestion Control Service
Problem Gridlock sets-in when there is packet
loss due to router congestion
The sending systems packet is lost due to
congestion, and is alerted when it stops
receiving ACKs of packets sent
CLIENT
SERVER
forces the End Systems to decrease the rate at
which packets are sent during periods of
congestion
49Macroscopic Description of TCP throughput
(Based on Idealised model for the steady-state
dynamics of TCP)
- whats the average throughout of TCP as a
function of window size and RTT? - ignore slow start (typically very short phases)
- let W be the window size when loss occurs.
- when window is W, throughput is W/RTT
- just after loss, window drops to W/2, throughput
to W/2RTT. - Throughput increases linearly (by MSS/RTT every
RTT) - Average Throughput .75 W/RTT
50TCP Futures TCP over long, fat pipes
- Example GRID computing application
- 1500-byte segments, 100ms RTT, desired throughput
of 10 Gbps - requires window size W 83,333 in-flight
segments - Throughput in terms of loss rate
- ? L 2?10-10 a very small loss rate! (1 loss
event every 5 billion segments) - new versions of TCP is needed for high-speed
environments
51TCP Fairness
- Fairness goal if N TCP sessions share same
bottleneck link, each should get an average
transmission rate of R/N , an equal share of the
links bandwidth
Go to Summary of TCP Congestion Control
52Analysis of 2 connections sharing a link
Assumptions
Link with transmission rate of R
Each connection have the same MSS, RTT
No other TCP connections or UDP datagrams
traverse the shared link
Ignore slow start phase of TCP
Operating in congestion-avoidance mode (linear
increase phase)
Goal adjust sending rate of the two connections
to allow for equal bandwidth sharing
53Why is TCP fair?
- Two competing sessions
- Additive increase gives slope of 1, as throughout
increases - multiplicative decrease decreases throughput
proportionally
R
equal bandwidth share
We can view a simulation on this
A point on the graph depicts the amount of link
bandwidth jointly consumed by the connections
Connection 2 throughput
congestion avoidance additive increase
R
Connection 1 throughput
Full bandwidth utilisation line
View Simulation
54The End
The next succeeding slides are just for
additional reading.
55TCP Latency Modeling
Multiple End Systems sharing a link
1 TCP connection
1 TCP connection
3 TCP connections
Multithreading implementation
1 TCP connection
R bps links transmission rate
Loop holes in TCP
In practice, client/server applications with
smaller RTT gets the available bandwidth more
quickly as it becomes free. Therefore, they have
higher throughputs
Multiple parallel TCP connection allows one
application to get a bigger share of the bandwidth
56TCP latency modeling
the time from when the client initiates a TCP
connection until when the client receives the
requested object in its entirety
Q How long does it take to receive an object
from a Web server?
- TCP connection establishment time
- data transfer delay
- Actual data transmission time
- Two cases to consider
- WS/R gt RTT S/R
- An ACK for the first segment in window returns
to the Sender before a windows worth of data is
sent - WS/R lt RTT S/R
- Sender has to wait for an ACK after a windows
worth of data sent
No data transfer delay
Theres data transfer delay
57TCP Latency Modeling
SERVER
CLIENT
R bps links transmission rate
FILE
O - Size of object in bits
Assumptions
S number of bits of MSS (max. segment size)
Network is uncongested, with one link between end
systems of rate R
CongWin (fixed) determines the amount of data
that can be sent
No packet loss, no packet corruption, no
retransmissions required
Header overheads are negligible
File to send integer number of segments of size
MSS
Connection establishment, request messages, ACKs,
TCP connection-establishment segments have
negligible transmission times
Initial Threshold of TCP congestion mechanism is
very big
58TCP latency Modeling
Case Analysis STATIC CONGESTION WINDOW
Case 1 WS/R gt RTT S/R An ACK for the first
segment in window returns to the Sender before a
windows worth of data is sent
Number of segments Rounded up to the nearest
integer
e.g. O256bits, S32bits, W4
Assume W4 segments
59TCP latency Modeling
Case Analysis STATIC CONGESTION WINDOW
Case 2 WS/R lt RTT S/R Sender has to wait
for an ACK after a windows worth of data sent
If there are k windows, sender will be stalled
(k-1) times
STALLED PERIOD
Case 2 latency 2RTT O/R (K-1)S/R RTT -
WS/R
60Case Analysis DYNAMIC CONGESTION WINDOW
STALLED PERIOD
O/S15
4 windows
61Case Analysis DYNAMIC CONGESTION WINDOW
- Let K be the number of windows that cover the
object. - We can express K in terms of the number of
segments in the object as follows
Note
62Case Analysis DYNAMIC CONGESTION WINDOW
- From the time the server begins to transmit the
kth window until the time the server receives an
ACK for the first segment in the window - Transmission of kth window
-
- Stall Time
-
- Latency
63Case Analysis DYNAMIC CONGESTION WINDOW
- Let Q be the number of times the server would
stall if the object contained an infinite number
of segments.
- The actual number of times that the server stalls
is - P min Q, K-1 .
64Case Analysis DYNAMIC CONGESTION WINDOW
- Let Q be the number of times the server would
stall if the object contained an infinite number
of segments.
- The actual number of times that the server stalls
is - P min Q, K-1 .
- Closed-form expression for the latency
65Case Analysis DYNAMIC CONGESTION WINDOW
- Let Q be the number of times the server would
stall if the object contained an infinite number
of segments.
Slow start will not significantly increase
latency if RTT ltlt O/R
66- http//www1.cse.wustl.edu/jain/cis788-97/ftp/tcp_
over_atm/index.htmatm-features