Title: TCP
1TCP
10
2TCP purpose
- TCP provides reliable data transmission over an
unreliable network. - TCP provides congestion control
- TCP provides flow control
- TCP passes messages
- Inputs
- Destination address
- Destination port
- Source port (socket)
- Message
- Outputs
- Message
- Error reporting
- If TCP reports that the message has been
delivered then we can rest assured that the
receiving application has received the data. What
the application does with it is another story. - At least 85 of all traffic uses TCP.but I heard
the 50 of traffic in S. Korea uses UDP (gaming). - UDP
- No flow control
- No error reporting (little error reporting)
BGP
FTP
HTTP
SMTP
telnet
icmp
UDP
OSPF
TCP
IP
3TCP header
- IP header is 20 bytes (source IP, destination IP,
protocol, TTL,) - TCP header 20 bytes
Source port
Destination port
Sequence
ACK
Header length 4 bits
Reserved 6
U R G
A C K
P S H
R S T
S Y N
F I N
REC WIN 16
CHECK SUM 16
Urgent ptr 16
Options and padding
4- Ports used so a single host can have many
connections at the same time. When a packet
arrives, it is distinguished by the source IP,
source port, and destination port. More or less,
the IPs and port define an application - Sequence number indicates the 1st byte of the
data. - ACK is the next expected sequence number
- Header length in 32 bit words. 4 bits means the
max size is 60 bytes. 20 bytes are used by the
header, so up to 40 bytes more could be in
options. - flags
- URG urgent ptr (urgent data and valid urgent
ptr, eg., cntrl-c) - ACK ACK number is valid
- PSH receiver (the receiver should pass this
data to the application as soon as possible as
oppose to what? This should be set when this
packet will empty the outgoing buffer so the
receiver should not wait for a full buffer before
passing data to the app. Just send it now.) - RST reset connection (something went wrong,
good for detecting attacks). - SYN synchronize sequence number
- FIN sender is finished sending data
5connection establishment
Node A initiates a connection with node B gt Node
A performs an active open, node B passive open
(listen)
dest
source
SYN1, seq2197 ACK0
Send SYN
Send SYN/ACK
SYN1, seq197 ACK1, ack2198
Send ACK (for syn)
ACK flag1 ack198 seq2198
Initial SYN depends on implementation
6Connection establishment
- If the first SYN is dropped, then it is resent 3
seconds later. If this is dropped, it is resent 6
seconds. And so on. The maximum waiting time is
64 seconds. The maximum time can be as high as
180 second. But this depends on the
implementation. - If the listener doesnt get an ACK, it will
retransmit in 3 second and back-off in the same
way. - But if the listener gets a data packet, the ack
will be set and this will end the connection
establishment. - Often during connection establishment connection
setup data is included in the options. - Eg., the segment size is included in the options.
- More option discussed later
7Connection termination
- FIN flag implies no more data will be sent from
that host. - A FIN from each side closes the connection.
- A FIN from only one size puts the connection in
the half close state. - Example
- Node A sends first
- A sends pkt with FIN1 and seqU (A enters
FIN_WAIT) - B responds with ACK and ackU1 (B enters
close_wait) - A receives ACK (A enters FIN_WAIT2)
- Now b closes
- B send pkt with FIN set and seqV (enters
LAST_ACK) - A responds with ACK and ackV1 (enters
TIME_WAIT and stays there for 120 seconds and
then enters closed) - B receives ACK and enters closed.
- Use netstat to determine the state of the TCP
connections.
8Sending data
- Either side can send data. When sequence number
indicates where the first byte is placed in the
receiver buffer. - The receiver responds with an ACK, the ack
indicates the next empty byte location in the
buffer.
SYN had seq14
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
e
H
i
S
t
e
v
buffer
Seq1001 Ack22 Data size 0
9SYN had seq14
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
buffer
Seq22 Ack1001 Data Bye, size 3 (bytes)
SYN had seq14
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
B
y
e
buffer
Seq1001 Ack20 Data size 0
Seq20 Ack1001 Data Hi, size 2 (bytes)
SYN had seq14
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
B
y
e
H
i
buffer
Seq1001 Ack25 Data size 0
Note here the receiver is not sending data, so
its seq num is never changing and the reply ack
is never changing. But the definitions of the ACK
and SYN remain valid. Note that SYN and FIN
packets are special cases. No data, but the ACKs
increment.
10Retransmission time-out
- How to decide when a packet should be
retransmitted? - Two methods. Here we talk about the first, when
the ACK has not been received in a long time, TCP
assumes that the packet was dropped. - How long is a long time..? No good solution.
Van jackobsons algorithm
This does not work all that well. Really, it is
MinRTO that controls when time-outs occur. Van
Jackobsons algorithm does not work well. But
more analysis is required.
11RTO analysis
Using the July 25, 2001 snapshot of round-trip
times from the NLANR data set. we computed
empirical probability of spurious timeouts. The
total data set consists of nearly 13000
connections between 122 sites and 17.5 million
round-trip time measurements. This data
consisted of time series of round-trip times for
each connection with each time series containing
1440 round-trip times (one sample per minute over
the entire day)
12Detecting drops with triple Dup ACKs
Seq20 Ack1001 Data Hi, size 2 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
Seq
e
S
t
e
v
H
i
buffer
Seq1001 Ack22 Data size 0
Seq22 Ack1001 Data Bye, size 2 (bytes)
Seq25 Ack1001 Data Wazup, size 5 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
e
S
t
e
v
H
i
W
a
z
u
p
Seq1001 Ack22 Data size 0 Rwin2
Seq30 Ack1001 Data Give, size 4 (bytes)
25
30
30
35
15
16
17
18
19
20
21
22
e
H
i
W
a
z
u
p
S
t
e
v
G
i
v
e
Seq1001 Ack22 Data size 0 Rwin2
Seq34 Ack1001 Data Me, size 2 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
e
S
t
e
v
H
i
W
a
z
u
p
G
v
e
M
e
i
Seq1001 Ack22 Data size 0 Rwin2
25
30
30
35
16
17
18
19
20
21
22
Seq22 Ack1001 Data Bye, size 2 (bytes)
15
e
H
i
W
a
z
u
p
M
e
e
S
t
e
v
G
v
e
i
B
y
Seq1001 Ack36 Data size 0 Rwin2
13Why triple dup ACK?
- Why not one DUP ACK?
- Bennet and Partridge, Packets reordering is not
pathological network behavior, 1999. This paper
showed that packet reordering can/does occur.
Further research into this could be a project. - The reason for the packet reordering is that the
routers have parallel paths through them. So,
depending on the order of arrival and the packet
sizes, the incoming order will be different from
the outgoing order. - Supposedly this was only a problem with older
model juniper routers. There are many of these
routers out there. Cisco field day! - Reordering only happens when the packets arrive
at nearly the same time. This might not happen
that much in TCP (see ACK clocking later). - However, this is an active research area.
- Load balancing can cause packets to take
different paths. This can cause reordering. Load
balancing is a good project topic. - Route flap can also cause reordering.
- Why not a larger DUPThres (larger than 3)?
- This casues other problems.
- Limited transmit can help. See my papers on
TCP-PR for details. - Using triple DUP ACKs instead of RTO is called
fast retransmit because the drop is detected
faster.
14Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
- The number of unacknowledged packets must be lass
than the receiver window. - As the receivers buffer fills, decreases the
receiver window.
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
Application reads buffer
25
26
27
28
29
30
31
24
Seq1001 Ack24 Data size 0 Rwin9
Seq4 Ack1001 Data e, size 1 (bytes)
25
26
27
28
29
30
31
24
e
15Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
- The number of unacknowledged packets must be lass
than the receiver window. - As the receivers buffer fills, decreases the
receiver window.
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
Application reads buffer
25
26
27
28
29
30
31
24
3 s
Seq1001 Ack24 Data size 0 Rwin9
window probe
Seq4 Ack1001 Data , size 0 (bytes)
Seq1001 Ack24 Data size 0 Rwin9
Seq4 Ack1001 Data e, size 1 (bytes)
25
26
27
28
29
30
31
24
e
16Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
- The number of unacknowledged packets must be lass
than the receiver window. - As the receivers buffer fills, decreases the
receiver window.
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
3 s
Seq4 Ack1001 Data , size 0 (bytes)
Seq1001 Ack24 Data size 0 Rwin0
6 s
Max time between probes is 60 or 64 seconds
Seq4 Ack1001 Data , size 0 (bytes)
17Receiver window
- The receiver window field is 16 bits.
- Default receiver window
- By default, the receiver window is in units of
bytes. - Hence 64KB is max receiver size for any (default)
implementation. - Ethernet segments are 1500 bytes (TCP data
1460). - So that would give 44 packets.
- If the bit-rate was 10Mbps, what is the RTT so
that this window size is equal to the bandwidth
delay product. - Receiver window scale
- During SYN, one option is Receiver window scale.
- This option provides the amount to shift the
Receiver window. - Eg. Is rec win scale 4 and rec win10, tehn
real receiver window is 10ltlt4 160 bytes.
18Congestion Control
- Make sure not to overwhelm the network
- How much data to put into the network?
- The sender maintains a the congestion window
(cwnd) that is the maximum number of
unacknowledged packets. - InFlight is the number of unacked packets.
- If InFlight lt cwnd, then a packet can be sent.
- When an ACK arrives, InFlight decreases so
another packet can be sent.
19suppose that cwnd 4MSS
MSS is maximum segment size min of segment
sizes of sender and receiver. It is negotiated
during SYN.
suppose MSS1000
Seq20 Ack1001Data , size 1 MSS (bytes)
Inflight1MSS
Inflight2MSS
Seq1020 ck1001 Data , size 1 MSS (bytes)
Seq2020 Ack1001 Data , size 1 MSS (bytes)
Inflight3MSS
Seq3020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Seq1001 Ack1020 Data size 0
Seq1001 Ack1020 Data size 0
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
20suppose that cwnd 4MSS
MSS is maximum segment size min of segment
sizes of sender and receiver. It is negotiated
during SYN.
suppose MSS1000
Seq20 Ack1001Data , size 1 MSS (bytes)
Inflight1MSS
Seq1020 ck1001 Data , size 1 MSS (bytes)
Inflight2MSS
Seq2020 Ack1001 Data , size 1 MSS (bytes)
ACK clocking What is the maximum rate that ACKs
can arrive at the sender?
Seq3020 Ack1001 Data , size 1 MSS (bytes)
Inflight3MSS
Inflight4MSS
Seq1001 Ack1020 Data size 0
Seq1001 Ack1020 Data size 0
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
21ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
22ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here?
23ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here?
24ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What about the packets?
25ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What about the packets? 10Mbps. Perfect!!!
26Congestion control
- ACK clocking makes the sender not send any faster
than the bottleneck link speed. - But how to fill the pipe?
We only send cwnd packets in a burst. How big
should cwnd be?
Sending at burst rate of 10Mbps
Not sending pckts. Wasted bandwidth
Sending at burst rate of 10Mbps
27Congestion control
- ACK clocking makes the sender not send any faster
than the bottleneck link speed. - But how to fill the pipe?
We only send cwnd packets in a burst. How big
should cwnd be?
The number of pckts sent in one RTT is the
cwnd. In order to not waste bandwidth, how many
packets should be sent?
RTT
28Congestion control
- ACK clocking makes the sender not send any faster
than the bottleneck link speed. - But how to fill the pipe?
We only send cwnd packets in a burst. How big
should cwnd be?
The number of pckts sent in one RTT is the
cwnd. In order to not waste bandwidth, how many
packets should be sent?
RTT
Cwnd (bytes) Link byte-rate (byte/s) RTT s
Bottleneck links speed
Bandwidth delay product Link byte-rate (byte/s)
RTT s
29Congestion control
- Ideally cwnd bandwidth delay product.
- This ignores fairness. If there are N flows that
are also use the same link. Then ideally cwnd
bandwidth delay product/N. - But how to find this value???
30TCP congestion control
- Theme probe the system.
- Slowly increase cwnd until there is a packet
drop. That must imply that the cwnd size (or sum
of windows sizes) is larger than the BWDP. - Once a packet is dropped, then decrease the cwnd.
And then continue to slowly increase. - Two phases
- slow start (to get to the ballpark of the correct
cwnd) - Congestion avoidance, to oscillate around the
correct cwnd size.
Cwndgtssthress Triple dup ack
Connection establishment
Slow-start
Congestion avoidance
timeout
Connection termination
31Slow start
- When the connect first start (and after a timeout
for todays TCPs) - Cwnd starts at 1 or 2 MSS.
- For each non-dup ACK received, the window size
increase by one. - This increasing continues until the window
reaches the value of SSThres. - The initial value of SSThres is often large
(taken as infinite). So the Rwin limits the
growth of the window.
32Slow start
cwnd
SYN Seq20 AckX
SYN Seq1000 Ack21
SYN Seq21 Ack1001
1
Seq21 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
2
Seq1021 Ack1001 Data size 1000
Seq2021 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
3
Seq1021 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
Seq2021 Ack1001 Data size 1000
4
Seq1021 Ack1001 Data size 1000
Seq2021 Ack1001 Data size 1000
5
6
7
The pipe is full!
8
33Slow start
cwnd
SYN Seq1000 Ack21
1
RTT
2
Seq1001 Ack1021 size 0
Cwnd doubles every RTT!!
3
RTT
Seq1001 Ack1021 size 0
4
Seq1001 Ack1021 size 0
5
RTT
6
7
RTT
The pipe is full!
8
What is happening here?
RTT??
34Slow start
cwnd
SYN Seq1000 Ack21
1
RTT
2
Seq1001 Ack1021 size 0
Cwnd doubles every RTT!!
3
RTT
Seq1001 Ack1021 size 0
4
Seq1001 Ack1021 size 0
5
RTT
6
7
RTT
What is happening here? Now the queue is
filling. Either it will fill and drop a packet or
the recWin will stop cwnd from increasing
8
RTT??
35- If RecWin!inf and RecWinltbandwidth delay product
queue size, and there are no other packets,
then there will never be a drop. Lots of
conditions, but a large number of flows do not
experience drops. - If RecWin/ssthressinf and the outgoing link of
the sender is not the bottleneck, then eventually
there will be a drop. If the drop is detected
with triple dupack, then cwnd cwnd/2 and
congestion avoidance is entered. - If the drop(s) is(are) detected with timeout,
then ssthresscwnd/2, cwnd1 and slowstart is
continued. - If ssthresslt bandwidth delay product queue size
and RecWingtssthress, the congestion avoidance is
entered.
36Congestion Avoidance
Basics additive increase multiplicative decrease
(AIMD)!! Rough view For every cwnds worth of
packets, cwnd is incremented by one. When there
is a drop, cwndcwnd/2.
Seq (MSS)
cwnd
cwnd
11
4
6
12
1
13
2
3
14
16
4
15
17
2
18
3
19
4
20
5
15
15
21
5
15
6
15
7
15
5
6
8
5
9
6
15
7
3
8
22
9
10
23
10
22
11
23
12
13
24
11
6
14
12
4
13
15
24
14
15
37Rough view of TCP congestion control
drops
drop
Congestion avoidance
Slow start
Slow start
38TCP - more detailed view
- Delayed ACKs
- The worry was that the network was going to be
all jammed up with ACKs. - So instead of sending an ACK for every pck, delay
the ack and maybe ack two packets - Generate an ACK for at least every other packet.
- Dont delay an ACK by more than 500ms. (exact
number depends on implementation.) - If packets are out of order, generate an ACK for
every packet. - Also, immediately send an ACK when a gap in the
buffer is filled. - Delayed ACKs can greatly slow down a connection.
- Eg., the first packet is delayed by 500ms
- Depending on the implementation, cwnd will grow
more slowly.
39Details - Fast recovery
- cwnd after a drop
- Recall, TCP only sends packets when InFlight lt
Cwnd. - InFlight only decreases when a new ACK is
received, I.e., a DUP ACK does not cause InFlight
to change. - If a DUP ACK arrives, then it means that a packet
arrived at the receiver and an ACK was sent. So
the number of packet in the network has
decreased. So InFlight should decrease. - But maybe the network has duplicated the ACK. To
be conservative, leave InFlight as is (I guess).
40Fast recovery
- Upon the two DUP ACK arrival, do nothing. Dont
send any packets (InFlight is the same). - Upon the third Dup ACK,
- set SSThrescwnd/2.
- Cwndcwnd/23
- Retransmit the requested packet.
- Upon every other DUP ACK, cwndcwnd1.
- If InFlightltcwnd, send a packet and increment
InFlight. - When a new ACK arrives, set cwndssthres (RENO).
- When an ACK arrives that ACKs all packets that
were outstanding when the first drop was
detected, cwndssthres (NEWRENO)
41Fast recovery
Seq (MSS)
cwnd
Inflight
cwnd
11
4
6
12
1
6
13
2
3
14
16
4
15
17
2
18
3
19
4
20
5
15
15
21
5
15
6
15
7
15
5
8
5
9
6
6
66/23
15
7
7
8
7
22
9
8
8
10
23
10
22
11
23
12
13
24
11
6
14
12
3
3
13
15
24
14
15
42Fast recovery multiple drops - RENO
Seq (MSS)
cwnd
4
1
2
3
cwnd
Inflight
4
2
11
6
3
12
6
4
5
12
16
5
12
17
6
18
7
19
5
8
5
20
12
9
6
12
21
7
12
8
12
9
10
12
10
11
12
6
66/23
12
13
11
7
6
7
22
14
12
8
8
15
23
15
12
15
12
15
24
3
3
Why is this bad? The first drop told us that we
were sending to fast. The second drop tells us
the same thing (already). So why react to the
same news twice.NewReno
15
15
523
5
16
15
2
2
43Fast Recovery multiple drops - NewReno
- The problem was that one of the packets that was
outstanding when the drop was detected was also
dropped. - Solution (NewReno)
- When a drop is detected,
- Ssthrescwnd/2
- Cwndcwnd/23
- Recover seq of largest byte sent.
- Retransmit the dropped packet
- Upon a DUP ACK, increment cwnd and sent if
Inflightltcwnd - If ACK is larger than pervious ACK, but smaller
than recover (partial ack) - Suppose that pervious ackX and now
ackYltrecover - Retransmit drop packet
- Cwnd cwnd (Y-X)1
- Of course, Inflight Inflight-(Y-X)
- So transmit another packet (that makes two
transmissions) - If ACKgtrecover,
- Cwndssthres
- Exit fast recovery
44Fast Recovery single drops - NewReno
cwnd
Inflight
14
14
16
17
18
19
20
21
17
17
17
17
Recover29
14
17
10
11
12
13
14
15
15
16
31
Note how the actual number outstanding is always
7
7
45Fast Recovery multiple drops - NewReno
cwnd
Inflight
14
14
16
17
18
19
20
21
17
17
NewReno sends two packets for every ACK
indicating a multiple drop.
17
17
29
Recover29
14
17
10
11
12
13
14
15
15
16
21
2 drops takes 2 RTT to recover. N drops takes N
RTT to recover. If NRTTgtRTO, then slow-steady
gt no TO impatient gt TO
19
21
1619-(21-17)1
1519-4
35
7
Exit fast recovery
46Other things
- Idle restart
- If no packet has been sent in RTO seconds
- SSThressCwnd
- Cwnd1
- Slow-start
- Avoids big bursts after idle times
- E.g., get data form disk
- http 1.1
- Timeout exponential back off
- If no ACK arrives before RTO timer expires, then
time-out - Ssthresscwnd/2 Cwnd2 slow-start
- RTOmin(2RTO,64s)
- If next packet is dropped, then the wait is
longer - Gives up after 9-12 tries. But implementation
dependent (ns never stops) - If a retransmitted is dropped, the TCP times out.
47Dup ACKs after timeout
cwnd
Inflight
20
14
14
21
16
22
17
23
18
19
20
21
17
24
17
17
17
29
24
30
Recover29
26
28
14
10
30
11
12
13
42
14
15
15
17
42
16
42
31
42
42
42
42
19
42
1619-(21-17)1
1519-4
eventually timeout
DUP ACKS
17
18
Set send_high to maximum seq sent. If DUP ACKs
are received for segments less than send_high,
assume it does not indicate a drop. In case there
was a drop, then there will be a time out.
18
19
48Selective Acknowledgment SACKThe latest
widespread congestion control
- Problem when a multiple packets are dropped, the
cumulative ACK does not give information as to
which packets were dropped. As a result, fast
recovery is not so fast it takes one RTT per
lost packet. - Solution embed into the ACK some information
about which packets have successfully arrived. - TCP-SACK allows ACKs to contain information about
received packets. - If the packets are received in order, then the
ACK looks the same as TCP-RENO or TCP-NEWRENO.
But if a packet the packets arrive out of order,
then the ACK contains SACK blocks. - A SACK block indicates a sequence of segments
that have been received.
seq num
15
20
25
30
35
A
A
A
S
S
S
S
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
49TCP-SACK
SACK blocks are 8 bytes long (4 bytes for each
edge) The SACK option includes 1 byte to specify
that it is a SCK block and one byte for the
number of SACK blocks. 1 SACK block 10 bytes
2 bytes padding -gt 52 bytes header 2 SACK blocks
18 bytes 2 bytes padding -gt 60 bytes header 3
SACK blocks 26 bytes 2 bytes padding -gt 68
bytes header 4 SACK blocks 34 bytes 2 bytes
padding -gt 76 bytes header Max ACK is 80 bytes If
time stamp option is used, then the max number of
SACK blocks is 3.
kind5
length2
SACK option
left edge of 2st block 26
right edge of 2st block 30
left edge of 1st block 20
right edge of 1st block 23
50Generation of SACKs
- No SACK blocks if no out of order packets
- No delayed ACK if out of order packets (send an
ACK for every received packet. - When an out of order packet arrives, the first
SACK block contains contain the segment that just
arrived. - The ACK should contain as many SACK blocks as fit
and are required (no skimping to save bit-rate). - The SACK blocks included should be those that
have most recently been reported (see 3). So if
there are at most 3 SACK blocks, then each
continuous block of segments will be reported at
least 3 times. - If the packet that arrived has just been received
(a duplicate reception), then the first SACK
block should identify this packet. (This is the
DSACK extension to SACK). In this case, the next
SACK block should indicate the continuous
sequence of segments that contain the segments
received in duplicate.
seq num
15
20
25
30
35
A
A
A
S
S
S
S
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
left edge of 2nd block
right edge of 2nd block
right edge of 2nd block
left edge of 2nd block
Now suppose that segment 21 arrives for a second
time.
kind5
length2
SACK option
left edge of DUP packet 21
right edge of DUP packet 22
left edge of 1st block 20
right edge of 1st block 23
left edge of 2st block 26
right edge of 2st block 30
51DSACK
- DSACK is to identify packets that have been
needlessly retransmitted. - The primary source of such retransmissions is
packet reordering. - If such a retransmission occurs, it likely means
that cwnd was divided by 2 needlessly. - DSACK helps identify these needless divides by
two. - It is not clear what can be done once they are
identified. - Many ideas have been suggested, but it remains to
be scene if they actually improve things - Ethan Blanton, Mark Allman, On Making TCP More
Robust to Packet Reordering (2002) show that
some improvement is possible - Bohacek et al shows that if there is persistent
reordering, more drastic measures are required. - Neither paper includes analysis of the current
situation in the Internet. - The current situation is not completely known.
- The homework provides backbone traces with
rampant reordering. - In my opinion (on 2/20/04) some sort of
timer-based approach is necessary. The DUPACK
threshold approach is not appropriate because a
burst of packets (as can be seen in the homework)
can be very reordered. But reordering by more
than a few milliseconds is very rare. - A project could examine this.
52Eifel Detection
- DSACK is only useful after the arrival of the
second copy of the packet. - Eifel uses time-stamps to inform the sender that
a packet that was thought to have been lost has
actually arrived.
53TCP-SACK (Sender side)
- Slow start and the linear increase part of SACK
is the same as TCP-RENO/NEWRENO. The fast
recovery part is different. - SACK provides more information about which
packets have been lost. The sender can use this
to determine - which packets to send
- when to send packets
- When to assume that a packet is lost
- If DupThresh continuous SACK blocks have been
SACKed that have larger sequence number. The idea
is that DupThresh packets have been SACKed with
larger sequence number, but continuous SACK
blocks are used instead. - If DupThreshMSS bytes have been SACKed that have
larger sequence number.
MSS5 bytes DupThresh3
little packets
8
13
18
23
19
Packet num
3
14
15
16
17
6569
7882
8387
4044
seq num
7071
7273
7475
7677
1519
S
S
S
A
A
A
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
- Assumed dropped because of reason 1 and 2
- Number of continuous sack blocks with higher seq
num 4?DupThresh - Number SACKed bytes with large seq num 25 ?
MSSDupThresh
- Assumed dropped because of reason 1 only
- Number of continuous sack blocks with higher seq
num 3 ?DupThresh - Number SACKed bytes with large seq num
9ltMSSDupThresh
Not assumed dropped.
54Number in pipe or InFlight
- If a packet has been sent, not lost, and not
SACKed, then this packet is assumed to be in the
pipe. - Any packet that has been retransmitted and not
SACKed. - Retransmission happen in order (smallest seq num
first, why?) - Let HighRX denote the highest segment that has
been Retransmitted. - Any packet that has been not been SACKed and has
seq num less been retransmitted, so it is in the
pipe.
55Which packet to send next? (during fast recovery)
- The next to transmit is the segment with the
smallest seq num that satisfies - If the segment is less than HighRX
- If the segment has seq num less than the largest
segment in a SACK block - If the segment is assumed to be lost.
seq num
15
20
25
30
35
A
A
S
S
S
S
S
S
S
N
N
N
A
ACKed
SACKed
SACKed
Not Sent
HighRX
already retransmitted
next to be sent
- If the above is an empty set, then the next to be
sent is smallest segment that has not yet been
sent. - If the above is also empty (because there are no
more packets to be sent),
seq num
15
20
25
30
35
A
A
S
S
S
S
S
N
N
N
A
SACKed
ACKed
SACKed
Not Sent
HighRX
next to be sent
already retransmitted
end of file
seq num
15
20
25
A
A
S
S
S
S
S
A
SACKed
SACKed
ACKed
HighRX
already retransmitted
next to be sent
56TCP-SACK congestion control
- When a loss is detected
- set RecoveryPointSeq num of highest segment
sent. Fast recovery ends when this seq num is
ACKed (SACKed is not good enough). - ssthresh cwndInflight
- Retransmit lost packet with smallest seq num.
- Set HighRX equal to the retransmitted packet
- During recovery (until RecoveryPoint is ACKed)
- If pipeltcwnd, then send next to be sent.
57TCP-SACK notes
- After RTO, the TCP-SACK sender starts fresh and
erases SAKC info from prior to the RTO (some of
it might be regained in retransmissions of SACK
blocks). - Like NEWRENO, the highest seq sent before an RTO
is recorded and a dupack from a packet qith seq
num less than this highest seq does not cause
fast recovery/retransmit. - Like NEWRENO, the retransmit timer can be reset
during recovery (slow and steady) or not
(impatient).
58TCP-SACK timeout
cwnd newReno
Inflight
pkt sent
14
14
16
- SACK, NewReno, etc. will time-out if a
retransmission is lost. - If SACK uses the same technique to increase cwnd
as NewReno (I.e., cwndinflight/23). and if
there are more than cwnd/2 packets are lost, SACK
will time-out. - The ns implementation has this problem.
17
18
19
20
21
17
29
17
17
17
14
14
17
10
11
12
13
14
14
no more packet sent time-out
59TCP-SACK burst
cwnd SACK
pkt sent
- SACK, NewReno, etc. will time-out if a
retransmission is lost. - Multiple drops lead to a burst of packets being
sent.
pipe
16
14
17
18
19
20
21
17
29
17
17
17
4,5,6,7
17,18,19,20
7
lost ACK clocking and sent a burst
21
7
22
24
31
37
38
recovery ends
60Limited Transmit
- When a packet is dropped and the window size is
less than 4, TCP will always timeout (not enough
ACKs arrive to get triple DUP). - It, upon receiving a DUP ACK, a packet is
transmitted, then there might be enough DUPACKs
to cause fast retransmitted and avoid time-out. - Limited transmit allow for a packet to be send
when the second Dup Ack is received. (In general,
for every other dup ack). - Even if a packet is lost, sending a packet for
every other ACK is sending at half the bit-rate. - While this helps TCP avoid time-outs, it also
makes this version of TCP far more aggressive for
loss probability greater than about 1 (where
time-outs become quite prevalent for non-limited
transmit TCP)
Seq (MSS)
Seq (MSS)
cwnd
cwnd
3
3
1
1
2
2
3
3
2
2
2
4
4
2
2
5
5
2
Time out
Triple dup ack! No time out
61Limited Transmit
Seq (MSS)
cwnd
Seq (MSS)
5
1
cwnd
cwnd
2
4
3
1
1
4
2
2
2
5
3
3
4
4
2
2
2
6
2
5
5
2
6
2
7
2
2
Triple dup ack!
Triple dup ack!
62ECN
- Sometimes the router will have a large enough
queue to accept the packet, but the queue
occupancy is beyond a threshold, so in order to
try to get the TCP flows to send at a slower
rate, the router would drop packets (even though
there is room in the queue). - Its funny to drop packets when there is room in
the queue, so another option is to mark the
packets. The receiver should include in the ACK
that packet that is being ACKed has been marked
and the sender should react to this marking as it
would to a drop, except that there is no reason
to retransmit the marked packet. - This approach has little impact in general,
except, like limited transmit, when the loss
probability if very high, it can reduce timeouts.