Title: Summary of Last Lecture
1Summary of Last Lecture
- Transport layer services and protocols
- Logical communication
- End-to-end service
- TCP reliable, in-order delivery
- UDP unreliable, unordered delivery
2Summary of Last Lecture
- Multiplexing and demultiplexing
- Receiving host demultiplexing
- Sending host multiplexing
- Connectionless demultiplexing
- Identified by two-tuple
- (dest IP, dest port)
- Connection-oriented demultiplexing
- Identified by four-tuple
- (source IP, source port, dest IP, dest port)
3Summary of Last Lecture
- Roadmap of reliable data transfer
- Rdt 1.0 no bit errors, no loss of packets
- Rdt 2.0 channel with bit errors
- Error detection
- Receiver feedback using control msgs (ACK, NAK)
- Rdt 2.1 handles garbled ACK/NAKs
- Introduction of sequence number
- Rdt 2.2 a NAK-free protocol
- Duplicate ACK at sender retransmit current pkt
4Summary of Last Lecture
- Roadmap of reliable data transfer (cont.)
- Rdt 3.0 channels with errors and pkt loss
- Sequence number
- Timer
5Summary of Last Lecture
- Go-Back-N
- Sender
- K-bit sequence number
- Cumulative ACK
- Timer
- Timeout(n) retransmit pkt n and all higher
sequence number pkts in window - Receiver
- No receiver buffering
- Repeat ACK pkt with highest in-order sequence
number
6Summary of Last Lecture
- Selective Repeat
- Sender
- Only retransmit pkts for which ACK not received
- Receiver
- Buffers out-of-order pkts
- Individually acknowledges all correctly received
pkts
7CS352- TCP and UDP
- Dept. of Computer Science
- Rutgers University
8The Internet Transport Layer
- Two transport layer protocols supported by the
Internet - Reliable
- The Transport Control Protocol (TCP)
- Unreliable
- The Unreliable Datagram Protocol (UDP)
9 UDP
- UDP is an unreliable transport protocol that can
be used in the Internet - UDP does not provide
- connection management
- flow or error control
- guaranteed in-order packet delivery
- UDP is almost a null transport layer
10Why UDP?
- No connection needs to be set up
- Throughput may be higher because UDP packets are
easier to process, especially at the source - The user doesnt care if the data is transmitted
reliably - The user wants to implement his or her own
transport protocol
11UDP more
- often used for streaming multimedia apps
- loss tolerant
- rate sensitive
- other UDP uses
- DNS
- SNMP
- reliable transfer over UDP add reliability at
application layer - application-specific error recovery!
32 bits
source port
dest port
Length, in bytes of UDP segment, including header
checksum
length
Application data (message)
UDP segment format
12UDP checksum
- Goal detect errors (e.g., flipped bits) in
transmitted segment
- Receiver
- compute checksum of received segment
- check if computed checksum equals checksum field
value - NO - error detected
- YES - no error detected. But maybe errors
nonetheless? More later . - No action to recover from an error
- Sender
- treat segment contents as sequence of 16-bit
integers - checksum 1s complement of (1s complement sum
of segment contents) - sender puts checksum value into UDP checksum
field
13Internet Checksum Example
- Note
- When adding numbers, a carryout from the most
significant bit needs to be added to the result - Example add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1
1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1
0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0
1 1
Wraparound the carry
sum
Checksum (complement)
14TCP Overview RFCs 793, 1122, 1323, 2018, 2581
- point-to-point
- one sender, one receiver
- reliable, in-order byte steam
- no message boundaries
- pipelined
- TCP congestion and flow control set window size
- send receive buffers
- full duplex data
- bi-directional data flow in same connection
- MSS maximum segment size
- connection-oriented
- handshaking (exchange of control msgs) inits
sender, receiver state before data exchange - flow controlled
- sender will not overwhelm receiver
15 TCP
- TCP provides the end-to-end reliable connection
that IP alone cannot support - The protocol
- Frame format
- Connection management
- Retransmission
- Flow control
- Congestion control
16TCP segment structure
URG urgent data (generally not used)
counting by bytes of data (not segments!)
ACK ACK valid
PSH push data now (generally not used)
bytes rcvr willing to accept
RST, SYN, FIN connection estab (setup,
teardown commands)
Internet checksum (as in UDP)
17TCP Frame Fields
- Source Destination Ports
- 16 bit port identifiers for each packet
- Sequence number
- The packets unique sequence ID
- Acknowledgement number
- The sequence number of the next packet expected
by the receiver
18TCP Frame Fields (contd)
- Window size
- Specifies how many bytes may be sent after the
first acknowledged byte - Checksum
- Checksums the TCP header and IP address fields
- Urgent Pointer
- Points to urgent data in the TCP data field
19TCP Frame Fields (contd)
- Header bits
- URG Urgent pointer field in use
- ACK Indicates whether frame contains
acknowledgement - PSH Data has been pushed. It should be
delivered to higher layers right away. - RST Indicates that the connection should be
reset - SYN Used to establish connections
- FIN Used to release a connection
20TCP seq. s and ACKs
- Seq. s
- byte stream number of first byte in segments
data - ACKs
- seq of next byte expected from other side
- cumulative ACK
- Q how receiver handles out-of-order segments
- A TCP spec doesnt say, - up to implementer
Host B
Host A
User types C
Seq42, ACK79, data C
host ACKs receipt of C, echoes back C
Seq79, ACK43, data C
host ACKs receipt of echoed C
Seq43, ACK80
simple telnet scenario
21 TCP Connection Establishment
Host A
Host B
SYN (seqx)
SYN (seqy, ACKx1)
SYN (seqx1, ACKy1)
22TCP Connection Tear-down
Host A
Host B
FIN (seqx)
ACK (ACKx1)
A-gtB torn down
FIN (seqy)
ACK (ACKy1)
B-gtA torn down
23 TCP Retransmission
- When a packet remains unacknowledged for a period
of time, TCP assumes it is lost and retransmits
it - TCP tries to calculate the round trip time (RTT)
for a packet and its acknowledgement - From the RTT, TCP can guess how long it should
wait before timing out
24TCP Round Trip Time and Timeout
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - ignore retransmissions
- SampleRTT will vary, want estimated RTT
smoother - average several recent measurements, not just
current SampleRTT
- Q how to set TCP timeout value?
- longer than RTT
- but RTT varies
- too short premature timeout
- unnecessary retransmissions
- too long slow reaction to segment loss
25Round Trip Time (RTT)
Time for data to arrive
Network
Time for ACK to return
RTT Time for packet to arrive at destination
Time for ACK to return from
destination
26RTT Calculation
Receiver
Sender
0.9 sec
RTT
ACK 2048
2.2 sec
RTT 2.2 sec - 0.9 sec. 1.3 sec
27Smoothing the RTT measurement
- First, we must smooth the round trip time due to
variations in delay within the network -
- SRTT a SRTT (1-a) RTTarriving ACK
- The smoothed round trip time (SRTT) weights
previously received RTTs by the a parameter - a is typically equal to 0.875
28Retransmission Timeout Interval (RTO)
- The timeout value is then calculated by
multiplying the smoothed RTT by some factor
(greater than 1) called b - Timeout b SRTT
- This coefficient of b is included to allow for
some variation in the round trip times.
29Example
Initial SRTT 1.50 a 0.875, b 4.0
RTT Meas.
SRTT
Timeout
1.5 s
1.50
b1.50 6.00
1.0 s
1.50a 1.0(1- a) 1.44
b1.44 5.76
2.2 s
1.44a 2.2(1- a) 1.54
b1.54 6.16
1.0 s
1.54a 1.0(1- a) 1.47
b1.47 5.88
0.8 s
1.47a 0.8(1- a) 1.39
b1.39 5.56
3.1 s
2.0 s
30Problem with RTT Calculation
Receiver
Sender
Sender Timeout
RTT?
ACK 2048
RTT?
31Karns Algorithm
- Retransmission ambiguity
- Measure RTT from original data segment
- Measure RTT from most recent segment
- Either way there is a problem in RTT estimate
- One solution
- Never update RTT measurements based on
acknowledgements from retransmitted packets - Problem Sudden change in RTT can cause system
never to update RTT - Primary path failure leads to a slower secondary
path
32Karns algorithm
- Use back-off as part of RTT computation
- Whenever packet loss, RTO is increased by a
factor - Use this increased RTO as RTO estimate for the
next segment (not from SRTT) - Only after an acknowledgment received for a
successful transmission is the timer set to new
RTT obtained from SRTT
33Another Problem with RTT Calculation
- RTT measurements can sometimes fluctuate severely
- smoothed RTT (SRTT) is not a good reflection of
round-trip time in these cases - Solution Use Jacobson/Karels algorithm
- Error RTT - SRTT
- SRTT SRTT (a Error)
- Dev Dev h(Error - Dev)
- Timeout SRTT (b Dev)
34Jacobson/Karels AlgorithmExample
Error RTT - SRTT SRTT SRTT (a Error) Dev
Dev d (Error - Dev) Timeout SRTT (b
Dev)
Initial SRTT 1.50, Dev 0 a 0.125, d 0.25,
b 4.0
RTT Meas.
SRTT
Error
Dev.
Timeout
1.5 s
1.50
0.0
0.00
1.50
1.0 s
1.44
-0.50
0.13
1.94
2.2 s
1.54
0.76
0.28
2.67
1.0 s
1.47
-0.54
0.35
2.85
0.8 s
1.39
-0.67
0.43
3.09
3.1 s
2.0 s
35Example RTT computation
36 TCP Flow Control
- TCP uses a modified version of the sliding window
- In acknowledgements, TCP uses the Window size
field to tell the sender how many bytes it may
transmit - TCP uses bytes, not packets, as sequence numbers
37TCP Flow Control (contd)
Important information in TCP/IP packet headers
Number of bytes in packet (N)
Sequence number of first data byte in packet (SEQ)
N
SEQ
Send
Window size at the receiver (WIN)
Sequence number of next expected byte (ACK)
ACK bit set
ACK
WIN
Recv
Contained in TCP header
Contained in IP header
38Example TCP session
- (1)remus tcpdump -S host scullyKernel
filter, protocol ALL, datagram packet
sockettcpdump listening on all devices - 151522.152339 eth0 gt remus.4706 gt
scully.echo S 12642965041264296504(0) win 32120
ltmss 1460,sack OK,timestamp 71253512 0,nop,wscale
0gt 151522.153865 eth0 lt scully.echo gt
remus.4706 S 875676030875676030(0) ack
1264296505 win 8760 ltmss 1460gt151522.153912
eth0 gt remus.4706 gt scully.echo .
12642965051264296505(0) ack 875676031 win 32120
remus telnet scully 7 A ltreturngt A
39Example TCP session
Timestamp
Source IP/port
Dest IP/port
Packet 1 151522.152339 eth0 gt remus.4706 gt
scully.echo S 12642965041264296504(0) win 32120
ltmss 1460,sackOK,timestamp 71253512 0,nop,wscale
0gt (DF)
Flags
Packet 2 151522.153865 eth0 lt scully.echo gt
remus.4706 S 875676030875676030(0) ack
1264296505 win 8760 ltmss 1460)
Options
Packet 3 151522.153912 eth0 gt remus.4706 gt
scully.echo . 12642965051264296505(0) ack
875676031 win 32120
Window
Start Sequence Number
Acknowledgement Number
End Sequence Number
40TCP data transfer
Packet 4151528.591716 eth0 gt remus.4706 gt
scully.echo P 12642965051264296508(3) ack
875676031 win 32120
data
Packet 5 151528.593255 eth0 lt scully.echo gt
remus.4706 P 875676031875676034(3) ack
1264296508 win 8760
bytes
41TCP Flow Control (contd)
Receivers buffer
Receiver
Sender
Application does a 2K write
0
4K
Empty
ACK 2048 WIN 2048
Application does a 3K write
Full
Sender is blocked
Application reads 2K
ACK 4096 WIN 0
ACK 4096 WIN 2048
Sender may send up to 2K
42TCP Flow Control (contd)
Piggybacking Allows more efficient
bidirectional communication
Data from A to B
ACK for data from B to A
N
SEQ
ACK
WIN
A
B
N
SEQ
ACK
WIN
Data from B to A
ACK for data from A to B
43TCP Congestion Control
- Recall Network layer is responsible for
congestion control - However, TCP/IP blurs the distinction
- In TCP/IP
- the network layer (IP) simply handles routing and
packet forwarding - congestion control is done end-to-end by TCP
44Self-Clocking Model
Bottleneck link
Pr
Fast link
Pb
1. Send Burst
2. Receive data packet
5. Send a data packet
Data
Receiver
Sender
Acks
4. Receive Acknowledgement
3. Send Acknowledgement
Ab
Ar
Ar
Given Pb Pr Ar Ab Ar (in units of time)
Sending a packet on each ACK keeps the
bottleneck link busy
45Changing bottleneck bandwidth
- one router, finite buffers
- sender retransmission of lost packet
Host A
lout
lin original data
l'in original data, plus retransmitted data
Host B
finite shared output link buffers
46TCP Congestion Control
- Goal achieve self-clocking state
- Even if dont know bandwidth of bottleneck
- Bottleneck may change over time
- Two phases to keep bottleneck busy
- Slow-start ramps up to the bottleneck limit
- Packet loss signals we passed bandwidth of
bottleneck - Congestion Avoidance tries to maintain self
clocking mode once established
47TCP Congestion Window
- TCP introduces a second window, called the
congestion window - This window maintains TCPs best estimate of
amount of outstanding data to allow in the
network to achieve self-clocking
48TCP Congestion Window
- To determine how many bytes it may send, the
sender takes the minimum of the receiver window
(i.e. sliding window) and the congestion window - Example
- If the receiver window says the sender can
transmit 8K, but the congestion window is only
4K, then the sender may only transmit 4K - If the congestion window is 8K but the receiver
window says the sender can transmit 4K, then the
sender may only transmit 4K
49TCP Slow Start Phase
- TCP defines the maximum segment size as the
maximum size a TCP packet can be (including
header) - TCP Slow Start
- Congestion window starts small, at 1 segment size
- Each time a transmitted segment is acknowledged,
the congestion window is increased by one maximum
segment size - On each ack, cwndcwnd 1
50TCP Slow Start (contd)
Congestion Window Size
Event
1K A sends 1 segment to B B ACKs the
segment 2K A sends 2 segments to B B ACKs both
segments 4K A sends 4 segments to B B ACKs all
four segments 8K A sends 8 segments to B B
ACKs all eight segments 16K and so on
51TCP Slow Start (contd)
- Congestion window size grows exponentially (i.e.
it keeps on doubling) - Packet losses indicate congestion
- Packet losses are determined by using timers at
the sender - When a timeout occurs, the congestion window is
reduced to one maximum segment size and
everything starts over
52TCP Slow Start
- When connection begins, increase rate
exponentially until first loss event - double CongWin every RTT
- done by incrementing CongWin for every ACK
received - Summary initial rate is slow but ramps up
exponentially fast
Host A
Host B
one segment
RTT
two segments
four segments
53TCP Slow Start (contd)
Timed out Transmissions
Congestion window
Transmission Number
1 Maximum Segment Size
54TCP Slow Start (contd)
- TCP Slow Start by itself is inefficient
- Although the congestion window builds
exponentially, it drops to 1 segment size every
time a packet times out - This leads to low throughput
55TCP Linear Increase Threshold
- Establish a threshold at which the rate increase
is linear instead of exponential to improve
efficiency - Algorithm
- Start the threshold at 64K (ssthresh)
- Slow start
- Once the threshold is passed, only increase the
congestion window size by 1 segment size for each
congestion window of data transmitted - For each ack received, cwnd cwnd
(mssmss)/cwnd - If a timeout occurs, reset the congestion window
size to 1 segment and set threshold to
max(2mss,1/2 of MIN(sliding window, congestion
window))
56TCP Linear Increase Threshold Phase
Example Maximum segment size 1K Assume
SSthresh32K
Timeout occurs when MIN(sliding window,
congestion window) 40K
Congestion window
Thresholds
40K
32K
20K
1K
Transmission Number
57TCP Fast Retransmit
- Another enhancement to TCP congestion control
- Idea When sender sees 3 duplicate ACKs, it
assumes something went wrong - The packet is immediately retransmitted instead
of waiting for it to timeout - Why?
- Note that acks sent by the receiver when it
receives a packet - Dup ack implies something is getting through
- Better than time out
58TCP Fast RetransmitExample
Receiver
Sender
MSS 1K
ACK of new data
ACK 2048 WIN 31K
Duplicate ACK 1
ACK 2048 WIN 30K
Duplicate ACK 2
ACK 2048 WIN 29K
Fast Retransmit occurs (2nd packet is
now retransmitted w/o waiting for it to timeout)
Duplicate ACK 3
ACK 2048 WIN 28K
ACK 2048 WIN 27K
ACK 7168 WIN 26K
59TCP Fast Recovery
- Yet another enhancement to TCP congestion control
- Idea Dont do a slow start after a fast
retransmit - Instead, use this algorithm
- Drop threshold to max(2mss,1/2 of MIN(sliding
window, congestion window)) - Set congestion window to threshold 3 MSS
- For each duplicate ACK (after the fast
retransmit), increment congestion window by MSS - When next non-duplicate ACK arrives, set
congestion window equal to the threshold
60TCP Fast RecoveryExample
Sender
SW29K,TH15K, CW20K
Continuing with the Fast Retransmit Example...
SW28K,TH15K, CW20K
ACK 2048 WIN 28K
Fast Retransmit Occurs
MSS1K Sliding Window (SW) Congestion Threshold
(TH) Congestion Window (CW)
SW28K, TH10K, CW13K
ACK 2048 WIN 27K
SW27K, TH10K, CW14K
ACK 7168 WIN 26K
SW26K, TH10K, CW10K
61Resulting TCP Sawtooth
In steady state, window oscillates around the
bottlenecks capacity (I.e. number of
outstanding bytes in transit)
Congestion window
Slow Start
Bottleneck Capacity
Linear Mode
40K
Sawtooth
32K
20K
1K
Transmission Number
62TCP Recap
- Timeout Computation
- Timeout is a function of 2 values
- the weighted average of sampled RTTs
- The sampled variance of each RTT
- Congestion control
- Goal Keep the self-clocking pipe full in spite
of changing network conditions - 3 key Variables
- Sliding window (Receiver flow control)
- Congestion window (Sender flow control)
- Threshold (Senders slow start vs. linear mode
line)
63TCP Recap (cont)
- Slow start
- Add 1 segment for each ACK to the congestion
window - -Doubles the congestion windows volume each
RTT - Linear mode (Congestion Avoidance)
- Add 1 segments worth of data to each congestion
window - Adds 1 segment per RTT
64Algorithm Summary TCP Congestion Control
- When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially. - When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows
linearly. - When a triple duplicate ACK occurs, Threshold set
to max(FlightSize/2,2mss) and CongWin set to
Threshold3mss. (Fast retransmit, Fast recovery) - When timeout occurs, Threshold set to
max(FlightSize/2,2mss) and CongWin is set to 1
MSS. - FlightSize MIN(sliding window, congestion
window) - The amount of data that has been sent
but not yet acknowledged.
65TCP sender congestion control
66TCP Fairness
- Fairness goal if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
67Why is TCP fair?
- Two competing sessions
- Additive increase gives slope of 1, as throughout
increases - multiplicative decrease decreases throughput
proportionally
R
equal bandwidth share
loss decrease window by factor of 2
Connection 2 throughput
congestion avoidance additive increase
Connection 1 throughput
R
68Fairness (more)
- Fairness and UDP
- Multimedia apps often do not use TCP
- do not want rate throttled by congestion control
- Instead use UDP
- pump audio/video at constant rate, tolerate
packet loss - Research area TCP friendly
- Fairness and parallel TCP connections
- nothing prevents app from opening parallel
connections between 2 hosts. - Web browsers do this
- Example link of rate R supporting 9 connections
- new app asks for 1 TCP, gets rate R/10
- new app asks for 11 TCPs, gets R/2 !
69Delay modeling
- Q How long does it take to receive an object
from a Web server after sending a request? - Ignoring congestion, delay is influenced by
- TCP connection establishment
- data transmission delay
- slow start
- Notation, assumptions
- Assume one link between client and server of rate
R - S MSS (bits)
- O object size (bits)
- no retransmissions (no loss, no corruption)
- Window size
- First assume fixed congestion window, W segments
- Then dynamic window, modeling slow start
70Fixed congestion window (1)
- First case
- WS/R gt RTT S/R ACK for first segment in window
returns before windows worth of data sent
delay 2RTT O/R
71Fixed congestion window (2)
- Second case
- WS/R lt RTT S/R wait for ACK after sending
windows worth of data sent
delay 2RTT O/R (K-1)RTT (W-1)S/R
delay 2RTT O/R (K-1)S/R RTT - WS/R
K O/WS
72TCP Delay Modeling Slow Start (1)
- Now suppose window grows according to slow start
- Will show that the delay for one object is
where P is the number of times TCP idles at
server
- where Q is the number of times the server
idles if the object were of infinite size. -
and K is the number of windows that cover the
object.
73TCP Delay Modeling Slow Start (2)
- Delay components
- 2 RTT for connection estab and request
- O/R to transmit object
- time server idles due to slow start
- Server idles P minK-1,Q times
- Example
- O/S 15 segments
- K 4 windows
- Q 2
- P minK-1,Q 2
- Server idles P2 times
74TCP Delay Modeling (3)
75TCP Delay Modeling (4)
Recall K number of windows that cover
object How do we calculate K ?
Calculation of Q, number of idles for
infinite-size object, is similar (see HW).