Title: How to use it
1How to use it
- Press Space to go alonge slide animation
- Dont hurry to press Space next time. Wait for
end of animation - If you want to go back, use key PgUp.
Version 08 June 1999 Come later - presentation is
under construction now
2Encapsulation data into Ethernet packet
User data
User data
Application header
Application data
TCP header
IP header
Ethernet header
Ethernet trailer
3IEEE 802.2/802.3 Encapsulation (RFC 1042)
LENGTH contain length packet from next byte till
CRC (CRC isnt included)
DSAP (Destination Service Access Point) and SSAP
(Source Service Access Point) both are set to
0xAA.
or
CNTL (Control field) is set to 3.
ORG CODE allways is 0 in all bytes
TYPE field identifies data that follows. For
example, type 0x0800 (hex) identifies IP
datagram follows
or
4Ethernet Encapsulation (RFC 894)
or
or
5IP packet structure
Version.Current protocol version is 4.
15
16
31
0
IHL - IP header length. IHL is quantity of 32-bit
words in IP header. This field has 4-bit length
gt maximum header length is 60 bytes
16-bit total packet length
TOS
4-bit ver
4-bit IHL
16-bit identification
13-bit Fr offset
flags3-bit
TOS - type of service contain of a 3-bit
precedence bits (ignored), 4 TOS bits, and unused
bit which must be 0. 4 TOS bits minimize
delay maxm,ize throughput maximize
reliability minimize monetary costOnly 1 of
these 4 bits can be turned on
TPL - total packet length is total IP packets
length in bytes. Then maximum length of IP packet
is 65535 bytes.
IDENTIFICATIN - this field is used when IP need
fragment fatagrams. Identification identifies
each datagram and is incremented each time a
datagram is sent Well see meaning of this field
when we talk about fragmentationFLAGS and
FRAGMENT OFFEST we see also when we talk about
fragmentation
Continue...
6IP packet structure
15
16
31
0
TTL - time-to-live sets an upper limit of routers
through which a datagram can pass. This field is
decremented each time when datagram pass the
router. When this field became 0 a datagram is
dropped by router and ICMP message is sent to
datagrams sender.
Header checksum
TTL
Protocol
Source address
PROTOCOL - this field identifies DATA portion of
datagram (which protocol is encapsulated into IP
datagram).
Destination address
Options (padding)
HEADER CHECKSUM is calculetaed for IP header only.
DATA
SOURCE and DESTINATION addresses is senders and
receivers IP addresses.
OPTIONS is a variable-length field which contain
som eoptions. Well discuss some of them later.
The option field always end on a 32-bit boundary.
PAD bytes (value is 0) are added if neccessary.
DATA is data.
7Special case IP addresses
IP address classes
A
0.0.0.0 to 127.255.255.255
B
128.0.0.0 to 191.255.255.255
C
192.0.0.0 to 223.255.255.255
D
224.0.0.0 to 239.255.255.255 Multicast
E
240.0.0.0 to 247.255.255.255
8ARP and RARP
- ARPFor example, we are working on the Ethernet
network. Ethernet driver and adapter are using
MAC-address. TCP/IP is using IP addresses. When
host want to send data to another host it known
onlt receivers IP address and put this
information to TCP/IP stack. Then TCP/IP stack
need mechanism to have correspondence between MAC
and IP addresses. IP have two algorithms for
solve it.
ARP
RARP
- RARPIf system dont have hard or floppy drive
and should boot from network it cant take IP
address from local resourses. Such system have
only MAC-address. RARP is algorithm which allow
system to obtain IP address from network
9ARP
Host
ARP
IP
Resolve IP address to hardware address
Do I know hardware address?
Yes
Yes
No
Ethernet driver
ARP request
Host
Host
Ethernet driver
Ethernet driver
ARP
ARP
Is somebody looking for my address?
Is somebody looking for my address?
No
Yes
Ignore request
Send ARP reply
10RARP
Diskless workstation
Boot
Read own hardware network address
I have a IP address!!!
Send RARP request
Send RARP reply
Somebody wants to have IP address!
Give to somebody IP address from my table
RARP server
11ARP packet
type
0x806
hardware type
Specified hardware type. 1 for an Ethernet
protocol type
0x800 for IP
hardware size
Size of hardware address. 6 for an Ethernet
protocol size
Size of protocol address. 4 for IP
op
Type of operation (request or reply). ARP
request - 1, ARP reply - 2, RARP request - 3,
RARP reply - 4.
Broadcast
Dest address
12ICMP - Internet Control Message ProtocolRFC
792packet structure
8-bit type
8-bit code
16-bit checksum (for entire ICMP message)
Contents depend on type and code
13ICMP address mask request and reply
Type 17-request 18 - reply
Code - 0
16-bit checksum (for entire ICMP message)
identifier (anything)
sequence number (anything)
12 bytes
Subnet mask
ICMP timestamp request and reply
Type 13-request 14 - reply
Code - 0
16-bit checksum (for entire ICMP message)
identifier (anything)
sequence number (anything)
32-bit originate timestamp
20 bytes
32-bit receive timestamp
32-bit transmit timestamp
14ICMP port unreachable error
Must include
IP header of the datagram that generated the error
At least 8 byte that followed this IP header. In
this example it is UDP header
General format ICMP unreachable message
type 3
code 0-15
16-bit checksum (for entire ICMP message)
8 bytes
Unused (must be 0)
IP header uncluding options first 8 bytes of
original IP datagram data
15ICMP echo request and echo reply (PING)
Client
Server
I want to know is server alive
Server is alive
I received ping to my address
Answer to client
Send echo request
Send echo reply
Packets
type 0 - reply 8 - request
code 0
16-bit checksum (for entire ICMP message)
8 bytes
identifier
sequence number
Optional data
identifier - process ID of the sending
process sequence number - starts at 0 and
incremented every time a new echo request is sent
Server must reply identifier and sequence number
fields. Historically ping has operated in mode
where it sends an echo request once a second.
16IP record option (-r option)
Send echo reply
Send echo request with -r option
Router 2
Client
Router 1
Server
Router 3
Packet IP option
Routers put into RR packet IP addresses of their
outgoing interfaces
4
8
12
16
20
24
28
Ptr
Code 1-byte field specifying the type of IP
option. For RR option its value is 7 Len total
number of bytes of the RR option. Ping always
provides a 38-byte option, to record up to 9 IP
addresses - maximum
There is the limited room in the IP header for
the list of IP addresses, because entire IP
header is limited to 1532-bit words (60 bytes).
There are only up to 40 bytes for option field in
IP header
17BROADCASTING
Four types of IP broadcast
Name Address Description
Limited 255.255.255.255 limited broadcast
never forwarded by a router.
Net-directred netid.255.255.255 routers forward
this kind of broadcast. These broadcast asign
for netid IP network
Subnet-directred host ID all is 1 bit broadcast
for specific subnet. For example, knowledge of
172.19.128.255 is broadcast for subnet
172.19.128.x mask is required with subnet mask
255.255.255.0
All-subnet-directred knowledge of If network
is subneted this is all-subnet-directed mask is
required broadcast. If network isnt subneted
this is net-directed subnet ID all 1,
broadcast host ID all 1
18MULTICASTING
!Note! On an Ethernet multicast address is
010000000000
Addressing
Do you remember?
Here is format of a class D IP address
First four bit for class D1110 0000 2241110
1111 239
28 bit multicast group ID
IP address
The set of host listening to a particular IP
multicast address is called a host group. A host
group can span multiple networks. Membership in a
host group is dynamic - hosts may join and leave
host group at will. There is no restriction on
the number of hosts in a host group, and a host
not have to belong to a group to send a message
to that group.
19MULTICASTING
Converting Multicast Group addresses to Ethernet
Addresses
The Ethernet addresses corresponding to IP
multicasting are in the range 01005e000000
through 01005e7fffff
We have 23 bits in the Etherntet address to
correspond to the IP multicast group ID. The
mapping places the low order 23 bits of the
multicast group ID into these 23 bits of the
Ethernet address.
These 5 bits in the multicast froup ID are not
used to form the Ethernet address
Low-order 23 bits of multicast group ID is copied
to Ethernet address
5e
Since the upper 5 bits of the multicast group ID
are ignored in this mapping, it is not uniwue. 32
different multicast group IDs map to same
Ethernet address (1111 31). The device driver
or the IP software must perform filtering, since
the interface card may receive multicast frames
in which the host is really not interested.
20IGMP reports and queries
(Internet Group Management Protocol)
Process 3
Multicast groups participant No
1
Group Address Group 1 224.8.8.1 Group 2 224.8.8.2
Wait for random timer Example, 2 seconds
Wait for 0-10 seconds
Host
Join to group 1
IP
IGMP report Dest IP - 224.8.8.1 Group IP -
224.8.8.1
Another GMP report Dest IP - 224.8.8.1 Group IP
- 224.8.8.1
IGMP report Dest IP - 224.8.8.1 Group IP -
224.8.8.1
IGMP report Dest IP - 224.8.8.1 Group IP -
224.8.8.1
Another IGMP report Dest IP - 224.8.8.1 Group
IP - 224.8.8.1
IGMP report Dest IP - 224.8.8.2 Group IP -
224.8.8.2
Another IGMP report Dest IP - 224.8.8.2 Group
IP - 224.8.8.2
IGMP query Dest IP - 224.0.0.1 Group IP - 0
IGMP report Dest IP - 224.8.8.2 Group IP -
224.8.8.2
Interface 1
Dont report group 2 next time
IP
IP
Group 1 alive
Group 2 alive
Wait for 0-10 seconds
Wait for 0-10 seconds
Wait for random timer Example, 3 seconds
Join to group 1
Join to group 2
Group 1 reported
Report group 2 only
Leave group 2
Host
Multicast groups on interface 1
1
2
Process 1
Process 2
Timer!
Send IGMP query
Multicast groups participant No
1
2
Router
21IGMP packet
IGMP message
IGMP version (1)
IGMP type (1-2)
unused
16-bit checksum
8 bytes
32-bit group address (calss D IP address)
Version 1 Type 1 - multicast router query 2 -
response sent by a host Group address class D IP
address. For query address is set to 0
22UDP
23UDP packet
Source port
Destination port
UDP length
UDP checksum
DATA (if any)
24TFTPTrivial File Transfer Protocol
Packet types
Requestes
Data packet
Mode netascii octet
Data ACK packet
Error packet
25TFTP operations
File transfer opcode 3 blcok number 1 bytes 512 De
st UDP port - appl Source UDP port - new port
number, was appointed for this file transfer by
TFTP server Those ports numbers will be used
during file transfer.
File trnsfer opcode 3 blcok number 2 bytes 356
(last block of File)
Read request for File opcode 1 Dest UDP port
69 Source UDP port - appl
ACK opcode 4 block number 2
ACK opcode 4 block number 1
File can be read by client?
Receiving block 1
Receiving block 2. Data size lt 512 byte gt last
block of file
Client received block 1
Need file File from server
YES
Process
Client
Server
In case of write file the client sends the WRQ.
If all is OK, server responds with ACK and block
number 0. And so on.
Error messages. Server responds with this type of
packet if a read request or write request cant
be processed. Also read or write error during
file transmission can cause this message to be
sent, and transmission is then terminated.
26BOOTP Bootstrap Protocol
BOOTP Packet Format
27BOOTP datagram
0
31
24
23
16
15
8
7
opcode
hardware type
hardware address length
hopcount
Opcode - 1 - request, 2 - reply
Transaction ID
H type - 1 for Ethernet
H addr length - 6 for Ethernet
number of seconds
unused
Hop count - set to 0 by client
Trans ID - set by client and returned by the
server
client IP address
Number of seconds - set by client
your IP address
Client IP - set by client. If client dont have
an address gt 0
server IP address
Your IP - filled by the server with clients IP
address
300 bytes
Server IP - filled by the server
gateway IP address
Gateway IP - filled by a proxy server. If is.
client hardware address (16 bytes)
Client H address - must be set by client
server hostname (64bytes)
Server hostname - null terminating string that is
optionally filled in by the server
boot filename (128 bytes)
Boot filename -fully qualified, null terminated
pathnema of a file to bootstrap from
vendor-specific information (64 bytes)
28BOOTP
Port numbers
67
Server
Client
68
Vendor-Specific information
End of the items. Any bytes after this should be
set to 255
Pad
Examples
Subnet mask
many fields ...
Gateway
If information in vendor-specific filed is
provided, the first 4 bytes of this area are set
to th IP address 99.130.83.99. This is called
magic cookie.
tag
length
29BOOTP operations
Servers reply Source IP - 1.1.1.1 Your IP -
1.1.1.2 Server IP - 1.1.1.1 Gateway IP -
1.1.1.1 Boot file name - BFILE
ARP request to see if anyone else on network has
same adress Target IP - 1.1.1.2 Source IP -
0.0.0.0 Client sends second ARP request 0.5
second later, and third ARP request 0.5 second
after it. Third ARP request Source IP address is
1.1.1.2 (clients address)
Clients request Dest UDP port 67 Source IP
0.0.0.0 Dest IP - 255.255.255.255
ARP request who is server Sender IP -
1.1.1.2 Target IP - 1.1.1.1
Clients request Source IP 1.1.1.2 Dest IP -
255.255.255.255
ARP reply Sender - 1.1.1.1 Target IP -
1.1.1.1 Target harware address - servers
Servers reply Source IP - 1.1.1.1 Your IP -
1.1.1.2 Server IP - 1.1.1.1 Gateway IP -
1.1.1.1 Boot file name - BFILE
TFTP Clients read boot file BFILE from the server
Clients request Source IP 1.1.1.2 Dest IP -
255.255.255.255
Servers reply Source IP - 1.1.1.1 Your IP -
1.1.1.2 Server IP - 1.1.1.1 Gateway IP -
1.1.1.1 Boot file name - BFILE
I have IP, I have loodable image. I can start!
BOOTP process UDP port 68
Receiving information
Is my IP address unique?
NOBODY ANSWER
My IP address unique!
BOOTP server UDP port 67
Boot process
Client. Port 68.
Server. Port 67. IP - 1.1.1.1 For client - 1.1.1.2
30TCP
31TCP packet
Destination port
Source port
Sequence number
Acknowledgment number
Headerlength (4)
Reserved(6)
flags (6)
Window
Urgent pointer
Header checksum
Options (padding)
DATA
The MSS option is using only in SYN packets
32TCP sequence and aknowledgement
Receiving SEQ 10 and 10 bytes
Receiving SEQ 30 DATA 20 ACK 20
Receiving SEQ 20 DATA 10 ACK 50
ACK 10 (SEQ) 10 bytes
my ACK 30 20 Server received my data, his
ACK 20 my curr SEQ prev send plus data 10
10
my ACK 20 10 Client received my data, his
ACK 50 my curr SEQ prev send plus data 30
20
Send 10 bytes SEQ 10 ACK No
Send 20 bytes SEQ 30 ACK 20
Send 10 bytes SEQ 20 ACK 50
Send 20 bytes SEQ 50 ACK 30
Send my own data with my own SEQ and ACK 20
Client
Server
And so on.
33TCP connection establishment
Receiving packet.
Send packet with S (SYN) flag. (SYN segement).
Packet contain the port number of the server that
the client want to connect
Receiving servers respond
SEQ 145 ACK - Flags S
SEQ 348 ACK 146 Flags SA
ACK 349 Flags A
Respond with own SYN segment containing own SN
and ACK for clients SYN plus one (SYN comsumes
one sequence number) ACK 145 1 146
Server respond contain correct ACK
Acknowledge servers SYN with ACK servers SN
1 348 1 349
Server
Client
ISN 145
ISN 348
Active open
Passive open
ISN - initial sequence number
Described three segments complete the connection
establishment. This is often called the three-way
handshake.
34TCP connection termination
Receiving FIN packet.
Receiving FIN packet.
User type quite, for example
SEQ 658 ACK 426 Flags FA
ACK 427 Flags A
ACK 659 Flags A
SEQ 426 ACK 659 Flags FA
Respond with correspondent ACK
Respond with correspondent ACK
Next ACK should be, for example, 426 and my own
SN must be 658
I should close second direction
Now is half-close. It can be some data is
sending by server to client, with corresponding
ACKs. Then server close another direction of
connection
Send FIN - packety with FIN flag
Server
Client
Active close
Passive close
TCP connection is full duplex, and each direction
must be shut down independenly
35TCP states for connection establishment and
termination
Client
Server
SYN J
SYN_SENT
SYN_RCVD
SYN K, ack J1
ESTABLISHED
ack K1
ESTABLISHED
FIN M
FIN_WAIT_1
CLOSE_WAIT
ack M1
FIN_WAIT_2
LAST_ACK
FIN N
TIME_WAIT
ack N1
CLOSED
Client stays in this state for twice the MSL
362 MSL state
- All received datagram is discarded
- There is impossible to open another connection
for this socket pairs (IP tuple)
Quiet Time
If a host in the 2MSL wait crashes, reboots
within MSL seconds and immediatly establishes new
connections isung the same local and foreign IP
addresses and port number. To protect this
scenario RFC 793 states that TCP should not
create any connectionfor MSL seconds after
rebooting. This is called the quiet time.
Reset Segments
Reset segment - reset bit in TCP header is set
to 1.Any queued data is thrown away and the
reset is sent immediately. The receiver of the
RST can tell that the other end did an abort
instead of a normal close.
ExampleWe trying to connect to server with port
number thats not in use on the destionation. UDP
sends port unreachable message in this case.
TCP sends reset segment.
SEQ 400 Flags S port 10000
SEQ 0 ACK 401 Flags RA
Server doesnt have process with port 10000
FIN - orderly release. RST - abortive release.
37Half-Open
Packet
Packet
Packet
Packet
Packet
All is fine !
But sometimes something can crash.
Alive computer dont know that peer is died.
Peer havnt sent FIN or RES segments. Connection
is Half-Open
38Simultaneous Open
Usual connection open
SYN J
SYN_SENT
SYN_RCVD
SYN K, ack J1
ESTABLISHED
ack K1
ESTABLISHED
Simultaneous Open
Result - one connection, not two.
39Simultaneous Close
Usual connection close
FIN M
FIN_WAIT_1
CLOSE_WAIT
ack M1
FIN_WAIT_2
LAST_ACK
FIN N
TIME_WAIT
ack N1
CLOSED
Simultaneous Close
40TCP options (RFC 792 and 1323)
(examples)
End of option list
Those options dont have length field. The other
do.
length is th total length, uncluding the kind and
len bytes.
No operations
Maximum segment size
Window scale factor
Timestamp
41Delayed Acknowledgment (delayed ACK)
For example, delayed ACK here is 200 ms. See to
client.
Client
Server
PSH 26 (4) ack 11
START KERNEL
long time...
is waiting
And now...
ack 6
Client dont send ACK immediatly. It delay ACK,
hoping to have data to send them in the same
direction as the ACK. It can wait till next
delay ACK boundary.
PSH 612 (4) ack 11
Another instant
TIME
200 ms intervals
Here delayed ACK flag is turned off
is waiting
PSH 1115 (4) ack 12 piggyback
TCP has decided to sent data packet.
42Nagle algoritm
APPLICATION
PSH 23 (1) ack 2
TCP doesnt send packet. We are waiting for first
packets ACK.
TCP has data for send entire packet. And TCP does
it.
TCP doesnt send packet. We are waiting for first
packets ACK.
ack 3
TCP has received packet. Now it can send data
from buffer.
Send packet
PSH 35 (2) ack 2
mss (20bytes)
20 bytes
PSH 525 (20) ack 2
ack 5
1 byte
TCPbuffer
1 byte
1 byte
ack 25
bla.., bla... bla bla tume has passed
PSH 810 (2) ack 55
PSH 5556 (1) ack 10
ack 56
ACK is receiving, I have data, preparing and send
packet
PSH 1012 (2) ack 56
Befor packet was pushed into physical media
another packet from server had been received
PSH 5658 (2) ack 10
Now I have data for sending again. And I have
free ACK from server (packet )
PSH 5658 (2) ack 12
43TCP timers
- Retransmission timer. This timer is used when
expecting an acknowledfment from other end.
- Persist timer keeps window size information
flowing even if the other end closes its receive
window.
- Keepalive timer detect when the other end on an
otherwise idle connection crashes or reboots.
- 2MSL timer measures the time a connection has
been in the TIME_WAIT state.
44Round-Trip Time
PSH 23 (1) ack 2
Measured RTT (M)
ack 3
Send bytes
Receive ACK for that bytes
There are some formules which are used for
calculate retransmissiom timeout value (RTO).
A - smoothed RTT (an estimator of average)D -
smoothed mean deviationg - 0.125 (1/8)h - 0.25
Karns algoritm.Algoritm specify that when
retransmission occurs, we cannot update the RTT
estimator when the acknowledgement for the
retransmitted data finally arrives.
45RTT example. Measurement.
Most implementation measure only one RTT value
per connection at any time. If the timer for a
given connection is already in use when a data
segment is transmitted, that segment is not timed.
start timer
1257 (256) ack 1 1
RTT ?11.061 sec
2 ack 257
stop timer
257513 (256) ack 1 3
start timer
513769 (256) ack 1 4
RTT ?20.808 sec
5 ack 513
8 ack 769
stop timer
7691025 (256) ack 1 6
start timer
10251281 (256) ack 1 7
10 ack 1025
12811537 (256) ack 1 9
12 ack 1281
RTT ?31.015 sec
stop timer
15371793 (256) ack 1 11
46RTT example. Measurement.
The timing is done by incrementing a counter
every 500-ms TCP timer routine is invoked. Figure
shows the relationship in our example between
actual RTT that we can determin by network
analyzator and the counted clock ticks.
start timer
stop timer
start timer
start timer
stop timer
stop timer
47RTT example. Calculation.
RTT ?1 3 ticksRTT ?2 1 ticksRTT ?3 2 ticks
A is initialized to 0D is initialized to
3Initial RTO A 2D 0 23 6
seconds(Factor 2 is used only for initial
calculation)
When the ACK for the second data segment arrives
(segment 5) measured RTT is 1 and update isErr
M - A 0.5 - 2 -1.5A A gErr 2 -
0.1251.5 1.8125D D H(Err - D) 1
0.25(1.5 - 1) 1.125RTO A 4D 1.8125
41.125 6.3125But most implementation use RTO
as a multiple of 500 ms. In our instance RTO will
be 6 seconds.
When the ACK for the first data segment arrives
(segment 2) measured RTT is 3 and our estimators
initialized asA M 0.5 1.5 0.5 2D
A/2 1RTO A4D 2 41 6 seconds
48Congestion example.
There is normal data flow
64016657 (256) ack 1
66576913 (256) ack 1
ack 6657
69137169 (256) ack 1
ack 6913
71697425 (256) ack 1
Host knows that prevous packet is missed. Then
host send ACK for prevous received packet and
save receiving packet.
Congestion. For example, router lost packet
ack 6913 (save 256)
74257681 (256) ack 1
76817937 (256) ack 1
First duplicate ACK
ack 6913 (save 256)
79378193 (256) ack 1
ack 6913 (save 256)
Second duplicate ACK
There is third duplicate ACKs
3rd ACK
69137169 (256) ack 1retransmission
ack 6913 (save 256)
8193 8449 (256) ack 1
ack 8193
Received missed packet. Now this host has all
data bytes 6913-8192.
ack 8449
TCP count the number of duplicate ACKs received,
and when the third one is received assume that a
segment has been lost. TCP retransmit only one
one segment, starting with that sequence number.
We discuss fast retransmit algoritm later.
49Slow start.
cwnd 1
1513 (512) ack 1
Slow start works with congestion window - CWND.
CWND is initialized to 1 (one) segment and is
increased by one segment each time an ACK is
received.
ack 513
cwnd 2
5131025 (512) ack 1
10251537 (512) ack 1
ack 1025
At some point the capacity of the network can be
reached and some packets can be discarded. This
situation tells to the sender that its CWND is
too large. We ll see later mechanism of CWND
adjusting.
cwnd 3
15372049 (512) ack 1
20492561 (512) ack 1
ack 1537
Sender sends only two segments because ACK for
segment 10251537 hasnt received. Result We
have CWND 3 and 3 sended (without ACK)
segments.
cwnd 4
25613073 (512) ack 1
30733585 (512) ack 1
The sender can transmit up to the minimum of the
congestion window and advertized windiw. CWND is
flow control imposed by sender.
And so on
CWND is maintained in bytes
50Congestion avoidance algoritm.
Congestion avoidance and slow start are
different. But in practice congestion avoidance
and slow start are implemented together. When
congestion occurs TCP slows down the transmission
rate of packets into the network and then invoke
slow start to get things going again.
Congestion avoidance and slow start require that
two variables be maintained for each connection
- A slow start treshold size, ssthresh
There are two indications of packet loss
- the receipt of duplicate ACKs
51Congestion avoidance algoritm.
Combined algoritms work.
No
Yes
Yes
No
CWS - current window size
52Congestion avoidance algoritm. Illustration.
Starting pointWe assumed that congestion has
just occured when CWND had a value of 32
segments. Congestion was indicated by timeout
SSTRESH 32 / 2 16CWND 11 segment is send
at time 0
At time 1 ACK is returned and CWND is incremented
to 2 segments
At time 2 two ACK is returned and CWND is
incremented to 4 segments (CWND was 2 and two ACK
received)
congestion moment
And so on
Now congestion avoidance is working. Increasing
of CWND is linear, with a maximum increase of one
segment per round-trip time
CWND SSTRESH. Slow start is stopped and
congestion avoidance is started
53Fast retransmit and Fast recovery algoritms.
1513 (512) ack 1
I am able to send 3 packets
5131025 (512) ack 1
ack 513
ack 513
1st duplicated ACK
ack 513
2nd duplicated ACK
ack 513
3rdt duplicated ACK
It duplicated ACK also may be generated by
reordering segments.
It duplicated ACK may be generated by reordering
segments.
I think segment is lost
Host dont wait for timer retransmission expires.
It send the lost segment. This is
Slow start isnt performed, but congestion
algoritm is working. This is
FAST RETRANSMIT ALGORITM
FAST RECOVERY ALGORITM
54Fast retransmit and Fast recovery algoritms.
Combined algoritms work.
55Slow start and congestion avoidance example
CWND
SEQ x 1000
DATA GO
DATA GO
DATA GO
DATA GO
DATA GO
1
SYN
SYN
S, A
ACK
numbers (from table)
Initialize CWND MSS 256 SSTRESH 65535
CWND ltSSTRESH slow start CWND CWND 1
segmentCWND 512 256 768
CWND gt SSTRESH cong.avoid.CWND lt-991
256256/991 256/8We are using integer
arithmetic.CWND 1089
Timeout occursSSTRESH CWS/2 minimum valuse
512CWND 1 segment 256
CWND gt SSTRESH cong.avoid.CWND lt-768
256256/768 256/8We are using integer
arithmetic.CWND 885
Here is no changes because new data is not being
acknowledged
Here is ACK for data!CWND lt SSTRESH we in slow
start1 segment 256CWND CWND 256 512
CWND gt SSTRESH cong.avoid.CWND lt-885
256256/885 256/8We are using integer
arithmetic.CWND 991
Real formula for 1/CWND iscwnd lt- cwnd
(segsizesegsize)/cwnd segsize/8
56Slow start and congestion avoidance example
First two duplicated ACK is received and is
counted and CWND is left alone
Third duplicated ACK is arrivedSSTRESH CWND/2
2426/ 2 1024(rounded down to the next mult.
of the segment size)CWND SSTRESH number of
dupl ACKs 1024 3 256 1792
Duplicated ACK is received.CWND CWND 1
segment 2048 256 2304But CWND s not big
enough for sent data
ACK for new data is receivedCWND lt SSTRESHslow
start!!!CWND SSTRESH segment size 1024
256 1280
Retransmission is sent
Duplicated ACK is received.CWND CWND 1
segment 2304 256 2560We can send data
NOTE here we have 2304 unacknowledged data from
prevous segments
Duplicated ACK is received.CWND CWND 1
segment 1792 256 2048But CWND s not big
enough for sent data
Data is sent
There are some segments with same situation
57TCP keepalive timer
TCP implementation may use keepalive option. This
option is used to knowIs my peer alive?
One example is one half-open connection. One peer
is died but another end dont know about it. It
keeps socket (IP address port number) for that
died perr. But peer neednt anything already...
And alive one must know it!
Usually the keepalive timer is 2 hours.
There are 4 scenarios if there is no activity on
connection and one peer send keepalive probe to
another
58TCP keepalive timer
Scenario 1. Peer is alive and reachable.
Packet
Packet
ARP request
ARP reply
keepalive probe
ACK
Thats all.. Peers dont have any data to send to
each other but connection is established
Keepalive probe has SEQ that is one less than it
should be (for example, receiver wait for SEQ
14, but keepalive probe has SEQ 13. Receiver
receivs packet with incorrect SEQ and is forced
to respond with ACK which containnext SEQ thar
the server is expecting
Client received answer from the server. It knows
that the server is alive and reset its keepalive
timer
2 (two) hours passed...
My keepalive timer exhaustIs my peer alive? But
I forgot his MAC address...
Scenario 2. Peer crashed or process was rebooted.
Packet
Packet
keepalive probe
keepalive probe
Thats all.. Peers dont have any data to send to
each other but connection is established
2 hours have passed
75 secondsNo answer
My keepalive timer exhaustIs my peer alive?TCP
send request. Dont see now on lower level (for
ARP). We should know whatever perr alive or not.
75 secondsNo answerClient send 10 keep-alive
probes. If it doesnt receive response, it
consider the peers host is down or terminate
connection
But peer is crashed
59TCP keepalive timer
Scenario 3. Peer has crashed and rebooted.
Host has crashed, rebooted. It has working TCP
stack but doesnt have socket for that connection
Ill be laconic2 hours has passed
keepalive probe
reset connection
Once again..My keepalive timer exhaustIs my
peer alive?
Are they crazy? I dont have such socket!
Scenario 4.Client is running, but unreachable.
In this scenario situation will be the same as in
scenario 1 - from clients point of view. This
situation may be caused by accident with
intermediate router
60Path MTU Discovery
If th other end doesnt specify MSS, it default
to 536It is possible to save path MTU on a
per-route basis
But things is changing For example, router fell
and route was changed. Another router needs
fragmnet our datagram, but datagram has DF bit
set. Router is sending ICMP error to our host
61TCP packet with MSS option
TCP packet
Maximum segment size option
62Path MTU Discovery. Example.
MTU 552
MTU 296
Router 1
MTU 1500
MTU 1500
SYNmss 1460
SYN, ACKmss 512
1513 (512)ACK
ICMP error messageHost 1 unreachable, need to
frag, mtu 296(newer implementation routers
TCP)
1257(256)ACK
RouterI cant send so big datagram without
fragmentation. But DF bit is set gt error occur!
MTU is 552! I can send datagram with 512 bytes of
data.
My MSS now 256 (MTU 296)
63Window Scale Option
- Networks are growing and buffers is coming bigger
and there is not enough window size 65535
(maximum window size allowed by window field in
TCP header)
- The newer implementation using WINDOW SCALE OPTION
- The newer implementation can work with oldest
implementations.
Option field can contain WINDOW SCALE OPTION
Window
WINDOW SCALE OPTIONcan be advertized only in SYN
segment.Sacel factor is fixed in each direction
when the connection established
Shift count0 - 140 - no scaling performed
There are only 16 bit
64Window Scale Option. Setting.
To enable window scaling both ends must have this
option in their SYN segments
SYN, wscale 1
SYN, ACK, wscale 3
I think my window scale should be 1
Active peer is going to use window scale! I
understand it and choose my window scale 0. I
must set this option to 0.
How scale work.
Window scale is using to shift value from window
field to get real window size
Using window scale to shift value to left for 1
bit...
For example, window scale was set to 1 and window
size in the receiving packet is 4 (its only
example)
Real advertized window is 8
65Timestamp option
Timestamp oyin isusing for better calculating
RTTThe sender places a 32-bit value in the first
field and the receiver echoes this back in the
reply field.For usinf this option both ends must
be able to work with this option.For established
this option the active peer must set timestamp
option in the SYN and another (passive) end must
answer with option too.
Only one timestamp option is kept per
connectionHow does TCP do it?
- Receivers TCP keepsACK number from the last
ACK which was sent, and time stamp value which
was placed to there (tsrecenct).ACK number is
next sequence number whivh we are waiting for
(lastback).
- Segment arrived If SEQ from segment is
lastback, tsrecent timestamp option from the
segment SEQ
- Trsent is sent to the timestamp reply field and
lastback is sent to ACK value in the sending ACK.
66PAWS Protection Against Wrapped Sequence Numbers