Title: Linux TCPIP Stack
1Linux TCP/IP Stack
2 TCP / IP
vs. OSI model
7 Application 6 Presentation 5 Session
4 Transport 3 Network
1 Physical Layer
3TCP/IP Stack Overview
Process
1 sosend (... )
5 recvfrom(.)
Socket Layer
2 tcp_output ( . )
4 tcp_input ( ... )
Protocol Layer (TCP
Layer)
3 ip_input ( ... )
3 ip_output ( . )
Protocol Layer (IP
Layer)
4 ethernet_output ( . )
2 ethernet_input ( .. )
Interface Layer
(Ethernet Device Driver)
Physical Media
4Process Layer to TCP Layer
send (int socket, const char buf, int length,
int flags)
Process
Kernel
sendto (int socket, const char data_buffer, int
length, int flags, struct sockaddr destination,
int destination _length)
sendit (struct proc p, int socket, struct msghdr
mp, int flags, int return_size)
uipc_syscalls.c
sosend (struct socket s, struct mbuf addr,
struct uio uio, struct mbuf top, struct mbuf
control, int flags )
uipc_socket.c
tcp_userreq (struct socket s, int request,
struct mbuf m, struct mbuf nam, struct mbuf
control )
tcp_userreq.c
tcp_output (struct tcpcb tp)
tcp_output.c
TCP Layer
5Socket Layer
sendto (int socket, const char data_buffer, int
length, int flags, struct sockaddr destination,
int destination _length)
MBUF Chain
m_next NULL
m_next
m_nextpkt NULL
m_nextpkt NULL
m_len 100
m_len 50
28 Bytes
20 Bytes
m_data
m_data
m_type MT_DATA
m_type MT_DATA
data_buffer
m_flags M_PKTHDR
m_flags 0
Data
m_pkthdr.len 150
128 Bytes mBuf
m_pkthdr.recvif NULL
50 Bytes
Data
Unused Space
150 Bytes Data
100 Bytes
58 Bytes
6Socket Layer -sosend passes data and control
information to the protocol layer sosend(struct
socket s, struct mbuf addr, struct uio uio,
struct mbuf data_buffer, struct mbuf control,
int flags )
Initialize a new memory buffer and
variables to hold flags
Is there enough space in the
buffer sbspace(s-gtsb_snd)
no
yes
int error tcp_usrreq(s, flags, mbuf, addr,
control)
More buffers to send?
Free the memory buffers received
yes
1
0
no
Return value of error to sendto ( )
7TCP Layer - tcp_usrreq(struct socket s, int
request, struct mbuf data_buffer, mbuf nam,
mbuf control)
Initialize internet protocol control block inp
and TCP control block tp to store
information useful for TCP
Convert Socket to
Internet Protocol Control Block
inp sotoinpcb(so)
Convert the internet protocol control block
to a tcp control block
tp intopcb(inp)
request
PRU_SEND
return error to tcp_userreq( )
int error tcp_output(tp)
8TCP Layer (tcp_output.c) - tcp_output(struct
tcpcb tp)
Called by tcp_usrreq for one of the following
reasons To send the initial SYN To send a
finished_sending message To send data To send a
window update after data has been
received. tcp_ouput ( ) functionality 1.
determines whether TCP can send a segment or not
depending on flags in the data sent by the
socket layer to send an ACK, etc. Size of
window advertised by the receivers end. Amount
of data ready to send whether unacknowledged
data already exists for the connection 2.
Calculate the amount of data to be sent depending
on size of receivers window number of bytes
in the send buffer 3. Check for window
shrink 4. Send a segment Allocate a buffer for
the TCP and IP header from the header
template Copy the TCP and IP header template
into the the buffer to be sent. Fill the fields
in the TCP header. Decrement the number of
buffers to tbe sent, so that the end can be
checked. Set sequencenumber and acknowledgement
field. Set three fields in the IP header - IP
length, TTL and Tos. Pass the datagram to IP
9TCP Layer (tcp_output.c) - tcp_output(struct
tcpcb tp)
struct socket so tp -gt t_inpcb -gt inp_socket
Initialize a tcp header tcp_header
Idle is true if the max sequence number equals
the oldest unacknowledged sequence number, if an
ACK is not expected from the other end. int idle
(tp -gt snd_max tp -gt snd_una)
false
idle
Check ACK Flag Acknowledgement is not expected,
set the congestion window to one segment tp -gt
snd_cwnd tp -gt t_maxseg
true
10TCP Layer - tcp_output(struct tcpcb tp)
Acknowledgement is not expected, set the
congestion window to one segment tp -gt snd_cwnd
tp -gt t_maxseg
off is the offset in bytes from the beginning of
the send buffer of the first data byte to
send. off bytes have already been sent and
acknowledgement on those is awaited. int off
tp -gt snd_nxt - tp -gt snd_una
Determine length of data that should be
transmitted and the flags to be used. len is the
minimum number of bytes in the send buffer, win
(the minimum of the receivers window) and the
congestion window. len min(so -gt so_snd.sb_cc,
win) - off
Determine the flags like TH_ACK, TH_FIN,
TH_RST, TH_SYN flags tcp _outflags tp -gt
t_state
11TCP Layer - tcp_output(struct tcpcb tp)
Determine the flags like TH_ACK, TH_FIN,
TH_RST, TH_SYN flags tcp _outflags tp -gt
t_state
tp -gt t_flags TF_ACKNOW
true
Send acknowledgement
false
tp -gt t_flags TF_SYN TH_RST
true
Send sequence number or reset
false
tp -gt t_flags TH_FIN
true
Finished sending
false
12 Ckeck flags to determine the type of
message window probe retransmission normal data
transmission
Allocate an mbuf for the TCP IP header and
data if possible. MGETHDR ( m, M_DONTWAIT,
MT_HEADR) M_DONTWAIT indicates that if memory is
not available for mbuf then come out of the
routine and return an error state.
Length of data lt 44 Bytes 100 - 40 - 16
no
Create a new mbuf chain, copy the surplus data
and point it to the first mbuf chain.
yes
Copy the data from the socket send buffer into
the new packet header mbuf
ip_output(m, tp-gtt_inpcb -gt inp_options, tp -gt
t_inpcb -gt inp_route, so -gt
so_options SO_DONOTROUTE, 0)
13ip_output.c
ip_output(struct mbuf m, struct mbuf opt,
struct route ro, int flags, struct ip_moptions
imo) 1. Header initialization 2. Route
Selection 3. Source address selection and
Fragmentation
1. Header initialization
Packets damaged?
Check if there were any errors while adding
headers in higher layers. Most of the fields of
the IP header are pre defined by higher layer
protocols.
ERROR
yes
no
- The value of flags decides whats to be done
with the data - IP_FORWARDING Forward packet
- IP_ROUTETOIF Route directly to Interface
- IP_ALLOWBROADCAST Allow broadcasting of
packet - IP_RAWOUTPUT Packet contains pre-constructed
header
if ((flags IP_FORWARDING ) (flags
IP_RAWOUTPUT ))
yes
If the packet has to be forwarded to another
host, i.e if the machine is acting as a router,
then the IP header for forwarded packets should
not be modified by ip_output.
Save header length in hlen for fragmentation
algorithm
no
Construct and initialize IP header set ip_v 4,
clear ip_off assign unique identifier to
ip_id length, offset, TTL, protocol, TOS etc are
set by higher layers.
If the packet is not being forwarded and has to
be sent to another host then initialize the IP
header.
142. Route Selection
A cached route may be provided to ip_output as an
argument. UDP and TCP maintain a route cache
associated with each socket.
Verify Cached Route for destination address
Check if the cached route is the correct
destination. If a route has not been provided,
ip_output sets a temporary route structure called
iproute.
If (cached_route destination)
yes
Find the interface on which the packet has to be
placed. Ifp points to the interfaces ifnet
structure.
If the cached route is provided, find the
interface on which the frame has to be sent.
no
If the packet is being routed, rtalloc locates a
route to the address specified by dst. If rtalloc
fails, an EHOSTUNREACH error is generated. If
ip_forward called ip_output the error is
converted to an ICMP error. If the address is
found then ifp is made to point to thr ifnet
structure for the interface. If the next hop is
not the packets final destination, then dst is
changed to point to the next hop router.
Locate route Call rtalloc(dst_ip) to locate a
route to the destination. Find the interface on
which the packet has to be placed. Ifp points to
the interfaces ifnet structure. If
rtalloc(dst_ip) fails to find a route, return
host unreachable error.
153. Source address selection and Fragmentation
The final section of the ip_output ensures that
the IP header has a valid source IP address. This
couldnt have been done earlier because the route
hadnt been selected yet. If there is no source
IP then the IP address of the outgoing interface
is used as the source IP.
Check if valid source address is specified.
Select the IP address of the outgoing interface
as the source address.
no
yes
Does the packet have to be fragmented ?
yes
Larger packets (packets that exceed the MTU) must
be fragmented before they can be sent.
Fragment the packet if its size is greater than
the MTU.
no
In either case (fragmented or not) the checksum
is computed (in_cksum). If no errors are found,
the data is sent to if_output function of the
output interface.
If there are no check_sum errors, send the data
to if_output function of the selected interface.
16Interface Layer (if_ethersubr.c)
ether_output(struct ifnet ifp, struct mbuf
mbuf, struct sockaddr destination, struct
rtentry routing_entry) 1. Verification 2.
Protocol-Specific Processing 3. Frame
Construction 4. Interface Queuing.
1. Verification
Ethernet port up and running ? ifp -gt if_flags
(IF_UP IF_RUNNING )
no
senderr (ENETDOWN)
yes
17Interface Layer(if_ethersubr.c) -
ether_output(struct ifnet ifp, struct mbuf
mbuf, struct sockaddr destination, struct
rtentry rt_entry)
Function Takes the data portion of an Ethernet
frame ans encapsulates it with a 14-byte header
and places it on the interface send_queue. Phases
Verification, Protocol-Specific Processing,
Frame Construction, Interface Queuing.
Arguments - ifp points to outgoing interfaces
ifnet structure mbuf is the data to be
sent destination is the destination
address rt_entry points o the routing entry
Initialize- Ethernet header - struct eth_header
eh
Ethernet port up and running ? ifp -gt if_flags
(IF_UP IF_RUNNING )
Verification
no
senderr (ENETDOWN)
yes
18Route valid ? rt_entry rtalloc1 (destination,
1)
0
senderr (EHOSTUNREACH)
1
Next hop a gateway ? rt rt -gt rt_gwroute
0
1
Destination responding to ARP requests? If not
then do not send more packets to avoid
flooding. rt -gt rt_flags RTF_REJECT
no
Verification
Protocol Specific Processing
19 Protocol Specific Processing
Functionality Finds Ethernet address
corresponding to the IP address of the
destination.
destination -gt sa_family
AF_INET
Send ARP broadcast to find the ethernet address
corresponding to the destination IP address
Use m_copy( ) to keep the packet till an ack. Is
recvd.
Frame Preparartion
20 Protocol Specific Processing
Frame Preparartion
Make sure there is room for the 14 byte ethernet
header M_PREPEND ( m, sizeof(ethernet_header),
M_DONOTWAIT)
Form the Ethernet header from ethernet frame
type, ethernet MAC address, unicast ethernet
address associated with the output
interface. e.g. the default gateway for a host
21 Frame Preparartion
Interface Queuing
Is the output queue full
Discard the frame Free the memory buff senderr (
ENOBUFS )
yes
no
Place the frame on the interfaces send queue
lestart ( ifp )
lestart ( ifp )
22Interface Layer(if_le.c) - lestart(struct ifnet
ifp)
Function Dequeues frames from the interface
output queue and arranges for them to be
transmitted by the Ethernet Card.
struct le_softc le le_softcl ifp -gt
if_unit
le -gt sc_if.if_flags IFF_RUNNING
0
return error
1
Copy the the frame in mbuf to the hardware
buffer
Set the IFF_OACTIVE on to indicate that
the device is busy transmitting.
23ip_input.c
void ipintr( ) 1. Verification of incoming
packets 2. Option processing and forwarding 3.
Packet reassembly 4. Demultiplexing Storing IP
packets ip packets are stored in a chain of
mbuf structs in a linked list. Theheader must be
stored in one mbuf.
Unable to reassemble a complete datagram
get IP header from ipintrq in first mbuf
1. If no ip addresses set yet but the interfaces
are receiving, cant do anything with incoming
packets yet. This occurs during system
initialization when interfaces have not been
configured. 2. If length of packet in mbuf lt
length of struct ip increment ipstat.ips_toosmall.
3. Check ip version 4. Check header length 5.
Ip_sum in_cksum() (ip_sum should be 0) (used
by all protocols although on different parts.) 6.
Convert from network byte order to host byte
order. 7. Ip_len gt m_pkthdr.len indicates that
some bytes are missing. 8. Trim buffers if longer
than expected. 9. Drop if shorter than expected.
Dequeue packets
Packets damaged?
yes
discard
Verification
no
ip_dooptions() ip_dst found?
yes
host in the same subnet
Forwarding
1. Is ip_dst a local address? Look for ip_dst
in_ifaddr (list of configured addresses)
Ip_dooptions() 1
ICMP error message
no
ip_forwarding 0
Goto next buffer
yes
no
ip_forward ( )
Discard free mem
24 ip_forward (struct mbuf m, int srcrt )
Phase I
Is the packet eligible for forwarding
Multicast packet
no
yes
Is packet a link level broadcast packet loopback
packet network 0 and class E addresses
Ip_mforward ( )
yes
no
TTL 1
yes
ICMP error message
discard
Locate next hop
m points to the packet to be forwarded. If srcrt
0, packet is being forwarded because of a
source route option. struct route struct
rtentry ro_rt // pointer to struct with
information struct sockaddr ro_dst //destination
associated with the route entry pointed to by
ro_rt.
no
Cache most recent route usually consecutive
packets have same destination.
Decrement TTL
save at most 64 bytes of the packet in case
ICMP message has to be sent
25(No Transcript)