Internetworking Protocols and Programming - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Internetworking Protocols and Programming

Description:

Tail-Drop can trigger a global response on TCP flows that pass through a specific router ... may send a FIN signal. When TCP receives a FIN, the stack notifies ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 43
Provided by: krishp9
Category:

less

Transcript and Presenter's Notes

Title: Internetworking Protocols and Programming


1
Internetworking Protocols and Programming CSE
5348 / 7348 Instructor Krish Pillai Session 8
2
TCP Protocol
  • Uses sliding window protocol to optimize network
    throughput
  • Provides acknowledgements for guaranteed data
    delivery
  • Provides full duplex communication through the
    use of ACK and SYN numbering
  • Retransmissions done if ACK is not received
    within a certain time limit
  • Retransmit time is based on RTT under normal
    cases or on Karns back off strategy once a
    retransmit has occurred
  • Advertised windows are used receiver to limit
    sender from flooding receiver buffer space with
    data

3
Network Delay
  • The timeout value based on RTT computation is
    not responsive to short term variations in
    network delay
  • Intermediate routers have to queue incoming
    datagrams while they are busy referring routing
    tables
  • This causes queue lengths to increase making
    certain datagrams spend extra time in buffers
    waiting to be forwarded
  • This results in jitter or changing round trip
    time which is a function of network congestion
  • Queuing theory shows that the variance s in
    round trip time is inversely proportional to
    available network capacity

4
Network Delay
  • If L is the current network load expressed as a
    fraction such that
  • 0 ? L ? 1
  • The variance is RTT s is proportional to
    1/(1-L)
  • If the network is running at 80 capacity then
    the variance is 5 s
  • Previous technique of setting Timeout b RTT
    works well for loads up to 30
  • where RTT ( a Old_RTT) ( (1- a) New _RTT)
  • This equation did not have a predictive (diff)
    component in it
  • New computation for timeout value factors in
    load differential for better response

5
Network Delay
  • Compute the difference in successive RTT
    measurements
  • diff SAMPLE OLD_RTT
  • Smoothed_RTT OLD_RTT d diff
  • DEV OLD_DEV r ( DIFF - OLD_DEV )
  • Timeout Smoothed_RTT h DEV
  • Where
  • 0 ? d ? 1 controls how quickly the new sample
    affects the weighted average
  • 0 ? r ? 1 controls how quickly the new sample
    affects the mean deviation
  • h 1 and controls how quickly the deviation
    affects the timeout value
  • Original value of h was 2, but later on changed
    to 4 in 4.4 BSD
  • It was found that this estimate responded well
    but tended to underestimate RTT causing
    retransmission

6
Congestion Response
  • Delay caused by routers under heavy load can
    therefore cause TCP stacks to send unwanted
    retransmissions triggering a congestion collapse
  • TCP should reduce transmission rates when
    congestion occurs on routers
  • Unlike UDP, TCP provides congestion control
  • Congestion Algorithms prevent senders from
    overloading the network
  • Most implementations have the following four
    basic IETF standards
  • Slow Start
  • Congestion Avoidance
  • Fast retransmit
  • Fast recovery

7
Slow Start Algorithm
  • Routers send ICMP Source Quench if buffers fill
    up, but this works only to a point
  • If TCP starts injecting packets based on
    advertised window, a rapid rise in traffic can
    cause congestion and subsequent retransmissions
  • Slow Start Algorithm adds a new window called
    the cwnd or the Congestion Window
  • TCP acts as if the sliding window size is
  • min ( advertised window, congestion window)
  • When a connection is established cwnd is
    initialized to one segment
  • Slow Start Algorithm is based on the observation
    that
  • new packets should be injected into the network
    at the rate at which acknowledgements are
    received

8
Slow Start Algorithm
  • Increase congestion window by one segment each
    time an acknowledgement is received
  • Starts with one segment being sent with cwnd set
    to 1
  • When ACK arrives, two segments are sent and cwnd
    is set to two
  • when two ACKS arrive, cwnd is set to four and so
    on
  • Congestion window is allowed to grow till it
    becomes equal to the advertised window
  • For a receiver advertised window size of N, it
    takes only log 2 N round trips before cwnd
    reaches the advertised size
  • This mechanism for injecting packets works only
    if there are no losses in the network
  • Losses are detected by Timeouts or duplicate ACKS

9
Congestion Avoidance
  • To control the rate at which packets are
    injected in the event of loss, TCP uses a
    different rate to inject packets
  • TCP uses a congestion avoidance phase when loss
    is detected where cwnd is increased by one only
    if all segments in the window are acknowledged
  • To detect this cross over point between slow
    start and congestion avoidance, TCP uses a
    register called ssthresh
  • When segment loss is detected by TCP half the
    cwnd value is copied into the ssthresh register
  • If duplicate ACKS are detected then cwnd is then
    multiplicatively reduced
  • If timeout was the reason cwnd is set to one
  • When the receiver starts acknowledging segments
    TCP uses Slow start or congestion avoidance to
    grow cwnd
  • If cwnd ? ssthresh then TCP is in slow start,
    else in congestion avoidance

10
TCP Congestion Control Algorithms
  • The combined algorithm works as follows
  • A new connection sets cwnd to one segment and
    ssthresh to 65535 bytes
  • The cwnd is grown according to Slow start
    algorithm as ACKs are received
  • TCP never sends more than the lower value of the
    cwnd or the receivers advertised window, which
    is supplied in the ACK packet
  • When packet loss is detected (timeout or
    duplicate ACK), one half of the cwnd is stored as
    the ssthresh value and additionally if congestion
    was indicated by a timeout (prolonged
    congestion), cwnd is set to one segment
  • when new data is acknowledged by the other end ,
    cwnd is increased
  • the way cwnd is increased depends on whether TCP
    is in slow start or congestion avoidance mode
  • If cwnd is less than or equal to ssthresh, TCP
    is in Slow Start, else TCP is in Congestion
    Avoidance mode
  • Slow start increases cwnd exponentially while
    Congestion Avoidance increase cwnd by
    segsizesegsize/cwnd (linear growth)
  • Exponential window growth occurs until TCP is
    half way up to where congestion occurred, then
    window growth is linear (less aggressive)

11
TCP Congestion Control Algorithms
TCP Window Growth as a function of time
cwnd advertised window
cwnd ssthresh
Delta cwnd segsizesegsize/cwnd
cwnd size
Delta cwnd 2 cwnd
Time
12
Fast Retransmit
  • Fast Retransmit algorithm avoids TCP waiting for
    a timeout to resend lost segments (1990)
  • TCP sender does not know if duplicate ACKs are
    due to packets being delivered out of sequence or
    if the packet was indeed lost
  • Sender waits for a small number of ACKs to be
    received
  • Assumption If the packets were delivered out of
    order, there will be only one or two duplicate
    ACKs before a new ACK is sent by the receiver
  • If three or more ACKs are received in
    succession, it is strongly indicative of packet
    loss
  • Sender then retransmits the (apparently) missing
    segment without waiting for a timeout to occur

13
Fast Recovery
  • Oftentimes network congestion is transitory and
    Slow Start can force TCP to lose whatever it
    learnt about the network
  • Therefore once Fast Retransmit starts, the cwnd
    is grown based on congestion avoidance (linear
    growth) and not slow start (exponential growth)
  • This approach is termed the Fast Recovery
    Algorithm
  • Assumption Receipt of a few duplicate ACKs
    means that packets are getting through, though
    some packets were dropped
  • Receiver can generate an ACK only when a new
    packet has been received and buffered
  • This indicates that congestion is transient,
    hence no need to reduce flow abruptly by going
    into slow start (sets cwnd to one segment)
  • After retransmission occurs and all segments in
    the current window are acknowledged, the cwnd
    size is increased linearly

14
Congesting and Tail Drop
  • Routers handling heavy TCP traffic need
    specialized recovery mechanisms from congestion
  • In its simplest form Routers handle congestion
    by dropping new packets that arrive on its
    ingress buffer Tail-Drop Policy
  • Tail-Drop can trigger a global response on TCP
    flows that pass through a specific router
  • When a router that supports several TCP flows
    drops packets, all supported flows are affected
  • All effected senders reset their cwnd to 1
    segment causing an abrupt drop in traffic
    allowing the router to recover
  • All senders may go into slow start
    simultaneously
  • They may start growing traffic together to drive
    the router back into congestion causing the
    network throughput to oscillate

15
Random Early Discard
  • Routers use an improved scheme for overload
    control
  • Routers set two markers for the input buffer
    pool at Tmax and Tmin
  • If the queue contains less than Tmin add all
    datagrams to the queue
  • If queue contains more than Tmin datagrams but
    less than Tmax, randomly discard packets with a
    probability of p
  • If queue contains Tmax datagrams then discard
    all arriving packets
  • The probability p for discarding packets can
    be increased or decreased based on the nature of
    congestion
  • If congestion is so high that Tmax is
    consistently maintained, RED degenerates to
    Tail-Drop causing global oscillations
  • A simple approach is where p is increased from
    10 to 100 through increments of 10 if sustained
    congestion occurs

16
Random Early Discard
  • If a short burst of datagrams pushes the
    indicator above Tmin, RED starts dropping packets
    randomly
  • The queue may never get filled under such
    circumstances
  • To avoid this from happening RED computes a
    weighted average queue size
  • avg (1 - g ) Old_avg g Current_queue-size
  • where g is a coefficient between 0 and 1
  • The queue is generally measured in terms of
    octets and not datagrams but discarding is done
    based on datagrams
  • This means small datagrams have a lower
    probability of being discarded compared to large
    datagrams
  • This makes sure that pure ACKS have a lower
    probability of getting dropped under congestive
    situations

17
Establishing a Connection
  • TCP is connection oriented requiring
    establishment of a connection before processes
    can talk
  • The server (usually) issues a passive OPEN call
  • Clients issue an active OPEN call
  • The passive OPEN call remains dormant until a
    process attempts to connect to it by an active
    OPEN

Process 2
Process 1
The three-way handshake
Passive OPEN, Waits for active request
Active OPEN
1. Send SYN, seqn (ISN)
Receive SYN
2. Send SYN, seqm (ISN), ACKn1
Receive SYNACK
3. Send ACKm1
ISN Initial Sequence number
18
Closing a Connection
  • TCP is full duplex, therefore release signals
    should be sent to both ends of the connection
  • One end sends the last TCP segment with the FIN
  • The other process sends all its data ending with
    a TCP segment with the FIN bit set
  • The FIN bit signals the termination of a
    connection in one direction
  • FIN signals have to be received at both ends
    before the connection is released

MSL - Maximum Segment Life Connection stays in
the TIME_WAIT state for 2MSL after Active close
Process 1
Process 2
Active close (TIMED_WAIT)
1. FIN, seq1415531522(0) ACK1823083522
Passive close Timed Wait (2MSL)
2. ACK 1415531523
3. FIN, seq 1823083522(0) ACK1415531523
4. ACK 1823083523
19
Closing a Connection
  • TCP takes three segments to establish a
    connection
  • It takes four segments to terminate a connection
    (Orderly release)
  • TCP does a half-close in either direction before
    a connection is terminated
  • Either end may send a FIN signal
  • When TCP receives a FIN, the stack notifies the
    application that that the other end has
    terminated connection
  • TCP provides this seldom used feature for
    application to close transmissions one way while
    continuing to receive data from the far end
  • Connections both ways can also be terminated by
    an ABORT signal (Abortive release) with the RST
    bit set in the code field
  • Processes can do a simultaneous active
    open/close (Peer to Peer)

20
Silly Window Syndrome
  • TCP throughput deteriorates when one of the
    machines involved in the transaction is extremely
    slow
  • As the buffer on the receiver fills up it will
    progressively advertise a smaller window
  • Transmission from the sender will stop when
    advertisement drops to zero
  • Subsequent transmissions may have segments
    carrying one byte at a time degrading network
    throughput
  • This can also happen if the application sends
    data in blocks of B octets and TCP transmits is
    segments of M octets
  • if M is not a multiple of B, fractional data
    fragments have to be transmitted in small
    segments
  • This problem is termed the Silly Window Syndrome
    (SWS)

21
SWS Avoidance
  • Avoidance of SWS can be done at the receiver and
    at the sender
  • Receive-side Silly window avoidance
  • The receiver advertises a zero window when its
    buffer fills up
  • The receiver is then made to delay window
    advertisements until the buffer empties
    substantially
  • Window advertisements start when the buffer is
    50 emptied or if enough buffer space to hold a
    datagram of size MSS (max segment size) is
    available
  • Receive-side Delayed Acknowledgements
  • Once the advertised window becomes small, the
    receiver can start to delay acknowledgements
  • If data arrives in the meantime a single
    acknowledgement can signal receipt of at the
    datagrams reducing reverse traffic
  • If ACKS are delayed too much, a retransmission
    may occur
  • RTT computation on the sender may go awry due to
    artificial delay from receiver ( should never be
    delayed more than 500 ms)

22
SWS Avoidance
  • Send-side Silly window avoidance Nagle
    Algorithm
  • Data is clumped into aggregates before it is
    sent so that tinygrams are avoided
  • An adaptive technique is used by TCP to send
    data accumulating in transmit buffer
  • Unacknowledged data is queued into a transmit
    buffer until a limit is reached
  • Data is transmitted when limit is reached
  • If buffer is still not filled transmit data when
    an ACK arrives
  • Apply the rule even when the PUSH flag is set
  • Certain highly interactive applications such as
    X-traffic requires transmission of mouse and
    cursor controls
  • The Nagle Algorithm can be turned off to improve
    response time using the TCP_NODELAY socket option

23
Application Program Interfaces
  • Two most prevalent APIs are Berkeley Sockets and
    the System V Transport Layer Interface (TLI)
  • Both developed for UNIX and first implemented in
    C language
  • API design approach is to make the network I/O
    as similar to file I/O as possible
  • File I/O supports the following six system calls
  • open, creat, close, read, write, and lseek
  • File I/O operations work with file descriptors
  • File descriptor is an integer unique to a
    process that is used to identify a file that has
    been opened for I/O
  • Though superficially similar, the nature of
    Network I/O requires more details and options
    than file I/O

24
Application Program Interfaces
  • Need to specify whether the protocol between
    Client and Server is connection-oriented or
    connectionless
  • Only two processes take part in a transaction
    based on a connection-oriented protocol
  • a dedicated connection is set up for each
    transaction between the server process and the
    client process
  • Multiple client processes can talk to a server
    process using a connectionless protocol
  • Several Client processes can simply send
    messages to the same server process
  • For efficient Client-Server application, we have
    to maximize transaction throughput
  • Servers should processes as many client request
    as possible within a specific time slot

25
Server Client Model
  • Iterative server - Server waits for a client
    request, services it when it arrives, and goes
    back to waiting for a new request
  • Server process should know how long it takes to
    service a client request
  • Request arriving while Server is busy are queued
    for the Server by the Kernel
  • Concurrent server Server starts a new process
    to handle each request
  • Server process does not know how long it takes
    to handle a request
  • Client and Server processes are asymmetric and
    are coded differently
  • Servers are generally started first and clients
    request later on connect to them

26
Server Client Model
  • Servers
  • Open a communication channel and inform the
    local host of it readiness to accept client
    requests on a well known address (WKA)
  • Wait for a client request to arrive at the WKA
  • For an iterative server, process the request and
    send a reply. For a concurrent server, a new
    process is spawned to handle this client request
  • Go back to step 2 and wait for another client
    request
  • Clients
  • Open a communication channel and connect to a
    specific well-known address on a specific host
  • Send service request messages to the server, and
    receive the responses
  • Close communications channel and terminate

27
Client-Server Model
Server
(Connection-oriented protocol)
Client
Connection establishment
Blocks until connection from client
Data(request)
Data(reply)
28
Client-Server Model
(Connectionless protocol)
Server
Client
Blocks until data received from client
data (request)
Process request
data (reply)
29
Socket Addresses
  • Most BSD networking system calls require a
    pointer to a socket address structure as an
    argument (defined in ltsys/socket.hgt )
  • struct sockaddr
  • u_short sa_family / address family AF_xxx
    value /
  • char sa_data14 / up to 14 bytes of
    protocol-specific address /
  • The struct sockaddr is a generic construct that
    can hold identifiers for any protocol
  • For the Internet family the following structures
    are defined in ltnetinet/in.hgt
  • struct in_addr
  • u_long s_addr / 32 bit network ID host ID
    /
  • struct sockaddr_in
  • short sin_family / AF_INET /
  • u_short sin_port / 16 bit port number,
    network byte ordered /
  • struct in_addr sin_addr / 32 bit netid/hostid
    network byte ordered /
  • char sin_zero8 / unused /

30
Elementary Socket System Calls
  • socket System call returns a descriptor of
    type integer
  • this call specifies the type of communication
    protocol desired (TCP, UDP, XNS (Xerox Network
    Systems) etc.
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int socket (int family, int type, int protocol)
  • where family is
  • AF_UNIX Unix Internal protocols
  • AF_INET Internet protocols
  • AF_NS Xerox NS protocols
  • AF_IMPLINK Interface Message Processor Link Layer

31
Elementary Socket System Calls
  • socket System call returns a descriptor of
    type integer
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int socket (int family, int type, int protocol)
  • where type is
  • SOCK_STREAM stream socket
  • SOCK_DGRAM datagram socket
  • SOCK_RAW raw socket
  • SOCK_SEQPACKET Sequenced Packet Socket
  • SOCK_RDM reliably delivered message socket
    (unimplemented)

32
Elementary Socket System Calls
  • socket System call returns a descriptor of
    type integer
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int socket (int family, int type, int protocol)
  • where protocol is
  • IPPROTO_UDP UDP
  • IPPROTO_TCP TCP
  • IPPROTO_ICMP ICMP
  • IPPROTO_RAW Protocol Field left empty

33
Elementary Socket System Calls
  • bind System call assigns a name to an unnamed
    socket
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int bind (int sockfd, struct sockaddr myaddr,
    int addrlen)
  • Servers register their well-known addresses with
    the system so that the system can forward packets
    bound for this IP address and port number to the
    bound process
  • A client can register a specific address for
    itself
  • A connectionless client needs to assure that the
    system assigns it some unique address, so that
    the other end has a valid return address to send
    its responses to

34
Elementary Socket System Calls
  • connect System call client establishes a
    connection with the server
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int connect (int sockfd, struct sockaddr
    servaddr, int addrlen)
  • sockfd is a descriptor returned from the socket
    call
  • Second argument is a sockaddr filled with server
    descriptors
  • The connect call does not return until a
    connection is negotiated and established
  • A connection-oriented client does not have to
    bind to a local address before calling connect.
    Local address is auto assigned

35
Elementary Socket System Calls
  • listen System call connection-oriented server
    indicates to the system its willingness to
    receive connections
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int listen (int sockfd, int backlog)
  • call is executed after both the socket and bind
    calls and immediately before the accept system
    call
  • Backlog defines queue for incoming connections
    while the server is executing the accept command
    (usually set to five)
  • In concurrent connection-oriented servers, the
    server needs to accept a request and fork a child
    process before it can do another accept. This
    involves time delay with possible queue buildup

36
Elementary Socket System Calls
  • accept System call after connection-oriented
    server calls listen it executes the accept system
    call
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int accept (int sockfd, struct sockaddr peer,
    int addrlen)
  • accept takes the first request in the queue and
    creates another socket with the same properties
    as sockfd, assigns a new descriptor and returns
    this value
  • The sockaddr is filled with the address of the
    client requesting service
  • addrlen is a value-result parameter. It contains
    the size of the struct sockaddr before the call,
    and is filled in with the size of the sockaddr
    that defines the connection request

37
Elementary Socket System Calls
  • send/sendto System calls similar to write but
    requires additional arguments
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int send (int sockfd, char buff, int nbytes, int
    flags)
  • int sendto (int sockfd, char buff, int nbytes,
    int flags, struct sockaddr to, int addrlen)
  • send call sends data into the socket defined by
    sockfd. Contents of buffer pointed to by buff, up
    to nbytes length is transmitted. sockaddr holds
    destination address for sendto function call
  • The flags field is either zero or is formed by
    ORing the following
  • MSG_OOB send out-of-band data
  • MSG_DONTROUTE bypass routing (send or sendto)

38
Elementary Socket System Calls
  • recv/recvfrom System calls similar to read but
    requires additional arguments
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int recv (int sockfd, char buff, int nbytes, int
    flags)
  • int recvfrom (int sockfd, char buff, int nbytes,
    int flags, struct sockaddr from, int addrlen)
  • Receives data from a client. The recv system call
    is used with connection oriented client/servers.
    Fills in from and addrlen
  • The flags field is either zero or is formed by
    ORing the following
  • MSG_OOB receive out-of-band data
  • MSG_PEEK peek at incoming message (recv or
    recvfrom)

39
Elementary Socket System Calls
  • close System calls closes the socket and sends
    any queued data if protocol is reliable
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int close (int sockfd)
  • Sends any queued data is the protocol used by the
    socket is reliable
  • Normally system tries to return from the close
    immediately, but kernel attempts to send any data
    queued

40
Elementary Socket System Calls
  • Connectionless Clients can also call the connect
    system call
  • The connect call for UDP is a dummied call that
    does not send any packets out through a UDP
    socket
  • Local data structures for the destination
    address get set up with this call
  • Once connected the client can use send and
    recv to transmit data to the server
  • The server address does not have to be supplied
    each time data is transmitted as in the case of
    sendto and recvfrom
  • The term connect for a UDP client is a misnomer,
    but helps code efficiency

41
Elementary Socket System Calls
  • getsockname System call returns local protocol
    address associated with a socket
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int getsockname (int sockfd, struct sockaddr
    localaddr, int addrlen)
  • If a connection-oriented client does not call
    bind, getsockname can be used to return the local
    IP address and local port number assigned to the
    connection by the kernel
  • After calling bind with a port number of zero,
    getsockname can be used to get the port allocated
    by the system for the process

42
Elementary Socket System Calls
  • getpeername System call returns foreign
    protocol address associated with a socket
  • include ltsys/types.hgt
  • include ltsys/socket.hgt
  • int getpeername (int sockfd, struct sockaddr
    peeraddr, int addrlen)
  • If a connection-oriented server calls accept and
    execs a child process, getpeername is the only
    way the child process can obtain the clients
    identity
Write a Comment
User Comments (0)
About PowerShow.com