Title: COM 360
1COM 360
2Chapter 5
3Getting Processes to Communicate
- Previously we were concerned with using various
technologies to connect a collection of
computers - using direct links (LANS Ethernet and token
Ring) - packet switched networks (ATM) and
- internetworks.
- Next problem is to turn the host-to-host packet
delivery service into a process-to-process
communication channel.
4End-To-End Protocol
- The transport layer supports communication
between the end application protocols and is
called the end-to-end protocol. - It interacts with two layers
- From above, the application-level processes
- From below, the underlying network
5End-To-End Protocol
- Transport layer protocol provides these services
to the application layer - Guarantees message delivery
- Delivers messages in the same order that they are
sent - Delivers at most one copy of each message
- Supports arbitrarily large messages
- Supports synchronization between sender and
receiver - Allows the receiver to apply flow control to the
sender - Supports multiple application processes on each
host.
6End-To-End Protocol
- The underlying network has some limitations. It
may - Drop messages
- Reorder messages
- Deliver duplicate copies of a given message
- Limit messages to some finite size
- Deliver messages after some arbitrarily long
delay - This kind of network, exemplified by the
Internet, provides best effort service
7The Challenge!
- Develop algorithms that turn less than desirable
properties of the network into high level of
service required by the application programs. - Representative services
- Simple asynchronous demultiplexing service (UDP)
- Reliable byte-stream service (TCP)
- Request/Reply service (for example, RPC remote
procedure call)
8Simple Demultiplexer (UDP)
- The simplest transport protocol extends
host-to-host deliver service of the underlying
network into a process-to-process communication
service. - There are multiple processes running on a host,
so there must be a level of demultiplexing to
allow multiple applications to share the network. - The User Data Protocol (UDP) uses the best effort
network service
9Simple Demultiplexer (UDP)
- Form of address used to identify the target
process - Process can directly identify each other with an
OS-assigned process ID(pid) - More commonly- processes indirectly identify each
other using a port or mailbox - Source sends a message to a port and destination
receives the message from the port - UDP port is 16 bits, so there are 64K possible
ports- not enough for all Internet hosts - Process is identified as a port on a particular
host a (port, host) pair
10Simple Demultiplexer (UDP)
- How does a process learn the port to which it
wants to send a message? - A client initiates a message exchange with a
server process. - The server knows the clients port (contained in
message header and can reply to it. - Server accepts messages at a well known port.
- Examples DNS at port 53, mail at port 25
- Published at /etc/services
11Simple Demultiplexer (UDP)
- UDP does not implement flow control or reliable
delivery. - It only it just sends messages to some
application and ensures correctness with a
checksum (optional in IPv4, but mandatory in
IPv6). - UDP computes checksum over the header, the
message body and the pseudo-header, which
contains 3 fields (protocol number, source IP
address and destination IP address)
12Format for UDP Header
13UDP Message Queue
14UDP Message Queue
- A port abstraction is typically implemented as a
message queue. When a message arrives, the
protocol appends it to the end of the queue. - If the queue is full, the message is discarded.
There is no flow control that tells the sender to
slow down. - When an application wants to receive a message,
one is removed from the queue, unless the queue
is empty.
15Reliable Byte Stream (TCP)
- The Internet Transmission Control Protocol (TCP)
offers a reliable, connection-oriented, stream
service. - TCP guarantees reliable, in-order delivery of a
stream of bytes. - It is full duplex and contains a flow control
mechanism that allows the receiver to limit how
much data the sender can transmit at one time. - TCP also implements a congestion control
mechanism, which keeps the sender form
overloading the network.
16Congestion Control vs. Flow Control
- Congestion control involves preventing too much
data from being sent into the network, which
would cause switches or links to become
overloaded. - Flow control involves preventing senders from
over-running the capacity of the receivers. - Flow control is thus an end-to-end issue while
congestion control is concerned with how hosts
and networks interact.
17End-To-End Issues
- The sliding window algorithm is at the heart of
TCP, which runs over the Internet. - TCP supports logical connections between
processes which are running on any two computers
on the Internet. - TCP uses an explicit connection establishment
phase, during which the two sides agree to
exchange data with each other. - TCP connections have different round trip times
depending on the connections and distances
between hosts.
18End-To-End Issues
- Packets may become reordered as the cross the
Internet, which does not happen on a peer-to peer
link. - The sliding window algorithm can reorder packets
using the sequence number. - TCP assumes that each packet has a maximum
lifetime, or maximum segment lifetime (MSL)
usually set at 120 seconds.
19End-To-End Issues
- Computers connected to a link usually support the
link. - TCP must include a mechanism that each side uses
to learn what resources ( e.g. buffer space)
the other side can apply to the connection a
flow control issue. - The load on the link is visible as a queue of
packets at the sender but the sending side of a
TCP connection does not know what links will be
used to reach the destination and can cause
network congestion.
20Segment Format
- TCP is a byte-oriented protocol, which means that
the sender writes bytes into a TCP connection and
the receiver reads bytes out of the TCP
connection called a byte-stream. - TCP at the source buffers bytes to fill a packet
and then sends the packet to the destination. TCP
at the destination empties the packet into a
receiver buffer and the receiving process reads
from this buffer at its leisure. - The packets exchanged are called segments, since
each carries a segment of the byte stream.
21How TCP Manages a Byte Stream
A TCP connection supports byte streams flowing in
both directions
22TCP Header Format
The UrgPtr where non-urgent data begins.
23TCP Process
Shows data flow in one direction and ACKs in the
other.
24Connection Establishment and Termination
- A TCP connection begins with a client(caller)
doing an active open to a server (callee). - The two sides exchange messages to establish the
connection. - After the connection establishment phase is over,
the two sides begin sending data. - When a participant finishes sending data, it
closes one direction of the connection, which
causes TCP to initiate a round of connection
termination messages. - Note connection setup is asymmetric (one side
does a passive open and the other an active
open), teardown is symmetric(each side closes
independently). One side can keep sending
25Three-way Handshake
- The algorithm used by TCP to establish a
connection is called a three-way handshake. - Idea Two parties want to agree on a set of
parameters, which are the starting sequence for
their byte streams. - TCP requires that each site select a random
sequence number (to prevent reusing the same
sequence again too soon).
26Timeline For three-Way Handshake Algorithm
Acknowledgement identifies the next sequence
number.
27State Transition Diagram
- Each circle in TCPs state transition diagram
denotes a state that one end of the connection
can find itself in easily. - All connections start in the CLOSED state.
- The connection moves from state of state along
the arcs. - Each arc is labeled with the tag of the form
event/action.
28State Transition Diagram
- Two kinds of events trigger a state transition
- A segment arrives from a peer or
- The local application process invokes an
operation on TCP - TCPs state transition diagram defines the
semantics of both its peer-to-peer interface. The
syntax is given by the segment format.
29TCP State Transition Diagram
30Sliding Window Revisited
- TCPs version of the the sliding window algorithm
serves several purposes - It guarantees the reliable delivery of data
- It ensures that data is delivered in order
- It enforces flow control between the sender and
receiver - TCP uses the sliding window algorithm and adds
the flow control function as well.
31Sliding Window Revisited
- TCP differs from earlier algorithms because it
includes the flow control function. - Rather than having a fixed size sliding window,
it advertises a window size to the sender. - The sender is limited to having a value lt the
value of AdvertisedWindow bytes of unacknowledged
data at any given time. - The receiver selects a suitable window value
based on the available amount of buffer memory,
to keep from over-running the buffer.
32Reliable and Ordered Delivery
Relationship between TCP send buffer (a) and
receive buffer (b)
33Reliable and Ordered Delivery
- TCP on the sending side maintains a send buffer.
- This buffer is used to store data that has been
sent but not yet acknowledged as well as data
that has been written by the sending application,
but not yet transmitted. - One the receiving side, TCP maintains a receive
buffer. - This buffer holds data that arrives out of order
as well as in the correct order, that the
application has not yet read. - (Both buffers are finite and will eventually wrap
around.)
34Flow Control
- Data arriving from upstream fills the send
buffer, and data being transmitted to a
downstream node empty the receive buffer. - The size of the window sets the amount of data
that can be sent without waiting for an
acknowledgement from the receiver. - TCP on the receiver side must keep the buffer
from overflowing. It advertises the amount of
remaining free space. - The sender computes an effective window that
limits how much data it can send
35Protecting Against Wraparound
- We need to assure that the sequence number does
not wrap around within a 120 second period of
time. (On a T1 line it will be about 6.4 hours,
but on an OC-48 line it will be 14 seconds) - Gigabit Ethernet is getting close to the point
where 32 bits are too small for the sequence
number. Extensions are being developed to solve
this.
36Keeping the Pipe Full
- The delay x bandwidth product dictates the size
of the AdvertisedWindow field. The window needs
to be large enough to allow a full delay X
bandwidth products amount of data to be
transmitted. - TCPs Advertised window field is not big enough
to handle a T3 connection.
37Triggering Transmission
- How does TCP decide to transmit a segment?
- Applications write bytes into the stream and TCP
decides that it has enough bytes to send a
segment - TCP uses the maximum segment size (MSS) and sends
the largest segment, without fragmenting. - The sending process invokes a push process and
flushes the buffer. - A timer fires and the current contents of the
buffer is sent.
38Silly Window Syndrome
- Aggressively taking advantage of any available
window leads to the silly window syndrome. - Think of a conveyor belt with full containers
(data segments) going in one direction and empty
containers (ACKs) going in the reverse. - MSS are like large containers and 1 byte
containers are like small containers. - If sender aggressively fills a container as soon
as it arrives, then small containers remain in
the system, since it is immediately filled and
empties and never merged into a larger container.
39Silly Window Syndrome
- Silly window syndrome is only a problem when
either the sender sends a small segment or the
receiver opens the window a small amount. If
neither happens, the small container is never
introduced into the stream. - Since we cant prevent s small container, there
must be a means of coalescing them. - The receiver can do this by delaying ACKs-
sending one combined ACK, rather than multiple
small one.
40Silly Window Syndrome
41Nagles Algorithm
- The ultimate solution comes back to the sender.
If there is data to send, but the window is open
, MSS, we may want to wait before transmitting.
But how long? - Introduce a timer- transmit when time expires.
- Nagle introduced a self-clocking solution.
42Nagles Algorithm
- Idea As long as there is data in flight, the
sender will eventually receive an ACK. - This ACK can be treated as a timer firing,
triggering the transmission of more data - When application produces data to send
- if both data and window gt MSS
- send a full segment
- else if there is unACKed data
- buffer the new data until the Ack arrives
- else send all the new data now
43Nagles Algorithm
- In other words- its always OK to send a full
segment if the window allows. - Its also all right to send a small amount of
data if there are currently no segments in
transit, but if there is anything in flight, the
sender must wait for an ACK before transmitting
the next segment. - An interactive application like telnet, that
continually writes a byte at a time, will send
data at a rate of one segment per RTT. - Some applications, like the socket interface,
cannot afford such a delay and can turnoff
Nagles algorithm.
44Adaptive Retransmission
- Because TCP guarantees reliable delivery of data,
it re-transmits each segment if an ACK is not
received in a certain period of time. - Choosing an appropriate timeout value is not
easy. To do so, TCP uses an adaptive
re-transmission mechanism. - The idea is to keep a running average of the RTT
and then compute the timeout as a function of the
RTT. (Timeout 2 x Estimated RTT)
45Computing an Accurate RTT
a) Original transmission vs b) Retransmission
46Karn/Partridge Algorithm
- There is a flaw in the previous estimate. The ACK
does not acknowledge a transmission it
acknowledges the receipt of data. So when a
segment is re-transmmitted, it cannot be
determined if the ACK should be associated with
the first or second transmission. - The Karn/Partridge algorithm (1987)is a simple
solution. Each time TCP transmits, it sets the
next timeout to be twice the last one. (It uses
exponential backoff, like the Ethernet does.)
47Jacobson/Karels Algorithm
- Karn/Partridge algorithm was designed to
eliminate some of the Internet congestion. - In 1988 Jacobs and Karels proposed a more drastic
change to TCP to battle congestion - TCP computes the timeout value as a function of
both EstimatedRTT and the Deviation as - Timeout mu x EstimatedRTT theta x Deviation
- Timeout is related to congestion because if you
timeout too soon, you may unnecessarily
retransmit a segment.
48Implementation
- All of these retransmission algorithms are based
on acknowledgement timeouts, which indicate that
a segment has probably been lost. Note that a
timeout does not tell the sender whether any
segment it sent after the lost segment were
received. - There are TCP extensions to assist with this.
49Record Boundaries
- TCP is a byte stream protocol. The number of
bytes written by the sender are not necessarily
the same as the number of bytes read by the
receiver. ( For example, an application might
write 8 bytes, then 2 bytes, then 20 bytes to a
TCP connection but on the receiving end, the
application reads 5 bytes in a loop that iterates
6 times.) - This is in contrast to UDP where the message sent
is exactly the same size as the message received.
50Record Boundaries
- There are two features that can be used to insert
record boundaries into a byte stream - Urgent data feature, uses the URG flag and the
UrgPtr field in the TCP header specifying
special data - Push operation- used to flush or send bytes
that it has buffered
51TCP Extensions
- These extensions are options that can be added to
the TCP header - The first helps to improve TCPs timeout
mechanism - The second addresses the problem of TCPs 32 bit
sequence number filed wrapping around too soon on
a high-speed network, using a timestamp - The third allows TCP to advertise a larger window
- The fourth is the selective acknowledgement or
SACK option, which allows the sender to
retransmit just the segments that are missing.
52Alternative Design Choices
- TCP is robust and satisfies the needs of a wide
range of applications, but it is not the only
design choice possible. - There could have been request/reply protocols
like RPC - TCP could have provided a reliable byte stream
service instead of a a reliable message stream
service - TCP implements setup/tear down phases but could
have sent connection parameters with the first
message - TCP is a window-based protocol, but rate-based
designs are possible.
53Remote Procedure Call (RPC)
- The request/reply paradigm, is also called the
message transaction - A client sends a request message to a server and
the server responds with a reply message, with
the client blocking ( suspending execution) to
wait for the reply. - A transport protocol that supports request/reply
is more that a UDP message going in one direction
followed by a UDP message going in the other
direction. - The third type of transport protocol, called
RPC,matches the needs of an application involved
in a request/reply message exchange.
54Timeline for RPC
55RPC Fundamentals
- RPC is more than just a protocol- it is a popular
mechanism for structuring distributed systems. - The application program makes a call into a
procedure, regardless of whether it is local or
remote and blocks until the call returns. - RPC is also called the remote method invocation
RMI
56RPC Fundamentals
- A complete RPC mechanism involves two major
components - A protocol that manages the messages sent between
the client and the server as well as the
properties of the underlying network - Programming language and compiler support to
package arguments into a request message on the
client and to translate this message back into
the arguments on the server machine ( a stub
compiler).
57Complete RPC Mechanism
58RPC Fundamentals
- RPC refers to t type of protocol rather than a
specific standard like TCP - Unlike TCP, which is the dominant reliable byte
stream protocol, there is no one dominant RPC
protocols.
59RPC Fundamentals
- RPC performs a complex set of functions.
- Think of it as a stack of protocols
- BLAST fragments and reassembles large messages
- CHAN synchronizes request and reply messages
- SELECT dispatches request messages to the
correct process - These are not standard protocols, but demonstrate
the algorithms needed to implement RPC.
60A Simple RPC Stack
61Synchronous versus Asynchronous Protocols
- One way to characterize a protocol is by whether
it is synchronous or asynchronous. - At the asynchronous end of the spectrum, the
application knows nothing when send returns. - At the synchronous end of the spectrum, the send
operation returns a reply message.
62SunRPC
- SunRPC is a widely used protocol and while it has
not been accepted by any standards body, it has
become a de facto standard. - It plays a central role in Suns Network file
system (NSF). - SunRPC addresses the three functions of
fragmentation, synchronization and dispatching in
a slightly different order.
63Protocol Graph for SunRPC
64SunRPC Header Formats
a) Request b) Reply
65DCE
- DCE is the Distributed Computing Environment,
which is a set of standards and software for
building distributed systems. - DCE-RPC is designed to run on top of UDP and is
similar to SunRPC in using a 2 level addressing
scheme.
66Typical DCE-RPC Message Exchange
67Transport for Real-Time Applications (RTP)
- Real-time traffic, such as digitized voice
samples are carried over packet networks. - A real-time application is one with strong
requirements for the timely delivery of
information. ( for example, VoIP or voice over
IP and multimedia applications) - Real-time applications make demands on the
transport protocol that are not met by earlier
protocols like TCP and UDP. - The real-time protocol (RTP) is designed to meet
some of these challenges.
68Performance
- Network performance is evaluated by two metrics
delay (or latency) and throughput. - These represent performance as seen by the
application programs.
69Performance
- Each experiment involved running 3 identical
instances of the same test. - Delay or latency was measured for message sizes
of 1 byte, 100 bytes, 200 bytes1000 bytes - Throughput results were computed for message
sizes of 1KB, 2KB, 4KB32KB - Latency for the 1 byte case represents the
overhead involved in processing each message and
is the lower bound on latency. - Delay increases with message size for both UDP
and TCP.
70Measured System
Two Pentium Workstations running Linux connected
by a 100 Mbps Ethernet. A test program pings
messages between them.
71Round Trip Latencies
72Measured Throughput
73Performance
- Throughput improves as the messages get larger
larger messages mean less overhead. - The throughput curve flattens above 16 KB and
tops out before reaching 100MBps. - The factor preventing systems form running at
full Ethernet speed is a limitation of the
network adaptor, rather than the software.
74Summary
- Four end-to end- protocols
- UDP- simple demultiplexor- dispatches messages to
appropriate application process based on a port
number- offering unreliable, connectionless
datagram service. - TCP- reliable byte stream protocol, which
recovers from lost messages and delivers messages
in the order they were sent. It provides flow
control by using the sliding window protocol. It
provides a timeout mechanism. - RPC- request/reply protocol
- RTP- real time protocol.