Title: EndtoEnd Protocols
1End-to-End Protocols
- Underlying best-effort network
- drop messages
- re-orders messages
- delivers duplicate copies of a given message
- limits messages to some finite size
- delivers messages after an arbitrarily long delay
- Common end-to-end services
- guarantee message delivery
- deliver messages in the same order they are sent
- deliver at most one copy of each message
- support arbitrarily large messages
- support synchronization
- allow the receiver to flow control the sender
- support multiple application processes on each
host
2Simple Demultiplexor (UDP)
- Unreliable and unordered datagram service
- Adds multiplexing
- No flow control
- Endpoints identified by ports
- servers have well-known ports
- see /etc/services on Unix
- Header format
- Optional checksum
- psuedo header UDP header data
0
16
31
SrcPort
DstPort
Checksum
Length
Data
3- The following customer-specific entries were
found in the services file - prior to an upgrade. Note that service names
and their corresponding - port numbers must be registered with IANA,
http//www.iana.org, - and entries not registered as such may not be
preserved - automatically by future upgrades.
- echo 7/udp
- discard 9/tcp sink null
- systat 11/tcp users
- daytime 13/udp
- netstat 15/tcp
- chargen 19/tcp ttytst source
- ftp-data 20/tcp
- ftp 21/tcp
- ssh 22/tcp
Secure Shell - telnet 23/tcp
- smtp 25/tcp mail
- time 37/tcp timserver
- name 42/udp nameserver
- whois 43/tcp nicname
usually to sri-nic
bootps 67/udp
BOOTP/DHCP server bootpc 68/udp
BOOTP/DHCP client kerberos
88/udp kdc Kerberos V5
KDC hostnames 101/tcp hostname
usually to sri-nic pop2 109/tcp
pop-2 Post Office Protocol -
V2 pop3 110/tcp
Post Office Protocol - Version 3 sunrpc
111/udp rpcbind sunrpc 111/tcp
rpcbind imap 143/tcp
imap2 Internet Mail Access Protocol
v2 ldap 389/tcp
Lightweight Directory Access Protocol ldap
389/udp
Lightweight Directory Access Protocol dhcpv6-clien
t 546/udp dhcpv6c DHCPv6
Client (RFC 3315) dhcpv6-server 547/udp
dhcpv6s DHCPv6 Server (RFC
3315) submission 587/tcp
Mail Message Submission submission
587/udp see RFC
2476 ldaps 636/tcp
LDAP protocol over TLS/SSL (was sldap) ldaps
636/udp LDAP
protocol over TLS/SSL (was sldap) tftp
69/udp rje 77/tcp
4TCP Overview
- Connection-oriented
- Byte-stream
- app writes bytes
- TCP sends segments
- app reads bytes
- Full duplex
- Flow control keep sender from overrunning
receiver - Congestion control keep sender from overrunning
network
5Data Link Versus Transport
- Potentially connects many different hosts
- need explicit connection establishment and
termination - Potentially different RTT
- need adaptive timeout mechanism
- Potentially long delay in network
- need to be prepared for arrival of very old
packets - Potentially different capacity at destination
- need to accommodate different node capacity
- Potentially different network capacity
- need to be prepared for network congestion
6Managing the Byte Stream
7Segment Format
8Segment Format (cont)
- Each connection identified with 4-tuple
- (SrcPort, SrcIPAddr, DsrPort, DstIPAddr)
- Sliding window flow control
- acknowledgment, SequenceNum, AdvertisedWinow
- Flags
- SYN, FIN, RESET, PUSH, URG, ACK
- Checksum
- pseudo header TCP header data
9Connection Establishment and Termination
Active participant
Passive participant
(client)
(server)
SYN, SequenceNum
x
,
y
1
SYN ACK, SequenceNum
x
Acknowledgment
ACK, Acknowledgment
y
1
10State Transition Diagram
11Sliding Window Revisited
- Sending side
- LastByteAcked lt LastByteSent
- LastByteSent lt LastByteWritten
- buffer bytes between LastByteAcked and
LastByteWritten
- Receiving side
- LastByteRead lt NextByteExpected
- NextByteExpected lt LastByteRcvd 1
- buffer bytes between NextByteRead and LastByteRcvd
12Flow Control
- Send buffer size MaxSendBuffer
- Receive buffer size MaxRcvBuffer
- Receiving side
- LastByteRcvd - LastByteRead lt MaxRcvBuffer
- AdvertisedWindow MaxRcvBuffer -
(NextByteExpected - NextByteRead) - Sending side
- LastByteSent - LastByteAcked lt AdvertisedWindow
- EffectiveWindow AdvertisedWindow -
(LastByteSent - LastByteAcked) - LastByteWritten - LastByteAcked lt MaxSendBuffer
- block sender if (LastByteWritten - LastByteAcked)
y gt MaxSenderBuffer - Always send ACK in response to arriving data
segment - Persist when AdvertisedWindow 0
13Protection Against Wrap Around
- 32-bit SequenceNum
- Bandwidth Time Until Wrap Around
- T1 (1.5 Mbps) 6.4 hours
- Ethernet (10 Mbps) 57 minutes
- T3 (45 Mbps) 13 minutes
- FDDI (100 Mbps) 6 minutes
- STS-3 (155 Mbps) 4 minutes
- STS-12 (622 Mbps) 55 seconds
- STS-24 (1.2 Gbps) 28 seconds
14Keeping the Pipe Full
- 16-bit AdvertisedWindow
- Bandwidth Delay x Bandwidth Product
- T1 (1.5 Mbps) 18KB
- Ethernet (10 Mbps) 122KB
- T3 (45 Mbps) 549KB
- FDDI (100 Mbps) 1.2MB
- STS-3 (155 Mbps) 1.8MB
- STS-12 (622 Mbps) 7.4MB
- STS-24 (1.2 Gbps) 14.8MB
15Adaptive Retransmission(Original Algorithm)
- Measure SampleRTT for each segment/ ACK pair
- Compute weighted average of RTT
- EstRTT a x EstRTT b x SampleRTT
- where a b 1
- a between 0.8 and 0.9
- b between 0.1 and 0.2
- Set timeout based on EstRTT
- TimeOut 2 x EstRTT
16Karn/Partridge Algorithm
Sender
Receiver
Sender
Receiver
Original transmission
Original transmission
TT
TT
ACK
Retransmission
SampleR
SampleR
Retransmission
ACK
- Do not sample RTT when retransmitting
- Double timeout after each retransmission
17Jacobson/ Karels Algorithm
- New Calculations for average RTT
- Diff SampleRTT - EstRTT
- EstRTT EstRTT (d x Diff)
- Dev Dev d( Diff - Dev)
- where d is a factor between 0 and 1
- Consider variance when setting timeout value
- TimeOut m x EstRTT f x Dev
- where m 1 and f 4
- Notes
- algorithm only as good as granularity of clock
(500ms on Unix) - accurate timeout mechanism important to
congestion control (later)
18TCP Extensions
- Implemented as header options
- Store timestamp in outgoing segments
- Extend sequence space with 32-bit timestamp
(PAWS) - Shift (scale) advertised window
19Remote Procedure Call
- Outline
- Protocol Stack
- Presentation Formatting
20RPC Timeline
Client
Server
Blocked
Request
Computing
Blocked
Reply
Blocked
21RCP Components
- Protocol Stack
- BLAST fragments and reassembles large messages
- CHAN synchronizes request and reply messages
- SELECT dispatches request to the correct process
- Stubs
22Bulk Transfer (BLAST)
- Unlike AAL and IP, tries to recover from lost
fragments - Strategy
- selective retransmission
- aka partial acknowledgements
23BLAST Details
- Sender
- after sending all fragments, set timer DONE
- if receive SRR, send missing fragments and reset
DONE - if timer DONE expires, free fragments
24BLAST Details (cont)
- Receiver
- when first fragments arrives, set timer LAST_FRAG
- when all fragments present, reassemble and pass
up - four exceptional conditions
- if last fragment arrives but message not complete
- send SRR and set timer RETRY
- if timer LAST_FRAG expires
- send SRR and set timer RETRY
- if timer RETRY expires for first or second time
- send SRR and set timer RETRY
- if timer RETRY expires a third time
- give up and free partial message
25BLAST Header Format
- MID must protect against wrap around
- TYPE DATA or SRR
- NumFrags indicates number of fragments
- FragMask distinguishes among fragments
- if TypeDATA, identifies this fragment
- if TypeSRR, identifies missing fragments
26Request/Reply (CHAN)
- Guarantees message delivery
- Synchronizes client with server
- Supports at-most-once semantics
- Simple case Implicit Acks
27CHAN Details
- Lost message (request, reply, or ACK)
- set RETRANSMIT timer
- use message id (MID) field to distinguish
- Slow (long running) server
- client periodically sends are you alive probe,
or - server periodically sends Im alive notice
- Want to support multiple outstanding calls
- use channel id (CID) field to distinguish
- Machines crash and reboot
- use boot id (BID) field to distinguish
28CHAN Header Format
- typedef struct
- u_short Type / REQ, REP, ACK, PROBE /
- u_short CID / unique channel id /
- int MID / unique message id /
- int BID / unique boot id /
- int Length / length of message /
- int ProtNum / high-level protocol /
- ChanHdr
- typedef struct
- u_char type / CLIENT or SERVER /
- u_char status / BUSY or IDLE /
- int retries / number of retries
/ - int timeout / timeout value /
- XkReturn ret_val / return value /
- Msg request / request message /
- Msg reply / reply message /
- Semaphore reply_sem / client semaphore /
- int mid / message id /
29Synchronous vs Asynchronous Protocols
- Asynchronous interface
- xPush(Sessn s, Msg msg)
- xPop(Sessn s, Msg msg, void hdr)
- xDemux(Protl hlp, Sessn s, Msg msg)
- Synchronous interface
- xCall(Sessn s, Msg req, Msg rep)
- xCallPop(Sessn s, Msg req, Msg rep, void
hdr) - xCallDemux(Protl hlp, Sessn s, Msg req, Msg
rep) - CHAN is a hybrid protocol
- synchronous from above xCall
- asynchronous from below xPop/xDemux
30- chanCall(Sessn self, Msg msg, Msg rmsg)
- ChanState state (ChanState )self-gtstate
- ChanHdr hdr
- char buf
- / ensure only one transaction per channel /
- if ((state-gtstatus ! IDLE))
- return XK_FAILURE
- state-gtstatus BUSY
- / save copy of req msg and ptr to rep msg/
- msgConstructCopy(state-gtrequest, msg)
- state-gtreply rmsg
- / fill out header fields /
- hdr state-gthdr_template
- hdr-gtLength msgLen(msg)
- if (state-gtmid MAX_MID)
- state-gtmid 0
- hdr-gtMID state-gtmid
31- / attach header to msg and send it /
- buf msgPush(msg, HDR_LEN)
- chan_hdr_store(hdr, buf, HDR_LEN)
- xPush(xGetDown(self, 0), msg)
- / schedule first timeout event /
- state-gtretries 1
- state-gtevent evSchedule(retransmit, self,
state-gttimeout) - / wait for the reply msg /
- semWait(state-gtreply_sem)
- / clean up state and return /
- flush_msg(state-gtrequest)
- state-gtstatus IDLE
- return state-gtret_val
32- retransmit(Event ev, int arg)
- Sessn s (Sessn)arg
- ChanState state (ChanState )s-gtstate
- Msg tmp
- / see if event was cancelled /
- if ( evIsCancelled(ev) ) return
- / unblock client if we've retried 4 times /
- if (state-gtretries gt 4)
- state-gtret_val XK_FAILURE
- semSignal(state-gtrep_sem)
- return
-
- / retransmit request message /
- msgConstructCopy(tmp, state-gtrequest)
- xPush(xGetDown(s, 0), tmp)
33- chanPop(Sessn self, Sessn lls, Msg msg, void
inHdr) -
- / see if this is a CLIENT or SERVER session
/ - if (self-gtstate-gttype SERVER)
- return(chanServerPop(self, lls, msg,
inHdr)) - else
- return(chanClientPop(self, lls, msg,
inHdr))
34- chanClientPop(Sessn self, Sessn lls, Msg msg,
void inHdr) -
- ChanState state (ChanState
)self-gtstate - ChanHdr hdr (ChanHdr )inHdr
- / verify correctness of msg header /
- if (!clnt_msg_ok(state, hdr))
- return XK_FAILURE
- / cancel retransmit timeout event /
- evCancel(state-gtevent)
- / if ACK, then schedule PROBE and exit/
- if (hdr-gtType ACK)
-
- state-gtevent evSchedule(probe, s, PROBE)
- return XK_SUCCESS
-
35Dispatcher (SELECT)
- Dispatch to appropriate procedure
- Synchronous counterpart to UDP
- Address Space for Procedures
- flat unique id for each possible procedure
- hierarchical program procedure number
36Example Code
- Client side
- static XkReturn
- selectCall(Sessn self, Msg req, Msg rep)
-
- SelectState state(SelectState )self-gtstate
- char buf
- buf msgPush(req, HLEN)
- select_hdr_store(state-gthdr, buf, HLEN)
- return xCall(xGetDown(self, 0), req, rep)
-
- Server side
- static XkReturn
- selectCallPop(Sessn s, Sessn lls, Msg req, Msg
rep, void inHdr) -
- return xCallDemux(xGetUp(s), s, req, rep)
37Simple RPC Stack
38VCHAN A Virtual Protocol
- static XkReturn
- vchanCall(Sessn s, Msg req, Msg rep)
-
- Sessn chan
- XkReturn result
- VchanState state(VchanState )s-gtstate
- / wait for an idle channel /
- semWait(state-gtavailable)
- chan state-gtstack--state-gttos
- / use the channel /
- result xCall(chan, req, rep)
- / free the channel /
- state-gtstackstate-gttos chan
- semSignal(state-gtavailable)
- return result
39SunRPC
- IP implements BLAST-equivalent
- except no selective retransmit
- SunRPC implements CHAN-equivalent
- except not at-most-once
- UDP SunRPC implement SELECT-equivalent
- UDP dispatches to program (ports bound to
programs) - SunRPC dispatches to procedure within program
40SunRPC Header Format
- XID (transaction id) is similar to CHANs MID
- Server does not remember last XID it serviced
- Problem if client retransmits request while reply
is in transit
41Presentation Formatting
- Marshalling (encoding) application data into
messages - Unmarshalling (decoding) messages into
application data - Data types we consider
- integers
- floats
- strings
- arrays
- structs
- Types of data we do not consider
- images
- video
- multimedia documents
42Difficulties
- Representation of base types
- floating point IEEE 754 versus non-standard
- integer big-endian versus little-endian (e.g.,
34,677,374) - Compiler layout of structures
43Taxonomy
- Data types
- base types (e.g., ints, floats) must convert
- flat types (e.g., structures, arrays) must pack
- complex types (e.g., pointers) must linearize
- Conversion Strategy
- canonical intermediate form
- receiver-makes-right (an N x N solution)
Application data structure
Marshaller
44Taxonomy (cont)
- Tagged versus untagged data
- Stubs
- compiled
- interpreted
45eXternal Data Representation (XDR)
- Defined by Sun for use with SunRPC
- C type system (without function pointers)
- Canonical intermediate form
- Untagged (except array length)
- Compiled stubs
46- define MAXNAME 256
- define MAXLIST 100
- struct item
- int count
- char nameMAXNAME
- int listMAXLIST
-
- bool_t
- xdr_item(XDR xdrs, struct item ptr)
-
- return(xdr_int(xdrs, ptr-gtcount)
- xdr_string(xdrs, ptr-gtname, MAXNAME)
- xdr_array(xdrs, ptr-gtlist, ptr-gtcount,
- MAXLIST, sizeof(int), xdr_int))
47Abstract Syntax Notation One (ASN-1)
- An ISO standard
- Essentially the C type system
- Canonical intermediate form
- Tagged
- Compiled or interpretted stubs
- BER Basic Encoding Rules
- (tag, length, value)
48Network Data Representation (NDR)
- Defined by DCE
- Essentially the C type system
- Receiver-makes-right (architecture tag)
- Individual data items untagged
- Compiled stubs from IDL
- 4-byte architecture tag
- IntegerRep
- 0 big-endian
- 1 little-endian
- CharRep
- 0 ASCII
- 1 EBCDIC
- FloatRep
- 0 IEEE 754
- 1 VAX
- 2 Cray
- 3 IBM