CS556: Distributed Systems - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

CS556: Distributed Systems

Description:

Need to maintain history. Last reply sent to each client ... Need RPC specification file (square.x): defines procedure name, arguments & results ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 47
Provided by: mar177
Category:

less

Transcript and Presenter's Notes

Title: CS556: Distributed Systems


1
CS-556 Distributed Systems
Inter-process Communication (III)
  • Manolis Marazakis
  • maraz_at_csd.uoc.gr

2
Berkeley Sockets (I)
  • Socket primitives for TCP/IP.

3
Berkeley Sockets (II)
  • Connection-oriented communication pattern using
    sockets.

4
Connected vs Connectionless (I)
  • IP ? best-effort, unreliable, connectionless
  • Remembers nothing about a packet after it has
    sent it
  • Checksum computed on header only
  • No assumptions about the underlying physical
    medium
  • Serial link, Ethernet, Token ring, X.25, ATM,
    wireless CDPD,
  • UDP
  • (optional) checksum
  • notion of port

5
Connected vs Connectionless (II)
  • TCP ? reliable connection-oriented service
  • Segments are sent in IP datagrams
  • Checksum of data in each segment
  • Sequence of the 1st byte in the segment
  • Acknowledge-and-retransmit mechanism
  • Each side maintains a receive window
  • Range of sequence that this side is prepared to
    receive
  • Any arriving data with sequence outsiode the
    receive window is discarded
  • Queuing of data arriving out-of-order
  • Window slides to the right, if the next expected
    sequence has arrived
  • and an ACK is sent back with the sequence
    expected next
  • Send window
  • Bytes sent but not yet acknowledged
  • RTO timer (retransnmission timeout)
  • Timeout does not always mean that the data was
    lost !!
  • Bytes that can be sent but have not yet been sent

6
UDP Failure Model
  • Omission failures
  • timeouts
  • duplicate messages
  • lost messages
  • Need to maintain history
  • Last reply sent to each client
  • provided that a client can make only one request
    at a time
  • interprets each request as the ACK for the
    previous reply
  • periodic purge of history
  • No ACK for the last response received before
    client terminates
  • Fixed max. buffer size (8 KB)
  • No message order guarantee
  • Process crash failures

7
TCP Failure Model
  • Reliable message delivery
  • checksums, sequence numbers, timeouts
  • no need for applications to deal with
  • retransmissions
  • duplicates
  • reordering
  • no need for histories
  • Flow control mechanism
  • large transfers without overwhelming the receiver
  • BUT not reliable sessions
  • Connections may be severed or severely congested
  • Processes cannot distinguish network from process
    failure
  • Processes cannot tell if their recent messages
    were received

8
TCP is a stream protocol
  • No inherent notion of message boundary
  • The amount of data in a packet is not directly
    related to the amount of data delivered to TCP in
    the send() call
  • No reliable for the receiver to determine how the
    data was packetized
  • Several packets may have arrived between recv()
    calls
  • The amount of data returned in any given read()
    is unpredictable
  • Fixed-length messages
  • Variable-length messages
  • End-of-record marker
  • Fixed-length header (including record length)
    variable data

9
TCP Failure Modes (I)
  • TCP guarantees delivery of the data it sends
  • True or False ?

False How can we handle outages crashes ?
Guarantee to whom ?
10
TCP Failure Modes (II)
  • IP is a best-effort, unreliable protocol
  • so the TCP layer is the first place in the data
    path where it makes senses to even talk about
    guarantees
  • The senders TCP layer can make no guarantee
    about segments that arrive at the receivers TCP
    layer
  • An arriving segment may be corrupted, or it may
    contain duplicate data, or it may be out of order
  • The receivers TCP layer guarantees to the
    senders TCP layer that any segment that it ACKs
    all data that came before it have been
    correctly received
  • This does not mean that the data has been
    delivered to the application ot that it will
    ever be delivered !!
  • For example, the receiving host may crash after
    the ACK but before delivery

11
TCP Failure Modes (III)
  • It also makes sense to talk about guarantees at
    application B (receiver)
  • There can be no guarantee that all data sent by
    application A will arrive
  • However, all data that does arrive will be in
    order and uncorrupted

Avoid the attitude that TCP will take care of
everything
TCP is an end-to-end protocol, providing a
reliable transport mechanism between peers
The peers are the TCP layers of the sender
the receiver !!
12
TCP Failure Modes (IV)
  • Explicit acknowledgements
  • What does the client do if the server does not
    ACK receipt ??
  • It may not be safe to simply resend a request

When a problem occurs at an endpoint, there is
generally no alternative path ? The problem
persists until it is repaired
An intermediate router may send the originator an
ICMP message indicating that the destination
network or the host is unreachable
OR The sender eventually times-out resends the
segments not ACKed. This continues until the
sender gives up drops the connection (9
minutes). Pending read ? ETIMEDOUT Otherwise, the
next write fails ? SIGPIPE or EPIPE
13
TCP Failure Modes (V)
  • Peer crash
  • Indistinguishable from the case of the peer
    calling close() and then exit()
  • The peers TCP layer issues a FIN segment
  • This does not necessarily imply that the peer has
    no more data to send, or even that it is not
    willing to receive more data
  • Reception of the FIN may come at different
    execution states of the application
  • If client is blocked, TCP has no way of notifying
    it
  • The next transmission generates a RST segment ?
    ECONNRESET
  • If the RST is ignored more data is transmitted
    ? SIGIPE
  • This may occur if the client performs gt2
    consecutive write() calls without an intervening
    read() ? Notification takes place only after the
    2nd write()
  • If client has a pending read(), it gets an
    immediate error indication (eg read() returns
    EOF)

14
TCP Failure Modes (VI)
  • Peers host crash
  • The peers TCP cannot issue the FIN segment
  • Until recovery, this case cannot be distinguished
    from a network outage
  • The peers TCP no longer responds, but the sender
    keeps retransmitting
  • Until either the host recovers, or the sender
    gives up the connection ? ETIMEDOUT
  • If the host reboots before the sender gives up, a
    retransmitted segment may arrive at the TCP layer
    without it having knowledge of the connection ?
    RST
  • If sender has a read() pending ? ECONNRESET
  • Else, the next write() results in a SIGPIPE signal

15
Behavior of Peers
  • Checking for client termination
  • Heartbeats, timeouts for read operations,
    SO_KEEPALIVE option,
  • Checking for valid input
  • Buffer overflow errors

16
We rely on DNS
17
The Message-Passing Interface
  • Some of the most intuitive primitives of MPI.

18
Group Communication
  • Multicasting 1-to-many comm. pattern
  • Applications
  • replicated services (better fault tolerance)
  • discovery of services
  • replicated data (better performance)
  • propagation of event notifications
  • Failure model
  • depends on implementation
  • IP multicast (UDP datagrams) omission failures
  • class-D Inet addresses 1110 bit prefix
  • TTL
  • reliable multicast
  • ordered multicast
  • FIFO
  • Causal
  • Total

19
Conventional Procedure Call
  • Parameter passing in a local procedure call the
    stack before the call to read
  • The stack while the called procedure is active

20
Software layers
RPC is more than a (transport) protocol a
structuring mechanism for distributed systems
21
Steps of a Remote Procedure Call
  • Client procedure calls client stub in normal way
  • Client stub builds message, calls local OS
  • Client's OS sends message to remote OS
  • Remote OS gives message to server stub
  • Server stub unpacks parameters, calls server
  • Server does work, returns result to the stub
  • Server stub packs it in message, calls local OS
  • Server's OS sends message to client's OS
  • Client's OS gives message to client stub
  • Stub unpacks result, returns to client

22
Client and Server Stubs
  • Principle of RPC between a client server
    program.

23
Example (Sun RPC - ONC)
  • long square(long) example
  • client ren.eecis.udel.edu 11
  • result 121
  • Need RPC specification file (square.x) defines
    procedure name, arguments results
  • Run rpcgen square.x generates square.h,
    square_clnt.c, square_xdr.c, square_svc.c
  • square_clnt.c square_svc.c Stub routines for
    client server
  • square_xdr.c XDR (External Data Representation)
    code - takes care of data type conversions

24
RPC Specification File (square.x)
struct square_in long arg1 struct
square_out long res1 program SQUARE_PROG
version SQUARE_VERS square_out
SQUAREPROC(square_in) 1 // procedure
1 // version 0x321230000 // program
IDL Interface Definition Language
25
Parameter Specification Stub Generation
procedure
Corresponding message
26
Writing a Client a Server
  • The steps in writing a client a server in DCE
    RPC.

27
Binding (SUN RPC)
  • Port Mapper (rpcbind) listens at UDP port 111
  • Server registers program ID version
  • rpcinfo -p -gt display all registered RPC servers
  • When client issues clnt_create, the port mapper
    is contacted
  • program-to-port number mapping
  • arguments (program ID, version, protocol)
  • response servers port number

28
Binding (DCE)
29
Passing Value Parameters (I)
30
Passing Value Parameters (II)
  • a. Original message on Pentium (little-endian)
  • b. The message after receipt on SPARC
    (big-endian)
  • c. The message after being inverted.

31
Passing Value Parameters (III)
  • How to pass pointers ?
  • Meaningful only within a specific address space !
  • Arrays (of known length) structures
  • Copy/restore semantics (bet. stubs)
  • IN/OUT/INOUT markers
  • Optimization may eliminate one copy operation
  • Pointer to an arbitrary data structure ?
  • No general solution
  • Work-around
  • Pass back the pointer to its source

32
External Data Representation (I)
  • Data structures
  • flattened on transmission
  • rebuilt upon reception
  • Primitive data types
  • byte order (big-endian MSB comes first)
  • ASCII vs UNICODE (2 bytes per character)
  • marshalling/unmarshalling
  • to/from agreed external format

33
External Data Representation (II)
  • XDR (RFC 1832), CDR (CORBA), Java
  • data -gt byte stream
  • object references
  • HTTP/MIME
  • data -gt ASCII text

34
CORBA CDR example
35
Properties of TCP
  • Connected vs Connectionless Protocols
  • TCP is a stream protocol
  • Performance of TCP
  • Avoid re-inventing TCP !!
  • TCP failure modes
  • Behaviour of peers
  • LAN vs WAN testing
  • Tools Resources

36
Basic socket calls
SERVER
CLIENT
37
Performance of TCP (I)
  • 4.4BSD Implementation
  • UDP 800 LOC
  • TCP 4,500 LOC
  • CPU processing checksums, data copying
  • TCP ACKs
  • Receiver can piggyback the ACK
  • Usually every second segment is ACKed
  • .. May even delay ACKs (up to 0.5 sec)
  • Connection setup 3 segments
  • 1 ½ RTT SYN, SYNACK, ACK
  • Connection tear-down 4 segments
  • FIN, ACK, FIN (server-to-client), ACK
  • Except the last segment, these can be combined
    with data-bearing segments

38
Performance of TCP (II)
  • Results from a benchmark involving transmission
    of 5,000 data blocks
  • UDP datagram sizeTCP write size1,440 bytes
  • Ethernet frame 1,500 bytes
  • IP header 20 bytes, TCP header 20 bytes
  • TCP options 12 bytes
  • Average over 50 runs
  • Client produces data blocks, transmits them, and
    then exits
  • Server may run on
  • localhost (127.0.0.1)
  • Same host as the client, but given as an address
  • Other host

39
Performance of TCP (III)
Localhost (loop-back) MTU16,384
Client (network I/f) MTU1,500
40
Performance of TCP (IV)
Results for write size300 bytes
41
Avoid re-inventing TCP !!
  • Retransmissions ?
  • RTO
  • Must be adjustable
  • Exponential back-off
  • Flow control
  • Sliding window
  • Congestion control
  • Matching replies to requests ?
  • Sequence for each request
  • Efficiency of the implementation ?
  • TCP code base is highly optimized
  • and runs in kernel-space

42
LAN vs WAN testing
  • Performance on the WAN may not be satisfactory,
    due to the extra latency
  • may have to reconsider the design
  • Incorrect code is more likely to be triggered on
    the WAN
  • assumptions on volume/rate of arriving data

43
HTTP
  • Methods
  • GET, HEAD, POST
  • PUT, DELETE, TRACE, OPTIONS
  • Resource MIME-encoded data
  • Content negotiation
  • Authentication

44
Tools (I)
  • ping
  • IP header ICMP echo request/reply
  • tcpdump
  • Network analyzer sniffer
  • traceroute
  • Determine the network path by forcing each
    intermediate router to send an ICMP error message
    to the originator
  • Send a UDP datagram with TTL1 - so that the 1st
    router in the path will discard it !
  • Send a 2nd UDP datagram with TTL2 so that the
    2nd router in the path will discard it !
  • At the last hop, TTL1 an attempt is made to
    deliver the datagram (generating the ICMP error
    message port unreachable)

45
Tools (II)
  • ttcp
  • Benchmarking tool, with many- parameters
  • UDP or TCP transfers, buffers, size of
    read/writes
  • lsof
  • Determine which process has a file descriptor
    open (file or socket)
  • lsof i TCP6000
  • lsof i _at_remotehost.xdomain.net
  • netstat
  • Active sockets netstat af inet
  • Interfaces netstat i
  • Routing table netstat -rn
  • Protocol statistics netstat sp tcp
  • System call tracers strace, truss, ktrace

46
Resources
  • Books
  • Richard Stevens
  • TCP/IP illustrated series
  • Protocols, Implementation, T/TCP/HTTP/NNTP/Domain
    Sockets
  • UNIX Network Programming series
  • Networking APIs Sockets, XTI
  • Interprocess Communication
  • J.C. Snader Effective TCP/IP Programming
  • RFCs http//www.rfc-editor.org
Write a Comment
User Comments (0)
About PowerShow.com