Fast Communication - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Fast Communication

Description:

Fast Communication Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy Why Remote Procedure Call? Simplify building distributed systems and ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 43
Provided by: Jeff54
Category:

less

Transcript and Presenter's Notes

Title: Fast Communication


1
Fast Communication
  • Firefly RPC
  • Lightweight RPC
  • CS 614
  • Tuesday March 13, 2001
  • Jeff Hoy

2
Why Remote Procedure Call?
  • Simplify building distributed systems and
    applications
  • Looks like local procedure call
  • Transparent to user
  • Balance between semantics and efficiency
  • Universal programming tool
  • Secure inter-process communication

3
RPC Model
Client Application
Server Application
Return
Client Stub
Server Stub
Client Runtime
Server Runtime
Network
Call
4
RPC In Modern Computing
  • CORBA and Internet Inter-ORB Protocol (IIOP)
  • Each CORBA server object exposes a set of methods
  • DCOM and Object RPC
  • Built on top of RPC
  • Java and Java Remote Method Protocol (JRMP)
  • Interface exposes a set of methods
  • XML-RPC, SOAP
  • RPC over HTTP and XML

5
Goals
  • Firefly RPC
  • Inter-machine Communication
  • Maintain Security and Functionality
  • Speed
  • Lightweight RPC
  • Intra-machine Communication
  • Maintain Security and Functionality
  • Speed

6
Firefly RPC
  • Hardware
  • DEC Firefly multiprocessor
  • 1 to 5 MicroVAX CPUs per node
  • Concurrency considerations
  • 10 megabit Ethernet
  • Takes advantage of 5 CPUs

7
Fast Path in a RPC
  • Transport Mechanisms
  • IP / UDP
  • DECNet byte stream
  • Shared Memory (intra-machine only)
  • Determined at bind time
  • Inside transport procedures Starter,
    Transporter, Ender, and Receiver for the
    server

8
Caller Stub
  • Gets control from calling program
  • Calls Starter for packet buffer
  • Copies arguments into the buffer
  • Calls Transporter and waits for reply
  • Copies result data onto callers result variables
  • Calls Ender and frees result packet

9
Server Stub
  • Receives incoming packet
  • Copies data into stack, a new data block, or left
    in the packet
  • Calls server procedure
  • Copies result into the call packet and transmit

10
Transport Mechanism
  • Transporter procedure
  • Completes RPC header
  • Calls Sender to complete UDP, IP, and Ethernet
    headers (Ethernet is the chosen means of
    communication)
  • Invoke Ethernet driver via kernel trap and queue
    the packet

11
Transport Mechanism
  • Receiver procedure
  • Server thread awakens in Receiver
  • Receiver calls the stub interface included in
    the received packet, and the interface stub calls
    the procedure stub
  • Reply is similar

12
Threading
  • Client Application creates RPC thread
  • Server Application creates call thread
  • Threads operate in server applications address
    space
  • No need to spawn entire process
  • Threads need to consider locking resources

13
Threading
14
Performance Enchancements
  • Over traditional RPC
  • Stubs marshal arguments rather than library
    functions handling arguments
  • RPC procedures called through procedure variables
    rather than by lookup table
  • Server retains call packet for results
  • Buffers reside in shared memory
  • Sacrifices abstract structure

15
Performance Analysis
  • Null() Procedure
  • No arguments or return value
  • Measures base latency of RPC mechanism
  • Multi-threaded caller and server

16
Time for 10,000 RPCs
  • Base latency 2.66ms
  • MaxResult latency (1500 bytes) 6.35ms

17
Send and Receive Latency
18
Send and Receive Latency
  • With larger packets, transmission time dominates
  • Overhead becomes less of an issue
  • Good for Firefly RPC, assuming large transmission
    over network
  • Is overhead acceptable for intra-machine
    communication?

19
Stub Latency
  • Significant overhead for small packets

20
Fewer Processors
  • Seconds for 1,000 Null() calls

21
Fewer Processors
  • Why the slowdown with one processor?
  • Fast path can be followed only in multiprocessor
    environment
  • Lock conflicts, scheduling problems
  • Why little speedup past two processors?

22
Future Improvements
  • Hardware
  • Faster network will help larger packets
  • Triple CPU speed will reduce Null() time by 52
    and MaxResult by 36
  • Software
  • Omit IP and UDP headers for Ethernet datagrams,
    24 gain
  • Redesign RPC protocol 5 gain
  • Busy thread wait, 1015 gain
  • Write more in assembler, 510 gain

23
Other Improvements
  • Firefly RPC handles intra-machine communication
    through the same mechanisms as inter-machine
    communication
  • Firefly RPC also has very high overhead for small
    packets
  • Does this matter?

24
RPC Size Distribution
  • Majority of RPC transfers under 200 bytes

25
Frequency of Remote Activity
  • Most calls are to the same machine

26
Traditional RPC
  • Most calls are small messages that take place
    between domains of the same machine
  • Traditional RPC contains unnecessary overhead,
    like
  • Scheduling
  • Copying
  • Access validation

27
Lightweight RPC (LRPC)
  • Also written for the DEC Firefly system
  • Mechanism for communication between different
    protection domains on the same system
  • Significant performance improvements over
    traditional RPC

28
Overhead Analysis
  • Theoretical minimum to invoke Null() across
    domains kernal trap context change to call and
    a trap context change to return
  • Theoretical minimum on Firefly RPC 109 us.
  • Actual cost 464us

29
Sources of Overhead
  • 355us added
  • Stub overhead
  • Message buffer overhead
  • Not so much in Firefly RPC
  • Message transfer and flow control
  • Scheduling and abstract threads
  • Context Switch

30
Implementation of LRPC
  • Similar to RPC
  • Call to server is done through kernel trap
  • Kernel validates the caller
  • Servers export interfaces
  • Clients bind to server interfaces before making a
    call

31
Binding
  • Servers export interfaces through a clerk
  • The clerk registers the interface
  • Clients bind to the interface through a call to
    the kernel
  • Server replies with an entry address and size of
    its A-stack
  • Client gets a Binding Object from the kernel

32
Calling
  • Each procedure is represented by a stub
  • Client makes a call through the stub
  • Manages A-stacks
  • Traps to the kernel
  • Kernel switches context to the server
  • Server returns by its own stub
  • No verification needed

33
Stub Generation
  • Procedure representation
  • Call stub for client
  • Entry stub for server
  • LRPC merges protocol layers
  • Stub generator creates run-time stubs in assembly
    language
  • Portability sacrificed for Performance
  • Falls back on Modula2 for complex calls

34
Multiple Processors
  • LRPC caches domains on idle processors
  • Kernel checks for an idling processor in the
    server domain
  • If a processor is found, caller thread can
    execute on the idle processor without switching
    context

35
Argument Copying
  • Traditional RPC copies arguments four times for
    intra-machine calls
  • Client stub to RPC message to kernels message to
    servers message to servers stack
  • In many cases, LRPC needs to copy the arguments
    only once
  • Client stub to A-stack

36
Performance Analysis
  • LRPC is roughly three times faster than
    traditional RPC
  • Null() LRPC cost 157us, close to the 109us
    theoretical minimum
  • Additional overhead from stub generation and
    kernel execution

37
Single-Processor Null() LRPC
38
Performance Comparison
  • LRPC versus traditional RPC (in us)

39
Multiprocessor Speedup
40
Inter-machine Communication
  • LRPC is best for messages between domains on the
    on the same machine
  • The first instruction of the LRPC stub checks if
    the call is cross-machine
  • If so, stub branches to conventional RPC
  • Larger messages are handled well, LRPC scales by
    packet size linearly like traditional RPC

41
Cost
  • LRPC avoids needless scheduling, copying, and
    locking by integrating the client, kernel,
    server, and message protocols
  • Abstraction is sacrificed for functionality
  • RPC is built into operating systems (Linux DCE
    RPC, MS RPC)

42
Conclusion
  • Firefly RPC is fast compared to most RPC
    implementations. LRPC is even faster. Are they
    fast enough?
  • The performance of Firefly RPC is now good
    enough that programmers accept it as the standard
    way to communicate (1990)
  • Is speed still an issue?
Write a Comment
User Comments (0)
About PowerShow.com