The Peregrine HighPerformance RPC System - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

The Peregrine HighPerformance RPC System

Description:

In Peregrine, the kernel is responsible for: ... The Peregrine implementation utilizes: ... Peregrine RPC performance for single-packet network RPCs (microseconds) ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 21

Provided by: kdi4

Category:

more less

Transcript and Presenter's Notes

Title: The Peregrine HighPerformance RPC System

1
The PeregrineHigh-Performance RPC System

David B. Johnson and
Willy Zwaenepoel
Department of Computer Science
Rice University
Presented By Khaled Elmeleegy
Assisted by Moez Abdel-gawad

2
Overview

The Peregrine is an RPC system.
It tries to optimize the RPC.
The paper relies on experimental results.

3
Key optimizations

No intermediate copies of arguments or results.
No data conversion between client and server
(unless needed).
Preallocated and precomputed header templates for
transmitted packets.

4
Key optimizations (continued)

No thread-specific state is saved between calls
in the server.
Arguments are mapped into the servers address
space, rather than being copied.
In Multi-packet arguments, most copying overlaps
with transmission of next packet.

5
Implementation
Client
Server

Client Stub
Server Stub
Server thread
remote procedure
Application
Kernel
Kernel
Call Message
Local call Local return
Trap Return
Transmit (gather DMA) Receive
Reinitialize Transmit (DMA)
Jump
Call Trap
Execute Return
Return message
RPC in Peregrine
6
Implementation (Contd)

In Peregrine, the kernel is responsible for
1-Getting RPC messages from one address space to
another. (usually from a machine to another)
2-Reinitializing a free thread in the server when
a call message arrives, that handles the call
including the binding.
3-Unblocking the client thread when the return
message arrives.

7
Implementation (Contd)

Unlike the previous paper
No RPCruntime, instead its the kernels
responsibility to transfer messages reliably.
Pool of threads instead of pool of processes,
which gives better performance.
All processing specific to the particular server
procedure being called is performed in the stubs.

8
Hardware Requirements

The Peregrine implementation utilizes
The ability to re-map memory pages between
address spaces by manipulating the page-table
entries.

9
Hardware Requirements (contd)

The gather DMA capability of the Ethernet
controller.

P1
P1
P2
P2
P3
P3
P4
P5
P2
P9
P1
P6
P7
P8
Network
P9
Clients address space
Servers address space
10
Implementation of the optimizations

Gather DMA is used to send arguments/results,
instead of expensive copying.
No data conversion. (unless needed)
Use of packet header templates.
Received packet is mapped into the threads
stack, to avoid copying.

11
Implementation of the optimizations (contd)
Received call packet in one of the servers
Ethernet receive buffer pages
12
Used optimization techniques (contd)

Server thread doesnt save or restore its
registers in-between different RPCs. (as its a
jump not a call)

13
Multi-Packet Network RPC

For a network RPC message containing the argument
or result values is larger than the data portion
of a single Ethernet RPC packet, the message is
broken into multiple packets.
As in the single-packet case, the data are
transmitted directly from the clients address
space using gather DMA to avoid copying.

14
Multi-Packet Network RPC (contd)

Other than packets transmission, the execution of
a multi-packet network RPC is the same as for the
single-packet case.

15
Multi-Packet Network RPC (contd)
Example multi-packet call transmission and
reception
16
Local RPC

Between two threads executing on the same
machine.
Memory mapping is used to move the call arguments
and results between the clients and servers
address spaces.
The execution is the same as for network RPC.

17
Performance Numbers
Peregrine RPC performance for single-packet
network RPCs (microseconds)
18
Performance Numbers (contd)
Peregrine RPC performance for multi-packet
network RPCs
19
Effectiveness of the Optimizations

Not copying memory for either the arguments or
the results was shown to be very efficient
optimization.
In case of multi-packet RPC, not copying during a
critical path, was efficiently time saving as
well.
And not doing data representation conversion if
not needed was yet another effective optimization.

20
Conclusion

Peregrine, by trying to
Avoid expensive copies.
Expensive data representation conversions.
Recomputation of packet headers.
And reducing overhead for thread management.
Achieves a performance that is very close to the
hardware latency, both for network RPCs, and for
local RPCs.

Write a Comment

User Comments (0)