Clusters Networks II: Protection and Performance - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Clusters Networks II: Protection and Performance

Description:

... Project: Intel ... OS: Windows NT and Linux (goal: widespread use) NIC. 1280Mbps. P6 bus. PCI ... from: http://www.globus.org/documentation/papers.html ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 34
Provided by: Andre524
Category:

less

Transcript and Presenter's Notes

Title: Clusters Networks II: Protection and Performance


1
Clusters Networks II Protection and Performance
  • Andrew Chien
  • High Performance Distributed Computing (CSE225)
  • January 14, 1999

2
Announcements/Review
  • Class on Tuesday January 19th is cancelled.
  • Catch up on reading -)
  • Next class, Thursday, January 21
  • Last Time
  • Efficient aggregation in Clusters
  • High Performance Communication
  • Coordinated Scheduling (coarse and fine grained)
  • Uniform Resource Access
  • High Performance Communication

3
Todays Outline
  • Multi-process Protection in High Performance
    Networks
  • U-Net User-level Network Interface
  • Virtual Interface Architecture
  • Delivering Performance to Applications
  • FM 2.0 API and Layer Composition

4
U-Net Multiprocess Protection
Application
Kernel
Network
NIC
  • Traditional networking view OS mediated
    communication
  • Problem System call overhead limits performance
  • 10 - 50 ms fast trap in current day systems
  • 100s of ms in some cases
  • 50 ms gt what kind of bandwidth limits in a Gbps
    network?
  • 100 bytes gt 20mbps 250 bytes gt ?? 500 bytes
    100Mbps
  • 1KB gt 200Mbps 2KB gt ?? Full Bandwidth gt
    5KB or 6KB

5
U-Net OS Bypass and Virtualization
Application
NIC
Application
NIC
Network
Application
NIC
Application
NIC
Kernel
NIC
  • Each process gets a virtual network interface
    (memory mapped protection)
  • Runs protocols, buffer management, etc. in user
    space
  • Whats hard about this?

6
Providing Network Protection
  • How to avoid interference, preserve network data
    integrity, avoid spoofing?
  • Traditional model depends on Kernel to
    authenticate and route each packet
  • Idea Division of effort
  • Kernel sets up routes and connections between
    virtual interfaces
  • Packet tagging (done by the interfaces) and
    mux/demultiplexing ensure that users can only
  • send packets to authorized connections
  • receive packets from the places authorized to do
    so

7
U-Net Endpoints (virtual interfaces)
Endpoint
  • Each endpoint is a virtualized NI buffer pool
  • Connected by the kernel agent to another
    endpoint (bidirectional connections)
  • Communication segments are pinned DMA regions,
    buffer pool management done by network interface
  • Notification done by polling or event-driven
    (upcall)

8
Network Communication
Application
Application
Kernel
Kernel
  • Applications communication through endpoints
  • Kernel operations are NOT along the data movement
    path

9
U-Net Performance
  • Raw U-Net
  • Benefits of OS bypass
  • 65 ms RT latency, 120 ms for 32 bytes and then
    amortize to link speed
  • gt reduced overhead to 30 ms

10
U-Net vs. Fore Firmware
  • Lesson commercial products are not always
    well-designed / mature
  • Snapshot of state of art, and often very
    constrained by other circumstance

11
U-Net w/ Active Messages and IP Protocols
  • Overheads for IP significantly higher (2 - 2.5x)
  • Reduces deliverable bandwidth fraction for short
    messages
  • Peak Bandwidth achievable (for this network)

12
U-Net Summary
  • Demonstrated partition of kernel managed
    connection setup
  • User-level communication
  • User-level buffer management and protocols
  • Demonstrated reasonable performance

13
Virtual Interface Architecture
  • VIA Project Intel/Compaq/Microsoft
  • catalyze and industry standard for high
    performance cluster communication
  • capitalize on the technical advances in academic
    research to reduce communication overhead and
    deliver the performance
  • started Dec 1996, standard in Dec 1997
  • technical work paper design, emulator, lots of
    political wrangling amongst the companies
    (billions at stake)
  • designed to provide user-level communication to
    multiple operating systems -- WinNT, Novell
    Netware, Unix
  • designed to provide this user-level interface
    independent of the underlying interconnection
    fabric (ATM, GigE, Myrinet, Giganet, etc.)

14
VIA Basic Ideas
  • Endpoints a la U-Net
  • hardware supported NIC virtualization
  • send, receive buffer pools (registered memory
    with interface)
  • doorbells for notification between host and the
    NIC
  • polling and interrupt based notification (user
    selectable)
  • Network Reliability Attributes (failure
    semantics)
  • Read and Write RDMA Operations (and ordering)
  • Memory protection attributes
  • Group notification (shared completion queues)

15
  • VIA VIPL Overview Slides

16
Delivering Gigabit Performance FM 2.0
17
FM 1.x Evaluation
  • MPI on FM (Fall 1995)
  • BSD Sockets on FM (December 1995)

18
MPI-FM Efficiency
  • Problems excessive copies (pacing API), hard
    to program (interleaving)

19
MPI-FM Performance (initial)
  • Problems FM 1.x (and AM) API is a poor design
    for composition
  • How can we design an API that makes it easy to
    deliver performance?
  • Key issues
  • Eliminate copying for header attach/remove
    (Gather-scatter)
  • Eliminate copying from network overrun (Receiver
    flow control)
  • Ease programming effort for interleaved PTUs
    (Handler multithreading)
  • All needed to deliver performance to the
    application layers
  • Partial changes enabled
  • MPI on FM 1.1 (19ms, 17.5MB/s) JPDC97
  • Sockets on FM 1.1 (35ms, 17.5MB/s)

20
Illinois Fast Messages 2.x
  • Gather-scatter interface enables efficient
    layering, data movement without copies
    (packetization invisible)
  • Multithreading provides sequential view of
    message reception (packetization and interleaving
    invisible)
  • Bonuses Multiprocess, dynamic network namespaces

21
Receiver Flow Control
  • Receiver determines data pacing from network
    subsystem
  • Lower-levels provide communication/computation
    overlap
  • Provides a simple composition model (examples)
  • Leverages reliable delivery and flow control at
    the lower level

22
FM 2.x API
  • Sending
  • FM_begin_message(NodeID,Handler,size)
  • FM_send_piece(stream,buffer,size) // gather
  • FM_end_message()
  • Receiving
  • FM_receive(buffer,size) // scatter
  • FM_extract(total_bytes) // rcvr flow control
  • Implementation
  • C parser rewrites code
  • Logical thread for each message receive
  • OS thread safe

23
Send Example (List Send)
extern FM_handler myhandler void
sendlist(unsigned int dest, Node
nodep, unsigned int elts)
FM_stream mystream unsigned int databytes
eltssizeof(int) while (!(mystreamFM_begin_me
ssage(dest,databytes,
myhandler))) while (nodep)
FM_send_piece(mystream,nodep-gtdata,sizeof(int))
nodep nodep-gtnext FM_end_message(mystr
eam)
24
Handler Example (MPI)
pragma FM_declare_handler int myhandler(FM_stream
str, unsigned int sender) struct header
myheader int msglen FM_receive(myheader,st
r,sizeof(struct header)) msglen
myheader.length if (myheader.littlemsg)
FM_receive(littlebuf,str,msglen) else
FM_receive(findbigbuffer(msglen),str,msglen)
return FM_CONTINUE
25
Platform Upgrade PCs
Pentium Pro/II.
NIC
1280Mbps
Sparc -gt x86
2x
2x
PCI
P6 bus
  • PCs exploit cost advantages, eliminate PIO
    problem (graphics driven)
  • Faster network cards and links (2x)
  • OS Windows NT and Linux (goal widespread use)

26
FM 2.x Performance
  • Latency 11ms, BW 77MB/s, N1/2 lt200 bytes
  • Fast in absolute terms (compares to MPPs,
    internal memory BW)
  • Delivers a large fraction of hardware performance
    for short messages
  • Performance bottleneck has moved inside the
    system!

27
Performance Implications
  • Typical packet distributions
  • 80-90 of packets lt 200 bytes
  • gt FM2.x delivers 40MB/s 320Mbps _at_ 256 bytes
  • a Fast UDP delivers 2MB/s _at_ 256 bytes
  • 20x superior bandwidth/overhead
  • gt Of course, these are not directly comparable.

28
FM 2.x Evaluation (MPI)
  • MPI-FM 70MB/s, 17ms latency, 5.1ms overhead
  • Peak BW IBM SP2, Short messages much better
  • Latency SGI O2K
  • FM 77MB/s, 11ms latency, 4.1ms overhead

29
FM2.x Evaluation (MPI) cont.
  • High Transfer Efficiency, approaches 100
  • Other systems much lower even at 1KB (100Mbit
    40, 1Gbit 5)

30
FM 2.0 Summary
  • APIs and Guarantees matter for delivering
    performance
  • Layer composition is a critical issue in software
    communication architectures
  • What are the equivalent concepts for other types
    of Grid performance? (usable computation,
    memory, etc?)
  • What are the right metrics to drive this? N1/2
    for parallelism?

31
Overall Summary
  • User-level network interfaces
  • Separation of connection setup
  • User protocol processing and buffer management
  • Embodied in U-Net and VIA (and FM)
  • VIA fault tolerance and RDMA operations
  • Delivering communication performance
  • Depends on APIs and guarantees
  • Usable performance is critical question
  • Generalizations to Grid resource abstractions?

32
Next Time (January 21st)
  • Reading Assignments
  • Grid Book, Chapters 11 (Globus Toolkit) and 9.4 -
    9.6 (Legion)
  • Globus High Level Vision
  • The Globus Project A Status Report. I. Foster,
    C. Kesselman, Proc. IPPS/SPDP '98 Heterogeneous
    Computing Workshop, pg. 4-18, 1998.
  • Globus A Metacomputing Infrastructure Toolkit.
    I. Foster, C. Kesselman, Intl J. Supercomputer
    Applications, 11(2)115-128, 1997.
  • Globus Papers available from http//www.globus.or
    g/documentation/papers.html

33
Further reading (will be assigned next)
  • A Directory Service for Configuring
    High-Performance Distributed Computations. S.
    Fitzgerald, I. Foster, C. Kesselman, G. von
    Laszewski, W. Smith, S. Tuecke. Proc. 6thIEEE
    Symp. on High-Performance Distributed Computing,
    pg. 365-375, 1997.
  • Usage of LDAP in Globus. I. Foster, G. von
    Laszewski.
  • A Fault Detection Service for Wide Area
    Distributed Computations. P. Stelling, I.
    Foster, C. Kesselman, C.Lee, G. von Laszewski,
    Proc. 7th IEEE Symp. on High Performance
    Distributed Computing, 1998.
Write a Comment
User Comments (0)
About PowerShow.com