Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability


1
Virtualizing Modern High-Speed Interconnection
Networks with Performance and Scalability
Bo Li, Zhigang Huo, Panyong Zhang, Dan
Meng leo, zghuo, zhangpanyong, md_at_ncic.ac.cn
Presenter Xiang Zhang zhangxiang_at_ncic.ac.cn
  • Institute of Computing Technology, Chinese
    Academy of Sciences, Beijing, China

2
Introduction
  • Virtualization is now one of the enabling
    technologies of Cloud Computing
  • Many HPC providers now use their systems as
    platforms for cloud/utility computing, these HPC
    on Demand offerings include
  • Penguin's POD
  • IBM's Computing On Demand service
  • R Systems' dedicated hosting service
  • Amazons EC2

3
IntroductionVirtualizing HPC clouds?
  • Pros
  • good manageability
  • proactive fault tolerance
  • performance isolation
  • online system maintenance
  • Cons
  • Performance gap
  • Lack low latency interconnects, which is
    important to tightly-coupled MPI applications
  • VMM-bypass has been proposed to relieve the worry

4
IntroductionVMM-bypass I/O Virtualization
  • Xen split device driver model only used to setup
    necessary user access points
  • data communication in the critical path bypasses
    both the guest OS and the VMM

VMM-Bypass I/O (courtesy 7)
5
IntroductionInfiniBand Overview
  • InfiniBand is a popular high-speed interconnect
  • OS-bypass/RDMA
  • Latency 1us
  • BW 3300MB/s
  • 41.4 of Top500 now uses InfiniBand as the
    primary interconnect

Interconnect Family / Systems June 2010 Source
http//www.top500.org
6
IntroductionInfiniBand Scalability Problem
  • Reliable Connection (RC)
  • Queue Pair (QP), Each QP consists of SQ and RQ
  • QPs require memory
  • Shared Receive Queue (SRQ)
  • eXtensible Reliable Connection (XRC)
  • XRC domain SRQ-based addressing

N node count C cores per node
Conns/Process (N-1)C
Conns/Process (N-1)
RQ
SRQ
7
Problem Statement
  • Does scalability gap exist between native and
    virtualized environments?
  • CV cores per VM

Scalability gap exists!
Transport QPs per Process QPs per Node
Native RC (N-1)C (N-1)C2
Native XRC (N-1) (N-1)C
VM RC (N-1)C (N-1)C2
VM XRC (N-1)(C/CV) (N-1)(C2/CV)
8
Presentation Outline
  • Introduction
  • Problem Statement
  • Proposed Design
  • Evaluation
  • Conclusions and Future Work

9
Proposed DesignVM-proof XRC design
  • Design goal is to eliminate the scalability gap
  • Conns/Process (N-1)(C/CV) ? (N-1)

10
Proposed DesignDesign Challenges
  • VM-proof sharing of XRC domain
  • A single XRC domain must be shared among
    different VMs within a physical node
  • VM-proof connection management
  • With a single XRC connection, P1 is able to send
    data to all the processes in another physical
    node (P5P8), no matter which VMs those processes
    reside in

11
Proposed DesignImplementation
  • VM-proof sharing of XRCD
  • XRCD is shared by opening the same XRCD file
  • guest domains and IDD have dedicated, non-shared
    filesystem
  • pseudo XRCD file and real XRCD file
  • VM-proof CM
  • Traditionally IP/hostname was used to identify a
    node
  • LID of the HCA is used instead

12
Proposed DesignDiscussions
  • safe XRCD sharing
  • unauthorized applications from other VMs may
    share the XRCD
  • the isolation of the sharing of XRCD could be
    guaranteed by the IDD
  • isolation between VMs running different MPI jobs
  • By using different XRCD files, different jobs (or
    VMs) could share different XRCDs and run without
    interfering with each other
  • XRC migration
  • main challenge XRC connection is a
    process-to-node communication channel.
  • Future work

13
Presentation Outline
  • Introduction
  • Problem Statement
  • Proposed Design
  • Evaluation
  • Conclusions and Future Work

14
EvaluationPlatform
  • Cluster Configuration
  • 128-core InfiniBand Cluster
  • Quad Socket, Quad-Core Barcelona 1.9GHz
  • Mellanox DDR ConnectX HCA, 24-port MT47396
    Infiniscale-III switch
  • Implementation
  • Xen 3.4 with Linux 2.6.18.8
  • OpenFabrics Enterprise Edition (OFED) 1.4.2
  • MVAPICH-1.1.0

15
EvaluationMicrobenchmark
Explanation Memory copy operations under
virtualized case would include interactions
between the guest domain and the IDD.
  • The bandwidth results are nearly the same
  • Virtualized IB performs 0.1us worse when using
    blueframe mechanism.
  • memory copy of the sending data to the HCA's
    blueframe page

IB verbs latency using doorbell
MPI latency using blueframe
IB verbs latency using blueframe
16
Evaluation VM-proof XRC Evaluation
  • Configurations
  • Native-XRC Native environment running XRC-based
    MVAPICH.
  • VM-XRC (CVn) VM-based environment running
    unmodified XRC-based MVAPICH. The parameter CV
    denotes the number of cores per VM.
  • VM-proof XRC VM-based environment running
    MVAPICH with our VM-proof XRC design.

17
EvaluationMemory Usage
13GB
  • 16 cores/node cluster fully connected
  • The X-axis denotes the process count
  • 12KB memory for each QP
  • 16x less memory usage
  • 64K processes will consume 13GB/node with the
    VM-XRC (CV1) configuration
  • The VM-proof XRC design reduces the memory usage
    to only 800MB/node

Better
800MB
18
EvaluationMPI Alltoall Evaluation
VM-proof XRC
Better
  • a total of 32 processes
  • 1025 improvement for messages lt 256B

19
Evaluation Application Benchmarks
  • VM-proof XRC performs nearly the same as
    Native-XRC
  • Except BT and EP
  • Both are better than VM-XRC

VM-proof XRC
Better
  • little variation for different CV values
  • Cv8 is an exception
  • Memory allocation not NUMA-aware guaranteed

Better
20
Evaluation Application Benchmarks (Contd)
Benchmark Configuration Comm. Peers Avg. QPs/Process Max QPs/Process Avg. QPs/Node
FT VM-XRC (Cv1) 127 127 127 2032
FT VM-XRC (Cv2) 127 63.4 65 1014
FT VM-XRC (Cv4) 127 31.1 32 498
FT VM-XRC (Cv8) 127 15.1 16 242
FT VM-proof XRC 127 8 8 128
FT Native-XRC 127 7 7 112
IS VM-XRC (Cv1) 127 127 127 2032
IS VM-XRC (Cv2) 127 63.7 65 1019
IS VM-XRC (Cv4) 127 31.7 33 507
IS VM-XRC (Cv8) 127 15.8 18 253
IS VM-proof XRC 127 8.6 12 138
IS Native-XRC 127 7.6 11 122
15.9x less conns
14.7x less conns
21
Conclusion and Future Work
  • VM-proof XRC design converges two technologies
  • VMM-bypass I/O virtualization
  • eXtensible Reliable Connection in modern high
    speed interconnection networks (InfiniBand)
  • the same raw performance and scalability as in
    native non-virtualized environment with our
    VM-proof XRC design
  • 16x scalability improvement is seen in
    16-core/node clusters
  • Future work
  • evaluations on different platforms with increased
    scale
  • add VM migration support to our VM-proof XRC
    design
  • extend our work to the newly SRIOV-enabled
    ConnectX-2 HCAs

22
  • Questions?

leo, zghuo, zhangpanyong, md_at_ncic.ac.cn
23
Backup Slides
24
OS-bypass of InfiniBand
OpenIB Gen2 stack
Write a Comment
User Comments (0)
About PowerShow.com