Title: Nomad: Migrating OSbypass Networks in Virtual Machines
1Nomad Migrating OS-bypass Networks in Virtual
Machines
- Wei Huang, M.Koop, D.K.Panda
- The Ohio State University
- J.Liu, B.Abali
- IBM T.J. Watson Research Center
- In Proc. VEE07
2Outline
- Intro
- Background
- Challenges
- Design
- Implimentation
- Evaluation
- Related Works
- QA
31.Intro
- Intro
- Importance of VMs Performance, Scalability,
Management - Low Latency High BW Network Infiniband,
Myrinet, Quadrics -gt OS bypass, RDMA -gt Suitable
for cluster enviroment - Bad for Migration
- Intelligent NICs manage location dependent
resources - Apps in cluster environment expect reliable
services, need to make migration transparent to
apps.
41.Intro (cont.)
- Our goal
- Address the migration problem of modern OS-bypass
interconnects. - Target cluster environment very tightly coupled
systems with stringent communication performance
requirements - Three parts
- User level communication Library suspend
resume communication, namespace virtualization. - Modifying guest os driver free and reallocate
communication resources, namespace
virtualization. - Framework, coordinator coordinator in privileged
domain and a central server.
Xiang Xiaojia
Department of Computer Science Slide 4
52.Background
- OS bypass IO
- Inside OS, impose overhead
- Context switch user kernel
- Extra data copy
- User level communication
- user process exec frequent, time critical
operations, such as IO communication
Xiang Xiaojia
Department of Computer Science Slide 5
62. Background (cont)
- OS bypass IO
- Infiniband Architecture
- QP,CQ
- Buffer key
- Initiating data transmission
- Insert work to QP
- Ringing a doorbell
Xiang Xiaojia
Department of Computer Science Slide 6
72. Background (cont)
- Xen
- Hypervisor lowest level
- Split device driver model
- Frontend guest os
- Backend IDD
- Migration
- Frontend suspend / resume callback
- Backend do jobs
- IP MAC with OS
- IP lost / out of order TCP layer
Xiang Xiaojia
Department of Computer Science Slide 7
82. Background (cont)
- Direct IO
- Backend Module - Proxy
- Create virtual access point
- Coordinating access among VMs
Xiang Xiaojia
Department of Computer Science Slide 8
93. Challenges
Xiang Xiaojia
Department of Computer Science Slide 9
103. Challenges (cont.)
- Location dependent resources
- Opaque handlers -gt HCA resources
- Migration-gtInvalid handlers
- IB port address LID
- LID can be shared by VMs
- Cant be changed in migration
- QPNs, CQNs
Xiang Xiaojia
Department of Computer Science Slide 10
113. Challenges (cont.)
- User level communications
- Cache opaque handlers
- Memory keys, QPNs, anywhere
- RDMA need some handlers be cached in remote peers
- Hard to suspend communication from kernel
- User level direct communication
Xiang Xiaojia
Department of Computer Science Slide 11
123. Challenges (cont.)
- Hardware managed connection states
- Hardware store connection state information
- No os stack processing, good performance
- No easy way to migrate hardware conn states
- The hardware cant recovery dropped packets
during migration. - Dropped or out of order packets may cause fatal
error
Xiang Xiaojia
Department of Computer Science Slide 12
134. Design
Xiang Xiaojia
Department of Computer Science Slide 13
144. Design (cont.)
- Location dependent Resources
- Opaque Handlers
Xiang Xiaojia
Department of Computer Science Slide 14
154. Design (cont.)
- Location dependent Resources
Xiang Xiaojia
Department of Computer Science Slide 15
164. Design (cont.)
- Location dependent Resources
- Memory Keys
Xiang Xiaojia
Department of Computer Science Slide 16
174. Design (cont.)
- User level Communication
- Communication intercepted in Library
- Example
Xiang Xiaojia
Department of Computer Science Slide 17
184. Design (cont.)
- Connection state
- Method bring the connection (QP) states to
deterministic state - 1.Mark all QP suspended -gt no in flight packets
originating from this VM - 2.Sending suspend request to all connected VMs
- 3.
Xiang Xiaojia
Department of Computer Science Slide 18
194. Design (cont.)
- Unreliable Datagram Service (UD)
- Just dealing with address data structure
Xiang Xiaojia
Department of Computer Science Slide 19
205.Implementation
Xiang Xiaojia
Department of Computer Science Slide 20
215.Implementation(cont.)
- Migrating
- Optimization
- QP Active VS. InActive
Xiang Xiaojia
Department of Computer Science Slide 21
226.Evaluation
- Setup
- Bench
- IB verbs layer micro-bench
- HPC-BenchNAS Parallel Bench (NPB) MVAPICH (MPI
IB Implementation)
Xiang Xiaojia
Department of Computer Science Slide 22
236.Evaluation (cont.)
Xiang Xiaojia
Department of Computer Science Slide 23
246.Evaluation (cont.)
Xiang Xiaojia
Department of Computer Science Slide 24
256.Evaluation (cont.)
Xiang Xiaojia
Department of Computer Science Slide 25
266.Evaluation (cont.)
Xiang Xiaojia
Department of Computer Science Slide 26
277.Related Works
- OS bypass
- Active Messages, U-Net, FM, VMMC, Arsenic
- VMM IO
- VMWare Workstation, ESX, Xen
- Migration Ethernet
- Xen carry IP address, Unsolicited ARP
- Process Level Migration
- Zap VNAT
- Mobile IP
Xiang Xiaojia
Department of Computer Science Slide 27
28QA