Title: Chapter 8 System Virtual Machines
1Chapter 8System Virtual Machines
System VMs
- 2005.11.9
- Dong In Shin
- Distributed Computing System Laboratory
- Seoul National Univ.
2Contents
3Performance Enhancement of System Virtual
Machines
4Reasons for Performance Degradation
- Setup
- Emulation
- Some guest instructions need to be emulated
(usually via interpretation) by the VMM. - Interrupt handling
- State saving
- Bookkeeping
- Ex. The accounting of time charged to a user
- Time elongation
5Instruction Emulation Assists
- The VMM emulates the privilege instruction using
a routine whose operation depends on whether the
virtual machine is supposed to be executing in
system mode or in user mode. - Hardware assist for checking the state and
performing the actions.
6Virtual Machine Monitor Assists
- Context switch
- Using hardware to save and restore registers
- Decoding of privileged instructions
- Hardware assists, such as decoding the privileged
instructions. - Virtual interval timer
- Decrementing the virtual counter by some amount
estimated by the VMM from the amount that the
real timer decrements. - Adding to the instruction set
- A number of new instructions that are not a part
of the ISA of the machine.
7Improving Performance of the Guest System
- Non-paged mode
- The guest OS disables dynamic address translation
and defines its real address space to be as large
as the largest virtual address space. ? Page
frames are mapped to fixed real pages. - The guest OS no longer has to exercise demand
paging. - No double paging
- No potential conflict in paging decisions by the
guest OS system and the VMM
8Double Paging
- Two independent layers of paging will interact,
perform poorly.
Guest OS incorrectly believe a page to be in
physical memory ( green/gold pages )
VMM believes an unneeded page is still in use
(teal pages)
Guest evicts a page despite available physical
memory (red pages)
9Pseudo-page-fault handling
- A page fault in a VM system
- A page fault in some VMs page table
- A page fault of VMMs page table
- Pseudo page-fault handling
- Process
- Initialize page-in operation from backing store.
- Triggers guest pseudo page fault.
- Guest OS suspends guests user process.
- VMM does not suspend the guest.
- On completion of page-in operation
- VMM calls guest pseudo page fault handler again
- Guest OS handler wakes up blocked user process.
10The others
- Spool files
- Without any special mechanism, VMM should
intercept the I/O commands and decipher that the
virtual machines are simultaneously attempting to
send a job to the I/O devices . - Handshaking allows the VMM picks up the spool
file and continues to merge this file into its
own buffer. - Inter-virtual-machine communication
- Communication between two physical machines
involves the processing of message packets
through several layers at the sender/receiver
side - This process can be streamlines, simplified, and
made faster if the two machines are virtual
machines on the same host platform.
11Specialized Systems
- Virtual-equals-real (VR) virtual machine
- The host address space representing the guest
real memory is mapped one-to-one to the host real
memory address space. - Shadow-table bypass assist
- The guest page tables can point directly to
physical addresses if the dynamic address
translation hardware is allowed to manipulate the
guest page tables. - Preferred-machine assist
- Allow a guest OS system to operate in system mode
rather than user mode. - Segment sharing
- Sharing the code segments of the operating system
among the virtual machines, provided the
operating system code is written in a reentrance
manner.
12Generalized Support for Virtual Machines
- Interpretive Execution Facility (IEF)
- The processor directly executes most of the
functions of the virtual machine in hardware. - An extreme case of a VM assist.
- Interpretive Execution Entry and Exit
- Entry
- Start Interpretive Execution (SIE) The software
give up control to the hardware IEF part and
processor enters the interpretive execution mode.
- Exit
- Host Interrupt
- Interception
- Unsupported hardware instructions.
- Exception during the execution of interpreted
instruction. - Some special case
13Interpretive Execution Entry and Exit
VMM Software
Entry into interpretive execution mode
Interpretiveexecutionmode
SIE
Emulation
Exit for interception
Host interrupt handler
Exit for host interrupt
14Full-virtualization Versus Para-virtualization
- Full virtualization
- Provide total abstraction of the underlying
physical system and creates a complete virtual
systems in which the guest operating systems can
execute. - No modification is required in the guest OS or
application. - The guest OS or application is not aware of the
virtualized environment. - Advantages
- Streamlining the migration of applications and
workloads between different physical systems. - Complete isolation of different applications,
which make this approach highly secure. - Disadvantages
- Performance penalty
- Microsoft Virtual Server and Vmware ESX Server
15Full-virtualization Versus Para-virtualization
- Para Virtualization
- The virtualization technique that presents a
software interface to virtual machines that is
similar but not identical to that of the
underlying hardware. - This techniques require modifications to the
guest OS that are running on the VMs. - The guest OSs are aware that they are executing
on a VM. - Advantages
- Near-native performance
- Disadvantages
- Some limitations, including several insecurities
such as the guest OS cache data, unauthenticated
connections, and so forth. - Xen system
16Case StudyVmware Virtual Platform
17Vmware Virtual Platform
- A popular virtual machine infrastructure for
IA-32-based PCs and server. - An example of a hosted virtual machine system
- Native virtualization architecture product ?
Vmware ESX Server - This book is limited to the hosted system, Vmware
GSX Server (VMWare2001) - Challenges
- Difficulties to virtualize efficiently based on
IA-32 environment. - The openness of the system architecture.
- Easy Installation.
18Vmwares Hosted Virtual Machine Model
19Processor Virtualization
- Critical Instructions in Intel IA-32 architecture
- not efficiently virtualizable.
- Protection system references
- Reference the storage protection system, memory
system, or address relocation system. (ex. mov
ax, cs ) - Sensitive register instructions
- Read or change resource-related registers and
memory locations (ex. POPF) - Problems
- The sensitive instructions executed in user mode
do not executed as correct as we expected unless
the instruction is emulated. - Solutions
- The VM monitor substitutes the instruction with
another set of instruction and emulates the
action of the original code.
20Input/Output Virtualization
- The PC platform supports many more devices and
types of devices than any other platform. - Emulation in VMMonitor
- Converting the in and out I/O to new I/O
instructions. - Requires some knowledge of the device interfaces.
- New Capability for Devices Through Abstraction
Layer - VMApps ability to insert a layer of abstraction
above the physical device. - Advantages
- Reduce performance losses due to virtualization.
- Ex) Virtual Ethernet switch between a virtual NIC
and a physical NIC.
21Using the Services of the Host Operating System
- The request is converted into a host OS call.
- Advantages
- No limitations for VMMs access of the host OSs
I/O features. - Running the Performance-Critical applications
22Memory Virtualization
- Paging requests of the guest OS
- Not directly intercepted by the VMM, but
converted into disk read/writes. - VMMonitor translates it to requests on the host
OS throught VMApp. - Page replacement policy of host OS
- The host could replace the critical pages of VM
system in the competition with other host
applications. - VMDrivers critical pages pinning for virtual
memory system.
23Vmware ESX Server
- Native VM
- A thin software layer designed to multiplex
hardware resources among virtual machines - Providing higher I/O performance and complete
control over resource management - Full Virtualization
- For servers running multiple instances of
unmodified operating systems
24Page Replacement Issues
- Problem of double paging
- Unintended interactions with native memory
management policies between in guest operating
systems and host system. - Ballooning
- Reclaims the pages considered least valuable by
the operating system running in a virtual
machine. - Small balloon module loaded into the guest OS as
a pseudo-device driver or kernel service. - Module communicates with ESX server via a private
channel.
25Ballooning in VMware ESX Server
- Inflating a balloon
- When the server wants to reclaim memory
- Driver allocate pinned physical pages within the
VM - Increase memory pressure in the guest OS, reclaim
space to satisfy the driver allocation request - Driver communicates the physical page number for
each allocated page to ESX server - Deflating
- Frees up memory for general use within the guest
OS
26Virtualizing I/O Devices on VMware Workstation
- Supported virtual devices of VMware
- PS/2 keyboard, PS/2 mouse, floppy drive, IDE
controllers with ATA disks and ATAPI CD-ROMs, a
Soundblaster 16 sound card, serial and parallel
ports, virtual BusLogic SCSI controllers, AMD
PCNet Ethernet adapters, and an SVGA video
controller. - Procedures
- Intercept I/O operations issued by the guest OS.
( IA-32 IN and OUT ) - Emulated either in the VMM or the VMApp.
- Drawbacks
- Virtualizing I/O devices can incur overhead from
world switches between the VMM and the host - Handling the privileged instructions used to
communicate with the hardware
27Case StudyThe Intel VT-x (Vanderpool) Technology
28Overview
- VT-x (Vanderpool) technology for IA-32 processors
- enhance the performance VM implementation through
hardware enhancements of the processor. - Main Feature
- The inclusion of the new VMX mode of operation
(VMX root/non-root operation) - VMX root operation
- Fully privileged, intended for VM monitor New
instructions VMX instructions - VMX non-root operation
- Not fully privileged, intended for guest software
- Reduces Guest SW privilege w/o relying on rings
29Technological Overview
30VT-x Operations
VMX Non-root Operation
. . .
VM Exit
IA-32 Operation
VMX Root Operation
VMXON
VMLAUNCH
VMRESUME
31Capabilities of the Technology
- A Key aspect
- The elimination of the need to run all guest code
in the user mode. - Maintenance of state information
- Major source of overhead in a software-based
solution - Hardware technique that allows all of the
state-holding data elements to be mapped to their
native structures. - VMCS (Virtual Machine Control Structure)
- Hardware implementation take over the tasks of
loading and unloading the state from their
physical locations.
32Virtual Machine Control Structure (VMCS)
- Control Structures in Memory
- Only one VMCS active per virtual processor at any
given time - VMCS Payload
- VM execution, VM exit, and VM entry controls
- Guest and host state
- VM-exit information fields
33 Case StudyXen Virtualization
34Xen Design Principle
- Support for unmodified application binaries is
essential. - Supporting full multi-application operating
system is important. - Paravirtualization is necessary to obtain high
performance and strong resource isolation.
35Xen Features
- Secure isolation between VMs
- Resource Control and QoS
- Only guest kernel needs to be ported
- All user-level apps and libraries run unmodified.
- Linux 2.4/2.6 , NetBSD, FreeBSD, WinXP
- Execution performance is close to native.
- Live Migration of VMs between Xen nodes.
36Xen 3.0 Architecture
37Xen para-virtualization
- Arch Xen/X86 , replace privileged instructions
with Xen hypercalls. - Hypercalls
- Notifications are delivered to domains from Xen
using an asynchronous event mechanism - Modify OS to understand virtualized environment
- Wall-clock time vs. virtual processor time
- Xen provides both types of alarm timer
- Expose real resource availability
- Xen Hypervisor
- Additional protection domain between guest OSes
and I/O devices.
38X86 Processor Virtualization
- Xen runs in ring 0 (most privileged)
- Ring 1,2 for guest OS, 3 for user-space
- Xen lives in top of 64MB of linear address space.
- Segmentation used to protect Xen as switching
page tables too slow on standard X86 - Hypercalls jump to Xen in ring 0
- Guest OS may install fast trap handler
- MMU-virtualization shadow vs. direct-mode
39Para-virtualizing the MMU
- Guest OS allocate and manage own page-tables
- Hypercalls to change PageTable base.
- Xen Hypervisor is responsible for trapping
accesses to the virtual page table, validating
updates and propagating changes. - Xen must validate page table updates before use
- Updates may be queued and batch processed
- Validation rules applied to each PTE
- Guest may only map pages it owns
- XenoLinux implements a balloon driver
- Adjust a domains memory usage by passing memory
pages back and forth between Xen and XenoLinux
40MMU virtualization
41Writable Page Tables
42I/O Architecture
- Asynchronous buffer descriptor rings
- Using shared-memory
- Xen I/O-Spaces delegate guest Oses protected
access to specified h/w devices - The guest OS passes buffer information vertically
through the system. - Xen performs validation checks.
- Xen supports a lightweight event-delivery
mechanism which is userd for sending asynchronous
notifications to a domain.
43Data Transfer I/O Descriptor Rings
44Device Channel Interface
45Performance
46Thank You !