Title: Virtual Machines Background
1Virtual Machines Background
- Adapted from Silberschatz
2Virtual Machines
- A virtual machine takes the layered approach to
its logical conclusion. It treats hardware and
the operating system kernel as though they were
all hardware. - A virtual machine provides an interface identical
to the underlying bare hardware. - For example, the operating system creates the
illusion of multiple processes, each executing on
its own processor with its own (virtual) memory.
3Virtual Machines (Cont.)
- The resources of the physical computer are shared
to create the virtual machines. - CPU scheduling can create the appearance that
users have their own processor. - Spooling and a file system can provide virtual
card readers and virtual line printers. - A normal user time-sharing terminal serves as the
virtual machine operators console.
4System Models
Non-virtual Machine
Virtual Machine
5Advantages/Disadvantages of Virtual Machines
- The virtual-machine concept provides complete
protection of system resources since each virtual
machine is isolated from all other virtual
machines. What might be bad about this? - This isolation, however, permits no direct
sharing of resources. - A virtual-machine system is a perfect vehicle for
operating-systems research and development.
System development is done on the virtual
machine, instead of on a physical machine and so
does not disrupt normal system operation. - The virtual machine concept is difficult to
implement due to the effort required to provide
an exact duplicate to the underlying machine.
6Java Virtual Machine
- Compiled Java programs are platform-neutral
bytecodes executed by a Java Virtual Machine
(JVM). - JVM consists of
- - class loader
- - class verifier
- - runtime interpreter
- Just-In-Time (JIT) compilers increase performance
7Java Virtual Machine
8An Overview of Virtual Machine Architectures
9Definitions
- Instruction Set Architecture (ISA)
- Precise specification of the interface between
hardware and software - Application Binary Interface (ABI)
- Defines how an application can work with a
platform at the binary level. (Contrast with
API.) - Includes user ISA, system call interface, etc.
- Suppose an ABI is changed.
- Recompile?
- Source changes?
10Virtualization
Application
Application
Guest
Application
OS
OS
VirtualISA
Virtual ISA
OS
VMM
VirtualMachine
ISA
Hardware
ISA
Hardware
Host
- VMM also known as hypervisor.
11Virtual Machine Uses
- Emulation
- One ISA can be used to emulate another.
- Provides cross-platform portability.
- Optimization
- Emulators can optimize as they emulate.
- Also can optimize same ISA to same ISA.
- Replication
- A single physical machine can be replicated,
providing isolation between the VMs. - Composition
- Two virtual machines can be composed, combining
the functionality of each.
12Process vs. System
- Meaning of machine depends on perspective.
- To a process, the machine is the system calls,
libraries, etc. - Already abstract.
- The entire system also runs on a machine.
- Includes ISA, actual devices, etc.
- Other kinds of machines?
- As there are two perspectives, there are two
kinds of virtual machines process and system. - Process virtual machine can support an individual
process. - System virtual machine can run a complete OS plus
environment.
13Process vs. System
NativeApp
NativeApp
W32App
W32App
NativeApp
NativeApp
JavaProg
JavaProg
Windows
JavaVM
JavaVM
VMM
Linux
Linux
x86
x86
System VM
Process VM
Examples?
14Process VMs
- Multiprogramming
- A process has the illusion of having the whole
machine to itself. - Emulation
- Interpreted. (Define.)
- Translated. (Define.)
- What are relative merits?
- Dynamic optimizers
- Especially useful with some kind of
profile-directed translation. - High Level Language VMs
- High-level language is compiled to an
intermediate language. - VM then runs the intermediate language.
- Example is Java Interpreted or translated?
15System VMs
- Same ISA
- Classic (Define. Pros/cons?)
- VMM built directly on top of hardware.
- Most efficient, but requires wiping the slate
clean. - Requires device drivers in the VMM.
- Hosted (Define. Pros/cons?)
- VMM built on top of existing OS.
- Most convenient
- Devices drivers supplied by host OS, VMM uses
facilities provided by host OS. - Different ISA
- Whole System VMs Emulation
- ISA not the same, must emulate everything.
- Co-Designed VMs Optimization
- Hardware designed to support VMs.
- Provides a clean design for virtualization.
- Can be significantly more efficient.
16Virtualization
- The state of a machine must be maintained.
- Physical machine latches, flip-flops, etc.
- Virtual machine combination of physical machine
and state emulated in software using RAM, etc. - At certain points in execution, such as a trap,
the state of the machine must be materialized. - Not trivial due to complex hardware techniques
used to provide high performance. - This ability to materialize the state is termed
preciseness. - Three aspects of virtualization
- State registers and memory
- Instructions may involve emulation
- State materialization when exceptions occur
17Process VMs Virtualization
- Multiprogramming
- State
- Mapped 11
- Instructions
- Native
- State materialization
- Provided by hardware
- Dynamic translation
- State
- Registers mapped to host registers as available
(overflow to memory). Memory mapped to host
memory. - Instructions
- Emulated
- State materialization
- Provided by VM software
- HLL VMs
- State
- Mapped to host resources as available.
- Instructions
- Emulated, JIT compiled
18System VMs Virtualization
- Classic VMs
- State
- Mapped 11, except for privileged registers.
- Instructions
- Native, except trapping for priveleged
instructions - State materialization
- Provided by hardware
- Whole System VMs
- State
- Mapped to available memory, not 11
- Instructions
- Emulated
- State materialization
- Provided by VM software
- Co-Designed VMs
- State
- Mapped 11
- Instructions
- Block-level translated
19Taxonomy
- Process
- Same ISA
- Multiprogramming
- Dynamic optimization
- Different ISA
- Dynamic translators
- HLL VM
- System
- Same ISA
- Classic OS VMs (IBM)
- Hosted VMs
- Different ISA
- Whole system
- Co-designed VMs
20Key Ideas
- VMs can support an individual process only, or
can support a whole OS. - Can construct a useful taxonomy based on
- process or system
- same ISA or different ISA
21Virtualizing I/O Devices on VMware Workstations
Host VMM
22Virtualizing the PC Platform
- Several hurdles
- Non-virtualizable processor
- Some privileged instructions fail silently. (Why
is this a problem?) (Whats the solution?) - PC hardware diversity
- Why is this problematic for a classic VM?
- Pre-existing PC software
- Must stay compatible
- To address these, VMware uses a hosted VM. (Not a
classic VM.)
23Two Worlds
- VMApp runs in the host, using the VMDriver host
kernel component to establish the VMM. - CPU is thus executing in either the host world or
the virtual world, using VMDriver to switch
worlds. - World switches are expensive, since user and
system state must be switched.
24Architecture
VMApp
Host Kernel
VMDriver
VMNet
25Virtualizing the NIC
- I/O port operations by guest OS must be
intercepted by VMM. - Must then be processed in the VMM (to maintain
the virtual state). - Or executed in the host world. (When must it do
what?) - Send operations start as a sequence of ops to
virtual I/O ports. - Upon finalization of the send, the VMApp issues a
host OS syscall to the VMNet driver, which passes
it on the real NIC. - Finally requires raising a virtual IRQ to signal
completetion. - Receive operations operate in reverse.
- VMApps executes select() syscall on possible
sources. - Reads packet, forwards it to VMM which raises a
virtual IRQ.
26Details
- Send
- Guest OS out to I/O port
- Trap to VMDriver
- Pass to VMApp
- Syscall to VMNet
- Pass to actual NIC driver
- Receive
- Hardware IRQ
- Actual NIC delivers to VMNet driver
- VMNet driver causes VMApp to return from select()
- VMApp copies packet to VM memory
- VMApp asks VMM to raise virtual IRQ
- Guest OS performs port operations to read data
- Trap to VMDriver
- VMApp returns from ioctl() to raise IRQ
27Reducing Network Virtualization Overheads
- Handling I/O ports in the VMM
- Many accesses dont involve actual I/O.
- Let the VMM maintain the state, avoiding a worlds
switch. - Send combining
- If data rate is high, queue up packets, send them
in a group. - IRQ notification
- Use shared memory bitmap rather than requiring
VMApp to call select() when an IRQ is received on
the host system.
28Performance Enhancements
- Reducing CPU virtualization overhead
- Find operations to the interrupt controller that
have memory semantics and replace with MOV
operation, which does not require intervention by
the VMM. - Apparently requires dynamic binary translation.
- Modifying the guest OS
- Eliminate idle task page table switching, which
is not necessary, since the idle task pages are
mapped in every process page table. - Run idle task with page table of last process.
- What would happen if the idle task had a bug and
wrote to some random addresses?
29Performance Enhancements
- Creating a custom virtual device
- Virtualizing a real device is somewhat
inefficient, since the interface to these devices
is optimized for real devices, not virtual
devices. - Designing a custom virtual device can reduce
expensive operations. - Disadvantage is that must write a new device
driver in guest OS for this virtual device. - Modifying the host OS
- VMNet driver allocates kernel memory sk_buff,
then copies from VMApp to sk_buff. - Can eliminate copy by using memory from VM
physical memory. - Bypassing the host OS
- VMM uses own drivers, rather than going through
the host OS. (Note that going through the host OS
is using a kind of process VM provided by the
host OS.) - Disadvantage is that you have to write your own
VMM driver for every supported real device.
30Summary
- Main goal is to develop some understanding of the
issues of hosted system VM performance.
31Question
- If overwrite privileged instructions with a brk
instruction, how does the VMM know what
instruction used to go there?
32Xen and the Art of Virtualization
- A (bad) play on Zen and the Art of Motorcycle
Maintenance
33Motivation
- Server farm scenario
- Multiple applications installed on machines.
- Different customers.
- (Whats admission control?)
- Current approaches
- Allow users to install and run apps
- Configuration interaction between apps (like
versions of Java jars, shared libraries, etc.)
can lead to compatibility problems requiring
time-consuming system administration to solve. - Behavior of one app can impact performance of
another. Need performance isolation. - One approach is QoS.
- Extend OS to provide QoS to apps.
- (Whats the difference between QoS and real-time?
QoS and perf. isolation?)
34Use VMs
- Instead use multiple VMs, one VM per app.
- Each app can configure the entire OS exactly how
it requires. - Relatively easier to implement algorithms at the
VM level to isolate the performance behavior of
different apps. - Requirements for successful partitioning
- Isolation (Does VMware provide this?)
- Accommodate heterogeneity
- Good performance
- To avoid performance penalties of VMs like
VMware, use paravirtualization.
35Design Principles
- Support for unmodified binaries is essential.
- Must virtualize all features required by
existing ABIs. - Support for full multi-app OSs is important. (Not
just process VMs.) - Complex configurations may have multiple
processes and should be configured within a
single VM. - Paravirtualization is necessary to obtain high
performance and strong resource isolation. - For example, virtualizing page tables can result
in many expensive traps. - Even on ISAs designed for virtualization,
completely hiding the virtualization from guest
OS risks correctness and performance. - For example, the VM should know real time (and
not just virtual time) to handle things like
timeouts. - Contrast with Denali security model.
- Separate namespaces.
- Xen uses hypervisor.
36The VM Interface Overview
- Memory management
- Paging
- Xen in top 64 MB of every AS, avoiding TLB flush
for hypervisor transitions. - Guest OSs update actual hardware page tables
through Xen, which improves performance. (But
makes them aware of virtualization.) - Segmentation
- Cannot install fully privileged segment
descriptors.
37The VM Interface Overview (contd.)
- CPU
- Protection
- Guest OS must run at lower privilege. Since ring
1-2 seldom used, run guest OS in ring 1. - Exceptions
- Guest OSs must register handlers with Xen.
Generally identical to original. - Safety is done by making sure it doesnt execute
in ring 0. - System Calls
- Fast handlers may be registered to avoid going
through ring 0. Instead go from ring 3 to ring 1. - Does this change the ABI?
- Page Fault
- Page fault handler must be modified, fault addr
in a priv reg. - Technique is for Xen to write to a location in
the stack frame. - Device I/O
- Network, Disk, etc.
- All replaced with special, buffer-based event
mechanism.
38Porting
- XP directly accessed PTEs, Linux used macros.
(Why sig.?)
39Control and Management
- Separation of policy from mechanism
- Microkernel like design
- Basic control mechanism provided by hypervisor
through a control interface - Policies implemented by a special distinguished
guest OS instance (domain). - Scheduling parameters, phys mem allocations,
domain creation/destruction, create/delete
virtual network interfaces and block devices
40Architecture
41Details
42Hypercalls and Events
- Hypercalls
- From domain to Xen
- Explicit calls into the hypervisor by the guest
OS. Used by guest OS for things like updating
hardware page tables. - Events
- From Xen to domain
- Bitmask, and handler
43Data Transfer
- Presence of hypervisor is another layer, so
imperative to minimize overhead. - For resource accountability
- Minimize work to demultiplex data
- Or, figure out as quickly as possible which
domain it goes to. - Memory committed to I/O comes from relevant
domains - Minimize cross-talk
44I/O Rings
- Buffers separate. How is pointer shared? How does
reordering work? NBS.
45CPU Scheduling
- CPU Scheduling
- BVT
- Work-conserving
- Latency vs. throughput
- When would you want non-work-conserving?
- Fast-dispatch (borrowing)
46Time and Timers
- Time and timers
- Guest OSs made aware of real time, virtual time,
and wall-clock time. - Real-time, nanosecs since boot, can be
frequency-locked to external - Virtual time advances only when the guest OS is
executing. Used for scheduling by the guest OS. - Wall-clock time? An offset from real time. (When
would ever adjust?) - Xen-provided timers are used by guest OS.
- Solves one efficiency problem with VMware
Workstation. - Guest XP causes host to perform poorly, because
must constantly deliver timer interrupts to XP to
do things like smooth transition animations (like
minimizing a window, etc.). Forcing the guest to
use XP provided timer would eliminate the need to
virtualize these timer interrupts.
47Virtual Address Translation
- Virtual address translation
- Handled by Xen, batched updates.
- Must be validated by Xen.
- Type and ref count associated with each frame
- Type is used to aid validation
- For example, a page table frame needs to be
validated once, but not afterwards.
48Physical Memory
- Physical memory
- Reserved for each guest OS instance at time of
creation. - Provides strong isolation.
- But no sharing. What would be advantage of
sharing? - OS may use an additional table to give the
illusion of physical memory. - Might need to know hardware for optimizing
placement.
49Network
- VIFs
- Two I/O rings
- Zero-copy
50Disks
- VBDs (Domain0 has direct access.)
- Disk scheduling
- Guest doesnt know the real layout
- Xen does some reordering
- (A bit of a violation of policy/mechanism.)
- Scheduling is RR of batched requests, then
elevator. - Also may have reorder barriers.
- (How well does this provide isolation?)
51Performance
52Relative Performance
- Compared Linux, XenoLinux, VMware 3.2, and UML.
- Tests with others could not be published.
- Tests
- SPEC INT2000
- Linux build
- Native Linux 7 CPU is system.
- Open Source Database Benchmark (OSDB) Information
Retrieval (IR) - OSDB On-Line Transaction Processing
- dbench
- File system benchmark
- SPEC WEB99
- App level for Web servers (Apache)
53Performance
54Performance
55Operating System BMs
- What does SMP stand for?
- Why might SMP be slower?
- Why are the highlighted ones slower?
- Why sig handling faster for Xen?
56Operating System BMs
- Needs hypercall.
- Why more processes needs more time?
- Why less sig diff with bigger WS?
57Operating System BMs
- mmap and PF require two transitions. (Why?)
58Operating System BMs
59Concurrent VMs
- Run on 2-CPU SMP
- Apache only 28 improve over UP.
- Xen improves 9 over UP.
- Why slightly better sometimes?
60PostgreSQL
- Scores running multiple PostgreSQL on native
Linux are 25-35 lower. Possibly due to SMP
scalability plus poor use of block cache. - Weights seem to have an effect in the Info Retr
case, but no effect in OLTP case due to lots of
sync writes. Why sync writes?
61Performance Isolation
- Only 4 and 2 below earlier results.
- Does this make sense?
62Scalability of VMs
- SPEC INT2000
- Native Linux identifies as compute bound, and
uses 50 ms time slice. (Why does this matter?)
63Future Work
- Universal buffer cache with COW
- How might this be used?
- Last chance page cache (LPC)
- of non-zero length only when machine memory is
undersubscribed. - Clean, evicted pages, added to LPC.
- If faults, check LPC
- (Why only clean pages?)
64Key Ideas
- A virtual ISA (paravirtualization) is better.
- Better performance
- Allows VMs to be isolated from one another. One
VM cant cause the other to thrash, for instance. - Allows up to 100 OS instances
- Making the guest OS aware of virtualization
improves correctness and performance - Control and management of Xen itself is done from
a guest OS, via a special interface. - Cherry picking?
- Generally speaking, people always choose tests to
show their work in best light. - Maybe hard to tell if complex situation.
65Microkernels Meet Recursive Virtual Machines
66Decomposition
- Microkernels decompose functionality horizontally
(mainly). - Monolithic services separated horizontally.
- Moved up one layer.
- Stackable VMMs decompose functionality
vertically. - Each layer supplies some functionality.
67Fluke
- Uses a nested process architecture.
- Each process provides a VM to its children,
possibly with additional functionality. - Different from usual parent-child in that
children are completely contained within and
visible to parent. - This is necessary for the parent to be a VM to
its children. - Two APIs
- Low-level kernel API to microkernel for basic
manipulation - High-level protocols to handle
- Parent Interface
- Process
- MemPool
- FileSystem
- Nested VMs interact directly with microkernel for
the low-level API, but interact with the parent
VM for high-level protocols. - Parent VM will use interposition to add
additional functionality. This is how the
stacking works.
68(No Transcript)
69Key Ideas
- Implement a microkernel that allows process
virtual machines to be stacked. - Each virtual machine is a user-level server.
- Stacking occurs through process nesting.
- Use pass-through to avoid exponential behavior.
- Mainly interesting for the ideas, performance is
relatively poor, but may be improvable.