Execution Replay for Multiprocessor Virtual Machines

About This Presentation

Title:

Execution Replay for Multiprocessor Virtual Machines

Description:

Detection and replay of memory races is possible on commodity ... Tuned to 16-byte cacheline. Involving the kernel may be expensive. Single-processor Xen guests ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 29

Provided by: george299

Category:

more less

Transcript and Presenter's Notes

Title: Execution Replay for Multiprocessor Virtual Machines

1
Execution Replay for Multiprocessor Virtual
Machines

George W. Dunlap
Dominic Lucchetti
Michael A. Fetterman
Peter M. Chen

2
Big ideas

Detection and replay of memory races is possible
on commodity hardware
Overhead high for some workloads
but surprisingly low for other workloads

3
Execution Replay
CPU
Interrupts
Network
Memory
Keyboard, mouse
Disk
4
Uses of Execution Replay

Reconstructing state
Fault tolerance
Reconstructing execution
Debugging
Realistic trace generation
Both
Intrusion analysis

5
Single-processor Replay

Basic principles well understood
Log all non-deterministic inputs
Timing of asynchronous events
Minimal overhead (Dunlap02)
13 worst case
Log for months or years
Available commercially
VMWare Record/Replay

6
Replay for Multiprocessors

Memory races in multiprocessor VMs
The Ordering Requirement
The CREW Protocol
Implementing with page protections
Relation to the Ordering Requirement
Generating constrants from CREW events
DMA-capable devices and CREW
Performance

7
The Multiprocessor Challenge

Interleaved reads and writes
Fine-grained non-determinism
Much more difficult
Existing solutions
Hardware modification
Software instrumentation
SMP-ReVirt
Hardware MMU to detect sharing

8
Multiprocessor Replay
P2
P1
P2
P1
n5
n3
Memory
if (nlt4)
9
Ordering Memory Accesses

Preserving order will reproduce execution
a?b a happens-before b
Ordering is transitive a?b, b?c means a?c
Two instructions must be ordered if
they both access the same memory, and
one of them is a write

10
Constraints Enforcing order

To guarantee a?d
a?d
b?d
a?c
b?c
Suppose we need b?c
b?c is necessary
a?d is redundant

P1
P2
a
b
overconstrained
c
d
11
CREW Protocol

Each shared object in one of two states
Concurrent-Read all processors can read, none
can write
Exclusive-Write one processor (the owner) can
read and write others have no access

12
CREW protocol, cont

Enforced with hardware MMU
Read/write
Read-only
None
Change CREW states on demand
Fault, fixup, re-execute
CREW event
Increasing or reducing permission due to CREW
state changes

13
CREW Property

If two instructions on different processors
access the same page,
and one of them is a write,
there will be a CREW event on each processor
between them.

14
Generating Constraints

State Concurrent Read
All processors read-only
d CREW fault
New state P2 Exclusive
r privilege reduction
Read to None
i privilege increase
Read to Read/write
Log timing of r and i
Constraint
r ? i

P1
P2
a

d
r
i
d
15
Direct Memory Access

Device accesses memory directly
Logically another processor
Reads and writes need to be ordered
IOMMU cant fault/fixup/re-execute
Observation Transaction model
Device non-preemptible actor

16
Prototype SMP-ReVirt

Modified Xen hypervisor
Implement logging, CREW protocol
Details in paper

17
Evaluation questions

What is the overhead?
What affects performance?
In paper
When might I want to use MP?
Log with 1, 2, or N cpus?

18
Evaluation Workloads

SPLASH2 parallel application suite
FMM, LU, ocean, radix, water-spatial, radiosity
Kernel-build
Dbench

19
Predicting results

Key changes in sharing attributes
4096-byte sharing granularity
Miss is very expensive
SPLASH2
Good high spatial locality / low false sharing
Bad random access patterns / high false sharing
The Linux kernel
Tuned to 16-byte cacheline
Involving the kernel may be expensive

20
Single-processor Xen guests
21
Log Growth Rate
22
2-processor Xen guests
23
2-processor, cont
24
Log Growth Rate
25
4-processor Xen guests
26
Recap