Title: JIT Instrumentation
1JIT Instrumentation A Novel Approach To
Dynamically Instrument Operating Systems
Marek Olszewski Keir Mierle Adam
Czajkowski Angela Demke Brown University of
Toronto
2Instrumenting Operating Systems
- Operating systems are growing in complexity
- Kernel instrumentation can help
- Used for debugging, profiling, monitoring, and
security auditing... - Dynamic instrumentation
- No recompilation no reboot
- Good for debugging systemic problems
- Feasible in production settings
3Current Approach Probe-Based
- Dynamic instrumentation tools for OSs are
- probe based
- Overwrite existing code with jump/trap
- Efficient on fixed length architectures
- Slow on variable length architectures
- Not safe to overwrite multiple instructions with
jump - Branch to between instructions might exist
- Thread might be sleeping in between the
instructions - Must use trap instruction
4Current Approach Trap-based
Area of interest
Instrumentation Code
Trap Handler
sub 6c,esp
- Save processor state
- Lookup which instrumentation to call
- Call instrumentation
- Emulate overwritten instruction
- Restore processor state
mov ffffe000,edx
add 1,count_l adc 0,count_h
and esp,edx
inc 14(edx)
int3
mov 28(edi),eax
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
Very Expensive!
5Alternative JIT Instrumentation
- Propose to use just-in-time dynamic
instrumentation - Rewrite code to insert new instructions in
between existing ones - More Efficient.
- More Powerful. Supports
- Instrumenting branch directions
- Basic block-level instrumentation
- Per execution-path instrumentation
- Proven itself in user space (Pin, Valgrind)
6JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
mov ffffe000,edx
add 1,count_l
and esp,edx
adc 0,count_h
inc 14(edx)
mov 28(edi),eax
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
7JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
sub 6c,esp
mov ffffe000,edx
mov ffffe000,edx
add 1,count_l
and esp,edx
and esp,edx
adc 0,count_h
inc 14(edx)
pushf
mov 28(edi),eax
call instrmtn
popf
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
8JIT Instrumentation
Instrumentation Code
Area of Interest
Code Cache
sub 6c,esp
sub 6c,esp
mov ffffe000,edx
mov ffffe000,edx
and esp,edx
and esp,edx
inc 14(edx)
pushf
mov 28(edi),eax
call instrmtn
mov 2c(edi),ebx
mov 30(edi),ebp
add 1,eax
and 3,eax
or c, eax
mov eax,(ebx)
add 2,ebp
or f, ebp
mov ebp,4(ebx)
9Dynamic Binary Rewriting
- Use dynamic binary rewriting to insert the new
instructions. - Interleaves binary rewriting with execution
- Performed by a runtime system
- Typically at basic block granularity
- Code is rewritten into a code cache
- Rewritten code must be
- Efficient
- Unaware of its new location
10Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb1
bb3
bb2
bb4
Runtime System
11Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb3
bb2
bb2
bb2
bb4
Runtime System
12Dynamic Binary Rewriting
Original Code
Code Cache
bb1
bb1
bb1
bb3
bb2
bb2
bb2
bb4
bb4
bb4
Runtime System
No longer need to enter runtime system
13Dynamic Binary Rewriting
- Used for rewriting operating systems
- Virtualization (VMware)
- Emulation (QEMU)
- Never used for instrumentation of OSs
- Never used to rewrite host OS in a general manner
- Allows instrumentation of live system
14Outline
- Prototype (JIFL)
- Design
- OS Issues
- Performance comparison
- Kprobes vs JIFL
- Example Plugin
- Checking branch hint directions
15Prototype Design
- JIFL - JIT Instrumentation Framework for Linux
- Instruments code reachable from system calls
16JIFL Software Architecture
JIFL Plugin Starter
JIFL Plugin Starter
User Space
JIFL Plugin (Loadable Kernel Module)
Linux Kernel (All code reachable from
system calls)
Kernel Space
JIFL (Loadable Kernel Module)
Code Cache
Runtime System
Dispatcher
JIT Compiler
Heap
Memory Allocator
17Gaining Control
- Runtime System must gain control before it can
start rewriting/instrumenting OS - Update system call table entry to point to
dynamically emitted entry stub - Calls per-system call instrumentation
- Calls dispatcher
- Passing original system call pointer
18Dispatcher
- Saves registers and condition code states
- Dispatcher checks if target basic block is in
code cache - If so it jumps to this basic block
- Otherwise it invokes the JIT to compile and
instrument the new basic block
19JIT Compiler
- Like conventional JIT compiler, except its
input/output is x86 machine code - Compiles at a dynamic basic block granularity
- All but the last control flow instruction are
copied directly into the code cache - Control flow instructions are modified to account
for the new location of the code - Communicates with the JIFL plugin to determine
what instrumentation to insert
20JIT Inserting Instrumentation
- Instrumentation is added by inserting a call
instruction into the basic block - Additional instructions are also needed to
- Push/Pop instrumentation parameters
- Save/Restore volatile registers (eax, edx, ecx)
- Save/Restore condition code register
- Several optimizations can be performed to reduce
instrumentation cost
21Eliminating Redundant State Saving
- Eliminate dead register and condition code saving
code - Perform liveness analysis
- Reduce state saving overhead
- Per-basic block Instrumentation
- Search for the cheapest place to insert it
22Inlining Instrumentation
- Small instrumentation can be inlined into the
basic block - Removes the call and ret instructions
- Constant parameters are propagated to remove
stack accesses - Copy propagation and dead-code elimination is
applied to specialize the instrumentation routine
for context - All done on native x86 code. No IR!
23Effect of Optimizations
- Average system call latencies with per-basic
block instrumentation
Normalized Execution Time
24Prototype
25Memory Allocator
- While JITing JIFL often needs to allocate dynamic
memory - Cannot rely on Linux kmalloc and vmalloc routines
as they are not reentrant - Instead, we created our own memory allocator
- Pre-allocate a heap when JIFL starts up
26Releasing Control
- Calls to schedule() have to be redirected
- Otherwise, JIFL keeps control even after context
switch - Have to
- Save return address in hash table
- Call schedule()
- Look up and call dispatcher
27Performance Comparison
28Performance Evaluation
- Instrument every system call with three types of
instrumentation - System Call Monitoring (Coarse Grained)
- Call Tracing (Medium Grained)
- Basic Block Counting (Fine Grained)
- LMbench and ApacheBench2 benchmarks
- Test Setup
- 4-way Intel Pentium 4 Xeon SMP - 2.8GHz
- Linux 2.6.17.13
- With SMP support and no preemption
29System Call Monitoring
Normalized Execution Time
30Call Tracing
Normalized Execution Time Log Scale
31Basic Block Counting
Normalized Execution Time Log Scale
32Apache Throughput
Normalized Requests / Second
33Example Plugin
- Checking Correctness of Branch Hints
34Example Plugin Checking Branch Hints
Int correct_count ? 0 Int incorrect_count ? 0 //
Called for every newly discovered basic
block. Procedure Basic_Block_Callback if
last instruction is not a hinted branch
return if hinted in the branch not taken
direction call Insert_Branch_Not_Taken_In
strumentation( Increment_Counter,
correct_count) call Insert_Branch_Taken_
Instrumentation( Increment_Counter,
incorrect_count) else // Insert
same instrumentation but for reverse //
branch directions // Executed for every
instrumented branch. Procedure Increment_Counter(I
nt Counter) Counter ? Counter 1
35Example Plugin Checking Branch Hints
- 5 system calls with bad branch hint performance
- Misprediction rates gt 75
- Contained gt 30 of hinted branch executed
- Examined using a second plugin
- Monitored individual branches
- Found 4 greatest contributors
- Mapped back to source code
- Cant fix Not hinted by programmer!
36Conclusions
- JIT instrumentation viable for operating systems
- Developed a prototype for the Linux kernel (JIFL)
- Results are very competitive
- JIFL outperforms Kprobes by orders of magnitude
- Enables more powerful instrumentation
- e.g. Branch Hints