Title: Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation
1Pin Building Customized Program Analysis Tools
with Dynamic Instrumentation
- C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser,
G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood - Presented by Michael Laurenzano
2What is Program Instrumentation?
- Inserting extra code into an application to
observe its behavior - Example Cache Simulation
for (int i 0 i lt LENGTH i)
CacheSim(Ai) Ai
(double)i CacheSim(Bi) Bi
(double)i CacheSim(Ci) Ci
(double)i
3Uses of Program Instrumentation
- Code Profiles
- Basic block/Instruction count
- Operation results
- Microarchitectural study
- Branch outcomes
- Memory addresses
- Bug checking
- Memory leaks
- Uninitialized data
4Pin System Layout
5Pin System Layout
The code being analyzed
6Pin System Layout
Tells us where and how to perform analysis
The code being analyzed
7Pin System Layout
Tells us where and how to perform analysis
Combines application and pintool code to create
instrumented code
The code being analyzed
8Pin System Layout
Tells us where and how to perform analysis
Combines application and pintool code to create
instrumented code
Stores the Instrumented code created by the JIT
The code being analyzed
9Pin System Layout
Tells us where and how to perform analysis
Combines application and pintool code to create
instrumented code
Stores the Instrumented code created by the JIT
The code being analyzed
Controls execution, maintains data structures,
tracks program state
10Simplified Instrumentation
- Transfer control to VM at an application control
transfer - Look for instrumented version of branch target in
code cache - If found execute instrumented code
- If not compile the code, insert into code cache,
execute new code - Repeat
11Trace Linking
- Transfer control directly between traces
- Branch target must be known statically
- Target trace must be present in code cache
Regular Execution
Pin w/o Trace Linking
Pin w/ Trace Linking
Trace 1
Sequence 1
Trace 1
Virtual Machine
Sequence 2
Trace 2
Trace 2
12Trace Linking (Indirect)
- Unknown targets are usually somewhat
predictable - Function typically returns to a few locations
(few call sites) - Indirect Jump usually goes to a few locations
- Try several predicted targets to see if we can
avoid VM intervention - Short target lists are maintained for each
indirect branch - If we exhaust this list, use the VM
13Function Cloning
- Most common indirect control transfer is a
function return - Create a function instance for each call site
- Return address is then unique and known for each
function instance - Turns this indirect control transfer into a
direct control transfer - Code bloat!
- Implemented by keeping a call stack for each
instrumented instruction sequence - Keep last 4 in call stack
- Call stack represented as a 64-bit integer
14Register Bindings
- Register re-allocation occurs so that Pin can use
registers - The register bindings can be different from one
trace to the next - When compiling, keep register bindings from the
previous trace if possible - When linking traces, modify the register bindings
before going to the next trace - Usually only a few registers are mismatched in
practice
15Optimization Inlined Analysis Routines
Without Inlining
With Inlining
Application
Application
Bridge Code
Bridge Routine
Analysis Code
Analysis Routine
Bridge Code
Application
Bridge Routine
- 2 fewer calls and 2 fewer returns
Application
- Other optimizations constant
- folding, code relocation
16Optimization eflags Register Liveness
- The x86 eflags register is treated as a
bit-vector containing state information - This register can be modified as a side-effect of
some instructions - eflags might not be live when we reach analysis
routine - If this is the case, we do not need to
save/restore it
17Optimization Call Scheduling
- User can specify that the routine be put anywhere
in the particular scope - Anywhere in instruction, basic block, function,
program, etc. - Pin can schedule the call according to best
performance - Perhaps at a point where few registers need to be
saved - How well will this actually work?
18Basic Pin Overhead
19Effectiveness of Optimizations
20Questions?