Title: Research Overview for LLVM Group
1Automatic Pool Allocation Compile-Time Control
Over Complete Pointer-Based Data Structures
Vikram Adve
University of Illinois at Urbana-Champaign
Joint work with Chris Lattner, Dinakar
Dhurjati, Sumant Kowshik
Thanks NSF (CAREER, Embedded02, NGS00, NGS99,
OSC99), Marco/DARPA
2Why Does Data Layout Matter?
Performance Working sets Spatial
locality Temporal locality Heap allocation
overheads
Security Buffer overruns Dangling
pointers Uninitialized pointers
S/w Reliability Dangling pointers Checkpointing St
atic bug detection Static data race detection
- and complex heap-based data structures are
ubiquitous.
3Compiling Pointer-Intensive Codes Today
- Current analyses and transformations focus on
primitives - disambiguate individual loads and stores
- optimize individual loads and stores
- reorder, split, or merge individual data types
- Q. Can compilers manipulate entire logical data
structures? - A list?
- A tree of linked lists?
- A hashtable?
- A graph?
4List 1 Nodes
List 2 Nodes
Tree Nodes
5Why Segregate Data Structures into Pools?
- Programs are designed around data structures
- Direct benefit of segregation Better performance
- Smaller working sets
- Improved spatial locality
- Sometimes convert irregular to regular strides
- Primary Goal Better compiler information
control - Compiler knows where (sets of) data structures
live in memory - Compiler knows order of data in memory (in some
cases) - Compiler knows type information ? runtime
points-to graph - Compiler knows which pools point to which other
pools - Compiler knows bounds on pool lifetimes
6Outline
- Automatic Pool Allocation
LAPLDI05 - Using Pool Allocation to Improve Performance
- Use 1 Improving heap locality, performance
- Use 2 Transparent pointer compression
LAMSP05 - Using Pool Allocation for Bug Detection, Security
- Use 3 Detecting buffer overruns fast and
transparently DAICSE06 - Use 4 Detecting all dangling pointer errors fast
DASubmitted - Use 5 SAFECode ...
- SAFECode A Safe Execution Environment for C/C
- Sound program analysis, memory safety for full C
DKAPLDI06 - Memory safety for type-safe C
DKALTECS05
7- Automatic Pool Allocation
- The transformation algorithm
Lattner and Adve, PLDI 2005 (Best Paper Award)
8Pool Allocation Current Approaches
Compiler has no information about pool properties
- Current Manual Pool Allocation
- Via library By class (e.g., C STL), scope, or
data structure - Via language support By scope or data structure
- Automatic Region Inference for ML (Tofte
Birkedal, Aiken) - By lifetime only, e.g., stack of regions
- Limited destructive updates
Goal is memory management, not layout control,
not DS separation
- Never automated before
- Imperative languages including C, C,
- Pool allocation by logical data structures
9Pool Allocation The Key Insight
- Partition heap objects according to the results
of some pointer analysis. - The pointer analysis representation we use is
called a Data Structure Graph (DS Graph).
10DS Graph Properties
- int G
- void twoLists()
- list X makeList(10)
- list Y makeList(100)
- addGToList(X)
- addGToList(Y)
- freeList(X)
- freeList(Y)
11DS Graph forOlden MSTBenchmark
Key Insight Fully context-sensitive points-to
graph identifies data structure instances Fully
context-sensitive ? Identify objects by full
acyclic call paths
12DS Graph for Olden EM3D Benchmark
13DS Graph for Olden Power Benchmark
- Olden-Power Benchmark
- build_tree()
- t malloc()
- t-gtl build_lateral()
- build_lateral()
- l malloc()
- l-gtnext build_lateral()
- l-gtb build_branch()
14Automatic Pool Allocation Overview
- Segregate memory according to points-to graph
- N graph nodes ? 1 pool (default 1-to-1)
- Retain explicit free() for objects
Points-to graph (two disjoint linked lists)
Pool 1
Pool 2
15Points-to Graph Assumptions
- Specific assumptions
- Separate points-to graph for each function
- Unification-based graph
- Can be used to compute escape info
- Use any points-to that satisfies the above
- Our implementation uses DSA LattnerPhD
- Infers C type info for many objects
- Context-sensitive
- Field-sensitive analysis
- Results show that it is very fast
DSApool allocation time lt 3 of GCC -O3 for all
tested programs.
16Pool Allocation Example
- list makeList(int Num)
- list New malloc(sizeof(list))
- New-gtNext Num ? makeList(Num-1) 0
- New-gtData Num return New
-
- int twoLists( )
- list X makeList(10)
- list Y makeList(100)
- GL Y
- addGToList(X)
- addGToList(Y)
- freeList(X)
- freeList(Y)
Change calls to free into calls to poolfree ?
retain explicit deallocation
17Pool Allocation Algorithm Details
- Indirect Function Calls
- call fp1 arg1 argN fp1 ? F1, F2
- call fp2 arg1 argN fp2 ? F2, F3
- Must pass same pool arguments to F1, F2 and F3
- Partition functions into equivalence classes
- If F1, F2 have common call-site ? same class
- Merge points-to graphs for each equivalence class
- Apply previous transformation unchanged
- Pools reachable from global variables
- Such a pooldesc is a runtime constant, so make
it global also - See paper for details LAPLDI05
18Two Further Refinements
- (1) Eliminating poolfree()
- poolfree() just before pooldestroy() is
redundant - This is effectively Static Garbage Collection !
- DS Create(P)
- ProcessData(DS)
- Free(DS, P) // redundant if ...
- pooldestroy(P)
- (2) Reducing Pool Lifetimes
- Pools need not be created / destroyed at function
boundaries - Intraprocedural flow analysis to create later,
destroy earlier - Can be extended interprocedurally Aiken et al.,
PLDI 96
19Pool Allocation Properties
- Strengths
- Transparent Fully automatic for any LLVM program
- Static Map Every pointer var/field points to
unique, known pool - Pool Type Information Many type-homogeneous
pools - Lifetimes Lifetime of every pool is bounded
- Pool Points-to Graph Compiler knows which pools
contain pointers to every pool, and vice versa - Limitations
- No deallocation No automatic deallocation of
items in pools - Unsafe No guarantee of memory safety
- Lifetimes Pools reachable from global vars have
global lifetime - Missing type info Type-unsafe objects (DS nodes)
20- Use 1 of Pool Allocation
- Improving performance of heap-intensive codes
Lattner and Adve, PLDI 2005
21Simple Pool Allocation Statistics
91
DSAPool allocation compile time is small less
than 3 of GCC compile time for all tested
programs. See paper for details
22Pool Allocation Speedup
- Several programs unaffected by pool allocation
- 10-20 speedup across many pointer intensive
programs - Some programs (ft, chomp) order of magnitude
faster
23Cache/TLB miss reduction
Miss rates measured with perfctr on AMD Athlon
2100
- Sources
- Defragmented heap
- Reduced inter-object padding
- Segregating the heap!
24Chomp Access Pattern with Malloc
25Chomp Access Pattern with PoolAlloc
26FT Access Pattern With Malloc
- Heap segregation has a similar effect on FT
- See Lattners Ph.D. thesis for details
27Pool Specific Optimizations
- Different Data Structures Have Different
Properties - Pool allocation segregates heap
- Optimize using pool-specific properties
- Examples of properties we look for
- Pool is type-homogenous
- Pool contains data that only requires 4-byte
alignment - Opportunities to reduce allocation overhead
28Looking closely Anatomy of a heap
- Fully general malloc-compatible allocator
- Supports malloc/free/realloc/memalign etc.
- Standard malloc overheads object header,
alignment - Allocates slabs of memory with exponential growth
- By default, all returned pointers are 8-byte
aligned - In memory, things look like (16 byte allocs)
4-byte padding for user-data alignment
4-byte object header
16-byte user data
One 32-byte Cache Line
29Pool-Specific Optimizations
- Selective Pool Allocation
- Dont pool allocate when not profitable
- PoolFree Elimination
- poolfree redundant if followed by pooldestroy
- Bump-pointer allocation if pool has no
poolfree - Eliminate per-object header
- Eliminate freelist overhead (faster object
allocation) - Type-safe pools infer a type for the pool
- Use 4-byte alignment for pools we know dont need
it
30PAOpts (3/4) Bump Pointer Optzn
- If a pool has no poolfrees
- Eliminate per-object header
- Eliminate freelist overhead (faster object
allocation) - Eliminates 4 bytes of inter-object padding
- Pack objects more densely in the cache
- Interacts with poolfree elimination (PAOpt 2/4)!
- If poolfree elim deletes all frees, BumpPtr can
apply
16-byte user data
16-byte user data
16-byte user data
16-byte user data
One 32-byte Cache Line
31PAOpts (4/4) Alignment Analysis
- Malloc must return 8-byte aligned memory
- It has no idea what types will be used in the
memory - Some machines bus error, others suffer
performance problems for unaligned memory - Type-safe pools infer a type for the pool
- Use 4-byte alignment for pools we know dont need
it - Reduces inter-object padding
4-byte object header
16-byte user data
16-byte user data
16-byte user data
16-byte user data
One 32-byte Cache Line
32Pool Optimization Speedup (FullPA)
PA Time
- Baseline 1.0 Run Time with Pool Allocation
- Optimizations help all of these programs
- Despite being very simple, they make a big impact
33- Use 3 of Pool Allocation
- Detecting buffer overruns fast and transparently
- Dhurjati and Adve, ICSE 2006, to appear
34Array Bounds Errors
- Most common reason for security attacks
- Over 50 of attacks reported by CERT
- 1988 First exploited
-
- 2006 Continues to get exploited
Key problem Tracking target object of each
pointer is very expensive (without fat pointers)
35Jones-Kelley Transparent Bounds Checking
(, ) (p,n 4) (, )
ref lookup(q)
Check(ref, r)
Idea Register all array objects in a global
splay tree lookup on every pointer calculation
Advantage Backwards-compatible no wrappers
needed Problem 4-5x slowdowns (up to 12x for
Ruwase-Lam extension)
36Separate search tree per pool
ref lookup(P1,q)
- 3 Key Insights
- Splay tree for a pool should be (very) small.
- In fact, 2-element cache works great!
- Pool for each pointer is known!
- In type-homogeneous pools, can distinguish (and
ignore) scalars.
Check(ref, r)
37Experimental Results
- Dramatic improvement in lookup overheads
- Average overhead 12 for Olden (34, 69 for 2
cases) - lt 4 for 2 system daemons
- Compares with 5x-6x for original Jones-Kelly.
- Up to 11x-12x for Ruwase-Lam extension (which we
use). - Effective in finding bugs
- Zitsers suite models 14 buffer overruns in
sendmail (7), wu-ftpd (4), bind (3) - All 14 detected successfully.
Caveat Like J-K, doesnt work for casts from
pointers to int and back
38- Use 5 SAFECode
- A Safe Compilation Strategy for C/C Programs
- Sound analysis Dhurjati and Adve, PLDI 2006,
to appear - Formal proof of soundness is in accompanying
technical report TR UIUCDCS-R-2005-2657. - Memory safety Dhurjati et al., PLDI 2006, TECS
2005
39 Safe Languages Provide Basic Guarantees
e.g., Java, C, Modula -3, ML
- Prevent memory access violations
- Detect errors during development
- Enable sound compile-time analyses
- e.g. in tools for safety checking, model
checking, program verification
Often ignored
Weakly typed languages like C, C do not provide
any of these benefits
40Why care about C/C?
- Huge body of essential legacy software
- Dominant in critical domains OS kernels,
embedded systems, daemons, language run-time
systems. - Example Microsoft Longhorn (basis of Vista)?
- Less than 25 in C Amitabh Srivastava,
CGO 04 keynote address - Mostly high level components, e.g., windowing
system - Performance critical code still in C/C
The features that make C/C popular for system
software are the features that make C/C
unsafe Nested structs stack-allocated objects
untagged unions explicit free custom
allocators.
41Current Solutions
Solution Overhead No memory violations Error checking Sound static analysis
Purify, Valgrind Several 100x - some -
SafeC 5x - some -
Jones-Kelley 5-6x - some -
SFI Over 2x y - -
FisherPatil 2x-6x Y Y -
Yong Over 2x - some -
SAFECode 0-30 Y some Y
CCured Upto 1.87x Y some Y
Cyclone 1x-2x Y some y
Pure C
Modified C
42SAFECode Compiler and Run-time System
- A typed assembly language (LLVM)
- Language-independent
- Simple, transparent runtime system
- Sound analysis and memory safety
- Heap safety via Automatic Pool Allocation
run-time checks - Stack safety via Data Structure Analysis (DSA)
heap conversion - Array safety via pool checks or precise array
bounds checks
Initially, for type-safe C, with restricted
pointer casts TECS 2005 Now, for nearly
arbitrary, unmodified C programs PLDI 2006
43Guaranteeing Static Analysis
- Many program verification tools build on alias
analysis, call graph, assumed type information - E.g., SLAM, ESP , BLAST
- Memory errors can invalidate these analyses
- Detecting all memory errors is expensive
- Dangling pointer errors
- Precise array bounds errors
Solution Enforce key analyses in the presence
of some memory errors Alias analysis, call
graph, type information.
44What is Alias Analysis
A static summary of memory objects and their
connectivity
struct List head makeList(20)
int P4 Pi . struct List Q (Struct
List )P Q-gtval
TK Type Known, TU Type Unknown
45Memory errors invalidate alias analysis
struct List tail, head
int B4
TU
head.field1 tail
Tmp (struct List)B
Tmp-gtfield6 .. //could corrupt head.field1
- head.field1 could point any where in memory
- pointer analysis incorrect
- head.field1 could corrupt memory of another TK
node
46Enforcing Alias Analysis
- Problem 1
- Must ensure that tmp points to
an object in this points-to
set - With normal allocation
- Objects are scattered in memory
- Checking set membership at run-time is extremely
expensive - Insight1
- Automatic Pool Allocation partitions heap
corresponding to nodes in the graph. These
partitions are compact and can be checked
efficiently!
Caveat Currently only flow-insensitive,
unification based
47Enforcing Alias Analysis
- Problem 2
- Checking every pointer access or initialization
is still very expensive - Insight 2
- Ignoring memory errors, any pointer obtained
from TK pool already has correct aliasing
behavior. - Pointers obtained from other pools will be
explicitly checked - Poolcheck(PP, p , align)
- Mask lower k bits of p, look in hash table of
page addresses in PP - Alignment check if array references in TK pool
48Tolerating Dangling Pointers
- Problem 3
- But memory errors (dangling pointer errors, array
bounds violations) could corrupt locations in TK
pools - Insight 3 (also used for type-safe C w/o GC)
- Reallocating a freed block to a new request of
the same type cannot cause any type violation or
(in the same pool) aliasing violation, despite
dangling pointers. - Only array references in TK pools must be
checked (can optimize) - Poolcheck(PP, p , align).
49Evaluation of Run-time Overhead
- Programs Olden, Ptrdist, 3 system daemons
- No source changes necessary
- Compared Olden with Ccured.
Program SAFECode ratio CCured ratio
bh 1.03 1.31
bisort 1.00 0.97
em3d 1.27 1.49
treeadd 0.99 2.72
tsp 0.99 1.23
yacr2 1.30 -
ftpd 1.00 -
fingerd 1.03 -
Max 1.30 2.72
50 51What Could You Do With Pool Allocation?
- Embedded Systems
- Pointer compression, data compression for
embedded codes - Data partitioning for explicit local memories /
buffers / tiles - Power savings for dead / dormant pools
- Dependable Systems
- Efficient checkpointing by ignoring unmodified
pools - Efficient replicated execution for servers
- Focusing instrumentation for program testing
- High Performance Systems
- Data-structure-centric profiling
- Linked pointer prefetching
52Summary
- Automatic Pool Allocation
- Gives compilers information about data structure
layouts, lifetimes, points-to information - SAFECode
- A sound execution strategy for C, C
programs enable sound analysis, enforce memory
safety.
llvm.cs.uiuc.edu
53llvm.cs.uiuc.edu