Title: Data layouts for object-oriented programs
1Data layouts forobject-oriented programs
- Martin HirzelIBM Research
- SIGMETRICS 6/16/2007
2Problem
- Object-oriented programs put datain objects.
- Caches and TLBs put data in blocks.
- Scattering objects over blocks causes cache/TLB
misses. - Misses cost time.
o1
o5
o6
o8
o2
o3
o7
o10
o4
o11
o9
cache line
TLB page
3Solution
- Most object-oriented languages use garbage
collection. - Garbage collection can move objects.
- To avoid misses, move objects to the right
cache/TLB blocks. - Simple, right?
o1
o5
o6
o8
o2
o3
o7
o10
o4
o11
o9
o11
o9
o10
o2
o1
o3
o4
o5
o8
o6
o7
4Cheney Copying GC
From-space
o9
o10
o2
o1
o3
o4
o5
o8
o6
o7
To-space
Copied not yet scanned
Copied scanned
5BF Breadth-first layout
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Why? Siblings
How? Queue-based traversal
Who? Cheney 1970
6DF Depth-first layout
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Why? Child-parent
How? Stack-based traversal
Who? Fenichel/Yochelson 1969
7HI Hierarchical layout
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Why? Both siblings and child-parent
How? Block-bounded breadth first
Who? Moon 1984 Wilson et al. 1991
8AO Allocation order layout
1
2d
3
4d
5d
6
7
8d
9
1c
3c
6c
7c
9c
Why? Creation order matches usage
How? Sliding compaction
Who? Lisp 2 collector 1967
9SZ Size segregation layout
1
2
3d
4
5d
7
6
9
8d
1c
6c
2c
7c
9c
4c
Why? Efficiently finding allocation holes
How? Segregated free lists
Who? Comfort 1964
10TH Thread local layout
001
010
1
8
3
111
11
2
4
9
12
110
101
10
6
5
7
100
Why? Disjoint working sets
How? Reachability from call stacks
Who? Steensgaard 2000
11Problem
AO Allocation order
AS Allocation site
BF Breadth-first
DF Depth-first
HI Hierarchical
PO Popularity
RA Random
SZ Size
TH Thread
TY Type
- Which layout is best, and which is worst?
- How much does it matter in practice?
- How similar are the layouts?
- How much does it matter in the limit?
12Solutions?
- Appeal to intuition
- They cant all be right!
- Formal
- Petrank/Rawitz showed hardness
- Simulation
- Who would believe those numbers?
- Brute-force
- Do you have a few person-years to spare?
13Avoiding Heisenberg Effects
Garbage collector
Garbage collector
Garbage collector
Application
Application
Time
- Exclude garbage collector performance
- Measure real effect of layout on application
- Implement the layouts with simple algorithms
14Object sorting garbage collection
populate
sort
copy
fixup
Sort keys AO, AS, PO, RA, SZ, TH, TY
1532 Benchmarks
16 Mutator time overhead
- All layouts sometimes best, sometimes worst
- RA is worst, as expected
- Low averages, but beware of worst cases!
- AO has best average (but not by much)
- DF has most best cases
- TH has most benign worst case
- Performance impact increases with SMP
- Conclusions for AO, DF, TH, RA still hold
17 Mutator miss rate increases
- Layouts have large impact on miss rates
- Miss rates confirm overhead conclusions
- Miss rates can not replace time measurements
18Layout similarities and differences
- AO, PO, and TH are quite similar
- As expected, RA is far out
19Estimated limit mutator time
- Benchmarkslargest avg. overhead
- Baselinebest observed
- Linear regression
Limit time (no misses) Limit time (no misses)
cache misses ? cache latency
TLB misses ? TLB latency
Total time (as measured) Total time (as measured)
20Related work
21Conclusions
- Layouts matter little on average, but
- Beware of the worst cases!
- Layout importance increases with SMP
- All layouts are sometimes best,sometimes worst
- AO has best average
- DF has most best-cases
- TH has best worst-cases