Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements


1
Structure Layout Optimizations in the Open64
Compiler Design, Implementation and Measurements
  • Gautam Chakrabarti
  • and
  • Fred Chow
  • PathScale, LLC.

2
Outline
  • Motivation
  • Types of structure layout optimizations
  • Criteria for structure layout optimizations
  • Implementation details
  • Performance results
  • Future work
  • Conclusion

3
Motivation
  • Poor data locality in many applications
  • High data cache miss rates
  • Growing gap between processor and memory speeds
  • Our Aim
  • Make applications more cache-friendly
  • Our Approach
  • Change layout of data structures
  • Requires whole-program optimization
  • Use Inter-Procedural Analysis and Optimizations
    (IPA)

4
IPA
  • Summarization
  • Analysis
  • Optimization

5
Types of Structure Layout Optimizations
  • Structure splitting
  • Structure peeling

struct struct_A double d1 double d2 int
i float f long long l char c struct
struct_A next
struct struct_A double d1 double d2 int
i float f long long l char c
6
Structure Splitting Example
  • struct new_struct_A
  • double d1
  • int i
  • long long l
  • struct new_struct_A next
  • struct cold_sub_struct_A p

struct struct_A double d1 double d2 int
i float f long long l char c struct
struct_A next
struct cold_sub_struct_A double d2 float
f char c
7
Structure Peeling Example
  • struct new_struct_A
  • double d1
  • int i
  • long long l

struct struct_A double d1 double d2 int
i float f long long l char c
struct cold_sub_struct_A double d2 float
f char c
8
Criteria for structure layout optimizations
  • Legality Analysis
  • Type cast
  • Address of a field is taken
  • Escaped types
  • Parameter types
  • Full visibility to IPA
  • Alignment restrictions
  • Profitability Analysis
  • Hotness
  • Affinity
  • Field accesses at loop level
  • Size

9
Implementation Details
  • Step 1 Type information summarization (IPL)
  • Step 2 Symbol table merging (IPA)
  • Step 3 Legality and profitability analysis (IPA
    analysis)
  • Step 4 Transforming the program (IPA
    optimization)

10
Implementation Details Type information
summarization
  • Information summarization in IPL
  • Framework for computing static profiles using
    heuristics
  • New TY flag TY_NO_SPLIT
  • SUMMARY_TY_INFO
  • SUMMARY_LOOP
  • For each DO_LOOP, WHILE_DO, DO_WHILE
  • Bit-vector to track field accesses of up to N
    structure for each loop
  • Considers field accesses immediately inside loop
  • These fields are considered affine to each other
  • Execution count of statements immediately inside
    loop
  • From statically estimated profiles or from
    runtime feedback

11
Implementation Details IPA Analysis
  • Inter-procedurally update statically estimated
    execution count of PUs
  • Update statically estimated loop frequencies in
    SUMMARY_LOOP
  • Consider SUMMARY_LOOP from the hottest P PUs
  • Determine candidates for structure-layout
    transformation
  • Determine new layout of structures

12
Implementation Details IPA Analysis Example
F4 F3 F2 F1 BV
L1 22 22 0101
L2 14 0010
L3 12 12 0101
L4 8 8 1100
L5 6 6 0101
F4 F3 F2 F1
AG1 40 40
AG2 14
AG3 8 8
Li Loops Fj Fields in a struct
AGk Affinity groups
13
Implementation Details Transforming the program
  • New type definitions
  • Field table update
  • Field access statements
  • New symbols
  • Assignment statements

Example
struct S struct T
// N
fields // AG1 fields
struct T p // AG2
fields // M fields
// peel T
struct S // N fields struct T1 p1
struct T2 p2 // M fields
struct T1 struct T2
// AG1 fields
// AG2 fields
14
Implementation Details Transforming the program
(continued)
  • Function calls to memory management routines

Example
p (T ) malloc (N sizeof (T))
if (p NULL) exit (1)
  • Detect memory management routine calls involving
    transformed type T
  • Replicate call, assignment statements
  • Update size of memory being allocated
  • Handle comparisons involving pointer p

15
Performance Results
  • Compilations options -Ofast at 32-bit ABI
  • Speedup due to structure layout optimizations

Benchmarks AMD Opteron (2.8GHz, 4GB, 1MB) AMD Barcelona(2.0GHz, 8GB, 512KB) Intel EM64T(3.4GHz, 4GB, 1MB) Intel Core(3.0 GHz, 4GB, 4MB) SiCortex MIPS(500MHz, 4GB, 256KB) Geometric Mean
179.art 134 66 56 47 41 62.5
181.mcf 24 23 23 31 13 22.0
462.libquantum 32 17 40 72 62 39.6
Geometric Mean 46.9 29.6 37.2 47.2 32.1 37.9
16
Performance Results (continued)
  • Compilations options -Ofast at 64-bit ABI
  • Speedup due to structure layout optimizations

Benchmarks AMD Opteron (2.8GHz, 4GB, 1MB) AMD Barcelona(2.0GHz, 8GB, 512KB) Intel EM64T(3.4GHz, 4GB, 1MB) Intel Core(3.0 GHz, 4GB, 4MB) SiCortex MIPS(500MHz, 4GB, 256KB) Geometric Mean
179.art 169 66 53 60 45 69.3
181.mcf 25 35 12 30 7 18.6
462.libquantum 82 51 75 70 69 68.6
Geometric Mean 70.2 49.0 36.3 50.1 27.9 44.6
17
Performance Results (continued)
  • Compilations options -Ofast at 64-bit ABI
  • Multiple copies of 462.libquantum running on
    multi-core chip
  • Platform Quad-core AMD Barcelona (2.0 GHz, 8GB,
    512KB, 2MB)
  • 3rd level cache shared among 4 cores
  • Speedup from structure layout optimizations

Benchmark 1 copy 2 copies 4 copies
462.libquantum 51 69 123
18
Future Work
  • Tune static profile estimation
  • Less restrictions
  • Integrate with field-reordering

19
Conclusion
  • A framework for performing structure layout
    transformations is now available in the Open64
    compiler.
  • The superior infrastructure in the Open64
    compiler helped us implement the optimizations
    cleanly and with relatively less effort.
  • Substantial speedups are possible on some of the
    CPU2000 and CPU2006 SPEC benchmarks.
  • Structure layout optimization is a required
    feature for a compiler to remain competitive.
Write a Comment
User Comments (0)
About PowerShow.com