Title: Open64: A Framework for High performance Compiler
1Open64 A Framework for High performance Compiler
March 2007
2Outline
- Open64 History
- Osprey Project
- Research Activities
- Retargetability
3Open64 Based Research Activites at University of
Delaware
- Open64 Code Porting for Large-Scale Multi-Core
Architectures - Code Optimization for Large-Scale Multi-Core
Architectures - Research on a point-to alias analysis under a SSA
framework - Landing Software Pipelining on Large-Scale
Multi-Core Architectures
4Port Open64 to Cyclops64
based on Pathscale 2.2.1/x8664
Begin with gcc 3.2.1/MIPS C FE, we change the MD
so that it can generate AST compatible with
cyclops64s ABI
Rewrite from scratch for C64
Only dep-test are enabled, the loop
transformation are not enabled because org loop
transformation is not readily applicable for arch
without cache
Changed heavily CGIR lowering, scheduling, EBO
etc
tools chains (as, ld, simulator etc) are provided
by ETI.
5Some researches on Open64/C64
- Scratch pad utilization
- Divide scratch pad memory into 3 areas 2nd level
general purpose register (L2 GPR) , software
rotating register (SRR), free area - L2 GPR further divide into caller/callee-save,
color live ranges with L2 GPR when RA run out of
real registers. - SRR prefetching, improve temporal locality
- E.g1 prefech 5 iterations ahead for () x
gt for () rrx x rrx5 - E.g2 improve temporal locality for (i0 i
lt10000 i ai bi ai-5 gt for
(i0 i lt10000 i rrx bi rrx-5
airrx - Use LDM (load multiple word) to reduce bandwidth
bandwidth requirement
6Unification based points-to analysis using SSA
- Motivations
- Incremental change to existing Steensgaards PT
analysis with better precision - Retain almost linear time
- Limited flow sensitivity improve the precision
of analysis of p and q where p and q are global
variable/pointer, or it may be modified by
callees. - Reduce the imprecision due to unification
- Limited Flow sensitivity by SSA form
- build (preliminary) SSA form for all variables
(inc global variables and local var with address
taken). Do not take into account the alias. - Perform Points-to on the preliminary SSA form,
update the SSA form during PT analysis p3
initially points-to n, after analyzing stmt 4,
p3points to both n and z
7Unification based points-to analysis using SSA
(cont)
- Differentiate flat unification and updating
unification - Flat unification let s1points_to(p1),
s2points_to(p2), statements p cond ? p1 p2
make s1 and s2 unified simply because p may
points to both set. The s1 and s2 themselves
dont need updated at the moment unification
happens. - Incremental update points_to(p1) gt a, b, q1
some_ptr, may change ps value, hence
points_to(p1) should be updated into a,b U
points_to(some_ptr). - The final unified set encode the type of
unification of smaller subset. Flat-unified
sub-sets are still disjointed.
8Software Pipelining of Multi-Core Architectures
A Brief Introduction
- Problem description
- Software toolchain
- Where Open64 helped
- Some results.
9Problem Description
- Software-pipelining on multi-threaded
architectures - Single-dimension Software-Pipelining (SSP)
- Workload distribution
- Data communication
- Data synchronization
10Software Pipelining Toolchain Based on Open64
11Implementation
12What Open64 features are used in multi-core
software pipelining
- Multi-dimensional dependence analysis
- WHIRL clean interface
- Machine model
- Reservation tables
- Register allocation
- Modulo-scheduler
- Code generator
- No need to implement everything to test
- Clean code despite lack of documentation!
13Cyclops64 architecture
14Some Results