Title: An Overview of the Trimaran Compiler Infrastructure
1An Overview of the Trimaran Compiler
Infrastructure
2What Is Trimaran ?
- A parametric compilation and performance
monitoring system - A full-blown C compiler for the HPL-PD
instruction set architecture (ISA) - A cycle-by-cycle parametric machine simulator
cache simulator - A suite of optimization and analysis tools
- Uses HPL-PD a parameterized very long instruction
word (VLIW) ISA - Supports predication, control and data
speculation and compiler controlled management of
the memory hierarchy - Compiles for target architectures specified by a
machine description language - Can compile optimized code for a variety of VLIW
and Superscalar architectures
3Trimarans Goal
- To provide a vehicle for implementation and
experimentation for state of the art research in
compiler techniques for instruction-level
parallel architectures. - Currently, the infrastructure is oriented towards
Explicitly Parallel Instruction Computing (EPIC)
architectures. - But can also support compiler research for
Superscalar architectures. - Primarily for back-end compiler research
- instruction scheduling, register allocation, and
machine dependent optimizations.
4Compiling a Program
Source Program (C, C, Java, etc)
- Compiles programs for only one architecture
- All optimizations are tuned for the given target
machine
5A Retargetable Compiler and Simulator
Front-End
Source Program (C, C, Java, etc)
High-level Optimizations
Low-level Optimizations
- MDES influences optimizations and code generation
- Executing the binary performs cycle-by-cycle
simulation based on MDES
Code Generation
6Terms and Definitions
- ILP (Instruction-Level Parallelism)
- more than one operation issued per clock cycle
within a single CPU - EPIC (Explicitly Parallel Instruction Computing)
- ILP under compiler control
- A single instruction may contain many operations
- Compiler determines operation dependences and
specifies which operations may execute
concurrently
7Infrastructure Components
- A machine description language, HMDES, for
describing ILP architectures. - A parameterized ILP Architecture called HPL-PD
- Current instantiation in the infrastructure is as
a EPIC architecture - A compiler front-end for C, performing parsing,
type checking, and a large suite of high-level
(i.e. machine independent) optimizations. - This is the IMPACT module (IMPACT group,
University of Illinois)
8Infrastructure Components
- A compiler back-end, parameterized by a machine
description, performing instruction scheduling,
register allocation, and machine-dependent
optimizations - Each stage of the back-end may be replaced or
modified by a compiler researcher - Primarily implemented as part of the ELCOR effort
by the CAR Group at HP Labs - Augmented with a scalar register allocator from
CREST
9Infrastructure Components ..contd
- An extensible IR (intermediate program
representation) - Has both an internal and textual representation,
with conversion routines between the two. The
textual language is called REBEL - Supports modern compiler techniques by
representing control flow, data and control
dependence, and many other attributes - Easy to use in its internal representation (clear
C object hierarchy) and textual representation
(human-readable) - A cycle-level simulator of the HPL-PD,
configurable by a MDES and provides run-time
information on execution time, branch
frequencies, and resource utilization - This information can be used for profile-driven
optimizations, as well as to provide validation
of new optimizations
10Infrastructure Support ..contd
- An Integrated graphical user interface (GUI) for
configuring and running the Trimaran system.
11 Trimaran System Organization
IMPACT
C program
KR/ANSI-C Parsing Renaming Flattening Control-F
low Profiling C Source File Splitting Function
Inlining
Classical Optimizations Code Layout Superblock
Formation Hyperblock Formation ILP Transformations
MachineDescription
Elcor/CAR
DependenceGraph Construction
Modulo Scheduling
Acyclic Scheduling
ExecutionStatistics
ToIR
Simulator
Post-pass Scheduling
Region-basedRegister Allocation
. . .
ReaCT-ILP
12An Overview of the IMPACT Module and Its
Optimization Suite
13KR/ANSI-C Parser
- Built upon EDG C parser
- Solid but persnickety about C language spec
- May need to modify benchmark source to match spec
- Utilizes native compilers header files (in most
cases), and libraries - We may only distribute binaries and source diffs
- Unmodified source available via free educational
license from EDG (see web site for source diffs
and instructions) - Modified to generate our source-level
intermediate rep. - Compile all the available source together
- Dont link in libraries if have source for
libraries! - Profiler and source analysis tools need everything
14IMPACT steps
- Flattening
- Transforms complex expressions into simple ones
adding temporary variables if required - Profiling
- Simple control-arc weighing based on one profile
run - File splitting
- Generates one file per function
- Function inlining
- Inlines functions to limit code growth but
accelerate most called functions - Classical optimizations
- Performs certain optimizations (Red Dragon book)
15IMPACT steps (contd)
- Code layout optimizations
- Makes most branches fall through, etc.
- Superblock/Hyperblock formation
- Can generate superblocks/hyperblocks
- ILP optimizations
- Expose more ILP through unrolling of loops, etc.
16An Overview of theELCOR module
17Elcor Functional Overview
- Elcor is a collection of compiler components and
scripts that analyze and transform Rebel
- Analysis modules
- Control dependence
- Data flow
- Transformation and optimization modules
- Scheduling modules
- Acyclic schedulers
- Loop schedulers
- Rotating register allocator
- Static register allocator()
- by ReaCT-ILP
- Elementary data structures
- Container classes
- Data structures for compiler algorithms
- Intermediate Representation data structures
- I/O modules
- Rebel reader/writer
- Lcode reader/writer
- Mdes interface()
18Control Flow Analysis
- Dominator analysis
- Control Dependence Analysis
- Loop detection
- Induction variable detection
19Control Flow Transformations
- Loop region construction
- Constructs a cyclic region which can be modulo
scheduled - Single back edge
- Structural region formation
- Identifies acyclic subgraphs of CFG that are
single entry and multiple exit. - Branch normalization/denormalization
- Constructs a memory layout independent form of
CFG that can be transformed easily.
20Control Flow Transformations
- Tail duplication
- Useful for constructing single entry multiple
exit regions
A
A
B
C
B
C
D
21Control Flow Transformations
- If-conversion of single entry multiple exit basic
block regions
Supports if-conversion with or without fully
resolved predicates (FRPs)
22Data-flow analysis
- Live variable analysis
- Live variable information is annotated on the IR
- Reaching definitions analysis
- A data structure for def-use chains is annotated
on the IR - Available expression analysis
- Queries for expression availability is provided
at any point on the control-flow graph - These analysis can be performed on any region
23Data-flow analysis architecture
- Uses a CFG consisting of basic-blocks and
hyperblocks. - Such a cut has to exist in the region hierarchy
- Region based analysis has three steps
- Transfer functions are constructed for each
entry-exit pair on a CFG node - Transfer functions are constructed using local
predicate relationships. - Global iterative solver is conventional
- Solves data-flow equations at basic/hyperblock
entry exit points. - Local analysis is used to determine data-flow
equation solutions at points within a block using
global solver results
24Optimizations
- Predicate speculation
- Dead code elimination
- Global copy propagation (forward)
- Local copy propagation (forward and backward)
- Global common sub-expression elimination
- Loop-invariant code removal
- Global register renaming
25The Elcor Intermediate Representation
26Factors Motivating the Design
- Global scheduling is key to exploiting ILP
- We are moving towards bigger and complex regions
- Frequency-based regions have more complex
structure than traditional structure-based
regions - Even a trace is multiple-entry multiple-exit
region - Many of the ILP enhancing techniques, e.g.,
height reduction, rely on estimates of height and
resource usage - Such estimates may be helpful even in earlier
phases - Analysis like memory disambiguation are expensive
- Need to represent and maintain their results
accurately
27Factors Motivating the Design
- Flexibility in phase ordering
- Because we don't fully understand the right phase
order - Flexibility and ability to grow
- In many cases, we don't fully understand the
requirements - IR highly optimized for a specific purpose may
not be the right one - Put general mechanism to support various policies
- Well defined interfaces to modules and
encapsulation - Uniformity
- Easy to build software, modify and grow
28IR Features
- Registers carry values, edges represent
dependences - A uniform, edge-based representation of control
flow and data dependences - Supports threading of data dependences
- dependence flow graphs
- Hierarchical non-overlapping region structure (a
tree)
- Multi-state IR
- Provides mechanism for representing
- Traditional control flowgraph
- Control dependences
- Data dependences for both registers and memory in
various forms - Various forms of register usage single
assignment, multiple assignments - Expanded virtual registers (EVRs)
- Predicated execution
- Data section
- Global symbols, arrays, etc.
29Internal vs. Textual Representation
- Each component of the graph data structure is a
C object - All modules of the Elcor use this IR
- Optimization are simply IR-to-IR transformations
- There is an ASCII intermediate representation,
called Rebel - Phases of Elcor may communicate using Rebel
- A reader procedure is provided that reads Rebel
and constructs the corresponding internal program
representation - A writer procedure is provided for generating
Rebel from the internal representation
30Example of textual IR
- See http//www.trimaran.org/docs/elcor_ir_manual.p
df
31Program Representation
- A program unit is represented by a graph of
operations connected by edges - Control flow is represented explicitly and at the
operation level - A region structure over the operation graph (a
tree) - The root of the tree is the program unit, e.g. a
procedure - The leaf nodes of the tree are operations
- Operation graph elements
- Op(eration) class
- Operand class
- Edge class
32Navigating ELCOR code
33Projects for this class
- You are going to be modifying parts of ELCOR
- ELCOR code is written in C with its own
template library - Very large code base
- Many tools already available in ELCOR you must
find them to try avoiding re-inventing the wheel
every time
34Some useful directories
- Graph directory
- Contains C representation of Op, Region,
Operand, etc. - Control directory
- Contains tools helpful in identifying control
structures (such as loops) - Analysis directory
- Analysis such as dominator analysis, liveness
analysis, etc. - Opti directory
- Contains code for certain optimizations
- Main directory
- Starting point for ELCOR
- Tools directory
- Structures such as maps, vectors, etc.
35Some tips
- Use ctags (exhuberant-ctags) or etags
- Look at Main/process_function.cpp
- Do NOT assume Trimaran to be bug-free
- Look at how supplied tools are used
- There are iterators that are very useful (look in
Graph and Control) - Spend time exploring the code before starting to
code. You may find that things are already there
36TheHPL-PD Simulator and Performance Monitoring
Environment
37User View of the Simulator
- To the user, the simulator is simply another
phase of the compilation/execution process. - Transparent to the user, Makefiles guide the
- Configuration of the simulator using MDES
- Generation of executable code from the Rebel
output of the back end. - Creation of interface for foreign calls
- to C routines provided by the user or as part of
a standard library. - A GUI is provided to extract and analyze the
execution results of the simulator.
ExecutionStatistics
C program
Front End
Back End
Simulator
38Execution Results
- During execution, the simulator produces raw
data, namely a trace specifying - Control flow execution
- gives the order of control-block execution
- Memory addresses referenced
- Guarded predicate values
- whether an operation within a HPL-PD instruction
was disabled by predication. - A trace-driven profiler tool is run after
execution. - Reads the trace, and Rebel file(s), and extracts
the desired information. - Emits a detailed statistics / profile information
file.
39Statistics
- List of items generated by the Trace-Driven
Profiler - IPC (number of HPL-PD operations / clock cycle).
- Memory address usage frequencies.
- Control block visit frequencies.
- Resource utilization.
- Register Usage frequencies.
- Functional Unit utilization.
- Memory(Stack / Heap) utilization.
- Effectiveness of guarded predicates.
- Register allocation overhead.
40Viewing execution statistics using the GUI
41Viewing execution statistics using the GUI
42The Trimaran GUI
- The Trimaran system is configured and run via a
Graphical User Interface - choose program to compile
- configure target machine
- configure compilation stages
- view graphical program representations at various
stages of compilation - view execution statistics (graphs, pie charts,
etc.) - view extensive on-line help and documentation.
- If desired, the system can also be run from the
command line and be invoked from shell scripts.
43The control panel
- The GUI is operated from this main control panel.
ViewingProgramIntermediate Representation
GUI settingsand defaults
Compiler andSimulator Parameters
Compileroptions
Organize Collections of Programs,
Machines,Parameter Sets, etc.
Viewing Execution Statistics
Target MachineConfiguration
44The Compiler Panel
- The compiler panel allows you to choose a
- benchmark program to compile
- you can add your own as well.
- target machine configuration
- parameter set (for the compiler and simulator)
- project file
- It also allows you to easily configure the stages
of the compiler and start the compilation
process.
45Choosing a benchmark and machine
Choosing a benchmark
Choosing a machine
46Configuring the compiler
Front endfeatures
Back endfeatures
Simulatoron/off
47On-line Documentation
- On-line documentation is available for each
component of Trimaran - this is the on-line help for the compiler panel.
48The Machine Panel
- The machine panel is used create new target
machines and modify existing ones. - Here, one selects an existing machine to copy or
modify.
49Machine Descriptions
- To edit a machine description, the GUI opens an
editor window. - Trimaran includes a very powerful machine
description facility. - It is the subject of an entire section of this
tutorial. - The GUI interface simplifies the process of
machine description.
50The Parameters Panel
- The parameters panel allows you to modify a large
number of parameters used by the front end, the
back end, and the simulator. - Generally, the default settings will be used.
- Once a new parameter set has been configured, the
set can be named and saved for subsequent use.
51Modifying Parameters
- Upon clicking open, the parameters are
displayed. - Here, the compiler front end parameters are
displayed, along with their current values. - Clicking a ? button opens a help window for
that parameter. - Parameters can also be modified by editing text
files, if desired.
52Parameters for the Back End
- The compiler back end has the largest number of
parameters. - The parameters are organized into groups
according to their use. - Analysis
- Optimizations
- Register Allocation
- Etc.
Help window
53The Statistics Panel
- The statistics panel allows you to choose what
statistics are displayed for the programs in
ones project file. - Function level execution profile
- Region level profile
- Instruction usage
- Etc.
54Viewing Statistics
- For each program in your project file, a separate
graph is displayed. - Here, pie charts show the dynamic instruction
distribution.
55The View IR Panel
- The IR viewer provides five kinds of views of a
program.
The program regions (hyperblocks, loops, etc.)
Dependence Graph
Control Flow Graph (CFG)
ILP Instruction Schedule
Profile Information
56Control Flow View
- Here is a portion of the control flow graph for a
program. - The user can specify a portion of the program to
display. - The viewer has zoom in, zoom out, scroll, etc.
57Summary
- Trimaran is open-source compilation/simulation
environment to study VLIW compilation - Intel IA-64 (Itanium and McKinley)
- Trimaran has a parameterized structure
- Relies on an HPLPD architecture description file
- Trimaran supports various research and
educational efforts - www.trimaran.org for more information