Title: What Programming LanguageCompiler Researchers should Know about Computer Architecture
1What Programming Language/Compiler Researchers
should Know about Computer Architecture
- Lizy Kurian John
- Department of Electrical and Computer Engineering
- The University of Texas at Austin
2Somebody once said
- Computers are dumb actors and compilers/programme
rs are the master playwrights.
3Computer Architecture Basics
- ISAs
- RISC vs CISC
- Assembly language coding
- Datapath (ALU) and controller
- Pipelining
- Caches
- Out of order execution
- Hennessy and Patterson architecture books
4Basics
- ILP
- DLP
- TLP
- Massive parallelism
- SIMD/MIMD
- VLIW
- Performance and Power metrics
Hennessy and Patterson architecture books ASPLOS,
ISCA, Micro, HPCA
5The Bottomline
- Programming Language choice affects performance
and power - eg Java
- Compilers affect Performance and Power
6A Java Hardware Interpreter
- Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001)
- This technique used by Nazomi Communications,
Parthus (Chicory Systems)
7HardInt Performance
- Hard-Int performs consistently better than the
interpreter - In JIT mode, significant performance boost in 4
of 5 applications.
8Compiler and Power
A
A
A
E
Cycle 1
Cycle 1
B
C
B
B
C
E
C
Cycle 2
Cycle 2
D
E
D
D
Cycle 3
Cycle 3
F
F
F
Cycle 4
Cycle 4
DDG
Peak Power 2 Energy 6
Peak Power 3 Energy 6
9Valluri et al 2001 HPCA workshop
- Quantitative Study
- Influence of state-of-the-art optimizations on
energy and power of the processor examined - Optimizations studied
- Standard O1 to O4 of DEC Alphas cc compiler
- Four individual optimizations simple
basic-block instruction scheduling, loop
unrolling, function inlining, and aggressive
global scheduling
10Standard Optimizations on Power
11Somebody once said
- Computers are dumb actors and compilers/programme
rs are the master playwrights.
12A large part of modern out of order processors
- is hardware that could have been eliminated if a
good compiler existed. -
13Let me get more arrogant
- A large part of modern out of order processors
was designed because - computer architects thought compiler writers
could not do a good job.
14Value Prediction
- Is a slap on your face
- Shen and Lipasti
15Value Locality
- Likelihood that an instructions computed result
or a similar predictable result will occur soon - Observation a limited set of unique values
constitute majority of values produced and
consumed during execution
16Load Value Locality
17Causes of value locality
- Data redundancy many 0s, sparse matrices, white
space in files, empty cells in spread sheets - Program constants
- Computed branches base address for jump tables
is a run-time constant - Virtual function calls involve code to load a
function pointer can be constant
18Causes of value locality
- Memory alias resolution compiler conservatively
generates code may contain stores that alias
with loads - Register spill code stores and subsequent loads
- Convergent algorithms convergence in parts of
algorithms before global convergence - Polling algorithms
192 Extremist Views
- Anything that can be done in hardware should be
done in hardware. - Anything that can be done in software should be
done in software.
20What do we need?
- The Dumb actor
- Or the
- The defiant actor who pays very little
attention to the script
21Challenging all compiler writers
- The last 15 years was the defiant actors era
- What about the next 15? TLP, Multithreading,
Parallelizing compilers Its time for a lot
more dumb acting from the architects side. - And its time for some good scriptwriting from
the compiler writers side.
22BACKUP
23Compiler Optimzations
- cc - Native C compiler on Dec Alpha 21064 running
OSF1 operating system - gcc Used to study the effect of individual
optimizations
24Std Optimizations Levels on cc
- -O0 No optimizations performed
- -O1 Local optimizations such as CSE, copy
propagation, IVE etc - -O2 Inline expansion of static procedures and
global optimizations such as loop unrolling,
instruction scheduling - -O3 Inline expansion of global procedures
- -O4 s/w pipelining, loop vectorization etc
25Std Optimizations Levels on gcc
- -O0 No optimizations performed
- -O1 Local optimizations such as CSE, copy
propagation, dead-code elimination etc - -O2 aggressive instruction scheduling
- -O3 Inlining of procedures
- Almost same optimizations in each level of cc and
gcc - In cc and gcc, optimizations that increase ILP
are in levels -O2, -O3, and -O4 - cc used where ever possible, gcc used used where
specific hooks are required
NOTE
26Individual Optimizations
- Four gcc optimizations, all optimizations applied
on top -O1 - -fschedule-insns local register allocation
followed by basic-block list scheduling - -fschedule-insns2 Postpass scheduling done
- -finline-functions Integrated all simple
functions into their callers - -funroll-loops Perform the optimization of loop
unrolling
27Some observations
- Energy consumption reduces when of instructions
is reduced, i.e., when the total work done is
less, energy is less - Power dissipation is directly proportional to IPC
28Observations (contd.)
- Function inlining was found to be good for both
power and energy - Unrolling was found to be good for energy
consumption but bad for power dissipation
29MMX/SIMD
- Automatic usage of SIMD ISA still difficult 10
years after introduction of MMX.
30Standard Optimizations on Power (Contd)