Title: EECS 583 Advanced Compilers Course Intro, Overview of VLIW Architectures
1EECS 583 Advanced CompilersCourse Intro,
Overview of VLIW Architectures
- Winter 2005, University of Michigan
- January 5, 2005
2About Me
- Mahlke mall key
- But just call me Scott
- 4th year here at Michigan
- Compiler guy who likes hardware
- Program optimization and building custom hardware
for high performance - Before this HP Labs
- Compiler research for Itanium-like processors
- PICO automatic design of NPAs
- Before before Grad student at Univ of Illinois
- Before 3 Undergrad at Illinois
3Class Overview
- This class is NOT about
- Programming languages
- Parsing, syntax checking
- Handling advanced language features virtual
functions, - Frontend transformations array dependence
analysis, - Debugging
- Simulation
- Compiler backend
- Mapping applications to processor hardware
- Retargetability work for multiple platforms
(not hard coded) - Work at the assembly-code level
- Processor independent -gt Machine code
- Speed/Efficiency
- How to make the application run fast
- Use less memory (text, data)
4Background You Should Have
- 1. Programming
- Good C programmer (essential)
- Linux, gcc, gdb, emacs
- Compiler system not ported to Windows or Mac
- 2. Computer architecture
- EECS 370 is good, 470 is better but not essential
- Basics caches, pipelining, function units,
registers, virtual memory, branches, branch
prediction, assembly code - 3. Compilers
- Frontend stuff is not very relevant for this
class - Basic backend stuff we will go over fast
- Non-EECS 483 people will have to do some
supplemental reading - 4. Powerpoint
- You will have to make a presentation in this class
5Textbook and Other Classroom Material
- No required text Lecture notes, papers
- Other useful material
- Trimaran webpage http//www.trimaran.org
- UIUC Impact webpage http//www.crhc.uiuc.edu/Impa
ct - Course webpage course newsgroup
- Will be set up by next Monday
- Lecture notes
- Newsgroup forum for helping each other, I will
try to check regularly, but I wont be able to
answer everything
6What the Class Will be Like
- Class meeting time 100 300, MW
- 2 hrs is hard to handle
- Well go for an hour, take 10 min break
- Core backend stuff
- Text book material some overlap with 483
- Few homeworks to apply classroom material
- Research papers
- Ill present research material along the way
- Presentations by students You guys are going to
teach
7What the Class Will be Like (2)
- Learning compilers
- No memorizing definitions, terms, formulas,
algorithms, etc - Learn by doing Writing code
- Substantial amount of programming
- Big learning curve for Trimaran compiler
- Reasonable amount of reading
- Classroom
- Attendance You should be here
- Discussion important
- Work out examples, discuss papers, etc
- Each of you will teach some advanced material to
the rest of us - Essential to stay caught up
- Special interest groups smaller meetings
outside of class where certain compiler topics
are focused on
8Course Grading
- Yes, everyone will get a grade
- Distribution of grades, scale, etc - ???
- Most (hopefully all) will get As and Bs
- Slackers will be obvious and will suffer
- Components
- Midterm exam 30
- Project 40
- Homeworks 15
- Paper presentation 10
- Class participation 5
9Homeworks
- Around 3 of these
- Small/modest programming assignments
- Design and implement something we discussed in
class - Goals
- Learn the important concepts
- Learn the compiler infrastructure so you can do
the project - Grading
- 4/3/2/1 (almost perfect, good, so-so, didnt try
very hard) - Working together is ok
- Make sure you understand things or it will come
back to bite you - For now, everyone must turn in their own
assignment, may change this later in the semester
10Projects
- Design and implement an interesting compiler
technique and demonstrate its usefulness - Topic/scope/work
- 1-3 people per project
- You will pick the topics (I have to agree)
- Projects will be planned/organized at the SIG
level - You will have to
- Read background material
- Plan and design
- Implement and debug
- Deliverables
- Working implementation
- Project report 5-10 page paper describing what
you did/results - 30 min presentation at end (demo if you want)
11Types of Projects
- New idea
- Small research idea
- Design and implement it, see how it works
- Extend existing idea
- Take an existing paper, implement their technique
- Then, extend it to do something interesting
- Generalize strategy, make more efficient/effective
- Implementation
- Take existing idea, create quality (could be
released) implementation in Trimaran - Evaluate it on a set of VLIW architectures
12Class Participation
- Interaction and discussion is essential in a
graduate class - Be here
- Dont just stare at the wall
- Be prepared to discuss the material
- Have something useful to contribute
- Opportunities for participation
- Research paper discussions thoughts, comments,
etc - Saying what you think in the special interest
group meetings - Solving class problems
13Special Interest Groups
- Divide up the class into 4 focus groups
- Each group will meet at times TBD
- Identify research papers, discuss papers and
project ideas - Start SIGs about 1/3rd way through class
- 4 groups equal number of people in each group
- Control flow handling, optimization
- Analysis and optimization
- Code generation (scheduling, register allocation,
... ) - Managing the memory hierarchy
- Within each performance, code size, power, ...
14Special Interest Groups (2)
- FAQ
- Do I have to be in a group Yes
- Can I be in more than 1 group No
- Do I get to pick which group I am in Sort of
- What if I get put in a group that I do not want
to be in Tough - Do I have to go to the SIG meetings Yes
- Can I do my project with someone in another SIG
No
15Contact Info
- Office 2223 EECS
- Email mahlke_at_umich.edu
- Office hours
- Mon, Wed after class or by appointment
- Visiting office hrs
- No GSI for this class
- I dont have time to fix everyones bugs
- You will have to be independent in this class
- Read the documentation and look at the code
- Come to me when you are really stuck or confused
- Helping each other is encouraged
16Role of the Compiler
- Hardware people have to understand compilers
- No attention to compilers -gt bad processor design
- Frontend material is not what real compiler
people focus on - Parsing, syntax checking, etc Standard, mature
field - Backend is where the action is at
- How to make code run fast (approach hand coding)
- How to reduce power/energy
- How to reduce code size
- How to reduce memory stalls
- How to make use of unusual architectural features
- How to design better processors
17Superscalar Processors
- Do everything in hardware
- Sequential code comes in
- Hardware parallelizes the code on the fly
- Traditional computer architecture class
- Emphasis on Pentium class architectures
- Desktop architecture is the only thing that is
important - In this class ...
- Very Long Instruction Word architectures is the
focus - Why? Dumb hardware Smart compiler
- Burden shifted to the compiler to exploit machine
resources
18VLIW/EPIC Architectures
- Our target processor for this class is VLIW/EPIC
- EPIC Explicitly Parallel Instruction Computing
- Think of these as synonyms for this class
- Desktop
- IA-64 aka Itanium I and II, Merced, McKinley
- Embedded processors
- All high-performance DSPs are VLIW
- Why? Cost/power of superscalar, more scalability
- TI-C6x, Philips Trimedia, Starcore, ST-200
- Itanium (aka Itanic) Is it a bad idea?
19VLIW/EPIC Philosphy
- Compiler creates complete plan of run-time
execution - At what time and using what resource
- POE communicated to hardware via the instruction
set - Processor obediently follows POE
- No dynamic scheduling, out of order execution
(these second guess the compilers plan) - Compiler allowed to play the statistics
- Many types of info only available at run-time
(branch directions, locations accessed via
pointers) - Traditionally compilers behave conservatively ?
handle worst case possibility - Allow the compiler to gamble when it believes the
odds are in its favor Feedback directed
optimization - Expose microarchitecture to the compiler
- memory system, branch execution
20Defining Feature I - MultiOp
- Superscalar
- Operations are sequential
- Hardware figures out resource assignment, time of
execution - MultiOp instruction
- Set of independent operations that are to be
issued simultaneously (no sequential notion
within a MultiOp) - 1 instruction issued every cycle provides
notion of time - Resource assignment indicated by position in
MultiOp - POE communicated to hardware via MultiOps
add
sub
load
load
store
mpy
shift
branch
21Defining Feature II - Exposed Latency
- Superscalar
- Sequence of atomic operations
- Sequential order defines semantics
- Unit assumed latency (UAL)
- Each conceptually finishes before the next one
starts - EPIC non-atomic operations
- Register reads/writes for 1 operation separated
in time - Semantics determined by relative ordering of
reads/writes - Assumed latency (NUAL if gt 1 for at least one op)
- Contract between the compiler and hardware
- Instruction issuance provides common notion of
time
22UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
Time 1 2 3 4 5 6 7 8 9 10 11 12 13
NUAL
traditional
Assume load 4 cycles, add 1, mpy 3
23Other Architectural Features of VLIW/EPIC
- Add features into the architecture to support
VLIW/EPIC philosphy - Create more efficient POEs
- Expose the microarchitecture
- Play the statistics
- Register structure
- Branch architecture
- Data/Control speculation
- Memory hierarchy management
- Predicated execution
24Register Structure
- Superscalar
- Small number of architectural registers
- Rename using large pool of physical registers at
run-time - EPIC
- Compiler responsible for all resourceallocation
including registers - Rename at compile time large poolof regs
needed - Static renaming
- Modify operands explicitly
- Dynamic renaming
- Operands not explicitly modified
- Is this feature lost? NO!
Op1
r13
Op2
Op3
r13 ? r67
Op4
25Rotating Registers
iteration n RRB 7
- Overlap loop iterations
- How do you prevent register overwrite in later
iterations? - Compiler-controlled dynamic register renaming
- Rotating registers
- Each iteration writes to r13
- But this gets mapped to a different physical
register - Block of consecutive regs allocated for each reg
in loop corresponding to number of iterations it
is needed
iteration n 1 RRB 6
II
Op1
Op1
r13
Op2
r13
Op2
actual reg (reg RRB) NumRegs At end of each
iteration, RRB--
26Branch Architecture
- Branch actions
- Branch condition computed
- Target address formed
- Instructions fetched from taken, fall-through or
both - Branch itself executes
- After the branch, target of the branch is
decoded/executed - Superscalar processors use hardware to hide the
latency of all the actions - Icache prefetching
- Branch prediction Guess outcome of branch
- Dynamic scheduling overlap other instructions
with branch - Reorder buffer Squash when wrong
27EPIC Branches
- Make each action visible with an architectural
latency - No stalls
- No prediction necessary (though sometimes still
used) - Branch separated into 3 distinct operations
- 1. Prepare to branch compute target address,
prefetch instructions from likely target - Executed well in advance of branch
- 2. Compute branch condition comparison
operation - 3. Branch itself
- Branches with latency gt 1, have delay slots
- Must be filled with operations that execute
regardless of the direction of the branch
28Control/Data Speculation
if (a gt b) x u w y x z y
4 . . .
a b . . . y x z y 4
Hoist conditionally executed instructions above
the condition
Hoist loads/uses over potentially aliased stores
x u w y x z y 4 if (a gt b) .
. .
y x z y 4 . . . a b
29Predicated Execution
a b c if (a gt 0) e f g else e f
/ g h i - j
add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1
add e, f, g L2 sub h, i, j
BB1 BB1 BB3 BB3 BB2 BB4
BB1
BB2
BB3
BB4
Traditional branching code
add a, b, c if T p2 a gt 0 if T p3 a lt 0 if
T div e, f, g if p3 add e, f, g if p2 sub h, i, j
if T
BB1 BB1 BB1 BB3 BB2 BB4
BB1 BB2 BB3 BB4
p2 ? BB2 p3 ? BB3
Predicated code
30VLIW/EPIC Advantages and Disadvantages
- Advantages
- No run-time dependence checks
- No run-time scheduling decisions
- No register renaming
- Rely on the compiler to do all the work
- SIMPLER hardware, more effective (larger scope!)
- Disadvantages
- No tolerance for different or variable latencies
- No tolerance phased program behavior
- No object code compatibility
- More complex compiler
31What if I Dont Care About VLIWs?
- How do we compile for superscalars?
- How do we compile for RISCs?
- All the basic compiler analyses and
transformations are the same for all processor
types - They were developed for RISCs
- Superscalar compilers work by pretending the
processor is a VLIW - But must worry about hardware undoing what the
compiler did - Other resources to worry about (ie reorder
buffer, reserv stations, etc.) - Not all hardware features available