EECS 583 Advanced Compilers Course Intro, Overview of VLIW Architectures PowerPoint PPT Presentation

presentation player overlay
1 / 31
About This Presentation
Transcript and Presenter's Notes

Title: EECS 583 Advanced Compilers Course Intro, Overview of VLIW Architectures


1
EECS 583 Advanced CompilersCourse Intro,
Overview of VLIW Architectures
  • Winter 2005, University of Michigan
  • January 5, 2005

2
About Me
  • Mahlke mall key
  • But just call me Scott
  • 4th year here at Michigan
  • Compiler guy who likes hardware
  • Program optimization and building custom hardware
    for high performance
  • Before this HP Labs
  • Compiler research for Itanium-like processors
  • PICO automatic design of NPAs
  • Before before Grad student at Univ of Illinois
  • Before 3 Undergrad at Illinois

3
Class Overview
  • This class is NOT about
  • Programming languages
  • Parsing, syntax checking
  • Handling advanced language features virtual
    functions,
  • Frontend transformations array dependence
    analysis,
  • Debugging
  • Simulation
  • Compiler backend
  • Mapping applications to processor hardware
  • Retargetability work for multiple platforms
    (not hard coded)
  • Work at the assembly-code level
  • Processor independent -gt Machine code
  • Speed/Efficiency
  • How to make the application run fast
  • Use less memory (text, data)

4
Background You Should Have
  • 1. Programming
  • Good C programmer (essential)
  • Linux, gcc, gdb, emacs
  • Compiler system not ported to Windows or Mac
  • 2. Computer architecture
  • EECS 370 is good, 470 is better but not essential
  • Basics caches, pipelining, function units,
    registers, virtual memory, branches, branch
    prediction, assembly code
  • 3. Compilers
  • Frontend stuff is not very relevant for this
    class
  • Basic backend stuff we will go over fast
  • Non-EECS 483 people will have to do some
    supplemental reading
  • 4. Powerpoint
  • You will have to make a presentation in this class

5
Textbook and Other Classroom Material
  • No required text Lecture notes, papers
  • Other useful material
  • Trimaran webpage http//www.trimaran.org
  • UIUC Impact webpage http//www.crhc.uiuc.edu/Impa
    ct
  • Course webpage course newsgroup
  • Will be set up by next Monday
  • Lecture notes
  • Newsgroup forum for helping each other, I will
    try to check regularly, but I wont be able to
    answer everything

6
What the Class Will be Like
  • Class meeting time 100 300, MW
  • 2 hrs is hard to handle
  • Well go for an hour, take 10 min break
  • Core backend stuff
  • Text book material some overlap with 483
  • Few homeworks to apply classroom material
  • Research papers
  • Ill present research material along the way
  • Presentations by students You guys are going to
    teach

7
What the Class Will be Like (2)
  • Learning compilers
  • No memorizing definitions, terms, formulas,
    algorithms, etc
  • Learn by doing Writing code
  • Substantial amount of programming
  • Big learning curve for Trimaran compiler
  • Reasonable amount of reading
  • Classroom
  • Attendance You should be here
  • Discussion important
  • Work out examples, discuss papers, etc
  • Each of you will teach some advanced material to
    the rest of us
  • Essential to stay caught up
  • Special interest groups smaller meetings
    outside of class where certain compiler topics
    are focused on

8
Course Grading
  • Yes, everyone will get a grade
  • Distribution of grades, scale, etc - ???
  • Most (hopefully all) will get As and Bs
  • Slackers will be obvious and will suffer
  • Components
  • Midterm exam 30
  • Project 40
  • Homeworks 15
  • Paper presentation 10
  • Class participation 5

9
Homeworks
  • Around 3 of these
  • Small/modest programming assignments
  • Design and implement something we discussed in
    class
  • Goals
  • Learn the important concepts
  • Learn the compiler infrastructure so you can do
    the project
  • Grading
  • 4/3/2/1 (almost perfect, good, so-so, didnt try
    very hard)
  • Working together is ok
  • Make sure you understand things or it will come
    back to bite you
  • For now, everyone must turn in their own
    assignment, may change this later in the semester

10
Projects
  • Design and implement an interesting compiler
    technique and demonstrate its usefulness
  • Topic/scope/work
  • 1-3 people per project
  • You will pick the topics (I have to agree)
  • Projects will be planned/organized at the SIG
    level
  • You will have to
  • Read background material
  • Plan and design
  • Implement and debug
  • Deliverables
  • Working implementation
  • Project report 5-10 page paper describing what
    you did/results
  • 30 min presentation at end (demo if you want)

11
Types of Projects
  • New idea
  • Small research idea
  • Design and implement it, see how it works
  • Extend existing idea
  • Take an existing paper, implement their technique
  • Then, extend it to do something interesting
  • Generalize strategy, make more efficient/effective
  • Implementation
  • Take existing idea, create quality (could be
    released) implementation in Trimaran
  • Evaluate it on a set of VLIW architectures

12
Class Participation
  • Interaction and discussion is essential in a
    graduate class
  • Be here
  • Dont just stare at the wall
  • Be prepared to discuss the material
  • Have something useful to contribute
  • Opportunities for participation
  • Research paper discussions thoughts, comments,
    etc
  • Saying what you think in the special interest
    group meetings
  • Solving class problems

13
Special Interest Groups
  • Divide up the class into 4 focus groups
  • Each group will meet at times TBD
  • Identify research papers, discuss papers and
    project ideas
  • Start SIGs about 1/3rd way through class
  • 4 groups equal number of people in each group
  • Control flow handling, optimization
  • Analysis and optimization
  • Code generation (scheduling, register allocation,
    ... )
  • Managing the memory hierarchy
  • Within each performance, code size, power, ...

14
Special Interest Groups (2)
  • FAQ
  • Do I have to be in a group Yes
  • Can I be in more than 1 group No
  • Do I get to pick which group I am in Sort of
  • What if I get put in a group that I do not want
    to be in Tough
  • Do I have to go to the SIG meetings Yes
  • Can I do my project with someone in another SIG
    No

15
Contact Info
  • Office 2223 EECS
  • Email mahlke_at_umich.edu
  • Office hours
  • Mon, Wed after class or by appointment
  • Visiting office hrs
  • No GSI for this class
  • I dont have time to fix everyones bugs
  • You will have to be independent in this class
  • Read the documentation and look at the code
  • Come to me when you are really stuck or confused
  • Helping each other is encouraged

16
Role of the Compiler
  • Hardware people have to understand compilers
  • No attention to compilers -gt bad processor design
  • Frontend material is not what real compiler
    people focus on
  • Parsing, syntax checking, etc Standard, mature
    field
  • Backend is where the action is at
  • How to make code run fast (approach hand coding)
  • How to reduce power/energy
  • How to reduce code size
  • How to reduce memory stalls
  • How to make use of unusual architectural features
  • How to design better processors

17
Superscalar Processors
  • Do everything in hardware
  • Sequential code comes in
  • Hardware parallelizes the code on the fly
  • Traditional computer architecture class
  • Emphasis on Pentium class architectures
  • Desktop architecture is the only thing that is
    important
  • In this class ...
  • Very Long Instruction Word architectures is the
    focus
  • Why? Dumb hardware Smart compiler
  • Burden shifted to the compiler to exploit machine
    resources

18
VLIW/EPIC Architectures
  • Our target processor for this class is VLIW/EPIC
  • EPIC Explicitly Parallel Instruction Computing
  • Think of these as synonyms for this class
  • Desktop
  • IA-64 aka Itanium I and II, Merced, McKinley
  • Embedded processors
  • All high-performance DSPs are VLIW
  • Why? Cost/power of superscalar, more scalability
  • TI-C6x, Philips Trimedia, Starcore, ST-200
  • Itanium (aka Itanic) Is it a bad idea?

19
VLIW/EPIC Philosphy
  • Compiler creates complete plan of run-time
    execution
  • At what time and using what resource
  • POE communicated to hardware via the instruction
    set
  • Processor obediently follows POE
  • No dynamic scheduling, out of order execution
    (these second guess the compilers plan)
  • Compiler allowed to play the statistics
  • Many types of info only available at run-time
    (branch directions, locations accessed via
    pointers)
  • Traditionally compilers behave conservatively ?
    handle worst case possibility
  • Allow the compiler to gamble when it believes the
    odds are in its favor Feedback directed
    optimization
  • Expose microarchitecture to the compiler
  • memory system, branch execution

20
Defining Feature I - MultiOp
  • Superscalar
  • Operations are sequential
  • Hardware figures out resource assignment, time of
    execution
  • MultiOp instruction
  • Set of independent operations that are to be
    issued simultaneously (no sequential notion
    within a MultiOp)
  • 1 instruction issued every cycle provides
    notion of time
  • Resource assignment indicated by position in
    MultiOp
  • POE communicated to hardware via MultiOps

add
sub
load
load
store
mpy
shift
branch
21
Defining Feature II - Exposed Latency
  • Superscalar
  • Sequence of atomic operations
  • Sequential order defines semantics
  • Unit assumed latency (UAL)
  • Each conceptually finishes before the next one
    starts
  • EPIC non-atomic operations
  • Register reads/writes for 1 operation separated
    in time
  • Semantics determined by relative ordering of
    reads/writes
  • Assumed latency (NUAL if gt 1 for at least one op)
  • Contract between the compiler and hardware
  • Instruction issuance provides common notion of
    time

22
UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
Time 1 2 3 4 5 6 7 8 9 10 11 12 13
NUAL
traditional
Assume load 4 cycles, add 1, mpy 3
23
Other Architectural Features of VLIW/EPIC
  • Add features into the architecture to support
    VLIW/EPIC philosphy
  • Create more efficient POEs
  • Expose the microarchitecture
  • Play the statistics
  • Register structure
  • Branch architecture
  • Data/Control speculation
  • Memory hierarchy management
  • Predicated execution

24
Register Structure
  • Superscalar
  • Small number of architectural registers
  • Rename using large pool of physical registers at
    run-time
  • EPIC
  • Compiler responsible for all resourceallocation
    including registers
  • Rename at compile time large poolof regs
    needed
  • Static renaming
  • Modify operands explicitly
  • Dynamic renaming
  • Operands not explicitly modified
  • Is this feature lost? NO!

Op1
r13
Op2
Op3
r13 ? r67
Op4
25
Rotating Registers
iteration n RRB 7
  • Overlap loop iterations
  • How do you prevent register overwrite in later
    iterations?
  • Compiler-controlled dynamic register renaming
  • Rotating registers
  • Each iteration writes to r13
  • But this gets mapped to a different physical
    register
  • Block of consecutive regs allocated for each reg
    in loop corresponding to number of iterations it
    is needed

iteration n 1 RRB 6
II
Op1
Op1
r13
Op2
r13
Op2
actual reg (reg RRB) NumRegs At end of each
iteration, RRB--
26
Branch Architecture
  • Branch actions
  • Branch condition computed
  • Target address formed
  • Instructions fetched from taken, fall-through or
    both
  • Branch itself executes
  • After the branch, target of the branch is
    decoded/executed
  • Superscalar processors use hardware to hide the
    latency of all the actions
  • Icache prefetching
  • Branch prediction Guess outcome of branch
  • Dynamic scheduling overlap other instructions
    with branch
  • Reorder buffer Squash when wrong

27
EPIC Branches
  • Make each action visible with an architectural
    latency
  • No stalls
  • No prediction necessary (though sometimes still
    used)
  • Branch separated into 3 distinct operations
  • 1. Prepare to branch compute target address,
    prefetch instructions from likely target
  • Executed well in advance of branch
  • 2. Compute branch condition comparison
    operation
  • 3. Branch itself
  • Branches with latency gt 1, have delay slots
  • Must be filled with operations that execute
    regardless of the direction of the branch

28
Control/Data Speculation
if (a gt b) x u w y x z y
4 . . .
a b . . . y x z y 4
Hoist conditionally executed instructions above
the condition
Hoist loads/uses over potentially aliased stores
x u w y x z y 4 if (a gt b) .
. .
y x z y 4 . . . a b
29
Predicated Execution
a b c if (a gt 0) e f g else e f
/ g h i - j
add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1
add e, f, g L2 sub h, i, j
BB1 BB1 BB3 BB3 BB2 BB4
BB1
BB2
BB3
BB4
Traditional branching code
add a, b, c if T p2 a gt 0 if T p3 a lt 0 if
T div e, f, g if p3 add e, f, g if p2 sub h, i, j
if T
BB1 BB1 BB1 BB3 BB2 BB4
BB1 BB2 BB3 BB4
p2 ? BB2 p3 ? BB3
Predicated code
30
VLIW/EPIC Advantages and Disadvantages
  • Advantages
  • No run-time dependence checks
  • No run-time scheduling decisions
  • No register renaming
  • Rely on the compiler to do all the work
  • SIMPLER hardware, more effective (larger scope!)
  • Disadvantages
  • No tolerance for different or variable latencies
  • No tolerance phased program behavior
  • No object code compatibility
  • More complex compiler

31
What if I Dont Care About VLIWs?
  • How do we compile for superscalars?
  • How do we compile for RISCs?
  • All the basic compiler analyses and
    transformations are the same for all processor
    types
  • They were developed for RISCs
  • Superscalar compilers work by pretending the
    processor is a VLIW
  • But must worry about hardware undoing what the
    compiler did
  • Other resources to worry about (ie reorder
    buffer, reserv stations, etc.)
  • Not all hardware features available
Write a Comment
User Comments (0)
About PowerShow.com