Title: The Last Class ''' Exam 2, Thoughts on Compiler Research
1The Last Class ...Exam 2, Thoughts on Compiler
Research
- EECS 483 Lecture 24
- University of Michigan
- Wednesday, December 10, 2003
2Grade Distribution
Max 143 Min 47.5 Mean 102 Median 104 Std.
Dev 19.7
3Observations
- This test was tougher than I thought
- Supposed to be simple questions, but ...
- 5 available expressions
- 6 loop invariant code elimination
- 9 induction variable strength reduction
- Tough questions
- 2 lots of interesting answers here ?
- 11a constructing the liveranges
- 12 no-one got all the points!
- 13 many did quite well on this
4Regrades
- Must make regrade requests by next Wednesday,
12/17 - None will be accepted after this
- Problems with adding score
- Take to anyone
- If you feel a problem was unfairly graded
- Take it to the person who graded it
- 1-5 Peter
- 6-9, 13 Scott
- 10-12 Yuan
5483 Exam 2 Awards
- 3 of the 4 top grades by undergraduates
- Honorable mention
- Stan Dimitrov
- Andrew Feret
- Top Graduate Student
- Hyunchul Park
- Top Undergraduate Student (and overall top score)
- Michael Schwartz
6Project 4 Demos
- During finals week (12/15 12/19)
- Turn in code then too
- In my office, 2223 EECS
- Signup sheet available Thursday afternoon
- Format 30 min slot
- Explain what you did on the whiteboard - What
optis, how you implemented them, why you choose
them, who did what - Demo Run benchmarks, correct execution,
execution cycles, show us some examples of the
optimizations in action, answer questions - Predetermined benchmark find examples of your
optis
7Project 4 Benchmarks
- More will be posted this weekend
- Small SPEC benchmarks
- Other media benchmarks
- Synthetic benchmarks
- Small, made up ones
- But, with lots of optimization opportunities
- Show us examples of your optis in action
8EECS 583 Preview
- Focus on VLIW/EPIC processor models
- VLIW Very Long Instruction Word
- EPIC Explicitly Parallel Instruction Computing
- IA-64 aka Itanium I and II or IPF
- Embedded processors 90 of the processors
- All high-performance embedded CPUs are VLIW
- TI-C6x, Philips Trimedia, ST LX/200
9VLIW/EPIC Philosphy
- Compiler creates complete plan of run-time
execution - At what time and using what resource
- POE communicated to hardware via the instruction
set - Processor obediently follows POE
- No dynamic scheduling (second guess the compilers
plan) - Compiler allowed to play the statistics
- Many types of info only available at run-time
(branch directions, locations accessed via
pointers) - Traditionally compilers behave conservatively
- Allow the compiler to gamble when it believes the
odds are in its favor (ie profiling) - Expose microarchitecture to the compiler
- memory system, branch execution
10Defining Feature I - MultiOp
- Superscalar
- Operations are sequential
- Hardware figures out resource assignment, time of
execution - MultiOp instruction
- Set of independent operations that are to be
issued simultaneously (no sequential notion
within a MultiOp) - 1 instruction issued every cycle provides
notion of time - Resource assignment indicated by position in
MultiOp - POE communicated to hardware via MultiOps
add
sub
load
load
store
mpy
shift
branch
11Defining Feature II - Exposed Latency
- Superscalar
- Sequence of atomic operations
- Sequential order defines semantics
- Unit assumed latency (UAL)
- Each conceptually finishes before the next one
starts - EPIC non-atomic operations
- Register reads/writes for 1 operation separated
in time - Semantics determined by relative ordering of
reads/writes - Assumed latency (NUAL if gt 1 for at least one op)
- Contract between the compiler and hardware
- Instruction issuance provides common notion of
time
12UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
traditional
NUAL
Assume load 4 cycles, add 1, mpy 3
13Other VLIW/EPIC Architectural Features
- Add features into the architecture to support
VLIW/EPIC philosphy - Create more efficient POEs
- Expose the microarchitecture
- Play the statistics
- Register structure
- Branch architecture
- Data/Control speculation
- Memory hierarchy management
- Predicated execution
14EECS 583 Topics
- 3 central focuses of the class
- Machine dependent (VLIW) optimization
- Mapping program onto real hardware
- Register allocation is 1 example of this
- Profile-guided optimization
- Control flow analysis/optimization
- Region formation traces, superblocks
- If-conversion
- Optimizing for the icache
15EECS 583 Topics (cont)
- Dataflow analysis/Opti
- Analysis of predicated code
- ILP opti (height reduction)
- Scheduling/Code generation
- Dependence edges, machine descriptions
- Instruction scheduling, modulo scheduling
- Clustering
- Managing the memory hierarchy
- Prefetching, cache bypassing, local memories
16My ResearchCompilers Creating Custom Processors
- Scott MahlkeCCCP Group Mike Chu, Nate Clark,
Ganesh Dasika, Kevin Fan, Manjunath Kudlur,
Pracheeti Nagarkar, Rajiv Ravindran, Wilkin Tang,
Hongtao Zhong
17Overview
- Traditional compiler
- Customize software to predefined hardware
- Our approach
- The compiler becomes the computer architect
CCCP Design System Compiler
Application
Customized processor
Power/perf/cost constraints
18Motivation for this Work
- Application diversity growing rapidly
- Encryption, packet routing, wireless,
signal/image processing, speech recognition - Each application domain has its own unique
- Power, performance, cost requirements
- Computation structure and memory access patterns
- One size processor does not fit all
- Generality of desktop processors costs
- Order of magnitude wins possible through
specialization, but at a loss of programmability
19Focus of the CCCP Project
- Automatic specialization of processor hardware
- Highly customized to target application
- Compiler orchestrates the design
- Deep program analysis
- Resource allocation, scheduling, code
optimization - However, we are not designing an ASIC!
- Customized, but programmable hardware
- Multiple similar applications execute
- Hardware creation is only ½ the solution
- Compiler-friendly customization
20CCCP System
Architecture Spec
object code
Extended C
Hardware Compiler
Processor Synthesizer
Retargetable Compiler
Mdes
HDL
Cost, Power
Spacewalker
Parallelism Spec
Perf
cycles
area
21Interested Students
- Needed skills
- Architecture/compiler background
- VLSI/synthesis knowledge is useful
- C programming
- For more info
- Visit http//cccp.eecs.umich.edu
- Drop by and talk with me, 2223 EECS
22Info for Undergraduates
- Grad school
- 3.4 program/CUGS Dont use these!
- Take the GRE, apply normally
- Get a taste of research (499 project)
- See if you like it
- Good way to get your foot in the door
- Looking for a job ...
- Chip companies Intel, IBM, Sun, HP, TI, AMD,
Motorola, ST, ARM, Agere - Software companies Microsoft
23Graduate Schools Beyond Michigan Compiler Related
Projects
- Illinois
- W. Hwu, S. Patel
- Princeton
- David August
- Colorado
- Dan Connors
- NCSU
- Tom Conte
- Georgia Tech
- K. Palem, S. Lee
- MIT
- Amarasinghe, Agarawal
- CMU
- T. Mowry, S. Goldstein
- Penn St.
- M. Irwin
- Wisconsin
- J. Smith
- Arizona
- R. Gupta