What Is a Compiler When The Architecture Is Not Hardware - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

What Is a Compiler When The Architecture Is Not Hardware

Description:

UCB November 8, 2001 Krishna V Palem. Georgia Tech ... Krishna V Palem ... UCB November 8, 2001 Krishna V Palem. Georgia Tech. The Nature of Embedded Systems ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 39

Provided by: cre24

Learn more at: https://ptolemy.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: What Is a Compiler When The Architecture Is Not Hardware

1
What Is a Compiler When The Architecture Is Not
Hard(ware) ?

Krishna V Palem

This work was supported in part by awards from
Hewlett-Packard Corporation, IBM, Panasonic AVC,
and by DARPA under Contract No. DABT63-96-C-0049
and Grant No. 25-74100-F0944. Portions of this
presentation were given by the speaker as a
keynote at the ACM LCTES2001 and as an invited
speaker at EMSOFT01
2
Embedded Computing
Why ?
What ?
How ?
3
The Nature of Embedded Systems
4
Favorable Trends

Supported by Moores (second) law
Computing power doubles every eighteen months
Corollary cost per unit of computing halves
every eighteen months
From hundreds of millions to billions of units
Projected by market research firms (VDC) to be a
50 billion space over the next five years
High volume, relatively low per unit margin

5
Embedded Systems Desiderata
6
Timing Example
Video-On-Demand
Predictable Timing Behavior
Unpredictable Timing Behavior
7
Current Art
Vertical application domains

Meet desiderata while overcoming NRE cost hurdles
through volume
High migration inertia across applications
Long time to market

8
Subtle but Sure Hurdles

For Moores corollary to be true
Non-recurring engineering (NRE) cost must be
amortized over high-volume
Else prohibitively high per unit costs
Implies uniform designs over large workload
classes
(Eg). Numerical, integer, signal processing
Demands of embedded systems
Non uniform or application specific designs
Per application volume might not be high
High NRE costs ? infeasible cost/unit
Time to market pressure

9
The Embedded Systems Challenge
Multiple application domains

Sustain Moores corollary
Keep NRE costs down

10
Responding Via Automation
Multiple application domains
11
Three Active Approaches

Custom microprocessors
Architecture exploration and synthesis
Architecture assembly for reconfigurable computing

12
Custom Processor Implementation
Proprietary ISA, Architecture Specification
Custom Processor implementation
Proprietary Tools
Application analysis
Fabricate
Processor
Proprietary ISA
Language with custom extensions
Application
Compiler
Binary

High performance implementation
Customized in silicon for particular application
domain
O(months) of design time
Once designed, programmable like standard
processors

Tensilica, HP-ST Microelectronics approach
13
Architecture Exploration and Synthesis
The PICO Vision
Automatic synthesis of application specific
parallel / VLIW ULSI microprocessors And their
compilers for embedded computing
14
Custom Microprocessors
Application(s) define workload
Analyze
Define ISA extension (eg) IA 64
Optimizing Compiler
Define Compiler Optimizations
Design Implementation
Microprocessor (eg) Itanium
15
Application Specific Design
Applications
Single Application
Analyze
Program Analysis
Extended EPIC Compiler Technology
Library of possible implementations (Bypass ISA)
Explore and Synthesize implementations
VLIW Core Non
programmable extension
Application specific processor runs single
application
16
The Compiler Optimization Trajectory
17
What Is the Compilers Target ISA?
Compiler
Hardware

Target is a range of architectures and their
building blocks
Compiler reaches into a constrained space of
silicon
Explores architectural implementations
O(days weeks) of design time
Exploration sensitive to application specific
hardware modules
Fixed function silicon is the result
Verification NRE costs still there
One approach to overcoming time to market

Frontend and Optimizer
Superscalar
Determine Dependences
Determine Dependences
Dataflow
Determine Independences
Determine Independences
Indep. Arch.
Bind Operations to Function Units
Bind Operations to Function Units
VLIW
Bind Transports to Busses
Bind Transports to Busses
TTA
Execute
B. Ramakrishna Rau and Joseph A. Fisher.
Instruction-level parallel History overview, and
perspective. The Journal of Supercomputing,
7(1-2)9-50, May 1993.
18
Choices of Silicon
High level design/synthesis
19
Reconfigurable Computing
20
FPGAs As an Alternative Choice for Customization

Frequent (re)configuration and hence frequent
recustomization
Fabrication process is steadily improving
Gate densities are going up
Performance levels are acceptable
Amortize large NRE investments by using COTS
platform

21
Major Motivations

Poor compilation times
Lack of correspondence between standard IR and
final configurations
Place and route inherently complex
Would like to have compile times of the order of
seconds to minutes
Provide customization of data-path by automatic
analysis and optimization in software

22
Adaptive EPIC
23
Compiler-Processor Interface
ISA
Source program
format
ADD
semantics
format
Compiler
LD
semantics
Registers Exceptions..
Executable
24
Redefining Processor-Compiler Interface
Source program
Compiler
Executable
Let compiler determine the instruction sets (and
their realization on chip)
25
EPIC execution model
Record of execution
26
Adaptive EPIC execution model
ILP-1
Configured Functional units
Reconfigure datapath
ILP-2
27
What can be efficiently compiled for today?
FPGA
Complex ASIC
DPGA
RAW
RaPiD
GARP
MultiChip
CVH
TRACE (Multiscalar)
SMT
Parallelism
SuperSpeculative
EPIC/VLIW
TTA
SuperScalar
Dataflow
VECTOR
Simple Pipelined/ Embedded
Early x86
Simple ASIC
0
4
16
32
64
128-512
1K-10K
100K-1M
gt1M
Approximate instruction packet size
28
Adaptive EPIC

AEPIC EPIC processing Datapath
reconfiguration
Reconfiguration through three basic operations
add a functional unit to the datapath
remove a functional unit from the datapath
execute an operation on resident functional unit

29
AEPIC Machine Organization
Level-1 configuration cache
I-Cache
Configuration cache
CRF
F/D
Multi-context Reconfigurable Logic Array
GPR/FPR/
Array Register File
L1
L2
30
AEPIC Compilation
31
Key Tasks For AEPIC Compiler

Partitioning
Identifying code sections that convert to custom
instructions
Operation synthesis
Efficient CFU implementations for identified
partitions
Instruction selection
Multiple CFU choices for code partitions
Resource allocation
Allocation of C-cache and MRLA resources for CFUs
Scheduling
When and where to schedule adaptive component
instructions

32
Compiler Modules
Source program
Front-end processing
Partitioning
Configuration Library
High-level optimizations (EPIC core)
Mapping (Adaptive)
Machine description
Performance statistics
33
Resource allocation for configurations
Reconfigurable logic can accommodate only two
configurations simultaneously
Record of execution
B
G
RL
R
Overlapping live-range region
Processor
34
Managing Reconfigurable Resource (contd.)

Simpler version related to register allocation
problem
New parameters
No need to save to memory on spill
configurations immutable
Load costs different for different configurations
C-cache, MRLA multi-level register file
Adapted register allocation techniques to
configuration allocation
Non-uniform sizes of configurations graph
multi-coloring
Adapted Chaitins coloring based techniques

35
The Problem With Long Reconfiguration Times
36
Speculating Configuration Loads

Mask reconfiguration times!

Need to know when and where to speculate

If f1gtgtf2 do not speculate to red empty load
slot

37
Sample Compiler Topics

Configuration cache management
Power/clock rate vs. Performance tradeoff
Bit-width optimizations

38
More Generally Architecture Assembly
Applications

An ISA view
Synthesis and other hardware design off-line
Much closer to compiler optimizations implies
faster compile time

Build off-line (synthesis, place and route)
Program
Prebuilt Implementations
Compiler selects assembles and optimizes
program
Data path
Storage
Interconnect
Dynamically variable ISA Architecture
implementation
Also applicable to yield fixed implementations in
silicon

Write a Comment

User Comments (0)