What Is a Compiler When The Architecture Is Not Hardware - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

What Is a Compiler When The Architecture Is Not Hardware

Description:

UCB November 8, 2001 Krishna V Palem. Georgia Tech ... Krishna V Palem ... UCB November 8, 2001 Krishna V Palem. Georgia Tech. The Nature of Embedded Systems ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 39
Provided by: cre24
Category:

less

Transcript and Presenter's Notes

Title: What Is a Compiler When The Architecture Is Not Hardware


1
What Is a Compiler When The Architecture Is Not
Hard(ware) ?
  • Krishna V Palem

This work was supported in part by awards from
Hewlett-Packard Corporation, IBM, Panasonic AVC,
and by DARPA under Contract No. DABT63-96-C-0049
and Grant No. 25-74100-F0944. Portions of this
presentation were given by the speaker as a
keynote at the ACM LCTES2001 and as an invited
speaker at EMSOFT01
2
Embedded Computing
Why ?
What ?
How ?
3
The Nature of Embedded Systems
4
Favorable Trends
  • Supported by Moores (second) law
  • Computing power doubles every eighteen months
  • Corollary cost per unit of computing halves
    every eighteen months
  • From hundreds of millions to billions of units
  • Projected by market research firms (VDC) to be a
    50 billion space over the next five years
  • High volume, relatively low per unit margin

5
Embedded Systems Desiderata
6
Timing Example
Video-On-Demand
Predictable Timing Behavior
Unpredictable Timing Behavior
7
Current Art
Vertical application domains
  • Meet desiderata while overcoming NRE cost hurdles
    through volume
  • High migration inertia across applications
  • Long time to market

8
Subtle but Sure Hurdles
  • For Moores corollary to be true
  • Non-recurring engineering (NRE) cost must be
    amortized over high-volume
  • Else prohibitively high per unit costs
  • Implies uniform designs over large workload
    classes
  • (Eg). Numerical, integer, signal processing
  • Demands of embedded systems
  • Non uniform or application specific designs
  • Per application volume might not be high
  • High NRE costs ? infeasible cost/unit
  • Time to market pressure

9
The Embedded Systems Challenge
Multiple application domains
  • Sustain Moores corollary
  • Keep NRE costs down

10
Responding Via Automation
Multiple application domains
11
Three Active Approaches
  • Custom microprocessors
  • Architecture exploration and synthesis
  • Architecture assembly for reconfigurable computing

12
Custom Processor Implementation
Proprietary ISA, Architecture Specification
Custom Processor implementation
Proprietary Tools
Application analysis
Fabricate
Processor
Proprietary ISA
Language with custom extensions
Application
Compiler
Binary
  • High performance implementation
  • Customized in silicon for particular application
    domain
  • O(months) of design time
  • Once designed, programmable like standard
    processors

Tensilica, HP-ST Microelectronics approach
13
Architecture Exploration and Synthesis
The PICO Vision
Automatic synthesis of application specific
parallel / VLIW ULSI microprocessors And their
compilers for embedded computing
14
Custom Microprocessors
Application(s) define workload
Analyze
Define ISA extension (eg) IA 64
Optimizing Compiler
Define Compiler Optimizations
Design Implementation
Microprocessor (eg) Itanium
15
Application Specific Design
Applications
Single Application
Analyze
Program Analysis
Extended EPIC Compiler Technology
Library of possible implementations (Bypass ISA)
Explore and Synthesize implementations
VLIW Core Non
programmable extension
Application specific processor runs single
application
16
The Compiler Optimization Trajectory
17
What Is the Compilers Target ISA?
Compiler
Hardware
  • Target is a range of architectures and their
    building blocks
  • Compiler reaches into a constrained space of
    silicon
  • Explores architectural implementations
  • O(days weeks) of design time
  • Exploration sensitive to application specific
    hardware modules
  • Fixed function silicon is the result
  • Verification NRE costs still there
  • One approach to overcoming time to market

Frontend and Optimizer
Superscalar
Determine Dependences
Determine Dependences
Dataflow
Determine Independences
Determine Independences
Indep. Arch.
Bind Operations to Function Units
Bind Operations to Function Units
VLIW
Bind Transports to Busses
Bind Transports to Busses
TTA
Execute
B. Ramakrishna Rau and Joseph A. Fisher.
Instruction-level parallel History overview, and
perspective. The Journal of Supercomputing,
7(1-2)9-50, May 1993.
18
Choices of Silicon
High level design/synthesis
19
Reconfigurable Computing
20
FPGAs As an Alternative Choice for Customization
  • Frequent (re)configuration and hence frequent
    recustomization
  • Fabrication process is steadily improving
  • Gate densities are going up
  • Performance levels are acceptable
  • Amortize large NRE investments by using COTS
    platform

21
Major Motivations
  • Poor compilation times
  • Lack of correspondence between standard IR and
    final configurations
  • Place and route inherently complex
  • Would like to have compile times of the order of
    seconds to minutes
  • Provide customization of data-path by automatic
    analysis and optimization in software

22
Adaptive EPIC
23
Compiler-Processor Interface
ISA
Source program
format
ADD
semantics
format
Compiler
LD
semantics
Registers Exceptions..
Executable
24
Redefining Processor-Compiler Interface
Source program
Compiler
Executable
Let compiler determine the instruction sets (and
their realization on chip)
25
EPIC execution model
Record of execution
26
Adaptive EPIC execution model
ILP-1
Configured Functional units
Reconfigure datapath
ILP-2
27
What can be efficiently compiled for today?
FPGA
Complex ASIC
DPGA
RAW
RaPiD
GARP
MultiChip
CVH
TRACE (Multiscalar)
SMT
Parallelism
SuperSpeculative
EPIC/VLIW
TTA
SuperScalar
Dataflow
VECTOR
Simple Pipelined/ Embedded
Early x86
Simple ASIC
0
4
16
32
64
128-512
1K-10K
100K-1M
gt1M
Approximate instruction packet size
28
Adaptive EPIC
  • AEPIC EPIC processing Datapath
    reconfiguration
  • Reconfiguration through three basic operations
  • add a functional unit to the datapath
  • remove a functional unit from the datapath
  • execute an operation on resident functional unit

29
AEPIC Machine Organization
Level-1 configuration cache
I-Cache
Configuration cache
CRF
F/D
Multi-context Reconfigurable Logic Array
GPR/FPR/
Array Register File
L1
L2
30
AEPIC Compilation
31
Key Tasks For AEPIC Compiler
  • Partitioning
  • Identifying code sections that convert to custom
    instructions
  • Operation synthesis
  • Efficient CFU implementations for identified
    partitions
  • Instruction selection
  • Multiple CFU choices for code partitions
  • Resource allocation
  • Allocation of C-cache and MRLA resources for CFUs
  • Scheduling
  • When and where to schedule adaptive component
    instructions

32
Compiler Modules
Source program
Front-end processing
Partitioning
Configuration Library
High-level optimizations (EPIC core)
Mapping (Adaptive)
Machine description
Performance statistics
33
Resource allocation for configurations
Reconfigurable logic can accommodate only two
configurations simultaneously
Record of execution
B
G
RL
R
Overlapping live-range region
Processor
34
Managing Reconfigurable Resource (contd.)
  • Simpler version related to register allocation
    problem
  • New parameters
  • No need to save to memory on spill
  • configurations immutable
  • Load costs different for different configurations
  • C-cache, MRLA multi-level register file
  • Adapted register allocation techniques to
    configuration allocation
  • Non-uniform sizes of configurations graph
    multi-coloring
  • Adapted Chaitins coloring based techniques

35
The Problem With Long Reconfiguration Times
36
Speculating Configuration Loads
  • Mask reconfiguration times!
  • Need to know when and where to speculate
  • If f1gtgtf2 do not speculate to red empty load
    slot

37
Sample Compiler Topics
  • Configuration cache management
  • Power/clock rate vs. Performance tradeoff
  • Bit-width optimizations

38
More Generally Architecture Assembly
Applications
  • An ISA view
  • Synthesis and other hardware design off-line
  • Much closer to compiler optimizations implies
    faster compile time

Build off-line (synthesis, place and route)
Program
Prebuilt Implementations
Compiler selects assembles and optimizes
program
Data path
Storage
Interconnect
Dynamically variable ISA Architecture
implementation
Also applicable to yield fixed implementations in
silicon
Write a Comment
User Comments (0)
About PowerShow.com