Synthesis of Custom Processors based on Extensible Platforms - PowerPoint PPT Presentation

About This Presentation
Title:

Synthesis of Custom Processors based on Extensible Platforms

Description:

Previous work in ASIP design. Xtensa platform. Manual custom instruction generation procedure ... Design Compiler. Xtensa GNU Profiler. Custom Processor ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 38
Provided by: fei78
Learn more at: https://www.cecs.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Synthesis of Custom Processors based on Extensible Platforms


1
Synthesis of Custom Processors based on
Extensible Platforms
  • Fei Sun, Srivaths Ravi,
  • Anand Raghunathan and Niraj K. Jha
  • Dept. of Electrical Engineering
  • Princeton University
  • NEC Laboratories America, Inc.

2
Outline
  • SoC design constraints
  • Background
  • Previous work in ASIP design
  • Xtensa platform
  • Manual custom instruction generation procedure
  • Automatic custom instruction generation flow
  • Experimental results
  • Conclusions

3
SoC Design Constraints
  • Time to market
  • Cost
  • Performance
  • Power
  • Cost-performance trade-off
  • Flexibility

4
Comparison of Different Approaches
ASIC ASIP GPP Time to market --
Cost --Performance
--Power
--Cost-performance --Flexibility
--
Very good Good -- Very bad
5
Flexibility vs. Energy Efficiency
6
Previous Work in ASIP Design
  • ASIP architectures and overall design
    methodologies
  • Huang, 1994, Adams, 1996, Fisher, 1999,
    Kucukcakar, 1999
  • Application-specific instruction set selection
  • Choi, 1999, Gschwind, 1999, Arnold, 1999
  • Low power ASIP design
  • Kalambur, 1997, Dougherty, 1999, Ishihara,
    2000, Sami, 2001
  • Commercial offerings
  • Xtensa, ARCtangent, Jazz, SP-5flex, Carmel

7
Xtensa Architecture
TRACE Port
Instruction
JTAG Tap Control
Instruction Memory or Cache Tags
Instruction Address
On Chip Debug
Align and Decode
Interrupt Control
Branch Logic Instruction Fetch
Memory Protection Unit
Processor Interface
Window Register File
Date Memory or Cache Tags
Exception Support
Coprocessor Register File
ALU Address Generation
Processor Controls
Write Buffer
MAC 16
Base ISA Feature
Data Address
Coprocessor Execution Units
Designer Defined Instruction Execution Unit
Configurable Function
Timers 1 to n
Optional Function
Data
Special Function Register Access
Configurable Optional Function
Data Address Watch 0 to n
Extensible
Sourcewww.tensilica.com
Instruction Address Watch 0 to n
8
Xtensa Processor Design Flow
Processor Configuration Inputs
Designer-DefinedInstruction Descriptions
Configuration File
Configured GNUC/C Compiler
Configured Processor HDL
Configured GNUAssembler/Disassembler
Configured Instruction SetSimulator/Emulator
Area, Power and Timing Estimation
Application Source Code
Generator Output
Sample Application Data
Internal Database
Design data
Use of Generated Data
Sourcewww.tensilica.com
Optimized Hardware
Optimized Software
9
Manual Custom Instruction Generation Procedure
Identify potential new instructions
Profile, read source code
Slow and error-prone
Describe custom instructions
Understand source code
Insert custom instructions
Rewrite source code
Verify functional correctness
10
Contributions of Our Work
  • Automatic custom instruction selection
  • Application program to extensible processors with
    custom instructions
  • Features
  • Efficient design space search
  • Use accurate information from instruction set
    simulator and synthesis
  • Bridge the gap between automatic synthesized and
    manually designed architectures

11
Automatic Custom Instruction Generation Flow
12
Automatic Custom Instruction Generation Flow
13
Example Illustration of Template Generation
14
Example Illustration of Template Generation
15
Example Illustration of Template Generation
16
Example Illustration of Template Generation
17
Example Illustration of Template Generation
18
Key Observations for Pruning
  • Higher the weight of the template, higher the
    potential for improvement --- Amdahls law
  • Scope for optimization determined by computation
    --- No. of cycles needed for executing the
    template
  • Scope for optimization determined by read/write
    ports limitation --- Additional cycles needed for
    extra reading/writing of input/output variables

19
Pruning Algorithm
  • Ranking criterion
  • OriginalTime Fraction of the total execution
    time of the original program spent in the
    template (weight)
  • In, Out Number of inputs and outputs of the
    template, respectively
  • a, ß Number of inputs/outputs encoded in the
    instruction
  • ? No. of cycles needed for executing the
    template
  • Higher priority means greater potential for speed
    up

20
Template Generation with Pruning
Ranked pool of seed templates
Threshold 0.1
Template set
10.51
7.92
4.05
2.13
21
Template Generation with Pruning
Highest priority
Threshold 0.1
Ranked pool of seed templates
12.73
Template set
1.18
16.35
22
Template Generation with Pruning
Highest priority
Threshold 0.1
Ranked pool of seed templates
12.73
Template set
16.35
23
Template Generation with Pruning
Highest priority
Threshold 0.1
Ranked pool of seed templates
12.73
16.35
Template set
24
No. of Templates vs. Threshold Ratio
25
Automatic Custom Instruction Generation Flow
26
Automatic Custom Instruction Generation Flow
(Contd.)
27
Automatic Custom Instruction Generation Flow
(Contd.)
28
Custom Instruction Insertion
  • Care must be taken to insert custom instructions
    into appropriate places without affecting
    programs functional correctness
  • If custom instructions need extra inputs
    (outputs), care must be taken to select
    appropriate variables to write to (read from)
    user-defined registers

29
Example Illustration of Custom Instruction
Insertion
30
Example Illustration of Custom Instruction
Insertion (Contd.)
....offset t 1for (i0 ilt100 i)
j .... result offset i j....
....offset t 1for (i0 ilt100 i)
j .... result CustomInstr(i,j)
....
WUR(offset,0)
(a) (b)
31
Automatic Custom Instruction Generation Flow
32
Custom Instruction Combination Selection ---
Problem Statement
  • Given a set of non-overlapping custom
    instructions, with each instruction having
    several versions, find a version for each
    instruction such that performance is maximized
    while area is under a certain threshold

33
Custom Instruction Combination Selection --- Flow
Chart
34
Automatic Custom Instruction Generation Flow
35
Experimental Methodology
C Program
Aristotle
Automatic Custom Instruction Generation
Xtensa GNU Profiler
Xtensa TIE Compiler
Modified C program
Synopsys Design Compiler
Cross Compiler
Tensilica Processor Generator
Sente Wattwatcher
ISS
Synopsys Design Compiler
Execution Cycles
Power
Area
Clock Period
36
Experimental Results (Contd.)
Average Performance improvement 3.4X Energy
reduction 3.2X Energydelay reduction 12.6X
Area increase 1.8
37
Conclusions
  • Automatic custom instruction synthesis for ASIPs
  • Template generation/selection
  • Custom instruction insertion
  • Custom instruction combination selection
  • Experimental results
  • 3.4X average performance improvement
  • 12.6X average energydelay reduction
Write a Comment
User Comments (0)
About PowerShow.com