Ph.D. Research Plan Presentation - PowerPoint PPT Presentation

About This Presentation

Title:

Ph.D. Research Plan Presentation

Description:

Validation framework (supporting tools required) Work plan ... Various I/O timeshapes, rigid or flexible. Possible to introduce pipelined functional units ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 32

Provided by: phil253

Category:

more less

Transcript and Presenter's Notes

Title: Ph.D. Research Plan Presentation

1
Ph.D. Research Plan Presentation

Anup Gangwar

2
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

3
Introduction

Why customize architectures?
General purpose computing domain Vs embedded
Customization leads to cheaper design solutions
Architectural choices for exploiting ILP
Superscalar processors
Try to extract ILP at run time, so, complex
hardware
Limited clock speeds and high power dissipation
Not suited for embedded type of applications
VLIW processors
Compiler has lot of knowledge about hardware
Compiler extracts ILP statically, so, simplified
hardware
Possible to attain higher clock speeds

4
Introduction - Problems with VLIW Processors

Complex compiler required for extracting ILP
Adequate hardware support needed for compiler
controlled execution
Code size expansion due to explicit NOPs if,
The application does not contain enough
parallelism
The compiler is not able to extract parallelism
from the application
Need for good instruction encoding and NOP
compression schemes

5
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

6
Specialization Opportunities -gt FUs
7
Specialization Opportunities -gt FUs (contd...)

Functional Unit Types
MISO or Multiple Input Single Output
MIMO or Multiple Input Multiple Output
MIMO with LD/ST or MIMOs with memory interaction
Rigid or flexible I/O timeshapes

8
Specialization Opportunities -gt Reg. File

Single register file organization doesnt scale
well
Area grows as N3
Delay grows as N3/2
Power grows as N3
where N is the no. of Functional Units connected
to the register file
Clustered VLIW architectures are the solution
Each FU can read from/write to only a subset of
registers
Data copying may increase execution latency
Powerful application analysis required to
overcome above mentioned problems

9
Specialization Opportunities -gt Reg. File
(contd...)
A Clustered VLIW Architecture
10
Specialization Opportunities -gt Interconnect

Clustering FUs together requires deciding ICN
between different clusters
between clusters and memory
Analysis of data access patterns required for
evaluating cost-performance tradeoffs
Current ASIP vendors do not offer customizable
interconnects

11
Specialization Opportunities -gt Encoding

Instruction encoding/decoding scheme affects
Code size
Object code compatibility
Branch miss prediction penalty
Hardware cost
Address specification in code size
Each UniOp is equivalent to a RISC/CISC
instruction

UniOp
UniOp
UniOp
UniOp
MultiOp
12
Specialization Opportunities -gt Encoding
(contd...)
IALU.0
IALU.1
FALU.0
BU.0
ADD
NOP
FMUL
NOP
NOPs in a MultiOp
VLIW Processor Pipeline with Instruction
Decompressor
13
Specialization Opportunities -gt Summary
14
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

15
Existing Methodologies -gt Simulation Driven
16
Task Set and Constraints
Architecture Description
Application Parameter Extraction
Architecture Design Space Exploration
Retargetable Compiler
Instruction Encoding Specialization
Validation (Simulation with encoded instructions)
Architecture Description (Output to synthesizer)
VLIW ASIP Synthesis Methodology
17
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

18
Validation Framework -gt Trimaran
C Program
Bridge Code
IMPACT

ANSI C Parsing
Code profiling
Classical machine independent optimizations
Block formation

ELCOR

Machine dependent code optimizations
Code scheduling
Register allocation

ELCOR IR
SIMULATOR Generator
Generated Simulator (Statistics)

ELCOR IR to low level C files
HPL-PD virtual machine
Cache simulation
Performance statistics

Compute and stall cycles
Cache stats
Spill code info

HMDES Machine Description
19
Validation Framework -gt Trimaran (contd...)
REBEL
Low level C files
C libraries
Emulation Library
Code Processor
HMDES
Native Compiler
Executable for the host platform
20
Validation Framework -gt Retargetable Assembler
Instruction Encoding Description
Toolkit Generator
Generated Assembler
Assembly Instructions
Object Code
To Simulator (for simulation with encoded
instructions)
21
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

22
Work Plan -gt Interconnect/RF/FU Specialization

Initially model the interconnect problem as ILP
and later on move to other solutions
Code selection problem in compilers is similar to
identifying compute intensive parts for AFUs
No. and type of FUs has not been properly
explored
RF clustering problem has not been dealt with
elsewhere
Jacome et. al.
Deal with Interconnect/RF/FU specialization
simultaneously
Operation chaining is not considered

23
Work Plan -gt Encoding/Decoding Specialization

Goal is to be able to generate encoding schemes
automatically
Work of Shail Aditya et. al.
Basically a parameterized encoding scheme
Techniques especially for HPL-PD architecture
Do not talk of dynamic code size minimization
Encoding template is fixed exploration limited
only to within the template design space
Various encoding templates need to be explored,
also the template itself may be derived from
application
Dynamic code size minimization needs to be
considered

24
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

25
Work Status -gt Specialized FUs in Trimaran

Modeling MISOs
Model as external function calls
Replace in Trimaran bridge code and replace with
AFU op
Model new AFU in MDES with the required ops
Introduce the semantics in simulator op
definitions file
Modeling MIMOs
Model as external function calls returning voids
Replace in Trimaran bridge code and replace with
AFU op
Explicitly reserve registers in C-code for
returning values
Introduce operation semantics in simulator op
definition file

26
Work Status -gt Specialized FUs in Trimaran
(contd...)

Modeling MIMOs with LD/ST
Model as regular MIMOs
Memory interaction with block LD/ST at beginning
and end of execute cycles
Additionally
Possible to impose register file constraints
Various I/O timeshapes, rigid or flexible
Possible to introduce pipelined functional units

27
Work Status -gt Instruction Enc. in Trimaran
28
Work Status -gt Instruction Enc. in Trimaran
(contd...)

New Jersey Machine Code Toolkit (NJMC)
Deals with bits at symbolic level
Can be used to write assemblers/disassemblers
Specification in SLED (Specification Language for
Encoding/Decoding)
Model instruction decompressor in HMDES
Instrument ELCOR to generate assembly code
Encoding is done using procedures generated by
NJMC
Problems with NJMC
VLIW instruction need to be broken up into 32 bit
tokens
Encoded instructions must end on 8 bit boundary

29
Work Status -gt Code Gen. for Clustered ASIPs

ELCOR
Disadvantages
ELCOR is heavily oriented towards HPL-PD
architecture
Does not support clustered VLIW architecture
Advantages
Strong optimizing compiler
Rich library to deal with the IR
IMPACT compiler system offers another choice for
building a backend
Feasibility study being carried out to fix a
particular direction of work

30
Presentation Outline

Introduction and motivation
Specialization opportunities in VLIW processors
Methodology
Validation framework (supporting tools required)
Work plan
Status of work
References

31
References

Bhuvan Middha, Varun Raj, Anup Gangwar, M.
Balakrishnan, Anshul Kumar and Paolo Ienne, A
Trimaran based framework for exploring design
space of VLIW ASIPs with coarse grain FUs,
ISSS-2002.
Anup Gangwar, M. Balakrishnan and Anshul Kumar,
A framework for studying the effect of VLIW
processor instruction encoding and decoding
schemes, Mini Project, Dept. of CSE.
M. Jacome and G. de. Veciana, Design challenges
for new application specific processors, IEEE
Design and Test of Computers-2000.
B. Ramakrishna Rau and Michael S. Schlansker,
Embedded computer architecture and automation,
IEEE Computer-2001
Michael S. Schlansker and B. Ramakrishna Rau,
EPIC An architecture for instruction-level
parallel processors, HPCA-2000.
N. G. Busa, A. van der Werf and M. Bekooij,
Scheduling coarse grain operations for VLIW
processors, ASPDAC-1998.
Shail Aditya, Scott A. Mahlke and B. Ramakrishna
Rau, Code size minimization and retargetable
assembly for custom EPIC and VLIW processors,
ISSS-1999.