Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions


1
Exploiting Forwarding to Improve Data Bandwidth
of Instruction-Set Extensions
  • Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra
    School of Computing, National University of
    Singapore ramkumar, liuhb, tulika_at_comp.nus.edu.s
    g

Presented by Alex Oumantsev
2
Exploiting Forwarding to Improve Data Bandwidth
of Instruction-Set Extensions
  • Introduce the material
  • Related Work
  • Proposed Architecture
  • Compilation Toolchain
  • Experimental Evaluation
  • Conclusion

3
Application-Specific instruction-set extensions
(Custom Instructions)
  • Extend the instruction-set architecture
  • Balance performance and time-to-market
  • Frequently used computation patterns
  • Custom Functional Units
  • Parallelization and chaining of operations
  • Processor Support RISC-style
  • Altera Nios-II
  • Tensilica Xtensa

4
Base Processor Custom Instruction mismatch
  • RISC-style
  • Fixed-length instructions
  • Two input operations per instruction
  • Custom Instructions
  • Complex
  • Multiple inputs per operation

5
Number of Inputs per Custom Instruction
6
Data Forwarding
  • Present on a typical RISC processor
  • Register Bypassing
  • Supplies data to a Functional Unit from buffer
  • Resolves Data hazards between instructions
  • Input operands for Custom Instruction
  • Use existing Logic

7
Related Work
  • Design Space Exploration
  • Data Bandwidth
  • Nios-II Internal Register Files
  • Extra cycles wasted on explicit MOV
  • MicroBalaze Xilinx Fast Simplex Link
  • put and get instructions
  • Relaxing register file port constraints
  • Fixed length instruction problem

8
Proposed Architecture
  • MIPS-like 5 stage pipeline

9
Data Forwarding
  • CUST instruction draws 2 inputs from
    Forwarding
  • Able to take up to 4 inputs
  • Modification Do not read from Register in ID
    if Forwarding

10
Instruction Encoding
  • Transparent to regular instructions
  • Minimize number of bits for operands
  • NIOS-II Example
  • Use 11 bits of OPX field
  • OPD defines operands from forwarding
  • COP specifies the custom instruction

11
Predictable Forwarding
  • Two prior instructions can be used
  • Problems with Multicycle and Cache Miss
  • Create bubbles in the pipeline
  • Cant rely on forwarding
  • Modify to send Stall signal to all stages
  • Pauses the pipeline till ready
  • No need for NOP instruction

12
Multicycle Delays
13
Cache Miss Delays
14
Compilation Toolchain
  • Compiler cooperation needed
  • Determine if operand can be forwarded
  • Encode custom instruction correctly
  • Schedule to maximize forwarding

15
Compilation Toolchain
  • IR Scheduling
  • Pattern Identification
  • Identify all possible patterns for custom
    instructions
  • Pattern Selection
  • Heuristic pattern Priorityspeedup frequency
  • Instruction Scheduling
  • Find optimal scheduling with forwarding
  • Forwarding Check and MOV Insertion
  • Insert MOV from x reg to x reg if needed

16
Experimental Evaluation
  • SimpleScalar tool set used
  • Constraint of max 4 inputs and one output
  • Selected benchmarks

17
Speedup
  • Speedup (CycleOrigin / CycleEx -1)100
  • Ideal 4 Read Ports from Registers
  • Forwarding Discussed solution (may have MOV)
  • MOV Nios-II implemented solution (forces MOV)

18
Energy Consumption
  • Energy used by Registers
  • Ideal 4 Read Ports from Registers
  • Forwarding Discussed solution (may have MOV)
  • MOV Nios-II implemented solution (forces MOV)

19
Conclusion
  • Compiler modification
  • Minor pipeline modification
  • Data Forwarding used for MISO custom instructions
  • Overcome limited register ports
  • Compatible instruction encoding
  • Near-ideal speedup
Write a Comment
User Comments (0)
About PowerShow.com