Instruction Generation For Hybrid Reconfigurable Architectures - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Instruction Generation For Hybrid Reconfigurable Architectures

Description:

... Architectures. Philip Brisk, Adam Kaplan, Ryan Kastner*, Majid ... and increasing performance for the PipeRench Architecture (Goldstein et al. 2000) ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 28
Provided by: ValuedSony
Category:

less

Transcript and Presenter's Notes

Title: Instruction Generation For Hybrid Reconfigurable Architectures


1
Instruction Generation For Hybrid Reconfigurable
Architectures
  • Philip Brisk, Adam Kaplan, Ryan Kastner, Majid
    Sarrafzadeh
  • Computer Science Department, UCLA
  • ECE Department, UCSB
  • October 11, 2002
  • CASES
  • Grenoble, France

2
Outline
  • What is Instruction Generation?
  • Related Work
  • Sequential and Parallel Templates
  • The Algorithm
  • Experimental Setup
  • Experimental Results
  • Conclusion and Future Work

3
Instruction Generation
  • Given a set of applications, what computations
    should be customized?

Customized (Hard/Soft) Macro in PLD
customized?
Application Specific Instruction set Processor
ALU
Register Bank
Customized Macros
Control
  • Main Objective complex, commonly occurring
    computation patterns
  • Look for computational patterns at the
    instruction level
  • Basic operation is add, multiply, shift, etc.

4
Customization and Performance
  • A customized instruction must offer some
    measurable performance increase.
  • In this work, we have categorized two types of
    customized instructions and quantified the
    performance that they offer us.
  • Sequential Instructions
  • Savings could come from either instruction fetch
    reduction or datapath optimization. (e.g.
    ADD-ADD converted to 3-input ADDER)
  • Parallel Instructions
  • Given multiple ALUs and data paths, allow data
    independent instructions to be computed
    simultaneously.

5
Problem Definition
  • Determining customized functionality transforms
    to regularity extraction
  • Regularity Extraction - find common
    sub-structures (templates) in one or a collection
    of graphs
  • Each application can be specified by collection
    of graphs (CDFGs)
  • Templates are implemented as customized
    instructions
  • Related problem Instruction Selection

6
What Is Instruction Generation?
The Instruction Selection Problem
R1 ? Mfp a R2 ? Ti 4 R1 ? R1 R2 R2 ?FP
X MR1 ? MR2
Templates given as inputs. How do we determine
templates?
7
What Is Instruction Generation?
The Alternative Instruction Generation
  • Reconfigurable architectures allow us to rethink
    the assumptions underlying our notion of
    instruction selection.
  • The target machine language can be changed by
    reconfiguring the FPGA to implement new
    instructions.
  • This presents new challenges for mapping IR to
    machine language.
  • We propose a scheme by which this mapping could
    be obtained at compile time.

8
What Is Instruction Generation?
Instruction Generation Applications to CAD and
Embedded System Design
  • Template Generation plays a role in the
    interaction between compilation and high-level
    synthesis.
  • Each template corresponds to a resource which
    must be provided by the underlying architecture.
  • A high-level synthesis tool can then allocate
    resources and schedule the operations on these
    resources.
  • This work investigates the latency-area tradeoff
    created by instruction generation.

9
Related Work
  • Similar techniques have proven beneficial in
    reducing area and increasing performance for the
    PipeRench Architecture (Goldstein et al. 2000)
  • Corazao et. Al have shown that well matched,
    regular templates can have a significant positive
    impact on critical path delay and clock speed
  • Kastner et al. (ICCAD02) formulated an algorithm
    for template matching as well as template
    generation for hybrid reconfigurable systems

10
Our Model of ComputationControl Data Flow Graphs
  • if (cond1) bb1()
  • else bb2()
  • bb3()
  • switch (test1)
  • case c1 bb4() break
  • case c2 bb5() break
  • case c3 bb6() break
  • bb7()

bb basic block
11
Instruction Generation
  • The basic idea an iterative process whereby we
    examine dataflow graphs and cluster combinations
    of nodes that occur frequently.
  • Ideally, we want large templates that occur
    often.
  • Sequential Template Generation Identifies
    templates where the IR operations have data
    dependencies between them.
  • Parallel Template Generation Identifies
    dataflow operations that may be scheduled in
    parallel.

12
Sequential Template Generation
  • Algorithm designed Kastner et al. ICCAD 2001.
  • Basic idea is to examine each edge in the DFG.
    The type of edge can be represented by an ordered
    pair consisting of the starting and ending node
    types.
  • Maintain a count for each edge type.
  • Cluster the most frequently occurring edge by
    replacing both vertices (head and tail) with a
    super-vertex maintaining the original vertices in
    an internal DAG.

13
Sequential Template Generation
14
Parallel Template Generation
  • Instead of examining DFG edges, we must determine
    whether pairs of computations can be scheduled in
    parallel.
  • We introduce a data structure called the
    All-Pairs Common Slack Graph (APCSG) to help us
    with this analysis.
  • APCSG edges are placed between nodes that could
    possibly be scheduled together.
  • Two nodes can be scheduled at the same time if
    they share common slack between them.

15
All Pairs Common Slack Graph (APCSG)
  • Common Slack the total number time steps that
    two operations x and y could be scheduled using
    by some scheduling heuristic.
  • APCSG undirected graph
  • Nodes correspond to operations
  • Edges represent the common slack between every
    operation

16
All-Pairs Common Slack Graph (Example)
17
Parallel Template Generation Algorithm
1. Given A Labeled Digraph G(V,E) 2. T is a
set of template types 3. T ? 4. while not
stop_conditions_met(G) I. APCSG
?create_apcsg(G) II. T ?determine_template_candid
ates(APCSG) III. cluster_vertices(G,T)
18
Parallel Template Generation
19
Stopping Conditions
  • So when should we stop clustering a graph?
  • Aside from pragmatic arguments, a correct
    stopping condition is essential if we are to
    prove that our template generation algorithm is
    optimal based on some criteria.

20
Stopping Criteria We Have Considered
Stopping Criteria We Have Used
  • Percentage of Nodes covered
  • Number of nodes left in the graph
  • Ratio of the number of nodes in a graph before
    and after clustering
  • Number of unique template types exceed a given
    threshold
  • Templates Exceed a Given Size
  • Percentage of overall slack lost in the graph
    over an iteration.
  • Template sizes are restricted to be lt 5 nodes
    total.
  • The algorithm stops when the total number of
    nodes is less than half of what was started
    with...

21
Scheduling Constraints
SCHEDULER


ALU1
CLK
1
2

Essentially, we have scheduled our operations at
the compiler level. What kind of job did we do?
22
Measuring The Damage
  • Length Of Schedule
  • The latency of all the operations
  • Ideally we want it short.
  • We must measure resulting clustered DAGs
  • Original, non-clustered DAG
  • Sequential Templates Only
  • Sequential and Parallel Templates

23
Experimental Setup
COMPILER IR (SUIF)
Sequential Template Generation Algorithm
Data Flow Graph and DAG Generation from a CDFG
pass
CO - COMPILER
A High Level Synthesis Tool Using A
Locally-Optimal Geometric Scheduling Algorithm
24
Benchmarks
  • CONVOLUTION Image convolution algorithm.
  • DeCSS Algorithm for breaking DVD encryption
  • DES The cryptographic symmetric encryption
    standard for over 20 years.
  • Rijndael AES The new advanced encryption
    standard.

25
Experimental Procedure
  • First, we compiled the program to the SUIF IR
    using the front end built by The Portland Group
    and Stanford University.
  • Next, we converted the SUIF IR to CDFG form
  • Then, we performed template generation on each
    basic block for each program.
  • We selected 4 large dataflow graphs from each
    program to schedule and evaluate our result.
  • We scheduled the dataflow graphs following
    template generation and and compared them to the
    original graphs.

26
Results
27
Conclusion And Future Work
  • The sequential template generation algorithm can
    be expanded to accommodate parallel templates.
  • Parallel template generation reduces latency at
    the expense of slack and area.
  • In the future, we plan to repeat these
    experiments
  • with a more realistic architecture description
  • with ability to cross-schedule parallel
    instructions
  • We also plan to explore compiler transformations,
    such as function inlining, to
  • extract even more regularity
  • determine a more global view of the program
Write a Comment
User Comments (0)
About PowerShow.com