HLS Challenges and Proposed Solutions - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

HLS Challenges and Proposed Solutions

Description:

HLS vs Low Level Design Methodology. Size of Digital ICs grown. HDL replace ... Flexibility in reconfiguring and reusing modules. Technology independent ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 28
Provided by: creat4
Category:

less

Transcript and Presenter's Notes

Title: HLS Challenges and Proposed Solutions


1
HLS Challenges and Proposed Solutions
  • Presented by
  • Gagan Raj Gupta (2001107)
  • Madhur Gupta (2001343)

2
Overview
  • HLS vs Low level Design Methodology
  • HDL CAD Tools
  • Benefits
  • No commercial tool till now
  • Challenging Tasks
  • Handling I/O
  • Timing Challenges
  • Memory Synthesis
  • Interconnect cost control
  • FSM Delay Estimation

3
HLS vs Low Level Design Methodology
  • Size of Digital ICs grown
  • HDL replace Schematic Design
  • Tools for simulation and Synthesis come to help
  • Gives hope to chase Moores law of 10X boost in
    productivity

4
Benefits of HLS
  • Far fewer lines of code
  • Faster specification and debugging
  • Faster Simulation
  • Facilitated by lesser details
  • Flexibility in reconfiguring and reusing modules
  • Technology independent
  • Can be ported to any technology or architecture
    (by varying implementation phase)
  • Can meet a wide variety of performance, cost and
    power goals

5
Level of Success
  • Though efforts are on, No company has yet
    successfully marketed a production-worthy design
    tool flow that can produce and verify high
    quality designs from behavioral descriptions.
  • Synopsys has developed Behavioral Compiler.
  • There are some efforts in academia like SPARK etc.

6
Challenging tasks for HLS
  • Handling I/O
  • Timing Challenges
  • Memory Synthesis
  • FSM Delay Estimation

7
I/O Scheduling Modes
  • Cycle-fixed
  • cycle-by-cycle I/O behavior fixed
  • not allowed to add extra FSM states!
  • Superstate-fixed
  • extra states can be added if necessary

8
Cycle-fixed scheduling
  • wait statements mark cycle boundaries
  • I/O operations between two waits are constrained
    to be scheduled into same cycle
  • reading and writing of ports
  • requires careful analysis!
  • Non-I/O operations can be scheduled anywhere
  • arithmetic/logical operations

9
Contd
  • wait ()//S1
  • if (x)
  • p 1
  • wait ()//S2
  • else
  • p 2
  • q 1
  • r 1
  • wait ()//S3
  • if a wait occurs in one branch of a
    conditional, it should also occur in all other
    branches
  • is this reasonable?
  • why impose this?

Problems with further scheduling
10
Contd
  • Wait imposed by coding-style restriction
  • Leads to only simple actions on each transition
  • easier to generate FSM
  • New state in FSM
  • Unwanted Addition!
  • changes behavior
  • wait () //S1
  • if (x)
  • p 1
  • wait () //S2
  • else
  • p 2
  • q 1
  • wait () //S4
  • r 1
  • wait () //S3

New State
11
Mux inputs before FUs
  • If binding follows scheduling...
  • an FU may perform different operations in
    different control steps
  • MUX implied at FU input
  • MUX delay needs to be accounted for by scheduler
  • but its too early! Scheduler cannot know MUX size

12
Accounting for MUX Delays atFU Inputs
  • Solution integrate FU binding with Scheduling
  • Iterate Scheduling and Binding
  • assume some MUX size (e.g., 4x1) during
    scheduling
  • binding is constrained
  • if binding cannot find a solution, increase MUX
    size (e.g., 8x1) and reschedule
  • Results may be pessimistic
  • Worst delay taken for Muxs
  • Binding is constrained

13
MUX Inputs before Registers
  • Similar is the problem because of register
    allocation following scheduling
  • The same register may hold results from various
    FUs

14
Accounting for MUX Delays atRegister Inputs
  • Solution integrate all steps
  • Scheduling
  • FU binding
  • Register Allocation
  • assume some MUX size (e.g., 4x1) during
    scheduling
  • FU binding and register allocation are
    constrained
  • if no solution, increase MUX size (e.g., 8x1) and
    reschedule
  • Results may be pessimistic

15
In short
  • At the time of Scheduling operations are assigned
    to control steps
  • Need to know the amount of cycle time available
    to FUs
  • Wire delays are unknown at the time of scheduling
  • Fanout from FUs vary, Wire lengths are not known
  • FSM delays are unknown
  • No of states, No of status and control signals
    unknown
  • Register-FU interconnection is complex

16
Memory Synthesis
  • Motivation
  • Reduce the complexity of interconnect between
    registers and FUs
  • Modeling timing for accessing registers (useful
    during scheduling)
  • Optimizing interconnection cost and memory
    utilization cost

17
Older Approach
  • Two steps
  • Grouping the variables or registers to form
    memory modules
  • Determining the interconnection between memory
    modules and functional units
  • But Optimization is weak because there is no
    precise way to predict the result of step 2
    during step 1

18
Newer Approach
  • Start with one fictitious large memory
  • Assign the variables to the ports and ports to
    the functional units in each control step so as
    to minimize interconnection cost.
  • Partition the variables and ports to form memory
    modules, minimizing the cost of memory modules
  • Primary focus on minimizing the interconnection
    cost

19
Step 1
  • Each variable is mapped to a unique register
  • At each control step
  • Assign the active variable to port (internal
    connection)
  • Assign the ports to corresponding functional unit
    terminal (external connection)
  • Number of internal connections related to memory
    module cost.
  • Heuristics are used (e.g. commutative operations)

20
Cost of interconnection
  • E is the number of external connections
  • I is the number of internal connections
  • c is the variable parameter adjusted to reach
    minimal cost

21
Step 2
  • Construct a graph from Step 1
  • Node ? Port
  • Edge between two ports ? If there is a variable
    that is accessed through both ports
  • Divide the graph into connected components
  • Solve the problem of allocating memory modules by
    solving 2-D bin packing problem
  • Two dimensions are the number of registers and
    the number of ports in the module

22
Algorithm Flow
23
Estimating FSM Delay
  • FSM description not available before scheduling
  • Have to estimate them using high level details,
    as
  • Application size
  • Application code structure
  • Resource constraints
  • Number of variables
  • Types of operations

24
Formulation
  • Experimental verification of these suppositions
  • Slight modification in earlier formulation (
    Neeraj Singhs work) to estimate delay with more
    precision
  • Coefficients determined by the underlying
    technology

25
Correlation of identified factors with FSM
characteristics
  • Code size ?Number of states
  • Number of variables ? Number of control signals (
    increase in registers and mux size)
  • Types of operations and resource constraints?
    Increase in number of Control Signals (increase
    in number of control signals for the FUs )

26
References
  • Kelvin Morris, FPGA and Programmable Logic
    Journal
  • Extended Abstract from DAC-33 by Raul Camposano
  • Utilization of Multiport Memories in Data Path
    Synthesis Taewhan Kim and C.L.Liu, CS
    Department, UIUC
  • Lecture Slides by our teacher Dr.P.R.Panda

27
Thank You
Write a Comment
User Comments (0)
About PowerShow.com