HLSl: HighLevel Synthesis of High Performance LatchBased Circuits - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

HLSl: HighLevel Synthesis of High Performance LatchBased Circuits

Description:

Microarchitecture, circuit style, cell design, coping with process ... Coloring of register conflict graph. Use of latch-based registers incurs extra conflicts ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 25
Provided by: swp6
Category:

less

Transcript and Presenter's Notes

Title: HLSl: HighLevel Synthesis of High Performance LatchBased Circuits


1
HLS-l High-Level Synthesis of High Performance
Latch-Based Circuits
  • Seungwhun Paik , Insup Shin
  • and Youngsoo Shin
  • Dept. of Electrical Engineering, KAIST, KOREA

2
Outline
  • Motivation main idea
  • Latch-based high-level synthesis HLS-l
  • Scheduling
  • Register allocation
  • Control synthesis
  • Optimize duty cycle
  • Experimental results
  • Conclusion

3
Motivation
  • Large performance gap between custom designs and
    ASICs
  • Microarchitecture, circuit style, cell design,
    coping with process variation, sequencing
    overhead, etc
  • Latch-based designs
  • Pros. lower sequencing overhead, transparency
    offers time borrowing
  • Cons. complicated timing analysis, more glitches

D. Chinnery et al, Closing the gap between ASIC
custom, Kluwer Academic Publishers, 2002.
4
Main Idea of HLS-l
  • Schedule operations at both edges of clock
  • Scheduling is done in a finer granularity
  • Control signals are generated per phase-step
    (p-step) basis


Control-step

4
5
Main Idea of HLS-l












Conventional scheduling
Proposed scheduling
6
Operation Delay (c-step)
  • Conventional c-step based scheduling
  • Execution delay of operation i is
  • given as of c-steps
  • Tclk clock period
  • DFU(i) max. delay of FU that computes OP i
  • Dmargin extra delay through data-path

6
7
Operation Delay (p-step)
  • P-step based scheduling
  • Execution delay of operation i is given as of
    p-steps
  • ri residual delay (ri Di mod Tclk)
  • Ttr a period of time when latches are
    transparent
  • Pi p-step where operation i is scheduled

7
8
Operation Delay (p-step)
  • ri ? 0 ? p-step based OP delay may vary
  • ri 0

8
9
P-step Based Scheduling
  • Most conventional scheduling algorithms can be
    easily extended to p-step based scheduling
  • No need to postpone scheduling of operation to
    the next p-step (even thought the delay gets
    smaller)
  • Concurrent read/write operations must be handled
    with a care



4 p-steps
3 p-steps


9
10
Register Allocation
  • Coloring of register conflict graph
  • Use of latch-based registers incurs extra
    conflicts
  • Condition 1 Input and output operands of the
    same OP that completes at transparent p-step
    (e.g., a and b)
  • Condition 2 Input and output operands of two
    different OPs that complete at the same
    transparent p-step (e.g., a and c)

a
a

-
c
b
-
b
c
Register conflict graph
10
11
Concurrent Read/Write Operation
  • Concurrent read/write operation (CRWO)
  • Operation w/ one of its input operands being the
    same as its output operand
  • Handled during operation scheduling

a
4

a
11
12
Control Synthesis
  • Generate control signals at both edges of clock
  • 1. Use a separate clock w/ twice the frequency of
    data-path clock
  • Duty cycle of data-path clock has to be fixed at
    50
  • (i.e., Ttr is fixed at 0.5Tclk)
  • Clock network power is roughly doubled
  • 2. Use dual-edge triggered flip-flops (DETFFs)

13
Dual-Edge Triggered Flip-Flop
  • A latch-mux implementation of D-type DETFF

clk
clk
clk
clk
D
Q
clk
R.P. Llopis et al, Low power, testable dual
edge triggered flip-flops, ISLPED, 1996
14
Control Synthesis Flow
  • Commercial tools do not support synthesis w/
    DETFFs
  • Control synthesis flow
  • Initially, synthesize w/ single-edge triggered
    FFs (SETFFs)
  • Substitute DETFFs for SETFFs after the synthesis
  • Check the timing of the controller at both edges
    of clock
  • Timing failure ? increase timing guardband and
    re-synthesis

15
Optimize Duty Cycle
  • Latency is affected by the selection of Ttr
  • Either too small or too large Ttr increases
    latency

15
16
A Heuristic Approach
  • Derive Ttr that minimize delay of each OP type k
  • rk Tclk/2 rk Ttr Tclk - rk
  • rk Tclk/2 rk Ttr or Ttr Tclk - rk
  • Find intersection of Ttr that minimizes delay of
    each OP type (favor OP type with higher cost)
  • Cost of OP type k costk wk occurk
  • Perform initial scheduling to find of critical
    OPs for each OP type (occurk)
  • Weight of OP type k (wk)
  • rk Tclk/2 wk 2, rk Tclk/2 wk 1

16
17
Example of Ttr Selection
  • Assume Tclk 10
  • Perform initial scheduling with Ttr 5
  • OPs on the critical path
  • One for each OP type
  • Ttr that minimize delay of each OP type
  • Addition (Di 10)
  • rk 0, no need to consider Ttr
  • Fast multiplication (Di 13)
  • rk 3 ? 3 Ttr 7
  • costk 2 1 2
  • Slow multiplication (Di 17)
  • rk 7 ? Ttr 7 or Ttr 3
  • costk 1 1 1






Ttr that minimize the latency is either 3 or 7
17
18
Example of Ttr Selection
  • Try scheduling with both Ttr 3 and Ttr 7
  • Select Ttr 3 as it results in smaller latency

18
19
Overall Design Flow
Behavior description
Physical design
VHDL analysis DFG generation
Gate-level netlist of data-path
DFG
HLS-l
Gate-level netlist of controller
success
RTL
Check timing of controller
Substitute DETFFs for SETFFs
Logic synthesis
Increase timing guardband
FU IPs
fail
20
Experimental Setting
  • Resource-constrained list scheduling
  • 10 behavioral benchmark designs
  • (23 resource constraints)
  • Tclk 8.2 ns, Ttr 2.5 ns
  • Di of addition/subtraction 8.2 ns ? 1 c-step
    (2 p-steps)
  • Di of multiplication 10.7 ns ? 2 c-steps (3
    p-steps)
  • Logic synthesis with DC _at_1.2V, 65-nm industrial
    standard library
  • Use DesignWare FUs (32-bits)

21
Latency Comparison
22
Area Comparison
  • Resource constraint of 1 ALU, 1
  • Average area reduction of 13 (9.5 for all
    benchmarks)
  • Mainly due to smaller area of latch-registers
    (24.6 less)

23
Conclusion
  • Proposed complete framework of high-level
    synthesis for latch-based circuits
  • Scheduling based on p-steps
  • Register allocation w/ extra conflict edges
  • Control synthesis using DETFFs
  • A method to optimize duty cycle
  • Results (compared with conventional HLS)
  • Latency is reduced by 3.8 c-steps (16.6)
  • Area is reduced by 9.5

24
Q AThank you for your attention
Design Technology Lab., KAIST Seungwhun Paik
(swpaik_at_dtlab.kaist.ac.kr)
Write a Comment
User Comments (0)
About PowerShow.com