HLSl: HighLevel Synthesis of High Performance LatchBased Circuits - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

HLSl: HighLevel Synthesis of High Performance LatchBased Circuits

Description:

Microarchitecture, circuit style, cell design, coping with process ... Coloring of register conflict graph. Use of latch-based registers incurs extra conflicts ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 25

Provided by: swp6

Category:

more less

Transcript and Presenter's Notes

Title: HLSl: HighLevel Synthesis of High Performance LatchBased Circuits

1
HLS-l High-Level Synthesis of High Performance
Latch-Based Circuits

Seungwhun Paik , Insup Shin
and Youngsoo Shin
Dept. of Electrical Engineering, KAIST, KOREA

2
Outline

Motivation main idea
Latch-based high-level synthesis HLS-l
Scheduling
Register allocation
Control synthesis
Optimize duty cycle
Experimental results
Conclusion

3
Motivation

Large performance gap between custom designs and
ASICs
Microarchitecture, circuit style, cell design,
coping with process variation, sequencing
overhead, etc
Latch-based designs
Pros. lower sequencing overhead, transparency
offers time borrowing
Cons. complicated timing analysis, more glitches

D. Chinnery et al, Closing the gap between ASIC
custom, Kluwer Academic Publishers, 2002.
4
Main Idea of HLS-l

Schedule operations at both edges of clock
Scheduling is done in a finer granularity
Control signals are generated per phase-step
(p-step) basis

Control-step

4
5
Main Idea of HLS-l

Conventional scheduling
Proposed scheduling
6
Operation Delay (c-step)

Conventional c-step based scheduling
Execution delay of operation i is
given as of c-steps

Tclk clock period
DFU(i) max. delay of FU that computes OP i
Dmargin extra delay through data-path

6
7
Operation Delay (p-step)

P-step based scheduling
Execution delay of operation i is given as of
p-steps

ri residual delay (ri Di mod Tclk)
Ttr a period of time when latches are
transparent
Pi p-step where operation i is scheduled

7
8
Operation Delay (p-step)

ri ? 0 ? p-step based OP delay may vary

ri 0

8
9
P-step Based Scheduling

Most conventional scheduling algorithms can be
easily extended to p-step based scheduling
No need to postpone scheduling of operation to
the next p-step (even thought the delay gets
smaller)
Concurrent read/write operations must be handled
with a care

4 p-steps
3 p-steps

9
10
Register Allocation

Coloring of register conflict graph
Use of latch-based registers incurs extra
conflicts
Condition 1 Input and output operands of the
same OP that completes at transparent p-step
(e.g., a and b)
Condition 2 Input and output operands of two
different OPs that complete at the same
transparent p-step (e.g., a and c)

a
a

-
c
b
-
b
c
Register conflict graph
10
11
Concurrent Read/Write Operation

Concurrent read/write operation (CRWO)
Operation w/ one of its input operands being the
same as its output operand
Handled during operation scheduling

a
4

a
11
12
Control Synthesis

Generate control signals at both edges of clock
1. Use a separate clock w/ twice the frequency of
data-path clock
Duty cycle of data-path clock has to be fixed at
50
(i.e., Ttr is fixed at 0.5Tclk)
Clock network power is roughly doubled
2. Use dual-edge triggered flip-flops (DETFFs)

13
Dual-Edge Triggered Flip-Flop

A latch-mux implementation of D-type DETFF

clk
clk
clk
clk
D
Q
clk
R.P. Llopis et al, Low power, testable dual
edge triggered flip-flops, ISLPED, 1996
14
Control Synthesis Flow

Commercial tools do not support synthesis w/
DETFFs
Control synthesis flow
Initially, synthesize w/ single-edge triggered
FFs (SETFFs)
Substitute DETFFs for SETFFs after the synthesis
Check the timing of the controller at both edges
of clock
Timing failure ? increase timing guardband and
re-synthesis

15
Optimize Duty Cycle

Latency is affected by the selection of Ttr
Either too small or too large Ttr increases
latency

15
16
A Heuristic Approach

Derive Ttr that minimize delay of each OP type k
rk Tclk/2 rk Ttr Tclk - rk
rk Tclk/2 rk Ttr or Ttr Tclk - rk
Find intersection of Ttr that minimizes delay of
each OP type (favor OP type with higher cost)
Cost of OP type k costk wk occurk
Perform initial scheduling to find of critical
OPs for each OP type (occurk)
Weight of OP type k (wk)
rk Tclk/2 wk 2, rk Tclk/2 wk 1

16
17
Example of Ttr Selection

Assume Tclk 10
Perform initial scheduling with Ttr 5
OPs on the critical path
One for each OP type
Ttr that minimize delay of each OP type
Addition (Di 10)
rk 0, no need to consider Ttr
Fast multiplication (Di 13)
rk 3 ? 3 Ttr 7
costk 2 1 2
Slow multiplication (Di 17)
rk 7 ? Ttr 7 or Ttr 3
costk 1 1 1

Ttr that minimize the latency is either 3 or 7
17
18
Example of Ttr Selection

Try scheduling with both Ttr 3 and Ttr 7
Select Ttr 3 as it results in smaller latency

18
19
Overall Design Flow
Behavior description
Physical design
VHDL analysis DFG generation
Gate-level netlist of data-path
DFG
HLS-l
Gate-level netlist of controller
success
RTL
Check timing of controller
Substitute DETFFs for SETFFs
Logic synthesis
Increase timing guardband
FU IPs
fail
20
Experimental Setting