High-Level Synthesis: Creating Custom Circuits from High-Level Code - PowerPoint PPT Presentation

About This Presentation
Title:

High-Level Synthesis: Creating Custom Circuits from High-Level Code

Description:

High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 122
Provided by: rambo
Category:

less

Transcript and Presenter's Notes

Title: High-Level Synthesis: Creating Custom Circuits from High-Level Code


1
High-Level Synthesis Creating Custom
Circuits from High-Level Code
  • Greg Stitt
  • ECE Department
  • University of Florida

2
Existing FPGA Tool Flow
  • Register-transfer (RT) synthesis
  • Specify RT structure (muxes, registers, etc)
  • Allows precise specification
  • - But, time consuming, difficult, error prone

HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
3
Future FPGA Tool Flow?
C/C, Java, etc.
High-level Synthesis
HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
4
High-level Synthesis
  • Wouldnt it be nice to write high-level code?
  • Ratio of C to VHDL developers (100001 ?)
  • Easier to specify
  • Separates function from architecture
  • More portable
  • - Hardware potentially slower
  • Similar to assembly code era
  • Programmers could always beat compiler
  • But, no longer the case
  • Hopefully, high-level synthesis will catch up to
    manual effort

5
High-level Synthesis
  • More challenging than compilation
  • Compilation maps behavior into assembly
    instructions
  • Architecture is known to compiler
  • High-level synthesis creates a custom
    architecture to execute behavior
  • Huge hardware exploration space
  • Best solution may include microprocessors
  • Should handle any high-level code
  • Not all code appropriate for hardware

6
High-level Synthesis
  • First, consider how to manually convert
    high-level code into circuit
  • Steps
  • 1) Build FSM for controller
  • 2) Build datapath based on FSM

acc 0 for (i0 i lt 128 i) acc ai
7
Manual Example
  • Build a FSM (controller)
  • Decompose code into states

acc 0 for (i0 i lt 128 i) acc ai
acc0, i 0
if (i lt 128)
Done
load ai
acc ai
i
8
Manual Example
  • Build a datapath
  • Allocate resources for each state

acc0, i 0
if (i lt 128)
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
9
Manual Example
  • Build a datapath
  • Determine register inputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
10
Manual Example
  • Build a datapath
  • Add outputs

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
11
Manual Example
  • Build a datapath
  • Add control signals

In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai


lt

i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
12
Manual Example
  • Combine controllerdatapath

In from memory
Controller
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
1


lt

acc 0 for (i0 i lt 128 i) acc ai
Done
Memory Read
acc
Memory address
13
Manual Example
  • Alternatives
  • Use one adder (plus muxes)

In from memory
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
lt
MUX
MUX

acc
Memory address
14
Manual Example
  • Comparison with high-level synthesis
  • Determining when to perform each operation
  • gt Scheduling
  • Allocating resource for each operation
  • gt Resource allocation
  • Mapping operations onto resources
  • gt Binding

15
Another Example
  • Your turn

x0 for (i0 i lt 100 i) if (ai gt 0)
x else x -- ai
x //output x
  • Steps
  • 1) Build FSM (do not perform if conversion)
  • 2) Build datapath based on FSM

16
High-Level Synthesis
Could be C, C, Java, Perl, Python, SystemC,
ImpulseC, etc.
High-level Code
High-Level Synthesis
Custom Circuit
Usually a RT VHDL description, but could as low
level as a bit file
17
High-Level Synthesis
acc 0 for (i0 i lt 128 i) acc ai
High-Level Synthesis
18
Main Steps
High-level Code
Converts code to intermediate representation -
allows all following steps to use language
independent format.
Front-end
Syntactic Analysis
Intermediate Representation
Optimization
Determines when each operation will execute, and
resources used
Scheduling/Resource Allocation
Back-end
Maps operations onto physical resources
Binding/Resource Sharing
Controller Datapath
19
Syntactic Analysis
  • Definition Analysis of code to verify syntactic
    correctness
  • Converts code into intermediate representation
  • 2 steps
  • 1) Lexical analysis (Lexing)
  • 2) Parsing

High-level Code
Lexical Analysis
Syntactic Analysis
Parsing
Intermediate Representation
20
Lexical Analysis
  • Lexical analysis (lexing) breaks code into a
    series of defined tokens
  • Token defined language constructs

x 0 if (y lt z) x 1
Lexical Analysis
ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN,
ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1),
SEMICOLON
21
Lexing Tools
  • Define tokens using regular expressions - outputs
    C code that lexes input
  • Common tool is lex

/ braces and parentheses / "" YYPRINT
return LBRACE "" YYPRINT return RBRACE
"," YYPRINT return COMMA "" YYPRINT
return SEMICOLON "!" YYPRINT return
EXCLAMATION "" YYPRINT return LBRACKET
"" YYPRINT return RBRACKET "-"
YYPRINT return MINUS / integers 0-9
yylval.intVal atoi( yytext ) return INT
22
Parsing
  • Analysis of token sequence to determine correct
    grammatical structure
  • Languages defined by context-free grammar

Correct Programs
Grammar
x 0 y 1
x 0
Program Exp
if (a lt b) x 10
Exp Stmt SEMICOLON IF LPAREN Cond
RPAREN Exp Exp Exp
if (var1 ! var2) x 10
Cond ID Comp ID
x 0 if (y lt z) x 1
x 0 if (y lt z) x 1 y 5 t 1
Stmt ID ASSIGN INT
Comp LT NE
23
Parsing
Incorrect Programs
Grammar
x 3 5

Program Exp
Exp S SEMICOLON IF LPAREN Cond RPAREN
Exp Exp Exp
x 5
x 5
if (x5 gt y) x 2
Cond ID Comp ID
x y
S ID ASSIGN INT
Comp LT NE
24
Parsing Tools
  • Define grammar in special language
  • Automatically creates parser based on grammar
  • Popular tool is yacc - yet-another-compiler-comp
    iler

program functions 1 functions
function 1 functions function
1 function HEXNUMBER LABEL COLON
code 2
25
Intermediate Representation
  • Parser converts tokens to intermediate
    representation
  • Usually, an abstract syntax tree

Assign
x 0 if (y lt z) x 1 d 6
x
if
0
assign
cond
assign
y
z
x
lt
1
d
6
26
Intermediate Representation
  • Why use intermediate representation?
  • Easier to analyze/optimize than source code
  • Theoretically can be used for all languages
  • Makes synthesis back end language independent

C Code
Java
Perl
Syntactic Analysis
Syntactic Analysis
Syntactic Analysis
Intermediate Representation
Scheduling, resource allocation, binding,
independent of source language - sometimes
optimizations too
Back End
27
Intermediate Representation
  • Different Types
  • Abstract Syntax Tree
  • Control/Data Flow Graph (CDFG)
  • Sequencing Graph
  • Etc.
  • We will focus on CDFG
  • Combines control flow graph (CFG) and data flow
    graph (DFG)

28
Control flow graphs
  • CFG
  • Represents control flow dependencies of basic
    blocks
  • Basic block is section of code that always
    executes from beginning to end
  • I.e. no jumps into or out of block

acc0, i 0
acc 0 for (i0 i lt 128 i) acc ai
if (i lt 128)
Done
acc ai i
29
Control flow graphs
  • Your turn
  • Create a CFG for this code

i 0 while (j lt 10) if (x lt 5) y
2 else if (z lt 10) y 6
30
Data Flow Graphs
  • DFG
  • Represents data dependencies between operations

c
b
a
d


x ab y cd z x - y
-
z
y
x
31
Control/Data Flow Graph
  • Combines CFG and DFG
  • Maintains DFG for each node of CFG

acc 0 for (i0 i lt 128 i) acc ai
0
0
acc
i
acc0 i0
if (i lt 128)
acc
ai
i
1
Done
acc ai i


i
acc
32
High-Level Synthesis Optimization
33
Synthesis Optimizations
  • After creating CDFG, high-level synthesis
    optimizes graph
  • Goals
  • Reduce area
  • Improve latency
  • Increase parallelism
  • Reduce power/energy
  • 2 types
  • Data flow optimizations
  • Control flow optimizations

34
Data Flow Optimizations
  • Tree-height reduction
  • Generally made possible from commutativity,
    associativity, and distributivity

a
b
c
d
c
d
a
b






c
d
b
a
b
a
c
d






35
Data Flow Optimizations
  • Operator Strength Reduction
  • Replacing an expensive (strong) operation with
    a faster one
  • Common example replacing multiply/divide with
    shift

0 multiplications
1 multiplication
bi ai ltlt 3
bi ai 8
c b ltlt 2 a b c
a b 5
a b 13
c b ltlt 2 d b ltlt 3 a c d b
36
Data Flow Optimizations
  • Constant propagation
  • Statically evaluate expressions with constants

x 0 y x 15 z y 10
x 0 y 0 z 10
37
Data Flow Optimizations
  • Function Specialization
  • Create specialized code for common inputs
  • Treat common inputs as constants
  • If inputs not known statically, must include if
    statement for each call to specialized function

int f (int x) y x 15 return y
10
int f (int x) y x 15 return y
10
int f_opt () return 10
Treat frequent input as a constant
for (I0 I lt 1000 I) f(0)
for (I0 I lt 1000 I) f_opt(0)
38
Data Flow Optimizations
  • Common sub-expression elimination
  • If expression appears more than once, repetitions
    can be replaced

a x y . . . . . . . . . . . . b
c 25 x y
a x y . . . . . . . . . . . . b
c 25 a
x y already determined
39
Data Flow Optimizations
  • Dead code elimination
  • Remove code that is never executed
  • May seem like stupid code, but often comes from
    constant propagation or function specialization

int f (int x) if (x gt 0 ) a b 15
else a b / 4 return a
int f_opt () a b 15 return a
Specialized version for x gt 0 does not need else
branch - dead code
40
Data Flow Optimizations
  • Code motion (hoisting/sinking)
  • Avoid repeated computation

for (I0 I lt 100 I) z x y bi
ai z
z x y for (I0 I lt 100 I) bi
ai z
41
Control Flow Optimizations
  • Loop Unrolling
  • Replicate body of loop
  • May increase parallelism

for (i0 i lt 128 i) ai bi ci
for (i0 i lt 128 i2) ai bi
ci ai1 bi1 ci1
42
Control Flow Optimizations
  • Function Inlining
  • Replace function call with body of function
  • Common for both SW and HW
  • SW - Eliminates function call instructions
  • HW - Eliminates unnecessary control states

for (i0 i lt 128 i) ai f( bi, ci
) . . . . int f (int a, int b) return a b
15
for (i0 i lt 128 i) ai bi ci
15
43
Control Flow Optimizations
  • Conditional Expansion
  • Replace if with logic expression
  • Execute if/else bodies in parallel

y ab if (a) x bd else x bd
y ab x a(bd) abd
DeMicheli
Can be further optimized to
y ab x y d(ab)
44
Example
  • Optimize this

x 0 y a b if (x lt 15) z a b -
c else z x 12 output z 12
45
High-Level SynthesisScheduling/Resource
Allocation
46
Scheduling
  • Scheduling assigns a start time to each operation
    in DFG
  • Start times must not violate dependencies in DFG
  • Start times must meet performance constraints
  • Alternatively, resource constraints
  • Performed on the DFG of each CFG node
  • gt Cant execute multiple CFG nodes in parallel

47
Examples
a
b
c
d
c
d
a
b

Cycle1
Cycle1
Cycle2



Cycle2

Cycle3

Cycle3
c
d
a
b
Cycle1



Cycle2
48
Scheduling Problems
  • Several types of scheduling problems
  • Usually some combination of performance and
    resource constraints
  • Problems
  • Unconstrained
  • Not very useful, every schedule is valid
  • Minimum latency
  • Latency constrained
  • Mininum-latency, resource constrained
  • i.e. find the schedule with the shortest latency,
    that uses less than a specified of resources
  • NP-Complete
  • Mininum-resource, latency constrained
  • i.e. find the schedule that meets the latency
    constraint (which may be anything), and uses the
    minimum of resources
  • NP-Complete

49
Minimum Latency Scheduling
  • ASAP (as soon as possible) algorithm
  • Find a candidate node
  • Candidate is a node whose predecessors have been
    scheduled and completed (or has no predecessors)
  • Schedule node one cycle later than max cycle of
    predecessor
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1


Cycle2

Cycle3


Cycle4
Minimum possible latency - 4 cycles
50
Minimum Latency Scheduling
  • ALAP (as late as possible) algorithm
  • Run ASAP, get minimum latency L
  • Find a candidate
  • Candidate is node whose successors are scheduled
    (or has none)
  • Schedule node one cycle before min cycle of
    successor
  • Nodes with no successors scheduled to cycle L
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
-
lt
Cycle1
Cycle4


Cycle3
Cycle2

Cycle3


Cycle4
L 4 cycles
51
Minimum Latency Scheduling
  • ALAP (as late as possible) algorithm
  • Run ASAP, get minimum latency L
  • Find a candidate
  • Candidate is node whose successors are scheduled
    (or has none)
  • Schedule node one cycle before min cycle of
    successor
  • Nodes with no successors scheduled to cycle L
  • Repeat until all nodes scheduled

c
d
e
a
b
f
g
h
Cycle1


Cycle2

Cycle3
-

lt

Cycle4
L 4 cycles
52
Minimum Latency Scheduling
  • ALAP
  • Has to run ASAP first, seems pointless
  • But, many heuristics need the mobility/slack of
    each operation
  • ASAP gives the earliest possible time for an
    operation
  • ALAP gives the latest possible time for an
    operation
  • Slack difference between earliest and latest
    possible schedule
  • Slack 0 implies operation has to be done in the
    current scheduled cycle
  • The larger the slack, the more options a
    heuristic has to schedule the operation

53
Latency-Constrained Scheduling
  • Instead of finding the minimum latency, find
    latency less than L
  • Solutions
  • Use ASAP, verify that minimum latency less than L
  • Use ALAP starting with cycle L instead of minimum
    latency (dont need ASAP)

54
Scheduling with Resource Constraints
  • Schedule must use less than specified number of
    resources

Constraints 1 ALU (/-), 1 Multiplier
c
d
a
e
f
b
g
Cycle1
-


Cycle2

Cycle3


Cycle4

Cycle5
55
Scheduling with Resource Constraints
  • Schedule must use less than specified number of
    resources

Constraints 2 ALU (/-), 1 Multiplier
c
d
a
e
f
b
g
Cycle1
-



Cycle2


Cycle3

Cycle4
56
Mininum-Latency, Resource-Constrained Scheduling
  • Definition Given resource constraints, find
    schedule that has the minimum latency
  • Example

Constraints 1 ALU (/-), 1 Multiplier
c
d
e
a
b
f
g
Cycle1
-


Cycle2
Cycle4
Cycle3


Cycle5

Cycle6
57
Mininum-Latency, Resource-Constrained Scheduling
  • Definition Given resource constraints, find
    schedule that has the minimum latency
  • Example

Constraints 1 ALU (/-), 1 Multiplier
c
d
e
a
b
f
g
Cycle1
-


Cycle2
Cycle3
Cycle4



Cycle5
Different schedules may use same resources, but
have different latencies
58
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Assumes one type of resource
  • Basic Idea
  • Input graph, of resources r
  • 1) Label each node by max distance from output
  • i.e. Use path length as priority
  • 2) Determine C, the set of scheduling candidates
  • Candidate if either no predecessors, or
    predecessors scheduled
  • 3) From C, schedule up to r nodes to current
    cycle, using label as priority
  • 4) Increment current cycle, repeat from 2) until
    all nodes scheduled

59
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Example

c
d
a
e
f
j
b
g
k
-
-






r 3
60
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 1 - Label each node by max distance from
    output
  • i.e. use path length as priority

c
d
a
e
f
j
b
g
k
3
1
4
4
2
3
2
1
r 3
61
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 2 - Determine C, the set of scheduling
    candidates

c
d
a
e
f
j
b
g
k
C
3
1
4
4
2
3
2
Cycle 1
1
r 3
62
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 3 - From C, schedule up to r nodes to
    current cycle, using label as priority

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Not scheduled due to lower priority
2
Cycle 1
1
r 3
63
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 2

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
C
2
Cycle 2
1
r 3
64
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Step 3

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Cycle2
2
Cycle 2
1
r 3
65
Mininum-Latency, Resource-Constrained Scheduling
  • Hus Algorithm
  • Skipping to finish

c
d
a
e
f
j
b
g
k
3
Cycle1
1
4
4
2
3
Cycle2
2
Cycle3
Cycle4
1
r 3
66
Mininum-Latency, Resource-Constrained Scheduling
  • Hus is simplified problem
  • Common Extensions
  • Multiple resource types
  • Multi-cycle operation

c
d
a
b
Cycle1


-
Cycle2
/
67
Mininum-Latency, Resource-Constrained Scheduling
  • List Scheduling - (minimum latency,
    resource-constrained version)
  • Extension for multiple resource types
  • Basic Idea - Hus algorithm for each resource
    type
  • Input graph, set of constraints R for each
    resource type
  • 1) Label nodes based on max distance to output
  • 2) For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors )
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t
  • 3) Increment cycle, repeat from 2) until all
    nodes scheduled

68
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • Step 1 - Label nodes based on max distance to
    output (not shown, so you can see operations)
  • nodes given IDs for illustration purposes

2 ALUs (/-), 2 Multipliers
c
d
a
e
f
j
b
g
k

4
-
3
2


1

6

5

7
-
8
69
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
1 2,3,4 1

4
-
3
2


1

6

5

7
-
8
70
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
1 2,3,4 1

4
-
3
2


1
Cycle1

6
Candidate, but not scheduled due to low priority

5

7
-
8
71
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2


4
-
3
2


1
Cycle1

6

5

7
-
8
72
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2


4
-
3
2


1
Cycle1

6

5
Cycle2

7
-
8
73
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - minimum latency
  • For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled predecessors)
  • 4) Schedule up to Rt operations from C based on
    priority, to current cycle
  • Rt is the constraint on resource type t

2 ALUs (/-), 2 Multipliers
Candidates
c
d
a
e
f
j
b
g
k
Cycle
ALU
Mult
  • 2,3,4 1
  • 5,6 4 2
  • 7 3


4
-
3
2


1
Cycle1

6

5
Cycle2

7
-
8
74
Mininum-Latency, Resource-Constrained Scheduling
  • List scheduling - (minimum latency)
  • Final schedule
  • Note - ASAP would require more resources
  • ALAP wouldnt but in general, it would

2 ALUs (/-), 2 Multipliers
c
d
a
e
f
j
b
g
k

Cycle1
3
2


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
75
Mininum-Latency, Resource-Constrained Scheduling
  • Extension for multicycle operations
  • Same idea (differences shown in red)
  • Input graph, set of constraints R for each
    resource type
  • 1) Label nodes based on max cycle latency to
    output
  • 2) For each resource type t
  • 3) Determine candidate nodes, C (those w/ no
    predecessors or w/ scheduled and completed
    predecessors)
  • 4) Schedule up to (Rt - nt) operations from C
    based on priority, one cycle after predecessor
  • Rt is the constraint on resource type t
  • nt is the number of resource t in use from
    previous cycles
  • Repeat from 2) until all nodes scheduled

76
Mininum-Latency, Resource-Constrained Scheduling
  • Example

2 ALUs (/-), 2 Multipliers
c
d
e
a
b
f
j
g
k

Cycle1
3
2


1
Cycle2

4
6

Cycle3
5
Cycle4

Cycle5
Cycle6

7
-
Cycle7
8
77
List Scheduling (Min Latency)
  • Your turn (2 ALUs, 1 Mult)
  • Steps (will be on test)
  • 1) Label nodes with priority
  • 2) Update candidate list for each cycle
  • 3) Redraw graph to show schedule

6

-

5

3
2


4
1

8

7
-

10
9
-
11
78
List Scheduling (Min Latency)
  • Your turn (2 ALUs, 1 Mult, Mults take 2 cycles)

c
d
a
e
f
b
g


3
2


4
1

5

6
79
Minimum-Resource, Latency-Constrained
  • Note that if no resource constraints given,
    schedule determines number of required resources
  • Max of each resource type used in a single cycle

c
d
a
e
f
b
g
3 ALUs
Cycle1
-


2 Mults

Cycle2


Cycle3

Cycle4
80
Minimum-Resource, Latency-Constrained
  • Minimum-Resource Latency-Constrained Scheduling
  • For all schedules that have latency less than the
    constraint, find the one that uses the fewest
    resources

Latency Constraint lt 4
Latency Constraint lt 4
c
d
e
a
b
f
g
c
d
e
a
b
f
g
-
Cycle1


-
Cycle1



Cycle2


Cycle2


Cycle3
Cycle3


Cycle4

Cycle4
2 ALUs, 1 Mult
3 ALUs, 2 Mult
81
Minimum-Resource, Latency-Constrained
  • List scheduling (Minimum resource version)
  • Basic Idea
  • 1) Compute latest start times for each op using
    ALAP with specified latency constraint
  • Latest start times must include multicycle
    operations
  • 2) For each resource type
  • 3) Determine candidate nodes
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle
  • 5) Schedule ops with 0 slack
  • Update required number of resources (assume 1 of
    each to start with)
  • 6) Schedule ops that require no extra resources
  • 7) Repeat from 2) until all nodes scheduled

82
Minimum-Resource, Latency-Constrained
  • 1) Find ALAP schedule

c
d
e
a
b
f
g
j
k

4
-
2
3
Last Possible Cycle


1

6

5
LPC
Node
-
7
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

Latency Constraint 3 cycles
c
d
e
a
b
f
g
j
k
Cycle1
2

3


1
Cycle2

6

5
4
Cycle3
-
-
7
Defines last possible cycle for each operation
83
Minimum-Resource, Latency-Constrained
  • 2) For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Cycle
LPC
Slack
Node
Candidates 1,2,3,4
0 0 0 2
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

Initial Resources 1 Mult, 1 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 1
84
Minimum-Resource, Latency-Constrained
  • 5)Schedule ops with 0 slack
  • Update required number of resources
  • 6) Schedule ops that require no extra resources

Slack
Cycle
LPC
Node
Candidates 1,2,3,4
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

0 1 0 1 0 1 2
X
Resources 1 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
4 requires 1 more ALU - not scheduled
-
7
Cycle 1
85
Minimum-Resource, Latency-Constrained
  • 2)For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Slack
Cycle
LPC
Node
Candidates 4,5,6
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1 1 0 0

Resources 1 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 2
86
Minimum-Resource, Latency-Constrained
  • 5)Schedule ops with 0 slack
  • Update required number of resources
  • 6) Schedule ops that require no extra resources

Slack
Cycle
LPC
Node
Candidates 4,5,6
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
1 2 0 2 0 2
Resources 2 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Already 1 ALU - 4 can be scheduled
Cycle 2
87
Minimum-Resource, Latency-Constrained
  • 2)For each resource type
  • 3) Determine candidate nodes C
  • 4) Compute slack for each candidate
  • Slack current cycle - latest possible cycle

Slack
Cycle
LPC
Node
Candidates 7
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
2 2 2 0
Resources 2 Mult, 2 ALU
c
d
e
a
b
f
g
j
k

3
4
-
2


1

6

5
-
7
Cycle 3
88
Minimum-Resource, Latency-Constrained
  • Final Schedule

Required Resources 2 Mult, 2 ALU
Slack
Cycle
LPC
Node
c
d
e
  • 1
  • 1
  • 1
  • 3
  • 2
  • 2
  • 3

1 1 1
2 2 2 3
a
b
f
g
j
k
Cycle1

2
3


1
4
Cycle2

-
6

5
Cycle3
-
7
89
Other extensions
  • Chaining
  • Multiple operations in a single cycle
  • Pipelining
  • Input DFG, data delivery rate
  • For fully pipelined circuit, must have one
    resource per operation (remember systolic arrays)

c
d
a
b
e
f
Multiple adds may be faster than 1 divide -
perform adds in one cycle


/

-
90
Summary
  • Scheduling assigns each operation in a DFG a
    start time
  • Done for each DFG in the CDFG
  • Different Types
  • Minimum Latency
  • ASAP, ALAP
  • Latency-constrained
  • ASAP, ALAP
  • Minimum-latency, resource-constrained
  • Hus Algorithm
  • List Scheduling
  • Minimum-resource, latency-constrained
  • List Scheduling

91
High-level Synthesis Binding/Resource Sharing
92
Binding
  • During scheduling, we determined
  • When ops will execute
  • How many resources are needed
  • We still need to decide which ops execute on
    which resources
  • gt Binding
  • If multiple ops use the same resource
  • gtResource Sharing

93
Binding
  • Basic Idea - Map operations onto resources such
    that operations in same cycle dont use same
    resource

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALU2
Mult2
Mult1
ALU1
94
Binding
  • Many possibilities
  • Bad binding may increase resources, require huge
    steering logic, reduce clock, etc.

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALU2
Mult2
Mult1
ALU1
95
Binding
  • Cant do this
  • 1 resource cant perform multiple ops
    simultaneously!

2 ALUs (/-), 2 Multipliers

Cycle1
2
3


1
4

-
Cycle2
6

5
Cycle3

7
-
Cycle4
8
96
Binding
  • How to automate?
  • More graph theory
  • Compatibility Graph
  • Each node is an operation
  • Edges represent compatible operations
  • Compatible - if two ops can share a resource
  • I.e. Ops that use same type of resource (ALU,
    etc.) and are scheduled to different cycles

97
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5 and 6 not compatible (same cycle)
5
7
4
2 and 3 not compatible (same cycle)
3
98
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
99
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
100
Compatibility Graph

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
Note - Fully connected subgraphs can share a
resource (all involved nodes are compatible)
7
4
3
101
Compatibility Graph
  • Binding Find minimum number of fully connected
    subgraphs that cover entire graph
  • Well-known problem Clique partitioning
    (NP-complete)
  • Cliques 2,8,7,4,3,1,5,6
  • ALU1 executes 2,8,7,4
  • ALU2 executes 3
  • MULT1 executes 1,5
  • MULT2 executes 6

1
6
2
8
5
7
4
3
102
Compatibility Graph
  • Final Binding


Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
7
4
3
103
Compatibility Graph
  • Alternative Final Binding


Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
ALUs
Mults
1
6
2
8
5
7
4
3
104
Translation to Datapath
a
b
c
d
e
f
g
h
i

Cycle1
2
3


1
-
4

Cycle2
6

5
Cycle3

7
-
Cycle4
8
  1. Add resources and registers
  2. Add mux for each input
  3. Add input to left mux for each left input in DFG
  4. Do same for right mux
  5. If only 1 input, remove mux

a
f
e
g
b
d
e
i
h
c
Mux
Mux
Mux
Mux
Mult(1,5)
Mult(6)
ALU(2,7,8,4)
ALU(3)
Reg
Reg
Reg
Reg
105
Left Edge Algorithm
  • Alternative to clique partitioning
  • Take scheduled DFG, rotate it 90 degrees

2 ALUs (/-), 2 Multipliers
106
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
107
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
108
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
109
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
110
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
111
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
112
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
113
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8


7
2

5

1
Cycle7
Cycle6
Cycle4
Cycle1
Cycle5
Cycle3
Cycle2
114
Left Edge Algorithm
2 ALUs (/-), 2 Multipliers

4
  1. Initialize right_edge to 0
  2. Find a node N whose left edge is gt right_edge
  3. Bind N to a particular resource
  4. Update right_edge to the right edge of N
  5. Repeat from 2) for nodes using the same resource
    type until right_edge passes all nodes
  6. Repeat from 1) until all nodes bound

right_edge

6

3
-
8
Write a Comment
User Comments (0)
About PowerShow.com