Title: Memory Efficient Software Synthesis from Dataflow Graph
1Memory Efficient Software Synthesis from Dataflow
Graph
- Wonyong Sung, Junedong Kim, Soonhoi Ha
- Codesign and Parallel Processing Lab.
- Seoul National University
2Contents
- Introduction
- Code Generation from Block Diagram Specification
- Synchronous Data Flow and Single Appearance
Schedule - Proposed Strategies
- Optimization 1 code sharing optimization
- Optimization 2 minimize buffer requirement
- Experiments
- Conclusions
3Introduction
- Motivations
- Embedded system has limited amount of memory
- large program memory cost, performance
penalty, power consumption - New trend of software development high level
design methodology - growing complexity, fast design turn-around
time, limited budget, etc. - Goal of Research
- Reduce the code and data size of automatically
generated software - In an automatic software synthesis environment
- Specification Dataflow graph with
SDF(Synchronous DataFlow) semantics
4Software Synthesis from SDF graph
main() for(i0ilt6i)A for(i0ilt4i)B
for(i0ilt3i)C for(i0ilt2i)D main(
) for(i0ilt2i) for(j0jlt3j)A for(
j0jlt2j)B for(i0ilt3i)C for(i0i
lt2i)D
B
1
3
2
2
A
D
3
1
2
C
2
Possible Schedules AABCABACDABABCD
(6A)(4B)(3C)(2D) (2(3A2B))(3C)(2D)
Single Appearance Schedule (SAS)
5Previous Efforts
- Single Appearance Schedule (SAS) APGAN,RPMC
- by Battacharyya et. al. in Ptolemy Group
- SAS guarantees the minimum code size (without
code sharing) - APGAN,RPMC heuristics to find data minimized
SAS schedule - ILP formulation for data memory minimization
- by Ritz et. al. in Meyr Group
- flat single appearance schedule sharing of data
buffer - Rate optimal compile time schedule
- by Govindarajan et. al. in Gao Group
- tried to minimize the buffer requirement using
linear programming - An algorithm to compute the smallest data buffer
size - by Ade et. al. in GRAPE group
6Proposed Strategies
- Coding style
- not stuck to one coding style, hybrid approach
- generated code is a mixture of inlines and
functions - Optimization 1 Code Sharing
- Multiple instances of a same kernel treated as
different node in SAS - Code sharing optimization has gain(block size)
and cost(context size) - Optimization 2 Schedule Adjustment
- give up single appearance schedule to reduce the
data size - (1) represents schedule information with BTLC
data structure - (2) find possible location for adjustment
- (3) schedule adjustment
7Flowchart of Optimization Procedure
Get SAS schedule RPMC,APGAN
code-block size context size
Code sharing optimization
Schedule Adjustment
C code generation
8Example of Code Sharing (CD2DAT)
ramp
sine
?
fir1
fir2
fir3
fir4
xgraph
ramp
sine
xgraph
Code before sharing for(int i0ilt2i)
/ code for fir1 / out
tapinputi / code for fir
2 / ..
Code after sharing for(int i0ilt2i)
fir(1) for(int i0ilt3i) fir(2) void
fir(int context) context_FIRcontext.
out...
context definition typedef struct double
out int output_ofs int
output_bs int output_nx .
double decimation double
tap context_FIR
9Code Size Overhead (in Sparc/Solaris)
without context
with context
.. value ..
(context_CGCRampcontext.value) ldd fp
-336,o0 sethi hi(0x20800),o1 ld
o10x3c8, o0 mov o0, o2 sll o2,
2, o1 add o1, o0, o1 sll 01, 3,
o0 add fp, -424, o1 add o1, o0,
o2 ld o2 0x1c, o0 ldd o0, o2
4 bytes
40 bytes
Reference Overhead 36 bytes!
10Optimization 1 Code sharing
- Multiple instances of a same kernel have their
own contexts - Kernel code should be transformed into shared
version function - Shared Version
- references are only through context variable
- Gain and cost of sharing
- Gain ( instances -1) ? (code block size)
- Cost (instances) ? (context variable size)
(code block overhead) - Code sharing is performed only when the gain is
larger than the cost
11Decision Formula
(1) ? code sharing overhead ?context
?reference (2) ?context ??pi?(pi), pi ?
ports where, ?(x) 3sizeof(int)
sizeof(pointer) (3)
?reference ?t ?S,C,AS,AP(?(t)??(t))
?(t) reference count ?(t) unit overhead
t type of reference (4) ? code block
size (5) ? number of instances
12Optimization 2 Adjusting SAS
- Adjusting Single Appearance Schedule
- 2(7A3B)5C gt 51
- 2(7A3B2C)C gt 39
- give up single appearance schedule
- BTLC (Binary Tree with Leaf Chain)
G
5
2
6,0,0 input, inside, output
3
7
0,0,21
21,0,15
7,0,5
0,0,3
13Computation of Buffer Requirements
2
7
3
7
5
3
A
B
21
30
14Flowchart of Schedule Adjustment
SAS schedule
Construct BTLC
Compute buffer requirement
Find candidate for adjustment
no
found
yes
Adjust schedule (split a chain)
Done code generation
15Splitting A Chain
0,30,0
- Finding split candidate
- a chain which has the largest number
- in this example BC is selected
- Schedule after splitting
- 2(7A3B2C)C
- In general, for a schedule that has two clusters
aCabCb(a and b are loop counts) new schedule is
defined as - a(Ca(b/a)Cb)(ba)Cb) , if altb
- (ab)Ca b((b/a)CaCb ), otherwise
30,0,0
0,21,30
21,0,15
0,0,21
6,0,0
30
21
Split point
0,0,3
7,0,5
Schedule 2(7A3B)5C
16Decision Formula
G
0,6,0
0,12,6
6,0,0
2
1
12,0,0
0,21,15
C
1
2
Cluster W value of the cluster
6,0,0
21,0,15
6
New Schedule 2(7A3B2C)C Gain 12
0,0,21
7
3
C
6,0,0
12
21
A
B
7,0,5
0,0,3
17Experiment CD2DAT
18Experimental Result
Program size after each optimization
CD2DAT Filter Bank SAS 13672 28512 Code
Sharing 12768 22024 Schedule
Adjustment 12296 22024
Memory behavior of CD2DAT in ARM7
Fetches Miss SAS 17098177 57189 Code
Sharing 17573923 52867 Schedule
Adjustment 17499386 54331
19Conclusion
- Our Environment
- PeaCE Ptolemy extension as Codesign Environment
- Optimization Techniques in Software Synthesis
- For automatic code generation from dataflow graph
- Joint minimization of code and data size
- Selective application code sharing and schedule
adjustment to SAS - Future works
- Clustering multiple fine grain nodes into a
large one - increase chance of code sharing
- Buffer sharing
- further reduce the buffer size and increase the
cache effect
20Thank You !