Generation of CDFGs from Scheduled and Pipelined Assembly Code - PowerPoint PPT Presentation

About This Presentation

Title:

Generation of CDFGs from Scheduled and Pipelined Assembly Code

Description:

Linearizing Pipelined Computational Operations ... 27% increase in code size during the linearization process. ... Linearize the assembly code. Generate the ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 19

Provided by: davidcz

Learn more at: https://www.ece.lsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Generation of CDFGs from Scheduled and Pipelined Assembly Code

1
Generation of CDFGs from Scheduled and Pipelined
Assembly Code

The 18th International Workshop on Languages and
Compilers for Parallel Computing
October 20, 2005
David Zaretsky, Gaurav Mittal, Robert Dick, and
Prith Banerjee
Department of Electrical Engineering and Computer
Science, Northwestern University
College of Engineering, University of Illinois at
Chicago

2
The Future of DSP Applications

Recent advances in embedded communications and
control systems are pushing the computational
limits of DSP applications, driving the need for
hardware/software co-design system.

3
Binary Translation

Problems with high-level synthesis
High-level application unavailable
Hardware compiler unavailable
Binary Translation
Grammar
Operation Latencies
Software Pipelining
Processor Architecture Limitations
Functional Units
Data Paths
Physical Registers
Memory Spilling
Control and Data Flow Graphs
Optimizations
Scheduling
Design decisions

4
FREEDOM Bridging the Gap

FREEDOM compiler automates the task of hw/sw
partitioning for software binaries.
FREEDOM is an acronym for Fabrication of
Reconfigurable Hardware Environments from DSP
Optimized Machine Code

5
Related Work

Binary Decompilation Translation
Cifuentes93/96/98
Kruegel04
Dehnert03
Stitt02/03
Dynamic Binary Optimizations
Bala00
Gschwind00
Ye00
Levine03
Control and Data Flow Analysis
Kastner02
Decker03
Amme00
Cooper02

6
Presentation Overview

FREEDOM Compiler Infrastructure
Data Dependency Analysis
CDFG Generation from Scheduled Assembly Code
Experimental Results
Summary Conclusions

7
The FREEDOM Compiler

Common entry point for multiple assembly
languages.
Intermediate levels
Machine Language Syntax Tree
Control Data Flow Graph
Hardware Description Language
Architecture Description Language provides
resource information for target FPGA
architecture.
Output RTL VHDL/Verilog and testbench.

8
Machine Language Abstract Syntax Tree (MST)

Generic language encapsulates most ISAs,
including predicated and parallel instruction
sets.
All MST instructions are three-operand,
predicated instructions pred op src1 src2
dst
Operand Types Memory Address, Label, Register,
Immediate.
Operator types
Logical AND, NAND, NEG, NOR, NOT, OR, XOR, SLL,
SRL, etc.
Arithmetic ADD, DIV, MULT, SUB
Branch BEQ, BGEQ, BGT, BLEQ, BLT, BNEQ, GOTO,
CALL
Comparison CMPEQ, CMPNE, CMPLT, CMPLE, CMPGT,
CMPGE
Assignment LD, ST, MOVE, UNION
General NOP

9
Data Dependency Analysis

MST instructions are assigned
A timestep T
An operation delay
Each instruction in a parallel set is incremented
by Tn T 0.01 n
Each instructions in an expanded set is
incremented by Tm Tn 0.0001 m
The write-back stage of an instruction is defined
as wb timestep delay

TIMESTEP PC OP DELAY SRC1 SRC2 DST
1.0000 0X0020 MULT (2) A4, 2, A4
2.0000 0X0024 LD (5) (A4), A2
2.0100 0X0028 ADD (1) A4, 4,
A2 3.0000 0X002c ADD (1) A4, A2,
A3
10
CDFG Generation from Scheduled Assembly Code
0x0000 VECTORSUM ZERO A7 0x0004
LDW A4, A6 0x0008 B LOOP
0x000C LDW A4, A6
0x0010 B LOOP 0x0014
LDW A4, A6 0x0018 B
LOOP 0x001C LDW A4, A6
0x0020 B LOOP 0x0024
LDW A4, A6 0x0028 B
LOOP 0x002C SUB A1, 4,
A1 0x0030 LOOP ADD A6, A7, A7 0x0034
A1 LDW A4, A6 0x0038
A1 SUB A1, 1, A1 0x003C A1 B
LOOP 0x0040 STW A7, A5 0x0044
NOP 4

Pipelined assembly code present difficulties in
CDFG generation
Complex control flows
Varying data dependencies
CDFG generation in 3 steps
Generate a Control Flow Graph
Linearize Pipelined Operations
Generate Data Flow Graph

11
Building a Control Flow Graph

Based on work by K. Cooper et al, Building a
Control-Flow Graph from Scheduled Assembly Code,
Dept. of Computer Science, Rice University.
Generates a CFG in O(n) time.
Requires 3 Stages
Partition the code at labels into a set of basic
blocks.
Add edges between CFG blocks to represent normal
flow of control.
Iteratively propagate pipelined branch and
counter information in a simulated control flow.

12
Event-Triggered Operations

Analogous to a read/write pipeline architecture.
Event trigger and execution stages are offset by
operation delay (d).
Implemented using a virtual shift register of
size d.
Event is triggered by assigning a 1 to the
highest bit (d-1).
SRL operation is performed on the register in
successive cycles.
Event is executed after d cycles, when a 1
appears in the zero bit.

13
Linearizing Pipelined Branch Operations

Iteratively propagate pipelined branch and
counter information in a simulated control flow.
Trigger a change in control flow after a number
of delay cycles.
Only the event is propagated using the SRL
operation.
Copy of branch instruction inserted at each
execution point.
The branch is predicated on the event
shift-register.
Intersecting branch paths are merged by OR-ing
predicates.
The original branch instructions are replaced
with NOPs.

14
Linearizing Pipelined Computational Operations

Multi-cycle instructions are serialized into
well-defined data flow paths along the pipeline.
For an instruction with n delay slots, the value
is propagated through virtual registers Rn-1?Rn,
Rn-2?Rn-1, R0?R1, where R0 is the original
register name.
Each instruction in the sequence is guarded by a
predicate on an event-triggering register bit.
Intersecting data paths are merged by OR-ing
predicates.

15
Building the Data Flow Graph

DFG represents data dependencies in each MST
procedure.
DFG is generated using write-back times of MST
instructions.

DOTPROD MVK .S1 500,A1 ZERO .L1
A7 MVK .S1 2000,A3 LOOP LDW
.D1 A4,A2 LDW .D1 A3,A5
NOP 4 MPY .M1
A2,A5,A6 SUB .S1 A1,1,A1
ADD .L1 A6,A7,A7 A1 B .S2
LOOP NOP 5 STW
.D1 A7,A3
16
CDFG Optimizations