Title: DFL Language Training
1 Training Software Version v2.2
2Training Overview
- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
3 Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
4Electronic Product Design
High-Complexity Applications
Time-2-Market
Time-2-Profit
Power-Efficient, High-Performance,
Cost-Effective, Flexible Architectures
Low-Cost
Low-Power
Deep-Sub-Micron Silicon Assembly
5Design Flow
Algorithm
Architecture
RT-level Synthesis
Abstraction Levels
architecture
Gates
Layout
6Time-to-Market
- Raising the abstraction level
- Code compactness
- Algorithmic description FIR filter
- 100 lines of C code
- RT-level description FIR filter
- 5,200 lines of HDL
- Blackbox
- Better simulation performance
- Easier design transfer and re-use
BEHAVIOR ( C SUBSET ) RT-LEVEL
7Flexibility
- Optimal area for application
- Low-power design
- More processing power/throughput
- Same starting point
- FPGA
- ASIC
- cheaper custom solution
BEHAVIOR ( C SUBSET ) RT-LEVEL
8Flexibility Example
RT-level synthesis
Behavioral synthesis
Reduction 50
Reduction 13
9Application Area
- Data path elements are shared over clock cycles
- Moderate decision making is involved
Controller FSM
Control/ Flags
Control
Data Path Cores Register Files
RAM/ROM Addr/Data Regs
Address/ Data
10Typical Applications
- ASSP Application Specific Standard Product
- Relatively complex data/signal processing
- GSM, DECT, wireless LAN
- Speech recognition, compression, processing
- JPEG, image processing
- Portable medical electronics
- ...
11Design Constraints
- Design considerations
- Algorithm level
- Frame rate
- Frame 1 execution of your algorithm
- 1 frame consumes 1 value for each input, produces
1 value for each output - e.g. GSM LTP 1 data frame (160 samples) every 20
ms - Maximal latency delay on signal caused by the
algorithm - RT-level
- Clock rate
- e.g. 50 MHz clock
- Cycle budget Clock rate / Frame rate
- The amount of clock cycles available to execute
one frame - e.g. for GSM LTP 4000 cycles
12Target Processor Architecture
branch logic
ALU
MULT
IN
OUT
RAM
ROM
13Structure of a Cluster
14Internal Design Flow
15Internal Design Flow(2)
16Defaults, Options and Pragmas
- Increasing order of priority
- Tool defaults
- Option settings (if any)
- Pragmas for specific cases
17Hardware Libraries
- Default library
- Supplied by Frontier
- Two versions - for Xilinx FPGA flow
- - for ASIC flow
- Sufficient to map all supported C operators
- User libraries
- Existing hardware blocks
- Custom hardware blocks for better
speed/area/power trade-off
18Project organization
artd_cache
19- Key Concepts
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
Edit and Compile Source
20Key Concept
- In a first step, ART Designer will convert in an
intelligent way your behavior description of
your algorithm into an internal representation. - intelligent -gt it checks whether the code is
C/C compliant, if there are non-synthesizable
constructs present - You can describe your algorithm using C/C
optionally enriched by ART Library fixed-point
types in C-style or SystemC-style. - To use ART Library types
- include ltfxp.hgt / C/C
version/ - include ltsc_fxp.hgt /SystemC
version/
21C Compiler optimizations
- Dead code elimination
- Constant propagation
- only for temporary expressions with constants
- b a 2 3 gt b a 5
22C Compiler Options (1)
Specification of the include search path Multiple
entries are separated by semicolon Specification
is relative to project subdirectory
Example /home/john/include..MY_INCLUDES/incl
ude
Macros to be defined/undefined Semicolon
separated Example for Defines
FXPTRACEMY_DEFINE1
Enables C test bench generation I/O can be read
in binary or decimal format
Saves the source file obtained after CPP
processing
Enables strict ANSI C compliance
23C Compiler Options (2)
- Data flow analysis
- identifies and accurately represents
- the parallelism of the C-code by
- - determining the exact data
- - dependencies between the variables
-
- to achieve - better performance
- - optimal use of target
processor
24Data Flow Analysis
void calc_address(const T_AD i, const T_AD
j, T_AD address) address const1i
const2j void mydesign() ... for (i0
ilt16 i) for (j0 jlt16 j)
calc_address(i,j,address) a Aaddress
.. // calculation of b Aaddress
b
DFA will check whether or not write address is
different from read address for every
iteration! This will determine how much loop
folding can be performed.
25Pragmas in C Source
- pragma OUT ltvar_name_1gt ltvar_name_2gt
- Used to indicate function arguments that are
strictly outputs - This is not checked by the compiler !
- Example
26- Key Concepts
- Edit and Compile Source
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
Create Architecture
27Key Concept
- In this step, you instantiate the hardware
resources that you need to define the target
architecture you want to use - You only have to instantiate the central elements
of hardware clusters (auxiliary resources like
register files, muxes and tristate buffers are
automatically generated at a later step) - Cores (ALU, MULT, )
- Memories (RAM, ROM, )
- Ports (INPORT, OUTPORT)
- You also instantiate one type of controller
28Architecture Model
29Instantiating Resources
- Resources can be instantiated from
- The default library artd_library (for ASIC flow)
or artd_xilinx_library ( for Xilinx FPGA flow) - A user library
- The libraries must have been selected in the
Create Architecture options
30Resources in the Default Library (1)
- Cores
- alu, alusat,
- mult, multp, mac2, mac3
- acu
- Memories
- rom, ram
- romctrl
- dpram_r_w, dpram_r_rw, dpram_w_rw, dpram_rw_rw
- dprom, dpromctrl
31Resources in the Default Library(2)
- Ports
- inport, inport_nohs, inport_noaddr,
inport_noaddr_nohs - outport, outport_nohs , outport_noaddr,
outport_noaddr_nohs - Controllers
- mbc_11, mbc_12, mbc_22, mbc_23
32Pragma Syntax Table
- I integer (e.g. 10)
- IL integerlist (e.g. 10,20,6 )
- IW integer or wildcard (e.g. 10 or or _)
- C quoted string (e.g. "acu")
- CL quoted stringlist (e.g. "in18","in210")
- EXPR expression (e.g. __)
33Pragmas (1)
- instantiate(C, C, C)
- instantiate(libraryName, resourceName,
instanceName) - This pragma instantiates a resource defined in a
library - The default library is called artd_library or
artd_xilinx_library - Multiple instances of the same resource can be
created - EXAMPLE
- instantiate("artd_xilinx_library","multp","multp_1
") - instantiate("artd_library","mbc_12","ctrl")
- instantiate(my_own_library",multiplier",mymult"
)
34Pragmas (2)
- instantiate_function(C, C)
- instantiate_function(functionName,
instanceName) - This pragma instantiates a virtual resource, not
defined in a library - All calls to the named function will be mapped on
this virtual resource as single-cycle operations - Only a single function can be associated with a
virtual resource - Allows design exploration without actually having
to create a library element - EXAMPLE
- instantiate_function(cordic",cordic_1")
35Pragmas (3)
- merge_regfiles(CL, C)
- merge_regfiles (registerfileName,
newRegisterfileName) - Merge a list of register files into a new
register file with the specified name - May lead to less registers but possibly a longer
schedule - EXAMPLE
- merge_regfiles("reg_a_ram_1","reg_dx_acu_1",
addr_reg")
ram_1
ram_1
addr_reg
acu_1
acu_1
36Pragmas (4)
- set_regfileports(C,IN,OUT, I)
- set_regfileports(regFileName,INOUT, nrports)
- This pragma allows you to generate multiport
register files - This pragma overrules the default register file
settings of one input port and one output port - EXAMPLE
- set_regfileports(merged_reg",IN,2)
- set_regfileports(merged_reg",OUT,2)
- This will result in a multiport register file
called merged_reg with two input ports and two
output ports
37Pragmas (5)
- connect_bus(C, CL, CL)
- Connect_bus(busName, writer, reader)
- Allows you to define a bus and its connctions.
- With this pragma you can restrict resources from
writing to specific busses or you can merge a
number of busses into one single bus. - By using multiple connect_bus pragmas you can
define partial or a complete busnetwork. The
outport of a resource that still has no bus
connection after the last connect_bus pragma will
automatically receive a private bus. - EXAMPLE
- connect_bus( ram2_bus,acu_2dout,reg_a_ram
_2d0,reg_dx_acu_2d0) - Defines a bus called ram2_bus that is
written to by the output of acu_2 and read by the
address port of ram_2 and the first input port of
acu_2
38Pragmas (6)
- no_connection(C, CL)
- No_connection(writer, reader)
- With this pragma you can restrict connections
between one output of a resource (defined by the
first argument!) and a list of inputs. - EXAMPLE
- no_connection( romctrl_1dout,reg_a_ram_2d0,
reg_dx_acu_2d0) - Using this pragma, no connection will be
present between the output of romctrl_1 and the
address register of ram_2 and the first input of
acu_2
39Default Architecture
- The following resources from the (ASIC)default
library are automatically instantiated when a new
project is created - alu, mult
- acu
- romctrl
- ram, rom
- inport, outport
- mbc_23
40Example Pragma File
//INPORT and OUTPORT without address
generation instantiate("artd_library","inport_noad
dr","inport_1") instantiate("artd_library","outpo
rt_noaddr","outport_1") //ACU and ROMCTRL for
RAM and ROM addressing instantiate("artd_library",
"acu","acu_ram") instantiate("artd_library","acu"
,"acu_rom") instantiate("artd_library","romctrl",
"romctrl_ram") instantiate("artd_library","romctr
l","romctrl_rom") //Cores and Memories instantiat
e("my_library","mac","my_mac") instantiate("artd_
library","rom","rom_1") instantiate("artd_library
","ram","ram_1") //Controller instantiate("artd_l
ibrary","mbc_23","ctrl") //dedicate address
generation cluster connect_bus(bus_romctrl_rom,
romctrl_romdout,reg__acu_romd0) connect
_bus (bus_dout_acu_rom,acu_romdout,reg__
acu_romd0, reg_a_acu_romd0) no_connection(
acu_ramdout,reg_a_rom_1) no_connection(r
omctrl_ramdout,reg__acu_rom,reg_a_rom_1
)
41Views
Architecture view
42 views
- Architecture view
- Graphical representation of the selected
architecture - In this view you can select and highlight
individual components and resources. You can also
jump to the architecture report for a detailed
textual overview
43 Reports (1)
44 Reports (2)
- Architecture report
- Lists all selected resource instances and its
registers - Lists for each instance/register
- input ports and connected register files/muxes
- output ports and connected buses
- Resources from the default library are listed
with - unspecified types and with their complete
instructionset - Resources from user libraries are listed with
types - and instruction list as specified in the
library
45- Key Concepts
- Edit and Compile Source
- Create Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
Map to Architecture
46Key Concepts
- In the mapping step following tasks are
performed - Memory management variables and temporary
variables (introduced by the compilation step)
are allocated to the available memory resources - Core resource assignment operations from the
design are assigned to corresponding core
resources and translated in RTs(register
transfers) - Multiplexer introduction muxes are introduced if
more than 1 bus is connected to input of a
register or if 2 or more variables with different
types are transferred to that input over a bus
connected to it
47Memory Management
Access Speed
Addressed by Data path
RAM
Arrays
ROM
INPORT/OUTPORT
Area per Memory Location
48Core resource assignment
- Resource assignment is completely detemined by a
set of internal mapping rules and by user
pragmas. - The rules are divided in two groups
- First set applies to the mapping of the core
resources in the default library. This set of
rules are transparent for the user but not
accessable - The second set apply to the mapping of operations
on user-defined resources and are an essential
part of the pragmas of the corresponding
user-defined library
49Mapping rules
- Operations or instructions on resources from the
standard library are handled as taking one clock
cycle. Exception MAC (has a pipeline register) - By default, operations and implicit operations
are mapped to the first instance of a resource
that can execute the operation - First means first instantiated in pragma file of
previous step - Implicit operations
- - ROM/RAM addressing Initialize address,
compute next address - - FOR loops Initialize loop counter, update,
test - - Implicit constants for all instances
50Multiplexer introduction
- In a last stage of the mapping step, muxes are
introduced were needed. - Their function is threefold
- bus selection
- data alignment
- type manupilation performed by coding
cast operations
51Pragmas (1)
- assign_expression(C, EXPR, C)
- assign_expression(scopeName, expression,
instanceName) - This pragma forces the mapping of operations,
indicated with an expression, onto a particular
instance - The action of the pragma is restricted to the
scope indicated by scopeName
- EXAMPLE
- assign_expression("/top",__,mult_2")
- All multiplications in top will be mapped on
mult_2
52Pragmas (2)
- assign_operation(C, C)
- assign_operation(operationName,
instanceName) - This pragma forces the mapping of operations,
indicated with the hierarchical name or source
label, on a particular instance - The label needs to be specified using its full
pathname - Wildcard can be used in levels, in labels
- EXAMPLE
- assign_operation(/.../incri, acu_2)
void cordic () ... Uintlt4gt tmp_i 0 loopi for
(int i0 ilt14, i) incri tmp_i ...
53Pragmas (3)
- assign_variable(C, C)
- assign_variable(variableName, instanceName)
- This pragma allows
- the mapping of a scalar or array variable onto a
specific memory (RAM, ROM) or port (INPORT,
OUTPORT) - the mapping of constant variables onto a specific
ROMCTRL memory - The variable needs to be specified using its full
pathname - EXAMPLE
- assign_variable("/top/AX","ram_2")
- assign_variable("/c4","romctrl_3")
- assign_variable(/cordic/Q_in,inport_2)
54Pragmas (4)
- assign_address(C, C)
- assign_address(variableName, instanceName)
- This pragma forces the address computation of a
specific variable to be performed onto a
particular instance - If one of the operations needed for address
computation cannot be performed on the given
instance, the default (acu_1) is used instead - EXAMPLE
- assign_address("/cordic/A","acu_2")
55Pragmas (5)
- assign_loopcounter(C, C)
- assign_loopcounter(iteratorName,
instanceName) - This pragma forces the operations for the
specified loopcounter (mostly decrement
operations) to be performed onto the given
instance - The default resource is the first instantiated
ACU - EXAMPLE
- assign_loopcounter("/cordic/loopi/i","acu_2")
void cordic() ... loopi for (int i0 ilt14,
i) ...
56Pragmas (7)
- unroll(C, IW, IW)
- unroll(loopName, firstIterationsToUnroll,
lastIterationsToUnroll) - This pragma unrolls loops or parts of loops
- Whole loop is unrolled when using the wildcard
_ - EXAMPLE
- unroll(/cordic/loopi,_,_) // unrolls the
whole loop - unroll(/cordic/loopi,3,0) // unrolls the
first 3 iterations - unroll(/cordic/loopi,2,4) // unrolls first 2
and last 4 iterations
void cordic() ... loopi for (int i0 ilt14
i) ...
57Pragmas (8)
- assign_variable_to_port(C, C, C)
- assign_variable_to_port(variableName,
instanceName, accessPortName) - This pragma assigns all read/write operations of
a variable to a specific port of a dual-port
memory - EXAMPLE
- assign_variable_to_port("/top/A",ram_1,ram_1_acce
ss2") - Assigns all accesses from/to array A to port 2 of
ram_1
58Pragmas (9)
- assign_operation_to_port(C, C, C)
- assign_operation_to_port(operationName,
instanceName, accessPortName) - This pragma assigns an operation to a specific
port of a dual-port memory - EXAMPLE
- assign_operation_to_port ("/top/loop/Aread",ram_1
,ram_1_access2") - the read operation labeled Aread is done via
access port 2 of ram_1
59Reports
- Architecture report
- List of resource instances is reduced to those
that are really used - Instances from the default library
- types have been set to the maximal types that are
used in the source - instruction list has been reduced to those
instructions that are used - Instances from a user library
- actual types and rules have been checked versus
the types and rules specified in the library - The controller sizes are still unknown
60Reports
- Memory map
- Detailed information on all
- present memory instances
- - RAM, ROM, ROMCTRL, INPORT, OUTPORT
61Reports
- Mux Report
- Summary and detailed information
- on all multiplexers
- Type Report
- For all muxes
- - Output buses they are connected to
- - For each bus variables, types,
- bitwise connection and alignment
Two-way cross-highlighting with source!
62- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
- Supported C subset
Schedule Operations
63Key Concept
- In this fourth step, two tasks are performed
- Scheduling of the operations the resulting
RT-graph will be ordered along a time axis in as
few machine cycles as possible taking in account
data and hardware constraints - Register assignment variables will be assigned
to fields of register files in such a way that
the overall size of the register files is
restricted to a minimum
64List Scheduling
Candidate LIST
Conflict Priority Comp.
Scheduled Operation
INPUT
4
OUTPORT
MULT
ALU
INPORT
OUTPUT
5
65ALAP and ALAP Greedy Scheduler
ALAP Greedy
ALAP
in
in
2
2
1
1
incx
incx
Parallel Path Optimizer Less Registers
5
5
8
8
4
4
9
9
decx
decx
7
7
incx
incx
out
out
66Loop Folding (1)
1
2
3
1
2
3
X
X
X
CORE1
X
X
X
CORE1
-----
-----
CORE2
X
X
X
X
X
X
CORE2
X
1 cycle per iteration since there is no dependency
21 cycles
67Loop Folding (2)
- Performed automatically
- Equivalent to pipelining
- Advantage
- Faster schedule through more parallelism
- Disadvantages
- Larger controller
- Larger register files
68Register Assignment
- For a particular register file assignment of
variables to specific register fields
0 1 2 3
a
d
field 1
b
e
e
field 2
c
f
field 3
69 Scheduler Options
- Scheduler algorithm
- ASAP (default) as soon as possible
- ALAP as late as possible
- ALAP Greedy complete paths are scheduled
- All tests them all for every level of
hierarchy and takes the one which results in
smallest cycle-count - Unconstrained folding
- General option for all loops
- Default on scheduler will try to reduce the
total machine-count of a for-loop by increasing
the available parallelism within every iteration - Can be overridden by pragma for specific loop
70Pragmas (1)
- fold(C, I)
- fold(loopName, reductionLimit)
- For a specific loop, this pragma specifies the
maximum number of iterations by which the
original number of iterations may be reduced - EXAMPLE
- fold("/cordic/loopi", 3)
- Iteration reduction of loop loopi" located in
function cordic" due to folding is maximally 3.
Suppose the original number of iterations is 14.
After folding, the resulting number of iterations
is at least 11.
71Pragmas (2)
- max_cycles(C, I)
- max_cycles(scopeName, nrOfCycles)
- Only used for the calculation of the cycle count
! - Suppose scope C in the source hierarchy
contains a conditional statement or a
non-manifest loop .You can then specify a maximum
cycle count value for scope C - When the exact number of cycles needed for C
can be computed, this pragma is ignored - EXAMPLE
- max_cycles("/top/block1", 10)
- If the number of cycles needed to execute
"/top/block1" cannot be computed exactly, a value
of 10 will be assumed to calculate the cycle count
72Views
Load view
73Views
- Load view
- The top part shows the loop structure of the
resulting schedule - Height of vertical bars represents number of loop
iterations - The bottom part shows activity of cores, register
files and/or buses as a function of the schedule
(program counter) - Dashed vertical lines represent loop and
condition boundaries - Colored vertical line shows split between init
section and run section - One-way cross-referencing to Schedule Report !
- Brings you to the potential, not to the exact RT
74Views
made
consumed
75Views
- Lifetime view
- The top part shows for selected register file(s),
the lifetime of all variables stored in these
files as a function of the schedule (program
counter) - The bottom shows for selected register file(s),
the required number of fields as a function of
the schedule - One-way cross-referencing to Schedule Report !
76Views
77Views
- RAM view
- The lifetime of variables stored on the selected
RAM(s) are displayed against the RAM (s)
address space
78 Reports
79Reports
- Schedule report
- Detailed listing of all RTs and their relative
sequence - Two-way cross-referencing with source !
- Information on all I/O performed by the processor
via INPORT and OUTPORT resources
80 Reports
81Reports
- Cycle Count Report
- Information on number of cycles
- Exact number for manifest descriptions
- C program for non-manifest descriptions (except
when using pragma max_cycles) - Information on loop structure of the input
description
82 Reports
83Reports
- Register report
- Summary overview of register usage
- Detailed listing for all register files and all
fields - Variables stored in the register field
- For each variable, potentials at which writes and
reads occur
84- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Verify the Design
- Create and Use a User Library
- Supported C subset
Build the RT-Level
85Key Concepts
- In this last step, following is performed
- Controller generation
- ROM optimizations
- HDL generation
86Controller-based versus Hardwired
Algorithm a x y - zc Controller-based
implementation (e.g. with microprogram)
microprogram 1. zctmp1 2 y-tmp1tmp2 3.
Xtmp2a
Hardwired implementation
x
-
y
z
a
c
87Multi-branch versus Single-Branch
Single-branch code
if (ingt8) result 3 else if (ingt4)
result 2 else if (ingt2) result
1 else result 0
Consecutive conditional jumps
Multiple-branch code
switch (in) case large result 3
break case medium result 2 break
case low result 1 break
default result 0
Offers a large number of possible next addresses
1
88Controller Generation
Generates the control bits sent to the different
resources of the datapath
Evaluates boolean combinations of its inputs
(status flags). Output condition code
Decodes the condition code to decide if the PC
has to branch to a specific address. Output
encoded jump address
89Controller Alternatives (1)
90Controller Alternatives (2)
- mbc_22
- This is a variation on the default controller
- difference a pipeline stage (status) has been
removed
91Controller Alternatives (3)
- FSM-based the microcode is replaced with a
synthesizable HDL model - mbc_11
92Controller Alternatives(4)
- mbc_12 control delay of 2
93Optimizations
- Optimization of micro-ROM
- Removal of columns constant columns and
duplicate columns are removed
Before Optimization
After Optimization
1 0 1 1 0 1 0 1 1 1 1 0 0 0
1 1 0 0 0 1
1 0 1 1 0 1 0 1
Res. 1
Res. 2
Res. 1
Res. 2
GND
VDD
94Optimizations
- Optimization of ROMCTRLs
- Constants are put in micro-ROM if the number of
micro-ROM columns is not increasing
95HDL Netlist Structure
artd_ltdesigngt_microrom
artd_ltdesigngt
StatusLogic
artd_rom
Controller
BranchLogic
ir
Cores, ports, auxiliary resources
alu_1
Busses
96Options
- Netlist manipulations
- processor init choose to have it internal or
external - optimize dataROM
- Generate Hold pin to freeze the processor state
- Separate files generates separate HDL files
instead of - 1 large HDL file
97Pragmas (1)
- optimize_dataroms(CL, ON, OFF)
- optimize_dataroms(romOrRomctrlInstanceName,
- ON, OFF)
- Overrules the Options setting for a specific ROM
or ROMCTRL - EXAMPLE
- optimize_dataroms("rom_", ON)
- Optimizes contents of all ROM instances whose
names start with rom_ - optimize_dataroms("romctrl_1","rom_1", OFF)
- No optimization of contents of romctrl_1 and rom_1
98Pragmas (2)
- define_vhdl_generic(C, C, C, C)
- define_vhdl_generic(vhdlinstancePathName,
genericName, genericType, genericValue) - Suppose you have your own VHDL core with
generics. This pragma allows you to supply the
additional information needed to instantiate the
core in the VHDL netlist - You have to specify such a pragma for every
generic - EXAMPLE
- define_vhdl_generic(mymult_1", "width1",
"integer", "8") - The generic width1 of type "integer" of the
instance mymult_1" will be set to the value "8"
99Pragmas (3)
- define_verilog_parameters(C, C)
- define_verilog_parameters(veriloginstancePathName
, parameterArgumentValues) - Suppose you have your own Verilog core with
parameters - This pragma allows you to specify the values
of this parameters for an instantiate of the core
in the Verilog netlist - EXAMPLE
- define_verilog_parameters(mymult_1", "8, 7")
- The parameter string "8, 7" will be used for
the instantiation of mymult_1 in the Verilog
netlist
100Pragmas (4)
- make_external(CL)
- make_external(compomentName)
- The full instance pathname has to be specified
(see examples) - EXAMPLES
- make_external("rom")
- makes all instances external whose names start
with rom - make_external(reg_dy_alu_1)
- makes register file for the Y input of alu_1
instance external - make_external(mult_1")
- makes mult_1 and its associated registers and
multiplexers external
101Pragmas (5)
- inport_benchmode(C, READONCE,READONCEPERFRAME)
- For a specific INPORT, provides more flexibility
for reading input values within the test bench - Address generation on
- Default each stimuli file is read once per frame
- By selecting READONCE, the stimuli file is only
read in the first frame - The read value(s) will be reused in the next
frames - E.g. useful for constant parameter values
- Address generation off
- Default stimuli file is read every time a read
operation is encountered - By selecting READONCEPERFRAME, the stimuli file
will only be read once per frame - Only supported if only one variable is mapped on
the INPORT instance - EXAMPLE inport_benchmode("inport_1",READONCE)
102Pragmas (6)
- map_rom_on_lutram(romInstanceName)
- map_ram_on_lutram(ramInstancename)
- Indicates wheter the ROM/RAM will be mapped on a
network of lut rams - This pragma will only be effective if you have
chosen in the create architecture step for the
Xilinx FPGA flow - EXAMPLE map_ram_on_lutram(ram_1)
-
103Pragmas (7)
- map_rom_on_blockram(C,I)
- map_ram_on_blockram(C,I)
- Map_rom_on_blockram(romInstanceName,
max_nr_blockrams) - Map_ram_on_blockram(ramInstanceName,
max_nr_blockrams) - Indicates wheter the ROM/RAM will be mapped on a
network of blockrams and LUT rams - This pragma will only be effective if you have
chosen in the create architecture step for the
Xilinx Virtex or Spartan II flow - EXAMPLE map_ram_on_blockram(ram_1,4)
- Maps ram_1 on a combination of
block ram and lut ram. - At most 4 block rams may be
used. Lut rams will only be used - if there are not enough block
rams. -
104 Reports
- Architecture report
- Controller dimensions are
- now filled in
105- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Create and Use a User Library
- Supported C subset
Verify the Design
106Fetching Inputs with INPORT
Ports generated on processor for every INPORT
ART Designer Processor
ltinport_namegt_address
(not for inport_noaddr)
ltinport_namegt_ data
External Memory Device
ltinport_namegt_ dreq
ltinport_namegt_ davail
107Writing Outputs with OUTPORT
Ports generated on processor for every OUTPORT
ART Designer Processor
ltoutport_namegt_address
(not for outport_noaddr)
External Memory Device
ltoutport_namegt_ data
ltoutport_namegt_ dready
ltoutport_namegt_ daccept
108Processor Control Pins
clk
ART Designer Processor
ready
rst
start
109Processor Startup Sequence
110Timing of Processor I/O
Meaning of the ready flag
Timing of Input Signals
Timing of Output Signals
111Generated HDL Test Bench
112Verifying the Generated HDL
- Example for vsim and Unix
- go to the artd_vhd or artd_v subdirectory
of your project - vlib work
- vcom artd_design.vhd artd_bench.vhd
- vsim artd_bench -c -do run -all quit -f
113- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Supported C subset
User Libraries
114Using your own Data Path Resources
void addsub (const Intlt16gt in1, const Intlt16gt
in2, const Uintlt1gt mode, Intlt16gt out)
pragma OUT out if (mode 0) out
in1 in2 if (mode 1) out
in1- in2
mode
You need to define -I/O -instructionset -timing
1 bit
16 bits
in1
Add Sub
16 bits
out
16 bits
in2
time
115Constraints on User Library Resources
- You can only create user-defined resources that
perform arithmetic or logical operations on their
inputs - types of arguments have to be determined
- Latency (number of cycles) has to be manifest
116 Library Data Organization
Contains info about the contents and the
parameters of the library
For every resource you need a pragma file with
the ART Designer model
Optional HDL description of every resource for
simulation/synthesis
117Declaring a User Library
- Use the Create Architecture options
- Library Name
- Symbolic name to be used in pragmas
- Library Path
- Actual directory for library data
118Creating a User Library
- To create a user-library, ART Designer is
equipped with a Library Manager - ToolsgtLibrary Manager
119Adding resources
- Once you have created the user-library, you have
to add resources to that library - ResourcegtNew
Resource Name
Origin ART Builder or user supplied
Pin availability and their names
Model availability and names
120Necessary pragmas for every resource
121Pragmas (1)
- define_view(CORE, C)
- define_view(CORE,resourceName)
- Defines a name for the user-defined resource
- EXAMPLES
- define_view (CORE,myMult")
122Pragmas (2)
- define_inputs(C, CL)
- define_inputs(resourceName, inputPort)
- Defines the name and the width of each input.
- Control input (command bus) can also be specified
here - Up to two command busses can be present
- EXAMPLE
- define_inputs(cordiccore", I_in8",
Q_in8") - define_inputs(myBlock, in124, in224,
Cbus3)
Command bus
123Pragmas (3)
- define_outputs(C, CL)
- define_outputs(resourceName, outputPort)
- For a specific resource, this pragma defines the
name and the width of each data output - Flag outputs can also be specified here
- EXAMPLE
- define_outputs(myBlock", xi16, xq16,
xp16") - define_outputs(myOwnAlu, out16, flag11,
flag21)
124Pragmas (4)
- define_instruction(C, C, CL)
- define_instruction(resourceName,
instructionName, control) - This pragma allows you to define an instruction,
along with its control bits - default instruction must always be defined
- Will be applied for the clock cycles when the
resource is not active - Important for resources with internal state
- EXAMPLE
- define_instruction(myAlu", "add", "CbusFTT")
- define_instruction(myBlock, default, )
- / single-mode block without internal state /
125Pragmas (5)
- define_singlecycle_reservationGraph(C, C) or
- define_pipelined_reservationGraph(C, C) or
- define_multicycle_reservationGraph(C, C)
- The pragmas define different reservationgraphs
which can be applied to specific instructions - There are 3 possible predefined
reservationgraphs - - single cycle, pipelined and multicycle
- It is also possible to create your own
reservationgraph using the define_reservationGrap
h pragma in combination with other optional
pragmas - EXAMPLE
- Define_pipelined_reservationGraph(myMult",
onepipelinestage")
126Pragmas (6)
- map_function(C, C, C,CL, CL, CL)
- Map_function(funcName, resourceName,
reservationGraph, - inputMapping, outputMapping,
ctrlMapping) - This pragma defines how a function is mapped on a
resource. It calls the function and maps it
onto the first instance of this type of resource. - EXAMPLE
- define_rule (mult",mymult,onepipelinestage
"in1","in2", "out1","FlagZ",
modedefault)
127Optional Pragmas
- map_expression(EXPR, C, C, CL,CL,CL)
- Map_expression(expression,resourceName,
reservationGraph, inputmapping,
outputmapping, ctrlMapping)
- This pragma allows you to map expression with
type definition! on a user-defined resource - EXAMPLE
- map_expression(Intlt32gt9(Intlt32gt(_) Intlt32
(_)),myMult", resgraph", in1,in2,
out,flagZ,modedefault)
128Optional Pragmas
- define_pipeline(C ,I)
- define_pipeline(timeshapeName, nrOfPipeRegs)
- This pragma defines a pipelined timeshape that
can be applied to a resource with pipeline
registers - The instruction(s) with this timeshape and the
inputs need to be applied during a specific
cycle, the corresponding output appears after the
specified number of pipelines 1 - EXAMPLE
- define_pipeline("multTimeshape", 1)
129Optional Pragmas
- define_sideeffect(C ,IL)
- define_sideeffect(resourceName,instruction)
- For resources with internal state, this pragma
defines which instructions change the state - Used by speculation and loopinvariant code
optimizers - EXAMPLE
- define_sideeffect(userMac2, mac, mpy)
130Optional Pragmas
- define_alignment(resourceName ,LSB,MSB)
- This pragma defines how all operands and
operations will be aligned on the ports of
resource - No mixed alignment is allowed. LSB alignment is
the default. - EXAMPLE
- define_aligment("myMult", MSB)
131Example Pragma Set (1)
define_view(CORE,"acs") define_inputs("acs",
"in116","in216","in316","mode2") define_out
puts("acs","out116","out216","out31","out41"
) define_instruction("acs","default","modeFF"
) define_instruction("acs","nMax1","modeFT")
define_instruction("acs","pMax2","modeTF") def
ine_instruction("acs","nMax2","modeTT") Defin
e_singlecycle_reservationGraph(acs,graph)
define_rule( "pMax1", acs,graph,
"in1","in2","in3","out1","out2","out3","out4",
modedefault) define_rule( "nMax1",
acs,graph, "in1","in2","in3","out1","out2",
"out3","out4",modenMax1) define_rule(
"pMax2", acs,graph, "in1","in2","in3","out1
","out2","out3","out4",modepMax2) define_ru
le( "nMax2", acs,graph, "in1","in2","in3","
out1","out2","out3","out4",modenMax2)
132Example Pragma Set (2)
- Defines a resource acs with
- 3 data inputs
- 1 control input
- 4 data output
- Defines 4 instructions that can be mapped on this
resource - All instructions take a single cycle to execute
133User-defined Libraries
- Running ART Designer
- Needed pragma file that models I/O,
instructions and timing - Performing C simulation of your algorithm
- Needed
- Either a behavioral model in C for the complete
block ... - Or a separate behavioral model in C for each
instruction (recommended) - Performing RTL HDL simulation and/or synthesis
- Needed HDL model
-
134- Key Concepts
- Edit and Compile Source
- Create Architecture
- Map to Architecture
- Schedule Operations
- Build the RT-Level
- Verify the Design
- Create and Use a User Library
Supported C subset
135Constant Definition
- A warning is generated when an overflow or
quantization occurs during compilation - Example
- Fixlt12,9gt coef -1.70171875Generates the
following warning Quantization occurred when
casting the constant "-1.701718749999999900e00"
to the type fixlt12,9gt".Result is
"0bt110.010011000" ( "-1.703125" ) - String literals are only supported for the
initialisation of ART Library type
variables.Example Ufixlt5,2gt c0bu011.01
136Enumeration Constants
- Identifiers declared as enumerators are treated
as integer constants. - Renumbering is possible.
- ART Designer will use the smallest possible type
to represent the enumeration. - Example enum State(INIT, EXEC, OUTPUT) State
states - gt Internally, the variable states will be
represented as a 2-bit variable
137Data-Types (1)
- The standard C types are mapped into ART Library
types before being mapped into an internal
representation
C Type
ART Library Type
signed char
Fixlt8,0gt
unsigned char
Ufixlt8,0gt
Fixlt16,0gt
signed short int
unsigned short int
Ufixlt16,0gt
signed int
Fixlt32,0gt
unsigned int
Ufixlt32,0gt
signed long int
Fixlt32,0gt
unsigned long int
Ufixlt32,0gt
bool
Uintlt1gt
void
Not mapped
138Data-Types (2)
- The C types float, double, long double are NOT
supported - Pointers and pointer arithmetic are NOT
supported, use arrays instead - Arrays with incomplete type descriptions are not
supported Intlt10gt A5 // is erroneous - A structure is expanded into its member
valuesExample struct algoState long
buffer char offset struct algoState
s1 The corresponding HDL variables will
be s1xbuffer and s1xoffset
Consequence An array of a structure with N
elements is represented as N
different arrays.
139Data-Types (3)
- If structures are used as input or output to the
top function or an ASU, you will get extra inputs
or outputs in the generated HDL. - The bit information for bit-field structure
members is ignored.Example struct PPN
unsigned int PFN 22 int 4
//unused unsigned int CCA 3 bool dirty
1 bool valid 1 bool global 1 - Union types are NOT supported.Example union
value char s int i
140Expressions
- All C operations mapped on resources out of the
default library will get the quantization
characteristic TRUNCATED and the overflow
characteristic WRAPPED. - This corresponds to the default behavior of
ART Library. - ART Designer only supports the following
non-default characteristics - , - with saturate
- The division / and modulo operator are NOT
supported. - Special modulo can be performed on the ACU by
using the predefined functions of ltartd_acu.hgt - A shift operation with a negative shift value
shifts in the opposite direction. - Recursive function calls are NOT supported
141Declarations Initialization
- Declarations
- The volatile type qualifier is ignored
- The register class specifier is ignored
- Initialization
- Initialization of static variables by using a
function argument is not supported.Intlt16gt
func(Intlt8gt in) static Intlt8gt tempin //
erroneous - Non-initialized static fixed-point variables get
a dontcare value. - Static and global variables having a supported C
type and which are not explicitly initialized,
are initialized to zero according to the C
semantics. ART Designer will automatically
initialize such variables to zero.
142Function Declarations (1)
- Defining the I/O arguments (for top function and
for functions mapped to resources) - return argument (if present) output
- non-pointer type argument input
- reference or array type arguments will be mapped
on an input and output argument. However, this
input and output argument can be removed - Input only use the const qualifier
- const Intlt8gt sample3
- Output only use the pragma precompiler
directive - ifdef__SYNTHESIS__
- pragma OUT ltargument_name_1gt ltargument_name_2gt
- endif
143Function Declarations (2)
include ltfxp.hgt void adder( const Intlt4gt a,
//const is optional const Intlt4gt b, //const
is optional Intlt4gt c ) pragma OUT c
cab
entity adder is port ( a in
std_logic_vector(3 downto 0) b in
std_logic_vector(3 downto 0) c out
std_logic_vector(3 downto 0) ) end adder
Inputs a and b, outputs c
144Linkage Preprocessing
- A linking step is NOT supported by ART Designer,
consequences are - The input may be kept in several files, but they
have to be included in the file containing the
top function - External functions are not supported
- External variables are not supported (extern
specifier) - PreprocessingThe preprocessor variable
__SYNTHESIS__ is automatically set by ART
Designer and can be used to exclude some parts of
the C specification from the mapping.Example i
fndef __SYNTHESIS__ printf(I am debugging\n)
// NOT mapped by ART Designer endif
145Bit Operations
- All bit operations in ART Library are supported
- concatenation
- z concat(x,y)
- bit select or bit set
- z bit(x,pos)
- bit(z,pos,x)
- slice select or slice set
- z slice(x,pos1,pos2)
- slice(z,pos1,pos2,x)
- They may result in non-optimal hardware since
they are mapped into basic operations
146Other Constructs
- The goto statement is NOT supported
- The standard library is NOT supported
- stdio.h in C
- iostream.h in C
147Coding Guidelines (1)
- Type definitionsUse typedef instead of define.
This results in cleaner and safer code.Example
typedef Fixlt22,21gt D_FIX D_FIX
a0.75
148Coding Guidelines (2)
- Fixed-point variables constructed f