Title: 7'Targeting Hardware
17.Targeting Hardware
2Targeting Hardware
- Learn how to compile Handel-C code for a specific
FPGA device -
- DK1 design flow
- FPGA pins
- Clocks
- Interfaces
- Building for EDIF
- The Technology Mapper
- FPGA place and route tools
- Timing analysis
3Targeting Hardware
- DK1 Design Flow
- This diagram shows the design flow with Handel-C.
Like any design flow diagram, it contains many
simplifications, but it does illustrate how a
design progresses from concept to silicon. - Algorithms are often developed and refined using
C or C. These may be known algorithms that need
to be run on dedicated hardware, or new
algorithms. - The first stage in creating a hardware
implementation of an algorithm is to translate
the C or C code into Handel-C. Because Handel-C
is based on standard C, a first pass translation
can be quite easy. However, getting an efficient
hardware implementation may require a great deal
of refinement to be done to exploit the
parallelism of Handel-C and hardware. - In parallel with this, simulation tests have to
be created. These can be written in Handel-C
alternatively, external C and C functions can
be called from a Handel-C function. The Handel-C
model and the tests can now be built for
simulation and simulations run. If the design is
not working correctly, some iteration round the
design flow may be required. - Once the simulation is working as expected,
hardware interfaces can be added to the Handel-C
model. The design will then be ready for building
for a specific FPGA device. The output from this
stage in the design process is an EDIF netlist.
This is an industry standard way of describing a
hardware design for EDA (Electronic Design
Automation) tools. (The term "CAD" or
"Computer-Aided Design" is sometimes used in
this context EDA and CAD are equivalent.) - The EDIF file is then read by the FPGA vendor's
place and route tools, which produce the file
needed to program the FPGA. This BIT file can be
downloaded to an FPGA development board, or used
to program an FPGA in an actual system.
4DK1 Design Flow
5Targeting Hardware
- Clock, Device and Pins
- Before a Handel-C program can be compiled to
produce an EDIF file, some information must be
added to the design. This describes the target
device and the interfaces between the FPGA and
the outside world. - Up to now, we have been declaring a clock and
giving it a dummy pin name. We now need to supply
the name of a clock pin in the actual FPGA we are
using. Many FPGAs have several clock pins, so we
need to know which pin (or pins -we can have more
than one clock) is being used as the system
clock. Here, we have stated that the clock pin is
called "Pin_L6". - We can also include the name of the target FPGA
family and the device part number in the Handel-C
code. Alternatively, this information could be
included in the DK1 project settings, or the FPGA
vendor's place and route tools. With some vendors
you may have to repeat this information, even if
it was included in the DK1 project or the
Handel-C code. Here we are targeting an Altera
APEX 20KE device with part number
EPF20K200EQ208-1. The part number indicates the
package (EQ208) and speed grade (-1). - If our design is going to communicate with the
outside world, we must tell the tools which pins
it is going to use for this purpose. Here we have
declared an eight-bit input bus and specified
which pins the bus will use. Interfaces are
discussed in detail later on in this section. - You will need to consult the data sheet for the
target FPGA or development board for details of
part numbers and the pin names.
6Clock, Device and Pins
set clock external "Pin_L6" set family
Altera20KE set chip "EPF20K200EQ208-1" interfa
ce bus_in (unsigned 8 data_in with data
"Pin_A2", "Pin_A3", "Pin_A4", "Pin_A5",
"Pin_B3", "Pin_B4", "Pin_B5", "Pin_B6",
) data_bus () void main (void) ..
7Targeting Hardware
- Target FPGA Family and Part
- The target FPGA family and part can be specified
in the DK1 tools using the Chip tab of the
Project Settings dialog. The family is usually
assigned when a project is created. A value for
the family is required, but the part can be left
blank and assigned later in the vendor's place
and route tools. The target family affects the
format of the EDIF netlist, so you must use the
right one for the device you are targeting. - If the set family and sef part specifications are
included in the Handel-C code, the values must be
the same as the ones in the Project Settings. It
is therefore best to use the Project Settings in
the DK1 GUI, and not include these specifications
in the Handel-C code. This also allows the same
Handel-C code to be used unchanged for different
FPGA families and devices.
8Target FPGA Family and Part
- You can set the target family and part in
Handel-C - Or in a DK1 project
9Targeting Hardware
- Clocks
- We know that in Handel-C a main function must
have exactly one clock. This is usually declared
to be connected to a specified external pin. DK1
will issue a warning if no pin name is given. - If you want to you can supply a clock rate in
MHz. This value will be communicated to the place
and route tools, which will attempt to make the
design operate at this speed. The rate does not
affect the behavior of the DK1 tools. - In Handel-C it is possible to have more than one
main function. If so, each main must have a
separate clock specification. Two main functions
can in fact use the same clock pin each main
function represents a separate top-level block of
hardware. There are some issues regarding the
simulation of designs with more than one main
function, which will be discussed in a later
section.
10Clocks
- Each main function has exactly one clock
- You can have more than one main - each has its
own clock - Clocks may be external or internally generated
- Clocks may be divided on-chip
11Targeting Hardware
- "Internal" Clocks
- Some older Xilinx chips (the 4000 series) have
internal clock generation circuitry. It is
usually better and much more common for a clock
to be generated off-chip using a crystal. If you
want to use an internally generated clock for one
of these devices, you will need to know what
frequencies are supported, by referring to the
relevant data sheet. - Internal clocks are also used in two other
situations - In a multi-module design (i.e. one with more than
one main function) wherethe clock pin is
connected to one module and passed to the other
one. - In an FPGA that includes a specialized clock
circuit such as a PLL (Altera), aDLL (Xilinx
Virtex) or a DCM (Xilinx Virtex II). These
circuits provide clockmultiplication, division
and phase shifting. Their use will be described
in alater section. - The slide shows how to use an input port as a
clock. The example will make more sense once
interfaces have been discussed later in this
section.
12"Internal" Clocks
- Some older Xilinx chips (XC4000) have on-chip
clock generators - Usually, "internal" clocks come from interfaces
- Used in modular designs or with PLLs (Altera)
DLLs (XilinxVirtex) or DCMs (Xilinx Virtex II)
13Targeting Hardware
- Divided Clocks
- Handel-C clocks may be derived by dividing the
incoming clock. The example shows how to specify
that the incoming clock should be divided by 2,
to give a 50MHz clock for the design. (The rate
is that of the divided clock here the external
clock is assumed to run at 100MHz). - Clock division is achieved using a shift
register, configured as a ring counter, with the
output being fed back to the input. To divide by
two, two flip-flops are used. These are
configured to power up with opposite values. The
output of the second flip-flop will toggle on
every other clock. - It is very important to use global clock buffers
to distribute clock signals. The DK1 software
inserts these where they are needed. The vendors'
place and route tools will issue warnings if
there is a problem. - Some FPGAs include special circuits that perform
functions like clock division. These should be
used in preference to a Handel-C external_divide
clock, if they are available, because they have
been designed specially for this purpose. Their
use will be described in a later section.
14Divided Clocks
- Clocks may be divided on-chip
15Targeting Hardware
- Variable Initialization
- Another important consideration in a real
hardware design is initialization. All the
flip-flops should power up in a known state. In
Handel-C terms, this means that all variables
should be initialized. - In Handel-C global and static variables may be
initialized when they are declared. Automatic
variables may not be initialized like this
instead they should be initialized with an
assignment statement.
16Variable Initialization
- All variables should explicitly be assigned an
initial value - Global and static variables and constants may be
initialized withtheir declaration - Non-static function variables can be initialized
in a separate assignment
17Targeting Hardware
- Global Reset
- It is a good idea for an FPGA to have a global
reset pin. This allows the chip to be reset to
its initial state when the system resets, for
example. - The Handel-C set reset directive defines a reset
pin in the same way that set clock defines the
clock pin. When the pin is asserted, the
variables revert to their initial values.
However, internal memories are not reset.
(Memories are discussed in a later section.) - Just as an internal clock can be specified, so
can an internal reset. Be careful! Using an
internally generated asynchronous reset can cause
timing problems in the design. It is not
recommended.
18Global Reset
- set reset controls the global reset
- Variables are reset to their initial values
- RAMs are NOT reset
19Targeting Hardware
- Try ... Reset
- The Handel-C try-reset construct provides a
synchronous internal reset mechanism. This is
much better than using an internal asynchronous
reset. - The statements in the try block are executed in
the normal way, until the reset condition becomes
true (i.e. not 0). The reset condition acts as an
external disable or interrupt. - Note that the name reset is a little misleading -
no variables are reset unless you write some
Handel-C statements to reset them! Also, when the
reset condition becomes false again, normal
operation does not resume unless you have written
the code to do so.
20Try ... Reset
- try... reset provides a means of stopping the
execution of some Handel-C code when a reset
condition is asserted
21Targeting Hardware
- Interfaces
- An interface is a Handel-C object that is used to
describe an interface between an FPGA and its
environment, or between different parts of a
single FPGA. - Pre-defined interfaces are provided for
connecting FPGA pins to a design ("buses") and
for connecting Handel-C designs together or to
designs written using other languages, such as
VHDL and Verilog or using a schematic editor
("ports"). Both these sorts of interface
correspond to external interfaces of some sort. - User-defined interfaces may be defined so that
foreign components can be instanced in a Handel-C
design. - Ports and user-defined interfaces will be
discussed in detail in a later section of the
course.
22Interfaces
- Pre-defined interfaces - FPGA pins
- bus_in, bus_latch_in, bus_clock_in
- bus_out
- bus_ts, bus_ts_latch_in, bus_ts_latch_out
- Pre-defined interfaces - EDIF/VHDL/Verilog ports
- porHn, port_out
- E.g. for connecting two Handel-C models together
- User-defined interfaces - "Black box" components
- EDIF, VHDL, Verilog models
- Simulation "plugin" model (native PC object code)
23Targeting Hardware
- Reading from External Pins
- A bus_jn interface is a pre-defined interface
that is used to input data from an FPGA's pins. A
bus_jn interface has a name and a single input.
The input has a type and an optional name. The
default name if none is given explicitly is in.
The syntax is illustrated in the examples
opposite. - Data is read from an interface by using the name
of the interface followed by a period and the
name of the input. The value can be used in any
expression, just like a variable. - If you have two input buses, you must have two
bus_jn interfaces. Alternatively, you could have
one wide input bus that represents a
concatenation of the two input buses.
24Reading from External Pins
- A bus_in interface enables data to be read from
external pins - Data is read as follows
25Targeting Hardware
- Input Timing Issues
- There is a potential problem with reading the
same input bus into two variables. This is as a
result of the routing from the pins to the
flip-flops in the chip. There is no guarantee in
the real hardware that the input values will
arrive at two separate locations in the same
clock cycle the routing delays to the two
locations might be different.
26Input Timing Issues
27Targeting Hardware
- Solution to Input Timing Problem
- The solution is to read the input into one
flip-flop and distribute the internal value. In
other words, synchronize the input to the clock. - Alternatively, if the timing of the input data
relative to the clock is known, it might be
possible to constrain the place and route tools
to ensure that the data is read correctly.
28Solution to Input Timing Problem
- Alternatively, constrain the input delays in
place and route
29Targeting Hardware
- "Latched" Input Bus
- A bus_latch_in interface provides a conditional
input. Despite the name, this sort of interface
is implemented using a flip-flop, not a
transparent latch. The flip-flop is clocked by an
external signal, not by the system clock. This is
a potential cause of timing problems, because the
flip-flop's output is not synchronized to the
system clock.
30"Latched" Input Bus
- Use bus_latch_in for conditionally registered
inputs
31Targeting Hardware
- Clocked Input Bus
- A bus_clock_in interface is similar to a latched
input, but uses the system clock. This provides a
way of ensuring that inputs are properly
synchronised. Some FPGAs have special flip-flops
associated with their I/O pins and a bus_clock_in
interface can make use of these.
32Clocked Input Bus
- Use bus_clock_in for clocked inputs
33Targeting Hardware
- Writing to External Pins
- Having looked at the input buses, we now move on
to look at the output bus interfaces. These are
declared like their input bus equivalents, but
have an output instead of an input. The output
type, width and name are given in brackets after
the name of the interface. The output name is
required - there is no default. An output
expression must also be used. - Note that a non-trivial output expression can be
given - out A B describes combinational logic
(an adder) between the FPGA core and the output
pins however it is usually best for outputs to
be registered, so simple assignment expressions
should generally be used. - For example,
- interface bus_out () Q_bus (unsigned 8 out
Sum) - and in the body of the model,
- Sum A B // Sum will be registered
34Writing to External Pins
- A bus_out interface enables data to be written to
external pins - Expressions can be used
35Targeting Hardware
- Bi-directional Pins
- Bi-directional bus interfaces (bus_ts) have both
an input and an output. They also have an output
enable, which is listed after the output. When
the output enable is 1 the pin is used as an
output and the output expression appears on the
pin. When the output enable is 0, the output
stage is put into its high impedance state, and
the pin can be used as an input. - The interfaces bus_latch_ts and bus_clock_ts are
bidirectional pins with input flip-flops, similar
to busjatchjn and bus_clock_in respectively.
36Bi-Directional Pins
- Use interfaces bus_ts, buslatch_ts, or
bus_clock_ts - bus_latch_ts and bus_clock_ts have input
registers
37Targeting Hardware
- Compiling for EDIF
- Once a Handel-C program has all the necessary
clock specifications and interfaces, an EDIF
netlist can be created. To do this, configure DK1
to build for EDIF.
38Compiling for EDIF
39Targeting Hardware
- Compiling for EDIF
- As with building for simulation, messages are
written to the output window at the bottom of the
main DK1 application window. You will see
additional messages, corresponding to the
optimizations that are performed during mapping
to EDIF.
40Compiling for EDIF
- Build log is displayed in Output window
41Targeting Hardware
- FPGA Place and Route Tools
- This diagram illustrates the back-end design
flow, i.e. the design flow from EDIF to
programming an actual FPGA device. - Handel-C produces an EDIF netlist, which is the
main input to FPGA vendors' place and route
tools. The EDIF netlist will have been tailored
by DK1 for the device family being targeted. - The other important inputs to the place and route
tools are the design constraints. If a clock rate
was given in the Handel-C clock specification,
this will be passed to the place and route tools.
For many designs this is sufficient for the tools
to produce a good implementation of the design.
In other cases, additional constraints may need
to be entered. These may include additional
timing constraints, especially if the interface
timing is critical, or the clocking is complex.
They may also include placement constraints. The
details of how to do this are specific to the
target device and tools and are outside the scope
of the present course. - The place and route tools implement the design in
the EDIF file whilst attempting to meet any
constraints. The result is a bit file, which is
used to configure the FPGA, and many report
files. The most important report file is the one
that documents the timing of the design. This
will indicate whether or not any timing
constraints, including the clock rate, have been
met. If they haven't, the design may not work
properly. Don't be fooled if the development
board or lab model appears to work - beware that
timing violations sometimes produce intermittent
system malfunctions. The speed at which a device
runs is also temperature-dependent - the system
may only work if the ambient temperature is low
enough. The timing information used by the place
and route tools is based on "worst-case"
conditions (within certain limits).
42FPGA Place and Route Tools
43Targeting Hardware
- Technology Mapper
- Part of the design implementation process is
technology mapping. This means taking a logic
function, described in terms of logic gates such
as and, or, xor, and mapping it to the lookup
tables in the FPGA. Mapping can be performed by
DK1 or by the place and route tools. If DK1 does
the mapping, it can provide more accurate
estimates of device speed and utilization than
would otherwise be the case.
44Technology Mapper
- Mapping is normally performed by PAR tools
- It is the translation of a gate-level netlist to
FPGA/PLD LUTs - DK 1.1 now includes its own Technology Mapper
- Allows more detailed estimation of design delay
and area - Can provide improved performance for some designs
- Supported devices
- Xilinx Spartanll.Virtex.VirtexE.Virtexll.Virtexll
Pro - Altera Apex 20K, 20KE and 20KC, Apex II,
Excalibur - Actel ProASIC, ProASIC
45Targeting Hardware
- Timing Constraints
- In order to understand the timing report that is
produced by an FPGA place and route tool, it is
necessary to understand the timing model of the
FPGA design that DK1 produces. - Handel-C produces a design that is synchronous by
construction, provided that there is only one
main function, or, if not, that all the main
functions share a common clock. If this is not
the case, the timing reports may be misleading,
and some manual constraints will have to be
entered. - For a synchronous design, all storage is in
flip-flops clocked by a common clock. Functions
are implemented using combinational logic. There
is no feedback in the combinational logic. Timing
analysis calculates the longest delay through
each combinational block under worst-case
operating conditions. Together with the
flip-flops' setup time parameters, this provides
a figure for the minimum period of the system
clock. This must be less than actual period of
the clock for the circuit to operate correctly.
The difference between the actual and the minimum
clock is called the "slack". A negative slack
indicates a timing constraint "violation". - Special care is required with inputs and outputs.
The delay from an input pin to an internal
flip-flop or between an internal flip-flop and an
output pin must also be less than the clock
period. Some timing analysis tools may ignore
these delays in the timing report, unless
explicitly requested not to. The problem can be
minimized by using flip-flops at every input or
output.
46Timing Constraints