Title: VHDL Design Tips and Low Power Design Techniques
1VHDL Design Tips and Low PowerDesign Techniques
- Jonathan Alexander
- Applications Consulting Manager
- Actel Corporation
- MAPLD 2004
2Agenda
- Advanced VHDL
- ProASICPlus Synthesis, Options and Attributes
- Timing Specifications
- Design Hints
- Power-Conscious Design Techniques
- Summary
3Actel ProASICPlus Design Flow
VHDL Source
Directives
Logic Optimization
Attributes
Synthesis
Timing
Technology Mapping
Place Route
Timing, Pin, Placement
Technology Implementation
4What is Synthesis?
- The mapping of a behavioral description to a
specific target technology, - i.e. Generates a structural netlist from a HDL
description - Includes optimization steps
- Optimize the design implementation for
- Higher Speed
- Smaller Area
- Lower Power
5ProASICPlus HDL Attributes and Directives
- Attributes are used to direct the way your design
is optimized and mapped during synthesis. - Directives control the way your design is
analyzed prior to synthesis. Because of this,
directives must be included in your VHDL source
code. - Three important ProASICPlus attributes or
directives are available - syn_maxfan (attribute)
- syn_keep (directive)
- syn_encoding (attribute)
6ProASICPlus HDL Attributes and Directives (contd)
- syn_maxfan Value
- Value Range gt 4
- Can be assigned to an input port, register
output, or a net - Overrides the global Fanout Limit setting
- The tool will replicate the signal if this
attribute is associated with it - Syntax
- In the HDL code
- attribute syn_maxfan of data_in signal is 1000
- In the constraint file
- define_attribute clk syn_maxfan 200
7ProASICPlus HDL Attributes and Directives (contd)
- syn_keep 1
- When associated with a signal, this directive
prevents Synplify from combining or collapsing
the node. - This attribute can be associated with
combinatorial signals only - Syntax
- In the HDL code
- Attribute syn_keep of st signal is Integer 1
- In the constraint file
- define_attribute st syn_keep 1
8Agenda
- Advanced VHDL
- ProASICPlusSynthesis and Options and Attributes
- Timing Specifications
- Design Hints
- Power-Conscious Design Techniques
- Summary
9Timing Constraints Specification
- Synplify ProASICPlus mapper allows specification
of the following - Global Design Frequency
- Multi-clock design
- Skew between two clocks
- Input and output delays
- Functional multi-cycle and false paths
- All these timing specifications are available in
the GUI, the presentation will cover the sdc
constructs only.
10Design Frequency Specification
- Multiple Clocks
- Graphical User Interface Frequency item allows
specification of a global value for all clocks - This setting influences the operator architecture
selection (speed or area) during mapping - This value should be set to the highest frequency
required in the design - To specify individual values for different
clocks, use the following sdc construct - define_clock clock_1 -freq ltValue1gt
- define_clock clock_2 -freq ltValue2gt
11Skew Specification in Synplify
- To define a skew between two clocks, use the
following constraint - define_clock_delay -rise clock1 -rise clock2
value - Example
- define_clock_delay -rise CLK19M -rise MPU_CLK
1.0 - define_clock_delay -rise MPU_CLK -rise CLK19M
2.0
12Input Delay
- Specifies the input arrival time of a signal in
relation to the clock. - It is used at the input ports, to model the
interface of the inputs of the FPGA with the
outside environment. - The value entered should represent the delay
outside of the chip before the signal arrives at
the input pin - To specify the input delay on an input port,
use the following constraint - define_input_delay InputPortName Value
13Output Delay
- Specifies the delay of the logic outside the FPGA
driven by the top-level outputs. - Used to model the interface of the outputs of the
FPGA with the outside environment. - To specify the output delay, use the following
constraints - define_output_delay OutputPortName Value
14Functional False Path
- define_false_path allows user to specify paths
which will be ignored for timing analysis, but
will still be optimized, without priority within
Synplify. - The following options are available
- -from lt a register or input pingt
- -to lta register or output pingt
- -through ltthrough a net signalgt
- Example
- define_false_path -from Register_A
- define_false_path -to Register_B
- Paths to Register_B are ignored
- define_false_path -through test_net
- Paths through Int_Net are ignored
15Agenda
- Advanced VHDL
- ProASICPlus Synthesis, Options and Attributes
- Timing Specifications
- Design Hints
- Power-Conscious Design Techniques
- Summary
16Late Arrival Signals Prioritization
-- Initial Description case State is when WAIT
gt if Critical then Target
lt Source_1 else Target lt Source_2
end if when ACTIVE gt if Critical
then Target lt Source_1 else
Target lt Source_3 end if when . end
case
-- Modified Description if Critical then Target
ltSource_1 else case State is when WAIT
gt Target lt Source_2 when ACTIVE gt
Target lt Source_3 when . end
case end if
State
State
Target
Source_2
Target
Source_1
Source_1
Critical
Critical
17Late Arrival Signal Another Hint !
Max
. begin if ((A_late B) gt Max) then
Out C else Out D end if
end Process
gt
C
Out
mux
D
A_late
gt
if ((B - Max) gt A_late) Out C else Out D.
C
Out
mux
D
18Signal vs Variable
- Variable assignments are sensitive to order.
- Variables are updated immediately
- Signal assignments are order independent.
- Signal assignments are scheduled
Process (Clk) begin if (ClkEvent and
Clk1) then Trgt1 lt In1 xor In2
Trgt2 lt Trgt1 Trgt3 lt Trgt2
end if end process
Signal vTarg3 std_logic Process
(Clk) Variable vTarg1, vTarg2 ... begin if
(ClkEvent and Clk1) then vTrgt1
In1 xor In2 vTrgt2 vTrgt1
vTrgt3 lt vTrgt2 end if end process
Process (Clk) Variable vTarg1, vTarg2 ...
begin if (ClkEvent and Clk1) then
Trgt3 lt vTrgt2 vTrgt2
vTrgt1 vTrgt1 In1 xor In2 end
if end process
Process (Clk) begin if (ClkEvent and
Clk1) then Trgt2 lt Trgt1
Trgt3 lt Trgt2 Trgt1 lt In1 xor
In2 end if end process
Trgt3
Trgt3
Trgt3
19Resource Sharing and Operand Alignment
With Resource Sharing (Smaller)
Operand Alignment (Faster)
HDL Code
process (X, Y, Z, Sel) begin if (Sel 0)
then Res lt X Y else
Res lt Y Z end if end process
() Especially if Y is a Late Arrival Signal
Without Resource Sharing (Larger and Slower)
Implementations
20Resource Sharing to Avoid
Sel
With Resource Sharing (Larger and Slower)
X
16
VHDL Code
mux
1
Y
16
Eq
Z
process (X, Y, Z, T, Sel) begin if (Sel 0)
then Eq lt (X Y) else
Eq lt (Z T) end if end process
mux
T
Sel
1
Without Resource Sharing (Smaller and Faster)
Eq
1
Implementation
21Internal Three-state Buffers
- At the VHDL Level
- Either Using the Multiplexer based modified VHDL
code, or - Replace the three-state structure using the
equivalent following AND-OR structure
tri_out
tri_en1
tri_in1
tri_en2
tri_in2
tri_en3
tri_in3
tri_en4
tri_in4
tri_out
mux_out
22Agenda
- Advanced VHDL
- Power-Conscious Design Techniques
- Data Path Selection
- FSM Encoding
- Gating Clocks and Signals
- Advanced Power Design Practices
- Summary
23Sources of Dynamic Power Consumption
- Switching
- CMOS circuits dissipate power during switching
- The more logic levels used, the more switching
activity needed - Frequency
- Dynamic power increases linearly with frequency
- Loading
- Dynamic power increases with capacitive loading
- Glitch Propagation
- Glitches cause excessive switching to occur at
relatively high frequencies. - Clock Trees
- Clock Trees operate at high frequency under heavy
loading, so they contribute significantly to the
total power consumption.
24Data Path Elements Selection
- Basic block selection is critical as the
power/speed tradeoff has to be well identified - Power is switching activity dependent, thus input
data pattern dependent - Watch the architecture of the basic arithmetic
and logic blocks - Check area/speed and fanout distribution/number
of logic levels - High fanout large number of logic level
higher glitch propagation - Investigate pipelining effect on power
dissipation - Impact on clock tree power consumption
- Impact on block fanout distribution
25Data Path Architectures
- Adders Architectures
- Architecture Evaluation
- Test Results
- Multipliers
- Architectures and Power Implications
- Pipelined Configurations
- Pipeline Effect on Power
- Pipelining vs re-Timing
26Review Ripple Adder
Carry signal switching propagates through all
the stages and consumes Power
27Review Carry Look-Ahead Adder
- Carry signal switching propagates through less
stages - However, higher number of Logic Level
28Carry Select Adder Overview
Principle Do it twice (considering Carry0 and
Carry1) then when actual Carry is ready,
Select appropriate result
- Carry signal switching propagates through less
stages - However, higher duplication and complexity
29Adder Architectures
Forward Carry Look Ahead (CLF) Fastest but also
largest Brent and Kung (BK) Almost same speed as
CLF but drastically smaller Carry Look Ahead
(CLA) Relatively small and slow Ripple (RPL)
Smallest but slowest
Brent and Kung Best area/speed tradeoff
30Adders Power Dissipation
- Brent and Kung Lowest Power Dissipation
- Lowest logic levels
- Lowest fanout
31Data Path Architectures
- Adders Architectures
- Architecture Evaluation
- Test Results
- Multipliers
- Architectures and Power Implications
- Pipelined Configurations
- Pipeline Effect on Power
- Pipelining vs re-Timing
32Multipliers Power Consumption
- Wallace Advantages Over Carry-Save Multiplier
(CSM) - Uniform switching propagation
- Less logic levels
- Lower average fanout
33Data Path Architectures
- Adders Architectures
- Architecture Evaluation
- Test Results
- Multiplier
- Architectures and Power Implications
- Pipelined Configurations
- Pipeline Effect on Power
- Pipelining vs re-Timing
34 Pipelining for Glitch Reduction
- A logically deep internal net is typically
affected by more primary inputs switching, and is
therefore more susceptible to glitches - Pipelining shortens the depth of combinatorial
logic by inserting pipeline registers - Pipelining is very effective for data path
elements such as parity trees and multipliers
35Pipelining Effect on Power
Pipelining increases clock tree power, but
overall power is lowered
36Pipelining vs. Re-timing
- Pipelining introduces new registers
- Re-timing does not introduce new registers
- Example FIR re-timing
- Re-timing also reduces power
- Registers prevent glitch propagation through high
logic-level paths (ie mulitpliers)
37Agenda
- Advanced VHDL
- Power Conscious Design Techniques
- Data Path Selection
- FSM Encoding
- Gating Clocks and Signals
- Advanced Power Design Practices
- Summary
38FSM and Counter Encoding Impact on Power
39Counters and FSMsState Register Transitions
40Counters Power Measurement on ProASIC
Power dissipation for 200 instances of 8
bit-counters As expected Gray counters dissipate
less power (25)
41FSM Encoding Effects on Power
42Agenda
- Advanced VHDL
- Power Conscious Design Techniques
- Data Path Selection
- FSM Encoding and Effect on Power
- Gating Clocks Signals
- Advanced Power Design Practices
- Summary
43Signal Gating
- There are several logic implementations of signal
gating
Latch or FF
Tri-state buffer
44Gating Clocks
- Most Used mechanism to gate clocks
Data_Out (N Bits)
New_Data
New_Data (N Bits)
LD_Enable
FSM
FSM
L A T C H
LD_Enable
CLK_En
CLK
CLK
Gating clock signals with combinatorial logic is
not recommended. Glitches are easily created by
the clock gate which may result in incorrect
triggering of the register
45Gating Signals Address Decoder Example
OUT0
IN0
IN1
OUT1
OUT2
Enable/Select
OUT3
A switching activity on one of the input of the
decoder will induce an large number of toggling
outputs Enable/Select signal prevents the
propagation of their switching activity
46Agenda
- Advanced VHDL
- Power Conscious Design Techniques
- Data Path Selection
- FSM Encoding and Effect on Power
- Gating Clocks and Signals
- Advanced Practices
- Summary
47VHDL Coding Effect on Power
- Example IF THEN . ELSE .
- Re-organizing the code helps to prevent
propagation of switching activity
48Delay Balancing
- If all primary inputs have the same arrival time
and the same switching probability, balancing
trees eliminates switching propagation
Un-Balanced
Balanced
49Guarded Evaluation
- Technique used to reduce switching activity by
adding latches or floating gates at the inputs of
combinatorial blocks if their outputs are not
used. - Example Results of multiplier may or may not be
used depending on the condition, Adding
transparent Latches or AND gates on the inputs
avoids power dissipation as they mask useless
input activity.
50Pre-computation Based Power Reduction
Common Clock
Combinatorial Logic
R1
Pre-Computation Input
Outputs
R2
Gated Input
Pre-Computation Logic
51Operator Reduction
- Based on transformations of operations into
computationally equivalent implementations - Example Distributive Multiplication over
Addition (resource sharing) - (XY) (ZY) (XZ) Y
52Input Signals Ordering
- Never forget that adders are commutative and
associative - Amplitude of IN is larger than the amplitude of
IN gtgt 7 and IN gtgt 8
Switching Probability
INgtgt7
INgtgt8
IN
Sign Bit Correlation
2 4 6 8 10 12 14 ..
Bit Number
53Summary
- Advanced VHDL Design Tips
- Identify critical and late arrival signals in
your design - Write code in a way that reduces the logic levels
for such signals - Perform functions such as state determination
while waiting for late signals - Low Power Design Techniques
- Reduce switching activity per clock cycle
- Reduce propagation of switching activity
- Use power-efficient architecture and encoding
- Disable logic blocks whose outputs are not used
- Re-evaluate expressions to achieve the above
54Additional Resources
- Documents available on http//www.actel.com
- Low Power Resource Center
- http//www.actel.com/products/rescenter/power/inde
x.html - Power Conscious Design with ProASIC
- http//www.actel.com/documents/PowerConscious.pdf
- Low Power Design for Antifuse FPGAs
- http//www.actel.com/documents/lowpower.pdf