Lecture 7 Behavioral Synthesis for Low-Power - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Lecture 7 Behavioral Synthesis for Low-Power

Description:

Title: Testing in the Fourth Dimension Author: pagrawal Last modified by: bushnell Created Date: 11/3/2000 2:09:08 AM Document presentation format – PowerPoint PPT presentation

Number of Views:318

Avg rating:3.0/5.0

Slides: 43

Provided by: pagr59

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 7 Behavioral Synthesis for Low-Power

1
Lecture 7Behavioral Synthesis forLow-Power

Behavioral Level Transforms
Potential for large power reduction
Summary

Michael L. Bushnell
CAIP Center and WINLAB
ECE Dept., Rutgers U., Piscataway, NJ

2
Motivation

Conventional automated layout synthesis method
Describe design at RTL or higher level
Generate technology-independent realization
Map logic-level circuit to technology library
Optimization goal shifting from low-area to
low-power and higher performance
Need accurate signal probability/activity
estimates
Consider low-power needs at all design levels

3
Behavioral-Level Transformations
4
Algorithm-Level Power Reductions vs. Other Levels
5
Differential Coefficients for Finite Impulse
Response (FIR) Filters

Discrete-time Linear Time-Invariant FIR system
Ci are the filter coefficients
N is taps or filter length
Differential Coefficients Method (DCM) reduces
computations to save power
Uses differences between coefficients rather than
direct-form computation
Uses various orders of differences
Requires more storage devices and storage
accesses

6
First-Order Differences

3 Consecutive outputs Y
Rewrite product terms
Except for C0, can express each coefficient as
the sum of preceding coefficient and difference
between it and the preceding coefficient

7
First-Order Differences (contd)

Store product terms and reuse them for next
output time period
Need only 1 extra addition per product term and 1
storage element
Store C0 and ds
Trade off long multiplier for a short one and
storage overheads
d1k-1/k is first-order difference between Ck and
Ck-1

8
Orders of Differences
9
Second-Order Differences

Coefficient
expressions
Needs just 2 extra storage variables and 2 extra
additions per product to compute FIR output with
2nd-order differences compared with direct form
computation

10
Generalized mth-Order and Negative Differences

mth-order differences require storage of m
intermediate results for each product term, of
size N, so need mN storage variables and m
additions per product term compared with direct
form
Differences can be positive or negative
Possible to get absolute value of partial product
with negative differences

11
Sorted Recursive Differences (SRD)

DCM only applicable to systems where envelope
generated by coefficient sequences (or
differences) is a smoothly-varying continuous
function
Mainly for low-pass FIR filters
Recursively sort coefficients and use various
orders of differences to reduce computation
Use transposed direct form of FIR output
computation
No restriction on applicable coefficient sequence
Word length reduction not the same for each
coefficient

12
Transposed Direct-Form (TDF) Computation

Compute all N product terms for particular data
before computing terms for next sequential data
Same throughput as for direct-form computation
Signal-flow graph

13
Signal Flow Graph for TDM Realization
14
Maximum Savings in Adds Using SRD
15
Frequency Response of SRD Low-Pass Filter and
Hamming Window
16
Savings in Adds for Low-Pass Filter

Black N 201, Grey N 101, White N 51

17
Savings in Shifts Using SRD
18
Least-Squares Coefficient Optimization for Filters

Find coefficient closest to desired coefficient,
but with fewest of 1 bits -- 1s called a
Goal is to reduce the of additions
Use sign-magnitude coefficient representation
k maximum code class allowed per coefficient
Use branch-and-bound method to solve integer
programming problem of selecting coefficient
approximations
Shown to reduce addition computations in
low-power filters by more than 40

19
Activity-Driven Architectural Transformations

Basic idea Power consumption in digital filters
depends on order of addition operators
Restructure addition tree to move adders with
higher-coefficient multiplications towards the
output
Higher-activity circuitry is moved closest to
root of addition tree
Definition of average signal activity over N
consecutive time frames

20
Data Flow Graph of IIR Filter
21
Perfectly Balanced Addition Tree
22
Filter Implementations

Can be bit-serial or word-parallel arithmetic
W bits fed in parallel to adders and multipliers
At time t 1, z of W bits change from time t
values
Activity b (t) z / W
b (t) is a random variable stochastic process
strict sense stationary
Average power dissipation proportional to
In bit-serial implementations, intra-word bit
differences, and not inter-word bit differences,
cause node activity

23
Architectural Transforms on DSP Filters

I inputs to word-parallel computation tree
I 2 l 1, l levels in tree
y S aj bj
Obtain minimum average value of qi over all
balanced adder nodes when
a1 a2 aI or a1 a2 aI
Minimum average value in a linear array of adders
when
a1 a2 aI

j I j 1

24
Linear Array of Adders

Assume mutually independent
inputs, but method works even
when signals correlate due to
reconvergent fanouts

25
Power Optimization Algorithm

Simulate circuit at functional level
Using random, mutually-independent input values
Note signal activities at all adder inputs
Restructure adder trees using above 2 hypotheses
Move additions with high activity closer to root
of computation tree
Recompute average activities
Iterate until no additional power is saved
Method shown to save up to 23 of power

26
Architecture-Driven Voltage Scaling

Scale down VDD to save power, but increases
circuit delay
Reduce delay by scaling down device sizes (less
C)
But interconnect C becomes dominant, not device C
Need architectural transformation to introduce
more parallelism to compensate for increased
delay
Introduce parallel or pipelined architecture

27
Example Original Data Path Operator
28
Redesigned Parallel Implementation gt 2 X Area
Increase
29
Redesigned Pipelined Implementation
30
Operation Reduction Methods

Reduce operators in data flow graph
Computes X3 AX2 BX C
Reduces C, but may slow down critical paths
Reduction maintaining throughput

31
Example

Reduction with less throughput

32
Operation Substitution Methods

Multiplication uses more energy than addition
Replace multipliers with adders in high-level
synthesis

33
Method and Results

Transformations
Common sub-expression utilization
Apply distributive law
Replace multiplication with repeated shifting and
adding
On an 11-tap FIR filter
Saved 62 of dissipated power

34
Precomputation-Based Optimization

Basic idea Precompute (with low-overhead
hardware) circuit output logic values 1 cycle
before they are needed
Use precomputed information in next clock cycle
to disable unneeded hardware, reduces switching
activity
Must be careful Precomputation hardware can add
to area and lengthen clock period

35
Precomputation Architecture
36
Explanation

When both functions are 0, indicates that no
prediction of output value is possible
When prediction happens (we know definitely that
output is 1 or 0), we turn off R2
Reduces activity in combinational logic block
R1 still is active, so Comb still computes
correct output
More effective if P (f1 f2) is large
For comparator, saves 50 of the power

37
Specific Comparator Example
38
Can Precompute Outputs Needed 2 or More Clocks
Later

Can reduce switching activity by 12.5

39
Example Adder-Comparator
40
Precomputation with Shannons Expansion Theorem
41
Summary

Behavioral or architectural level synthesis
Resynthesize state variable equations to save
power
Scale down supply voltage, and introduce
parallelism and pipelining to make up for
slow-down of hardware

42
Future Research Directions

Formal methods for behavioral or data flow level
to explore power reduction design space
Behavioral level power estimation algorithms
needed
Synthesis scheduling and data path allocation
algorithms should incorporate power tradeoffs

Write a Comment

User Comments (0)