Title: OPTIMAL FSMD PARTITIONING FOR LOW POWER
1OPTIMAL FSMD PARTITIONING FOR LOW POWER
- Nainesh Agarwal and Nikitas Dimopoulos
- Electrical and Computer Engineering
- University of Victoria
2Summary
- Power and energy
- Power gating
- Partitioning as means to achieve optimal power
gating - What next
3Computation Power and Energy
- What is the minimum energy a computation can
expend? - Are we there yet?
4Computation Power and Energy contd
- Feynman gives a relation between free energy and
computation rate for reversible computation - E kTlogr
- Where r is the computation rate.
- This means that at the limit, we may expend zero
energy (when r 1) but then the computation will
take infinitely long.
5Computation Power and Energy contd
- For irreversible computation,
- ?EkTblog2
- Where b is the number of bits involved in the
computation (entropy)
6Computation Power and Energy contd
- In both cases, these quantities are wxceptionally
small. - k 1.380650410-23 J/K
- At T300ºK, kT 4.14x10-21J
- A 50W 3GHz processor, in one cycle, consumes
1.65x10-8J
7Computation Power and Energy contd
- DSPstone benchmarks synthesized in 180 nm and 90
nm technologies
8DSPstone dynamic energy
9DSPstone total energy
10Computation Power and Energy contd
- Computational energy is far above the theoretical
minimum (by more than 10 orders of magnitude) - Technological drive reduces total energy (an
order of magnitude per generation) - Leakage power has become an issue
- Power gating may provide efficiencies to further
scale the technology
11Partitioning
- Controller and datapath are considered together
- Problem is formulated as
- Integer Linear Programming
- Non-linear programming solved using simulated
annealing
12Notation
- si represents a state of a FSMD
- vk represents a variable associated with one
or more states - A variable vk is considered to be shared between
two states si and sj if the variable is read
and/or written at both states - Tij Is the total number of bits of all variables
shared by states si and sj - Eij is 1 if there is a transition between states
si and sj, otherwise it is 0.
13ILP formulation
- Minimizes the number of bits that are shared
between the partitions and the number of times
that control could between the partitions - sij is 1 if both states si and sj are in the
same partition. Otherwise, it is 0.
14ILP formulation - complete
15Simulated Annealing formulation
- xi is -1 if state si is in the left partition,
and it is 1 if si is in the right partition - These quantities count the number of variable
bits and transition edges shared between the two
partitions
16Simulated Annealing formulation
- simplification steps
- Observe that is constant (the total
number of variable-bits)
17Simulated Annealing formulation
- Minimizes both the shared bits and the transition
edges.
18Evaluation
- Implemented four integer algorithms
- 8-bit counter
- 5/3 wavelet transform using lifting
- multiplierless approximation to the eight-point
Discrete Cosine Transform (DCT) - Integer transform from the H.264 standard
- Used CoDeL to implement the designs.
- Trace data were obtained from simulations using
Synopsys - The ILP model was solved using the CPLEX solver
included in the AIMMS modeling environment - The simulated annealing used MATLAB
19Evaluation contd
- Power savings were estimated (no partitioned
design implementation yet) - The static power savings depends on the size of
the sequential logic and the portion of time
spent in each partition. - The dynamic power savings depends on the number
of bits that are not clocked while the partition
is not powered mediated by the overhead due to
data communication when the active partition
changes.
20Evaluation (Static Power savings)
21Evaluation (Dynamic Power Savings)
22Results (ILP)
23Results (Simulated Annealing)
24Discussion
- Results show that partitioning the control and
datapaths could potentially save up to 50 of
power (static power) - Some circuits could not partition (DWT includes
one tight loop where it spends more than 90 of
the time) - Simulated annealing and ILP (for the partitioned
circuits) give identical results. - Simulated annealing is much faster.
25Future
- Extend methodology to more than 2 partitions
- Implement the partitioned FSMD machines and
confirm the realized power savings - Lower energy!