Title: Low Power Hardware Synthesis from Concurrent Action Oriented Specifications CAOS
1Low Power Hardware Synthesis from Concurrent
Action Oriented Specifications (CAOS)
- Sandeep K. Shukla
- Gaurav Singh
- FERMAT Lab, Virginia Tech.
2Outline
- CAOS Scheduling Problem
- Complexity Analysis
- Peak Power Problem
- Complexity Analysis
- Technique Rescheduling ( suppressing actions )
- Dynamic Power Problem
- Complexity Analysis
- Techniques Rescheduling, Operand Isolation,
Clock Gating, - Gated Guards.
3- CAOS Scheduling Problem
- ( Complexity Analysis )
4SCHEDULING PROBLEMS WITHOUT A PEAK
POWERCONSTRAINT
- Maximum Non-conflicting Subset of actions (MNS)
- Choosing actions which can execute in a clock
cycle. - Minimum Length Schedule Construction (MLS)
- Distributing actions over multiple clock cycles.
5MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
-
- Instance - Set A a1, a2, , an of enabled
actions a collection - C of pairs of actions, where ai, aj ? C
means that actions ai and - aj conflict an integer K n.
- Question - Is there subset A C A such that A
gt K and no pair of - actions in A conflict?
- MNS problem is NP-Complete.
- Corresponds to Maximum Independent Set (MIS)
Problem.
6MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
- NOTE - For any ? 1, a ?-approximation
algorithm for a - combinatorial optimization problem is a
heuristic that produces a - solution which is within a factor ? of the
optimal solution value. - It is known that for any ? gt 0, there is no O(n1-
?) - approximation - algorithm for the MIS problem, unless P NP.
- Same holds for MNS Problem.
7MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
-
- SOLUTION - Heuristics with good performance
guarantees can be - devised by exploiting the relationship between
MNS and MIS - problems.
- SPECIAL CASES
- Each action conflicts with at most ? other
actions for some constant ?- - Approximation algorithm exists that provides a
performance guarantee of ?1. - Planar graphs, near-planar graphs and unit disk
graphs- - Efficient approximation algorithms are known for
such classes of graphs.
8MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
-
- Instance - Set A a1, a2,,an of actions a
collection C of - pairs of actions, where ai, aj ? C means that
actions ai and aj - conflict, an integer t n.
- Question - Is there a partition of A into r
subsets A1, A2,...,Ar for - some r t such that for each i, 1 i r, the
actions in Ai are - pair-wise non-conflicting?
- MLS problem is NP-Complete.
- Corresponds to Minimum K-coloring (MINCOLOR)
Problem.
9MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
- It is known that for any ? gt 0, there is no O(n1-
?) - approximation - algorithm for MINCOLOR problem, unless P NP.
- Same holds for MLS Problem.
10MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
- SOLUTION Heuristics for graph coloring can be
used in - constructing schedules of near-minimum length.
- SPECIAL CASES
- Upper bound on the length of schedule is two -
- Corresponds to the problem of determining whether
a graph is 2-colorable. - Efficient algorithms are known.
- Each action conflicts with at most ? other
actions - For such instances, a schedule of length at most
? 1 can be constructed in polynomial time.
11- PEAK POWER PROBLEM
- ( Complexity Analysis )
12SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
-
- Single Clock Cycle
- Maximum Number of Actions in a Time Slot Subject
to Peak Power Constraint (MNA-PP). - Maximizing Utility Subject to Peak Power
Constraint (MU-PP).
13Maximum Number of Actions in a Time Slot Subject
to Peak Power Constraint (MNA-PP).
- Instance
- set A a1, a2,, an of non-conflicting
actions, - for each action ai, the power pi needed to
execute that action, - a positive number P representing the peak power
constraint. - Requirement - Find a subset A C A such that -
- total power needed to execute actions in A is at
most P and - A is a maximum over all subsets of A that
satisfy peak power constraint. - Optimal Solution -
- Sort actions in A into non-decreasing order by
the amount of power. - Keep adding actions in order as long as the peak
power constraint is satisfied.
14Maximizing Utility Subject to Peak Power
Constraint (MU-PP)
- Instance
- set A a1, a2,,an of non-conflicting actions,
- for each action ai, its power pi consumed and its
utility ui, - a positive number P representing the peak power,
- a positive number G representing the required
utility. - Question - Is there a subset A C A such that
the total power needed to execute all the actions
in A is at most P and the utility of A is at
least G ? - MU-PP problem is NP-Complete.
- Corresponds to KNAPSACK Problem.
15Maximizing Utility Subject to Peak Power
Constraint (MU-PP)
- Any approximation algorithm for the KNAPSACK
problem can be used as - an approximation algorithm with the same
performance guarantee for the - optimization version of MU-PP
- When the weights and profits are integers, there
is a polynomial time - approximation scheme (PTAS) for the KNAPSACK
problem.
16SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
- Multiple Clock Cycles
- Minimizing Makespan Subject to Peak Power
Constraint - (MM-PP).
- Minimizing Peak Power Subject to Makespan
Constraint - (MPP-M).
- Minimizing Makespan and Peak Power Decision
Version - (MPP-DECISION)
17Minimizing Makespan Subject to Peak Power
Constraint (MM-PP)
- Instance
- set A a1, a2,,an of non-conflicting actions,
- for each action ai, the power pi needed to
execute that action, - a positive number P representing the peak power
-
- Requirement
- Find a schedule of minimum length for the
actions in A such that the total power needed to
execute the actions in each time slot is at most
P.
18Minimizing Peak Power Subject to a Makespan
Constraint (MPP-M)
- Instance
- set A a1, a2,,an of non-conflicting actions,
- for each action ai, the power pi needed to
execute that action, - a positive number L representing the makespan
(number of slot used by a schedule). -
- Requirement
- Find a schedule of length at most L for the
actions in A such that the maximum total power
used in any time slot is a minimum over all
schedules of length at most L. -
- NOTE - MPP-M is dual of MM-PP.
19Minimizing Makespan and Peak Power
(MPP-DECISION) Decision Version of MM-PP and
MPP-M.
- Instance
- set A a1, a2,,an of non-conflicting actions,
- for each action ai, the power pi needed to
execute that action, - a positive number P representing the peak power,
- a positive number L representing the makespan.
-
- Question
- Is there a schedule of length at most L for the
actions in A such that the - total power used in any time slot is at most P ?
- MPP-DECISION problem is Strongly NP-Complete.
- Corresponds to 3-PARTITION problem.
- No pseudo-polynomial algorithm for the
MPP-DECISION problem, unless - P NP.
20Approximation Algorithms for MM-PP
- Efficient approximation algorithms possible by
reducing the - problem to the well known BIN PACKING problem.
- Example - Simple algorithm called First Fit
Decreasing (FFD) - provides a performance guarantee of 11/9.
- Sort items in non-increasing order of their sizes
and then assign - each item to the first bin in which it will fit.
- Sophisticated implementation reduces the running
time to O(n log n).
21Approximation Algorithms for MPP-M
- Efficient approximation algorithms possible by
reducing the - problem to classical multiprocessor scheduling
problem. - Example
- 4/3 approximation algorithm -
- Sort the actions in non-increasing order of their
power requirements. - Assign each action to a time slot for which the
total power used is the smallest at that time. - Can be implemented to run in O(n log n) time.
22LOW PEAK POWER TECHNIQUE
- Re-scheduling Suppress some actions in each
cycle to reduce peak power of the
design. - Possible Ways
- Conflict - based
- Add extra conflicts for peak power sake.
- Memory - based
- Use memory to select how many actions to execute
in each cycle.
23MEMORY-BASED LOW PEAK POWER TECHNIQUE
-
- ALGORITHM -
- Arrange actions based on their TRS ordering.
- Find possible combinations of non-conflicting
actions which can violate the peak power
constraint when executed concurrently. - For each violating combination -
- find a satisfying combination by suppressing some
actions. - give priority to actions which come earlier in
TRS-ordering. - store the satisfying combinations in a memory.
- In hardware, memory is used to execute
appropriate actions in each clock cycle in order
to satisfy the peak power constraint.
24MEMORY-BASED LOW PEAK POWER TECHNIQUE
- Implemented in Bluespec Compiler
- Around 10 peak-power savings achieved for small
designs like - Vending Machine.
- Larger power savings may be possible for larger
designs - Experiments Ongoing.
25MEMORY-BASED LOW PEAK POWER TECHNIQUE
-
- LIMITATIONS -
- Some designs written under the assumption that
maximum number of - actions will execute in each clock cycle might
not be able to use this - technique.
- Increases latency so applicable mostly to
latency-insensitive designs. - Designs with large number of actions may result
in a big memory.
26- DYNAMIC POWER PROBLEM
- ( Complexity Analysis )
27DYNAMIC POWER PROBLEM (DPP)
- Instance
- - set A a1, a2,,an of actions.
- - a positive integer P representing dynamic
power consumed. - Requirement -
- Select the ordering of execution of actions in A
such that P is minimized. - DPP is NP-Complete.
- Corresponds to Traveling Salesman Problem -
sub-problem to DPP.
28LOW DYNAMIC POWER TECHNIQUES
- Re-scheduling.
- Operand Isolation.
- Clock Gating.
- Gated Guards.
29RE-SCHEDULING
- Actions can be re-scheduled such that switching
at the inputs of the functional units is
minimized. - Resource sharing - Conflicts can be created such
that same functional units can be shared among
actions consisting of same operations on same
operands.
30OPERAND ISOLATION
- Operand Isolation
- Computation corresponding to the body of an
action is allowed only when its output is used in
the present clock cycle. - Involves -
- Insertion of gates at the appropriate points
without affecting guards. - Selection of activation signal.
- Guards of actions used as gating signals.
- Implemented algorithm in Bluespec Compiler saved
upto 25 dynamic power.
31OPERAND ISOLATION SINGLE ACTION
Computations stay quiescent except when action
executes, i.e. guard is True
action foo ( cond (x lt y) ) x lt x z
endrule
x
x
action foo
y
y
next-state values
F2
z
z
next state
Q
D
body logic
current state
EN
cond logic
enablesignals
32OPERAND ISOLATION MULTIPLE ACTIONS
Rule1
Rule Control
State
DataSelect
RuleN
F2
Action1
FN
ActionN
Cond1
Scheduler
CondN
- Isolating multiple actions of a design.
33REGISTER CLOCK GATING
- Register Clock-gating -
- Registers having a common ENABLE signal can be
provided the same gated clock. - CAOS - Registers being updated by same set of
actions can be passed the same gated clock. - Implemented algorithm in Bluespec Compiler saved
upto 45 dynamic power.
34REGISTER CLOCK GATING
CLK
Register
DIN
EN
QOUT
GATED_CLK
GATED_CLK
EN
CLK
- In CAOS, guards of the actions provide the
control for gating the clocks of the registers.
35GATED GUARDS
- In hardware, only required guards should be
computed in each clock cycle for power sake. - Static analysis can be done to figure out which
guards should be - computed.
36Gated Guards
- Rule 1 (x gt y) (y ! 0) --gt (x y y x)
- Rule 2 (x lt y) (y ! 0) --gt (y y - x)
- Rule 3 (y 0) --gt (result x)
- Let P ( x gt y) Q (y 0)
- Then g1 P !Q
- g2 !P !Q
- g3 Q
- ------------------------------------------
- g1 g2 false
- g1 g3 false
- g3 g1 false
-
37Gated Guards
- What else can we infer?
- (x gt y), (y ! 0), (x y), (y x)
- --------------------------------------------------
---- - (x lt y) (y ! 0) OR (y 0)
- So after Rule 1 execution, we know for sure, G1
cannot be true, but G2 or G3 may be true, and
hence G1 need not be evaluated. Also prioritize
G3.
38Gated Guard
- Gcd (70, 42)
- x 70, y 42 --gt Rule 1
- x 42, y 70 --gt Rule 2
- x 42, y 28 --gt Rule 1
- x 28, y 42 --gt Rule 2
- x 28, y 14 --gt Rule 1
- x 14, y 28 --gt Rule 2
- x 14, y 14 --gt Rule 2
- x 14, y 0 --gt Rule 3
- result 14
39Gated Guard
- Use a F/F that gets value 1, when Rule 1 is
fired, and becomes 0, when other rules are fired.
- If this F/F holds a value 1, evaluate only G3 and
then G2. - Unless Rule 1 is fired, this F/F stays at 0, and
hence can be clock gated most of the time. - This example may not be very useful, as the
guards are simple to evaluate, but guard calculus
on complex guards can lead to savings.
40GATED GUARDS
- Theorem proving techniques can be used for
deductions. - Such analysis can be done for more complicated
designs. - A memory in hardware can be used to store the
information about which guards need not be
computed in the present clock cycle.
41Thank You !!