Title: Low Power Memory Partitioning
1Low PowerMemory Partitioning
2Topics
- Memory synthesis for Embedded Systems flow from
logical design to physical design - Focus on low power solution. A new memory (RAM)
organization, based on the memory access profile
of the application, will be described
3Embedded Systems
- A combination of computer hardware andsoftware
designed to perform a dedicated function - Properties
- Single purpose
- Repetitive
- Limited Resources
- Outcome - Predictable behavior
4System-on-a-Chip
- System-on-a-Chip (SoC) is a single chip solution,
which implements most of the functions of an
Embedded System product - Placing memory and CPU on the same chip will
reduce energy consumption (less wire length) - On chip Memory occupies large area of the
chip.Today such memories contain up to 128Kb.
Infrequently accessed addresses will reside in
an external (off chip) memory bank.
5Why to Reduce Energy Consumption
- Problems
- Heat bad for durability, need fanning
- Short battery life
- Important in portable embedded products
- TradeoffWilling to sacrifice area and
performance for reduced energy consumption
6Solution
- Memory has 2 operation modes
- Active
- Stand by (Sleep mode) Info is stored but cannot
be accessed - Stand by mode consumes much less energy
- SRAM is used because it doesnt need to be
refreshed
7Solution (cont.)
- The memory will be split into several partitions
- At any time, only one partition will be active,
while others will be in stand by - Changing modes is done according to the R/W
requests - The saving is achieved by paying energy toll
only for the active partition, instead of the
whole memory
8Solution (cont.)
- Solution steps
- Analyze memory access pattern of the application
- Invoke algorithm for splitting the memory to
partitions that provide minimal energy
consumption - Create memory partitions based on the algorithm
output - Create control unit the Decoder
- Place and Route
9Partitioning Example
- Highly non uniform access profile with clustered
high access addresses lead to better energy
savings - Uniform access profile or scattered high access
addresses lead to minimal energy savings
10The Algorithm
11The Algorithm purpose
- Get a memory cut set, representing the
partitions sizes, that provides minimal energy
consumption - The cut set will have up to MaxP elements
- Each cut will be a continuous block of memory
- The sum of the cuts lengths is M, the number of
distinct addresses in the data memory to be
partitioned
12The Input ? vector
- ? 0,?1,?2 , ?MaxP-1
- ?i The energy overhead of moving from i to i1
partitions (?0 monolithic memory) - This energy overhead is consumed by the control
unit and the additional wires, and is added to
the cost of each memory access, without relation
to the partition size
13The Input ? vector (cont.)
- The partitioning algorithm must compensate for
this overhead and also present energy consumption
improvement - ? is a conservative estimation that prevents us
from making marginal partitioning that wont
improve, or even impair, the energy consumption
in real life - Fine tuning of ? will improve the energy savings
- ? depends mostly on the number of partitions, and
is almost insensitive to the size of the
partitions
14The Input ? vector (cont.)
- 2 ways to acquire ? values
- Create synthetic partitions (not based on the
algorithm)create control unitmeasure the ? in
the model - Choose a conservative ? estimationrun the
algorithm for some samplescreate the partitions
based on the algorithms outputcreate the
control unitmeasure the ? in the modeliterate
the process with new ? until convergence is
reached
15The Input Memory Access Analysis
- We need an analysis of the memory access pattern
of the application - It can be obtained because of the predictable
behavior of embedded applications - The analysis will be stored in 2 vectors of size
Mr(i) stores the number of reads from address
iw(i) stores the number of writes to address i
16The Input Memory Access Analysis
- The analysis can be obtained in 2 ways
- Dynamic method Execution profiling instruction
level simulators - Static method Data flow analysis profiles
accesses of a graph representation of a program - The static method, when performed on an accurate
model, will provide more accurate results
17The Input the Cost Function
- MemE(lo,hi,w,r) hi
hiEr(hi-lo)?r(i)
Ew(hi-lo)?w(i) ilo
ilo - Er(d) energy consumption for a read access in a
memory of d words - Ew(d) energy consumption for a write access in
a memory of d words - The amount of energy consumed by the partition
whose bounds are (lo,hi) and its access profiling
is given in r() and w()
18The Algorithm
- Pk (0,hi0,hi01,hi1, hi11,hi2,hik-21
,M-1) - MinConsump energy in monolithic memory
- for k 2 to MaxP
- for each cut set of k partitions ? Pk
- sum overhead for k-partit. (?0?1 ? ? ?
?k-1) - for each partition lo,hi in cut set
- sum MemE(lo,hi,w,r)
- if sum lt MinConsump then
- save the new minimal energy consumption
19Complexity
- Checking k-way partitioning takes
operations gt O(Mk-1) - Checks for maximum of MaxP-way partitioning (the
best solution out of 1,2,..MaxP partitions)
takes O(1) O(M) O(M2) O(MMaxP-1) gt
O(MMaxP-1) - Therefore MaxP 4 is a practical limitation
- Improvement Work with bigger basic units,
meaning that each partition size must be a
multiple of t, where t gt 1
20Memory Logic Generation
21Decoder Generation
- The decoder has one input from the CPU address,
and has 2 outputs to the partitions memory
select and relative address - Memory Select selects the active memory
partition based on the input address. It also
deactivates the partitions that are not currently
needed - Address translation the input absolute address
is converted to an address relative to the active
partition
22Decoder Generation (cont.)
- The decoder is placed between the CPU and the
memory partitions in order to reduce wire length - Careful design is needed not to increase system
delay and thus cause performance degradation - Example2 memory partitions 0-148,
149-255Absolute address 168Relative address
168-149 19The decoder logic is not
straightforward it is not made of simple logic
gates, but a series of comparators and subtractors
23Decoder Generation (cont.)
24Memory Physical Design
25Memory Generation
- Memory Generation Tool a tool that generates
physical layout of a memory component, based on
requested memory size and aspect ratio (the
proportion between the height and width of the
block) - The algorithm produced the optimal memory cut
set, in terms of energy consumption, based on
execution profiling - Since the algorithm is able to generate
partitions of any size, but the memory generation
tool is not that flexible, a match to applicable
values is done
26Block Placement and Routing
- We have generated memory partitions and decoder
physical layouts. Now we need to place and route
them along with the CPU
27Block Placement
- 2 Placement methods available
- Automatic placement of blocks with no insight
of their functionality. We will choose a
placement method that is motivated to reduce wire
length in the routing phase. - Manual using the knowledge of the pins
positions and block functionality to ease the
routing phase. Bus channel arrangement will be
used. In case of highly asymmetrical partitions,
significant area is wasted. - Since automatic placement gives more optimal
results, it is preferred
28Routing Phase
- 2 methods for routing
- Automatic - with wire length as cost function
- Manual
- Since automatic routing gives more optimal
results, it is preferred - Automatic search and repair phase will resolve
geometric and delay violations
29Experiments Results
30Lower Bound for Energy Savings
- For each memory access we pay a fixed overhead
and the energy consumed by the memory partition,
which is proportional to its size - ? 0,20,15,10 is a conservative overhead
estimation, which values are in respect to the
energy consumption of the monolithic memory - The simplest case is partitions of even size,
regardless of the memory access pattern
31Lower Bound for Energy Savings
- Expected lower bound for energy savings 31
- These results are for the worst case uniform
access profile. Highly non uniform access
profile with clustered high access addresses will
provide better results
32Energy Savings vs. Area Penalty
Area penalty is the ratio between the sizes of
the partitioned memory and the monolithic memory
33Energy Breakdown
Â
34Software Upgrade Problem
- The suggested solution gives an optimal result
for a given application access profile - A software upgrade will probably present a
different access profile - Applying the software upgrade to the existing
chip, will therefore provide non optimal energy
consumption
35Suggestion for Improvement
- A highly non uniform access profile that is
scattered all over the memory cannot be
optimized, because of the limitation that
partitions must be contiguous memory blocks - A compiler/linker suite that uses the access
profile to group highly accessed memory locations
together, will solve this problem
36Conclusions
- The new memory component can achieve a 35 energy
savings over the monolithic memory - This memory component will be about 2 times
bigger in size than the monolithic memory - The memory component will not impair performance
in careful design
37What Have We Learned?
- The importance of reducing energy consumption in
Embedded Systems - We have seen a complete design flow of a memory
component - The algorithm for partitioning a memory, in order
to achieve minimal energy consumption - The logic component that manages this new memory
model was introduced - The physical design considerations that were
taken in the transition from logical view to
physical view