Low Power Memory Partitioning - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Low Power Memory Partitioning

Description:

MemE(lo,hi,w,r) = hi hi. Er(hi-lo) r(i) Ew(hi-lo) w(i) i=lo i=lo ... sum = MemE(lo,hi,w,r) if sum MinConsump then. save the new minimal energy consumption ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 38
Provided by: Boaz
Category:

less

Transcript and Presenter's Notes

Title: Low Power Memory Partitioning


1
Low PowerMemory Partitioning
  • Boaz Moskovich

2
Topics
  • Memory synthesis for Embedded Systems flow from
    logical design to physical design
  • Focus on low power solution. A new memory (RAM)
    organization, based on the memory access profile
    of the application, will be described

3
Embedded Systems
  • A combination of computer hardware andsoftware
    designed to perform a dedicated function
  • Properties
  • Single purpose
  • Repetitive
  • Limited Resources
  • Outcome - Predictable behavior

4
System-on-a-Chip
  • System-on-a-Chip (SoC) is a single chip solution,
    which implements most of the functions of an
    Embedded System product
  • Placing memory and CPU on the same chip will
    reduce energy consumption (less wire length)
  • On chip Memory occupies large area of the
    chip.Today such memories contain up to 128Kb.
    Infrequently accessed addresses will reside in
    an external (off chip) memory bank.

5
Why to Reduce Energy Consumption
  • Problems
  • Heat bad for durability, need fanning
  • Short battery life
  • Important in portable embedded products
  • TradeoffWilling to sacrifice area and
    performance for reduced energy consumption

6
Solution
  • Memory has 2 operation modes
  • Active
  • Stand by (Sleep mode) Info is stored but cannot
    be accessed
  • Stand by mode consumes much less energy
  • SRAM is used because it doesnt need to be
    refreshed

7
Solution (cont.)
  • The memory will be split into several partitions
  • At any time, only one partition will be active,
    while others will be in stand by
  • Changing modes is done according to the R/W
    requests
  • The saving is achieved by paying energy toll
    only for the active partition, instead of the
    whole memory

8
Solution (cont.)
  • Solution steps
  • Analyze memory access pattern of the application
  • Invoke algorithm for splitting the memory to
    partitions that provide minimal energy
    consumption
  • Create memory partitions based on the algorithm
    output
  • Create control unit the Decoder
  • Place and Route

9
Partitioning Example
  • Highly non uniform access profile with clustered
    high access addresses lead to better energy
    savings
  • Uniform access profile or scattered high access
    addresses lead to minimal energy savings

10
The Algorithm
11
The Algorithm purpose
  • Get a memory cut set, representing the
    partitions sizes, that provides minimal energy
    consumption
  • The cut set will have up to MaxP elements
  • Each cut will be a continuous block of memory
  • The sum of the cuts lengths is M, the number of
    distinct addresses in the data memory to be
    partitioned

12
The Input ? vector
  • ? 0,?1,?2 , ?MaxP-1
  • ?i The energy overhead of moving from i to i1
    partitions (?0 monolithic memory)
  • This energy overhead is consumed by the control
    unit and the additional wires, and is added to
    the cost of each memory access, without relation
    to the partition size

13
The Input ? vector (cont.)
  • The partitioning algorithm must compensate for
    this overhead and also present energy consumption
    improvement
  • ? is a conservative estimation that prevents us
    from making marginal partitioning that wont
    improve, or even impair, the energy consumption
    in real life
  • Fine tuning of ? will improve the energy savings
  • ? depends mostly on the number of partitions, and
    is almost insensitive to the size of the
    partitions

14
The Input ? vector (cont.)
  • 2 ways to acquire ? values
  • Create synthetic partitions (not based on the
    algorithm)create control unitmeasure the ? in
    the model
  • Choose a conservative ? estimationrun the
    algorithm for some samplescreate the partitions
    based on the algorithms outputcreate the
    control unitmeasure the ? in the modeliterate
    the process with new ? until convergence is
    reached

15
The Input Memory Access Analysis
  • We need an analysis of the memory access pattern
    of the application
  • It can be obtained because of the predictable
    behavior of embedded applications
  • The analysis will be stored in 2 vectors of size
    Mr(i) stores the number of reads from address
    iw(i) stores the number of writes to address i

16
The Input Memory Access Analysis
  • The analysis can be obtained in 2 ways
  • Dynamic method Execution profiling instruction
    level simulators
  • Static method Data flow analysis profiles
    accesses of a graph representation of a program
  • The static method, when performed on an accurate
    model, will provide more accurate results

17
The Input the Cost Function
  • MemE(lo,hi,w,r) hi
    hiEr(hi-lo)?r(i)
    Ew(hi-lo)?w(i) ilo
    ilo
  • Er(d) energy consumption for a read access in a
    memory of d words
  • Ew(d) energy consumption for a write access in
    a memory of d words
  • The amount of energy consumed by the partition
    whose bounds are (lo,hi) and its access profiling
    is given in r() and w()

18
The Algorithm
  • Pk (0,hi0,hi01,hi1, hi11,hi2,hik-21
    ,M-1)
  • MinConsump energy in monolithic memory
  • for k 2 to MaxP
  • for each cut set of k partitions ? Pk
  • sum overhead for k-partit. (?0?1 ? ? ?
    ?k-1)
  • for each partition lo,hi in cut set
  • sum MemE(lo,hi,w,r)
  • if sum lt MinConsump then
  • save the new minimal energy consumption

19
Complexity
  • Checking k-way partitioning takes
    operations gt O(Mk-1)
  • Checks for maximum of MaxP-way partitioning (the
    best solution out of 1,2,..MaxP partitions)
    takes O(1) O(M) O(M2) O(MMaxP-1) gt
    O(MMaxP-1)
  • Therefore MaxP 4 is a practical limitation
  • Improvement Work with bigger basic units,
    meaning that each partition size must be a
    multiple of t, where t gt 1

20
Memory Logic Generation
21
Decoder Generation
  • The decoder has one input from the CPU address,
    and has 2 outputs to the partitions memory
    select and relative address
  • Memory Select selects the active memory
    partition based on the input address. It also
    deactivates the partitions that are not currently
    needed
  • Address translation the input absolute address
    is converted to an address relative to the active
    partition

22
Decoder Generation (cont.)
  • The decoder is placed between the CPU and the
    memory partitions in order to reduce wire length
  • Careful design is needed not to increase system
    delay and thus cause performance degradation
  • Example2 memory partitions 0-148,
    149-255Absolute address 168Relative address
    168-149 19The decoder logic is not
    straightforward it is not made of simple logic
    gates, but a series of comparators and subtractors

23
Decoder Generation (cont.)
24
Memory Physical Design
25
Memory Generation
  • Memory Generation Tool a tool that generates
    physical layout of a memory component, based on
    requested memory size and aspect ratio (the
    proportion between the height and width of the
    block)
  • The algorithm produced the optimal memory cut
    set, in terms of energy consumption, based on
    execution profiling
  • Since the algorithm is able to generate
    partitions of any size, but the memory generation
    tool is not that flexible, a match to applicable
    values is done

26
Block Placement and Routing
  • We have generated memory partitions and decoder
    physical layouts. Now we need to place and route
    them along with the CPU

27
Block Placement
  • 2 Placement methods available
  • Automatic placement of blocks with no insight
    of their functionality. We will choose a
    placement method that is motivated to reduce wire
    length in the routing phase.
  • Manual using the knowledge of the pins
    positions and block functionality to ease the
    routing phase. Bus channel arrangement will be
    used. In case of highly asymmetrical partitions,
    significant area is wasted.
  • Since automatic placement gives more optimal
    results, it is preferred

28
Routing Phase
  • 2 methods for routing
  • Automatic - with wire length as cost function
  • Manual
  • Since automatic routing gives more optimal
    results, it is preferred
  • Automatic search and repair phase will resolve
    geometric and delay violations

29
Experiments Results
30
Lower Bound for Energy Savings
  • For each memory access we pay a fixed overhead
    and the energy consumed by the memory partition,
    which is proportional to its size
  • ? 0,20,15,10 is a conservative overhead
    estimation, which values are in respect to the
    energy consumption of the monolithic memory
  • The simplest case is partitions of even size,
    regardless of the memory access pattern

31
Lower Bound for Energy Savings
  • Expected lower bound for energy savings 31
  • These results are for the worst case uniform
    access profile. Highly non uniform access
    profile with clustered high access addresses will
    provide better results

32
Energy Savings vs. Area Penalty
Area penalty is the ratio between the sizes of
the partitioned memory and the monolithic memory
33
Energy Breakdown
 
34
Software Upgrade Problem
  • The suggested solution gives an optimal result
    for a given application access profile
  • A software upgrade will probably present a
    different access profile
  • Applying the software upgrade to the existing
    chip, will therefore provide non optimal energy
    consumption

35
Suggestion for Improvement
  • A highly non uniform access profile that is
    scattered all over the memory cannot be
    optimized, because of the limitation that
    partitions must be contiguous memory blocks
  • A compiler/linker suite that uses the access
    profile to group highly accessed memory locations
    together, will solve this problem

36
Conclusions
  • The new memory component can achieve a 35 energy
    savings over the monolithic memory
  • This memory component will be about 2 times
    bigger in size than the monolithic memory
  • The memory component will not impair performance
    in careful design

37
What Have We Learned?
  • The importance of reducing energy consumption in
    Embedded Systems
  • We have seen a complete design flow of a memory
    component
  • The algorithm for partitioning a memory, in order
    to achieve minimal energy consumption
  • The logic component that manages this new memory
    model was introduced
  • The physical design considerations that were
    taken in the transition from logical view to
    physical view
Write a Comment
User Comments (0)
About PowerShow.com