Optimization techniques for high performance DSPs - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Optimization techniques for high performance DSPs

Description:

ACT (Munin, Power Profiling, Real-time PBC) Simulation & Simulation/Emulation Combos ... battery can grow to make the product expensive, unwieldy, and undesirable ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: robos
Category:

less

Transcript and Presenter's Notes

Title: Optimization techniques for high performance DSPs


1
SE 746-NT Embedded Software Systems
Development Robert Oshana Lecture
30 For more information, please
contact NTU Tape Orders NTU Media
Services (970) 495-6455
oshana_at_airmail.net
tapeorders_at_ntu.edu
2
Power Optimization
3
Power Agenda
  • Introduction
  • Silicon Story in a Nutshell
  • SW Management Makes a Difference
  • RTOS
  • Code gen
  • Measurement/Estimation/Analysis
  • CCP/Emulation
  • ACT (Munin, Power Profiling, Real-time PBC)
  • Simulation Simulation/Emulation Combos
  • Strategy Summary

4
The Power Pyramid
Measurement/Estimation
Confirm Predictions
P.S.
P.S.
Assure
Chips
Code Model.
Little Work In This Area
Software Management
Voltage Scaling
Consult
RTOS
Design Decisions Happen Here
Power Management
SI Story
Function Design
Predict
Circuit Design Techniques
Leakage
Switching
Power Components
5
Energy vs. Power
  • Power Energy / Time
  • Energy relates to battery life
  • How many mW over how long time
  • Power relates to heat dissipation and current
    draw
  • Temp. change avg. power over 1,000s -
    1,000,000s cycles
  • Intel Pentium-4 showed max change in temp. of 1
    degree/msec
  • Current draw avg. power over 100s 10,000s
    cycles
  • Due to chip, board level and power supply
    capacitance

6
Power optimization
  • Relatively little emphasis has been placed on
    code size optimization, and even less on power
    optimization
  • The growing embedded computing market is very
    different in that it does place a budget on
    everything it's all about cost
  • Code size relates directly to cost, as every
    extra memory chip needed makes the system cost a
    little bit more, which can be a big deal when you
    sell a million units

7
Power optimization
  • Power also relates to cost
  • if your application draws too much power, the
    required battery can grow to make the product
    expensive, unwieldy, and undesirable
  • How, then, to keep the power consumption under
    control if you're programming in C code?
  • make the application run in as few cycles as
    possible

8
Power optimization
  • Every instruction executed consumes power
  • cheapest instruction is one you don't execute at
    all
  • If you're meeting your real-time deadlines, you
    can reduce the clock speed
  • Power consumed is roughly proportional to the
    cube of the frequency
  • reducing the clock speed can be a big win

9
Power optimization
  • Apply lessons learned for code size reduction to
    power optimizations
  • The fewer instructions there are in your program,
    the less power it will use
  • Fewer instructions means a smaller memory
    footprint
  • Fewer memory chips need to be kept powered up
  • Fewer fetches are made from memory, which takes
    power
  • Your program is more likely to fit in the cache,
    from which fetches are lower power

10
Power optimization
  • Optimizing for speed and code size can go a long
    way toward reducing power
  • may be even larger gains to be made by exploiting
    hardware power reduction techniques
  • Some DSPs have multiple functional units
  • Automatically detects whether upcoming
    instructions will need a particular functional
    unit
  • Turns off power to unused units automatically (no
    programmer intervention)

11
C code modifications
  • First run the cycle profiler on your code
  • to identify the "hot spots
  • Code spends 90 of its time in a handful of loops
  • Since this is where you're spending most of your
    time, it's pretty likely where you're spending
    the most power
  • Focus your efforts on these loops, and you might
    not need to examine the other code

12
Power optimization
  • Some DSPs offer a zero-overhead loop feature
    (RPTB)
  • don't pay the branch latency penalty for every
    iteration of the loop
  • Loop buffer If your loop is small enough that it
    fits entirely into this cache, the DSP CPU will
    fetch from this very low-power, high-speed cache
    rather than memory
  • Examine the assembly code the compiler emits to
    see if you have loops which are just slightly too
    big (the assembler can help), and tweak your
    source code to try to reduce the size

13
Power optimization
  • It's possible to have a loop which apparently has
    more instructions, but lower code size, than an
    equivalent loop run faster because it fit in the
    loop cache.
  • Fetches from memory take power
  • The compiler will try to avoid fetching the same
    value repeatedly, but if the code has complex
    pointer manipulations (particularly multiple
    pointers), it might not be able to prove to
    itself that the memory location always has the
    same value

14
Power optimization
  • Avoid using complicated pointer expressions when
    you can
  • arrays are preferred
  • Write your algorithms in a straightforward
    fashion
  • The more clever you are with your code, the more
    trouble the compiler will have with it
  • Many DSP compilers are highly optimizing
  • might be able to do some of those tricks itself
  • give the compiler a chance

15
Power optimization
  • Use the linker command file to place critical
    sections of your application in on-chip
    (typically lower-powered) memory
  • Try to cram on-chip as much of the application
    (code and data) as possible
  • Be sure to place each function in its own section
    so that the linker has more freedom to pack
  • Power-down modes for idle loops
  • These can be big power savers
  • Although not directly accessible from C code,
    they could certainly be placed in a library

16
Power optimization
  • Look for the availability of power-saving library
    features, and use them in your C code
  • Problems of speed, size, and power are not
    independent
  • what might be an optimal solution for speed might
    not be the optimal power solution
  • May find yourself making contorted
    hand-optimizations to get the last tenth of a
    percent of performance from a CPU, but
    unwittingly sacrificing too much power to do so

17
Power optimization
  • idle instruction
  • Scales down clock
  • Turns off peripherals
  • Turns off cache
  • Dedicated interrupt to wake up

18
Power optimization
  • A trick that might work acceptably in one
    application might be counter-productive in
    another
  • Your best bet is to be familiar with your
    application and with the features of your hardware

19
Example a tweak that improves power
performance
cl55 -o power.c include ltstdio.hgt char
reverse(char str) int i char beg
str char end str strlen(str) - 1
for (i0 i lt strlen(str)/2 i) char
t end end-- beg beg t
return str
20
Example a tweak that improves power
performance
char reverse_rptblocal(char str) int i
char beg str char end str
strlen(str) - 1 int len strlen(str)
for (i0 i lt len/2 i) char t
end end-- beg beg t
return str
Moving the call out of the loop allows the
optimizer to make the loop a RPTBLOCAL
21
Summary Power Optimization with Code gen
  • DO LESS
  • Execute fewer instructions
  • Access Memory Less (common sub-expression
    elimination, etc)
  • Many performance optimizations also save power
  • USE LOWER-POWER RESOURCES
  • Use lowest-power functional unit for an operation
  • Put important code on-chip
  • Put important data on-chip
  • Restructure code to maximize utilization of
    caches
  • Keep the program close to the CPU (less bus and
    pin traffic)
  • Identify critical code and data !

22
Power Optimization with RTOS
  • Keep the maximum HW powered down
  • Peripherals
  • Memory modules
  • HW teams claim that most power is not consumed
    by the CPU but by peripherals
  • 20 CPU power savings is not nearly as important
    as a 30-60 power reduction in the peripherals
    which seems possible
  • Use Lowest Voltage and Clock Rates
  • Use dynamic voltage scheduling (DVS)
  • Must predict the MIPS requirements in the near
    future
  • The extra computing needed must not offset power
    gains
  • Flat MIPs meeting all deadlines is probably best
  • Peek MIPs needs force deviation from Flat MIPS
    model

23
Dynamic Voltage Scheduling
  • Dynamic Voltage Scheduling (DVS) approaches
  • on-line
  • off-line
  • inter-task
  • intra-task
  • System then adjusts the voltage so that the CPU
    is (ideally) never idle and all threads continue
    to meet real-time deadlines
  • In all cases, one must predict the MIPS
    requirements in the immediate future. The real
    trick is in predicting these MIPS requirements.

24
Off-line Approach
  • Compute a static schedule of voltages that may
    vary in real-time but do not adjust to "new"
    algorithms being downloaded to the platform
  • can spend an arbitrary amount of time determining
    an "optimal" schedule
  • Processor must be a closed system
  • ok for many other "classic" DSP applications

25
On-line Approach
  • Requires the RTOS to dynamically adjust to new
    algorithms
  • Can't spend a huge amount of time computing an
    optimal schedule
  • Simple heuristics are employed (because they must
    execute in real-time on the target)

26
Inter-task Approach
  • Pure" RTOS solution no change is made to
    algorithm code
  • Voltage scheduling decision points are only made
    at context-switch boundaries
  • RTOS is given a "black-box characterization of
    the execution time requirements of each task in
    the system

27
Intra-task Approach
  • An algorithm solution
  • Algorithms "call out" to the OS to inform the OS
    about how much more work is required before
    completing the current frame of data
  • Works well in algorithms with large variations
    between worst/average-case execution times (e.g.,
    MPEG4)
  • Theoretically, compilers can automatically
    compute the points in an algorithm where it
    should call out to the OS the compiler can
    compute worst-case execution times from various
    points in the call-graph

28
SE 746-NT Embedded Software Systems
Development Robert Oshana End of
Lecture For more information, please
contact NTU Tape Orders NTU Media
Services (970) 495-6455
oshana_at_airmail.net
tapeorders_at_ntu.edu
Write a Comment
User Comments (0)
About PowerShow.com