Title: Folie 1
1Hardware/Software Codesign of Embedded Systems
Power/Voltage Management
Voicu Groza School of Information Technology and
Engineering Groza_at_SITE.uOttawa.ca
2Embedded Systems
- Power/Energy Aware Embedded Systems
- Dynamic Voltage Scheduling
- Dynamic Power Management
http//www.phys.ncku.edu.tw/htsu/humor/fry_egg.ht
ml
Surpassed hot (kitchen) plate ? Why not use it?
3Processing units
- Need for efficiency (power energy)
Power is considered as the most important
constraint in embedded systemsin L. Eggermont
(ed) Embedded Systems Roadmap 2002, STW
Current smart phones can hardly be operated for
more than an hour, if data is being
transmitted.from a report of the Financial
Times, Germany, on an analysis by Credit Suisse
First Boston http//www.ftd.de/tm/tk/9580232.html
?nvse
4The energy/flexibility conflict- Intrinsic Power
Efficiency -
Operations/WattMOPS/mW
Ambient Intelligence
10
DSP-ASIPs
hardwired muxed ASIC
1
Processors
µPs
Reconfigurable Computing
0.1
0.01
Technology
0.13µ
0.07µ
0.25µ
0.5µ
1.0µ
Necessary to optimize HW/SW otherwise the prize
for software flexibility cannot be paid!
H. de Man, Keynote, DATE02T. Claasen, ISSCC99
5Power and energy are related to each other
P
E
t
In many cases, faster execution also means less
energy, but the opposite may be true if power has
to be increased to allow faster execution.
6Low Power vs. Low Energy Consumption
- Minimizing the power consumption is important for
- the design of the power supply
- the design of voltage regulators
- the dimensioning of interconnect
- short term cooling
- Minimizing the energy consumption is important
due to - restricted availability of energy (mobile
systems) - limited battery capacities (only slowly
improving) - very high costs of energy (solar panels, in
space) - cooling
- high costs
- limited space
- dependability
- long lifetimes, low temperatures
7Application Specific Circuits (ASICS)or Full
Custom Circuits
- Custom-designed circuits necessary
- if ultimate speed or
- energy efficiency is the goal and
- large numbers can be sold.
- Approach suffers from
- long design times,
- lack of flexibility(changing standards) and
- high costs(e.g. Mill. mask costs).
8Mask cost for specialized HWbecomes very
expensive
?Trend towards implementation in Software
http//www.molecularimprints.com/Technology/tech
_articles/MII_COO_NIST_2001.PDF9
9Power Consumption of a Gate
10Fundamentals of dynamic voltage scaling (DVS)
Power consumption of CMOScircuits (ignoring
leakage)
Delay for CMOS circuits
? Decreasing Vdd reduces P quadratically,while
the run-time of algorithms is only linearly
increased(ignoring the effects of the memory
system).
11Potential for Energy Optimization
- Saving Energy under given Time Constraints
- Reduce the supply voltage Vdd
- Reduce switching activity a
- Reduce the load capacitance CL
- Reduce the number of cycles Cycles
12Processors
At the chip level, embedded chips include
micro-controllers and microprocessors.
Micro-controllers are the true workhorses of the
embedded family. They are the original embedded
chips and include those first employed as
controllers in elevators and thermostats Ryan,
1995.
13Voltage Scaling and Power ManagementDynamic
Voltage Scaling
Energy / Cycle nJ
Vdd
14Power density continues to get worse
Nuclear reactor
15Need to consider CPU System Power
Courtesy N. Dutt Source V. Tiwari
16New ideas can actually reduceenergy consumption
Pentium
Crusoe
Running the same multimedia application.
As published by Transmeta www.transmeta.com
17Dynamic power management (DPM)
Example STRONGARM SA1100
400mW
- RUN operational
- IDLE a sw routine may stop the CPU when not in
use, while monitoring interrupts - SLEEP Shutdown of on-chip activity
RUN
90µs
Power fault signal
10µs
160ms
10µs
90µs
SLEEP
IDLE
Power fault signal
50mW
160µW
18Variable-voltage/frequency example INTEL Xscale
OS should schedule distribution of the energy
budget.
From Intels Web Site
19Key requirement 2 Code-size efficiency
- CISC machines RISC machines designed for
run-time-,not for code-size-efficiency - Compression techniques key idea
20Code-size efficiency
- Compression techniques (continued)
- 2nd instruction set, e.g. ARM Thumb instruction
set
Dynamically decoded at run-time
Same approach for LSI TinyRisc, Requires
support by compiler, assembler etc.
21Dictionary approach, two level control
store(indirect addressing of instructions)
Dictionary-based coding schemes cover a wide
range of various coders and compressors.Their
common feature is that the methods use some kind
of a dictionary that contains parts of the input
sequence which frequently appear.The encoded
sequence in turn contains references to the
dictionary elements rather than containing these
over and over. Á. Beszédes et al. Survey of
Code size Reduction Methods, Survey of Code-Size
Reduction Methods, ACM Computing Surveys, Vol.
35, Sept. 2003, pp 223-267
22Key idea (for d bit instructions)
Uncompressed storage of a d-bit-wide instructions
requires axd bits. In compressed code, each
instruction pattern is stored only
once. Hopefully, axbcxd lt axd. Called
nanoprogramming in the Motorola 68000.
For each instruction address, S contains table
address of instruction.
b
instructionaddress
a
S
b d bit
table of used instructions (dictionary)
c ? 2b
small
d bit
CPU
23Key requirement 3 Run-time efficiency -
Domain-oriented architectures -
n-1
Application yj ?i0 xj-iai
?i 0?i ? n-1 yij yi-1j xj-iai
Architecture Example Data path ADSP210x
Application maps nicely onto architecture
MR0 A11 A2n-2 MXxn-1 MYa0for
( j1 to n) MRMRMXMY MYaA1 MXxA2
A1 A2--
24Modulo addressing
sliding window
Modulo addressingAm ? Am(Am1) mod
n(implements ring or circular buffer in memory)
x
t
t1
..xt1-1xt1xt1-n1xt1-n2..
..xt1-1xt1xt11xt1-n2..
n most recent values
Memory, tt1
Memory, t2t11
25Saturating arithmetic
- Returns largest/smallest number in case of
over/underflows - Examplea 0111b 1001standard
wrap around arithmetic (1)0000saturating
arithmetic 1111(ab)/2 correct 1000 wra
p around arithmetic 0000 saturating arithmetic
shifted 0111 - Appropriate for DSP/multimedia applications
- No timeliness of results if interrupts are
generated for overflows - Precise values less important
- Wrap around arithmetic would be worse.
almost correct
26Fixed-point arithmetic
Shifting required after multiplications and
divisions in order to maintain binary point.
27Properties of fixed-point arithmetic
- Automatic scaling a key advantage for
multiplications. - Examplex 0.5 x 0.125 0.25 x 0.125 0.0625
0.03125 0.09375For iwl1 and fwl3 decimal
digits, the less significant digits are
automatically chopped off x 0.093Like a
floating point system with numbers ? 0..1),with
no stored exponent (bits used to increase
precision). - Appropriate for DSP/multimedia applications(well-
known value ranges).
28Spatial vs. Dynamic Supply Voltage Management
- Analogy of biological blood systems
- Different supply to different regions
- High pressure High pulse count and High
activity - Low pressure Low pulse count and Low activity
Not all components require same performance.
Required performance may change over time
29(No Transcript)
30Example Processor with 3 voltagesCase a)
Complete task ASAP
Task that needs to execute 109 cycles within 25
seconds.
Ea 109 x 40 x 10-9 40 J
31Case b) Two voltages
Eb 750 106 x 40 x 10-9 250 106 x 10 x
10-9 32.5 J
32Case c) Optimal voltage
Ec 109 x 25 x 10-9 25 J
33Observations
? A minimum energy consumption is achieved for
the ideal supply voltage of 4 Volts. In the
following variable voltage processor processor
that allows any supply voltage up to a certain
maximum. It is expensive to support truly
variable voltages, and therefore, actual
processors support only a few fixed voltages.
Ishihara, Yasuura Voltage scheduling problem
for dynamically variable voltage processors,
Proc. of the 1998 International Symposium on Low
Power Electronics and Design (ISLPED98)
34Generalization
- Lemma Ishihara, Yasuura
- If a variable voltage processor completes a task
before the deadline, then the energy consumption
can be reduced. - If a processor uses a single supply voltage V
and completes a task T just at its deadline, then
V is the unique supply voltage which minimizes
the energy consumption of T. - If a processor can only use a number of discrete
voltage levels, then a voltage schedule with at
most two voltages minimizes the energy
consumption under any time constraint. - If a processor can only use a number of discrete
voltage levels, then the two voltages which
minimize the energy consumption are the two
immediate neighbors of the ideal voltage Videal
possible for a variable voltage processor.
35The case of multiple tasksAssigning optimum
voltages to a set of tasks
N the number of tasks ECj the number of
execution cycles of task j L the number of
voltages of the target processor Vi the ith
voltage, with 1 ? i ? L Fi the clock frequency
for supply voltage Vi T the global deadline at
which all tasks must have been completed SCj
the average switching capacitance during the
execution of task j (SCi comprises the actual
capacitance CL and the switching activity ?) Xi,
j the number of clock cycles task j is executed
at voltage Vi
36Designing an IP model
- Simplifying assumptions of the IP-model include
the following - There is one target processor that can be
operated at a limited number of discrete
voltages. - The time for voltage and frequency switches is
negligible. - The worst case number of cycles for each task
are known.
37Experimental Results
38Voltage Scheduling Techniques
- Static Voltage Scheduling
- Extension Deadline for each task
- Formulation as IP problem (SS)
- Decisions taken at compile time
- Dynamic Voltage Scheduling
- Decisions taken at run time
- 2 Variants
- arrival times of tasks is known (SD)
- arrival times of tasks is unknown (DD)
39Dynamic Voltage Controlby Operating Systems
Voltage Control and Task Scheduling by Operating
System to minimize energy consumption Okuma,
Ishihara, and Yasuura Real-Time Task Scheduling
for a Variable Voltage Processor, Proc. of the
1999 International Symposium on System Synthesis
(ISSS'99)
- Target
- single processor system
- Only OS can issue voltage control instructions
- Voltage can be changed anytime
- only one supply voltage is used at any time
- overhead for switching is negligible
- static determination of worst case execution
cycles
40Problem for Operating Systems
deadline
2.5V
arrival time
Task1
5.0V
Task2
4.0V
Task3
What is the optimum supply voltage assignment for
each task in order to obtain minimum energy
consumption?
41The proposed Policy
Consider a time slot the task can use without
violating real-time constraints of other tasks
executed in the future
- Once time slot is determined
- The task is executed at a frequency of WCEC / T
Hz - The scheduler assigns start and end times of
time slot
42Two Algorithms
- Two possible situations
- The arrival time of tasks is known
- SD Algorithm
- Static ordering and Dynamic voltage assignment
- The arrival time of tasks is unknown
- DD Algorithm
- Dynamic ordering and Dynamic voltage assignment
43SD Algorithm (CPU Time Allocation)
- Arrival time of all tasks is known
- Deadline of all tasks is known
- WCEC of all tasks is known
- CPU time can be allocated statically
- CPU time is assigned to each task
- assuming maximum supply voltage
- assuming WCEC
44SD Algorithm (Start Time Assignment)
- In SD, it is possible to assign lower supply
voltage toTask2 using the free time - In SS, the scheduler cant use the free time
because it has statically assigned voltage
45DD Algorithm
When the tasks arrival time is unknown, its end
time cant be predicted statically using the SD
algorithm ? No predetermined CPU time, start or
end times
- Start Time Assignment
- New task arrives it either
- Preempts currently executing task
- Starts right after currently executing task
- Starting time is determined
46DD Algorithm (cont.)
End Time Prediction Based on the currently
executing tasks end time prediction, add the new
tasks WCEC time at maximum voltage
47DD Algorithm (cont.)
? If the currently executing task finishes
earlier, then new task can start sooner and run
slower at lower voltage
48Comparison SD vs. DD
Task
End Time
Start Time
Task
End Time
Start Time
49Experimental Results Energy
Normal Processor runs at maximum supply
voltage SS Static Scheduling SD
Scheduling done by SD Algorithm DD Scheduling
done by DD Algorithm
50Dynamic power management (DPM)
Dynamic Power management tries to assign optimal
power saving states Requires Hardware
Support Example StrongARM SA1100
400mW
RUN
RUN operational IDLE a sw routine may stop the
CPU when not in use, while monitoring
interrupts SLEEP Shutdown of on-chip activity
SLEEP
IDLE
160uW
50mW
51The opportunityReduce power according to
workload
Desired Shutdown only during long idle times ?
Tradeoff between savings and overhead
52The challenge
- Questions
- When to go to a power-saving state?
- Is an idle period long enough for shutdown?
- Predicting the future
53Adaptive Stochastic Models
Sliding Window (SW) Chung DATE 99
- Interpolating pre-computed optimization tables to
determine power states - Using sliding windows to adapt to non-stationarity
54Comparison of different approaches
P average power Nsd number of shutdowns Nwd
wrong shutdowns (actually waste energy)
55What about multitasking?
? Coordinate multiple workload sources
user
requesters
program
program
program
operating system
power manager
device
56Requesters
- Concurrent processes
- Created, executed, and terminated
- Have different device utilization
- Generate requests only when running(occupy CPU)
- Power manager is notified when processes change
state
We use processes to represent requesters requester
process
57Task Scheduling
Rearrange task execution to cluster similar
utilization and idle periods
t1 t2 t3
2
3
1
1
2
3
1
2
time
idle
idle
T
T time quantum
58Power-aware OS implementations
- Windows APM and ACPI
- Device-centric, shutdown based
- Power-aware Linux
- Good research platform (several partial
implementations, es. U. Delft, Compaq, etc.) - Quite high-overhead for low-end embedded systems
- Power-aware ECOS
- Good research platform (HP-Unibo implementation)
- Lower overhead than Linux, modular
- Micro OSes
59Application Aware DPM Example Communication
Power
NICs powered by portables reduce battery life
8 hours
- In general
- Higher bit rates lead to higher power consumption
- 90 of power for listening to a radio channel
- ? Proper use of PHY layer services by MAC is
critical!
60Off mode power savings
Server
Access Point
Buffering
Refill
Beacons
Request
Request
Client
Power
Doze mode
time
Off mode
Energy saving
time
Playback
Playback
Low water mark
Buffer full
Playing
LWM reached
61LWM / Buffer characteristics
- Higher error probability
- Exploits NIC off-state
- Min. value to allow data acquisition
Where to put the LWM?
- lower error probability
- Incurs NIC off-state overhead
- Max. value Buffer_length1 block
How long should the buffer be?
- Depends on memory availability
- The longer the buffer, the higher the NIC
off-state benefits
Buffering Strategies should be Power Aware!
62Comparison
- Low length buffers incur off mode power overhead
- Good power saving for high length buffers
63Exploiting application knowledge
Approximate processing Chandrakasan98-01 Tradeof
f quality for energy (es. lossy
compression) Design algorithms for graceful
degradation Enforce power-efficiency in
programming Avoid repetitive polling
Intel98 Use event-based activation
(interrupts) Localize computation whenever
possible Helps shutdown of peripherals Helps
shutdown of memories