Title: Power Aware Software Architecture
1Power Aware Software Architecture
- Rajesh K. Gupta
- University of California, San Diego
Cristiano Pereira, Ravindra Jejurikar, Yuvraj
Agrawal, Manjari Chachharia, Sandeep Shukla,
Mukesh Rajan
In collaboration with Sandy Irani, Mani Srivastava
2Computing In New Spaces
- Generational shift in computing devices
- lot more of everything including networking and
communications - lot less of power, energy, volume, weight,
patience - Application is everything, the possibilities are
limitless - System architectures are due for an overhaul
- the architectures are (radically)
changed/challenged - the programming context is changed
- the system software contract is changed
- new awareness location, power, timing,
reactivity, stability
power
3Outline
- The case for power awareness in
- application development
- system software
- Managing power in the OS
- knobs and strategies
- Making software power aware
- the hardware knobs (DVS, DPM)
- the application knobs (duty cycling, criticality,
aesthetics) - An ongoing experiment
4The Case for Power Awareness
- Limited availability
- Energy and power uses of new devices is markedly
different from laptops and notebook computers - much wider dynamic range of power demand
- increasing share of memory, communication and
signal processing - multiple power use modalities depending upon
application - immortal, paging-mode RX, lifeline TX,
mission-mode
5Power Management Places
- Hardware firmware
- many techniques for low power design in circuits,
architectures, etc. - but dont know the global state and
application-specific knowledge - Users
- dont know component characteristics, and cant
make frequent decisions - Applications
- operate independently
- and the OS hides machine information from them
- OS (role in resource allocation and sharing)
- it is a logical place for dynamic power
management - early results show 50-70 savings due to f/v
scaling Weiser94 - application-specific constraints and
opportunities for saving energy that can be known
only at that level
6Operating System Directed Power Management
- Significant opportunities in power management lie
with application-specific knobs - quality of service, timing criticality of various
functions - Needs of applications are driving force for OS
power management functions power-based API - collaboration between applications and the OS in
setting energy use policy - OS helps resolve conflicts and promote
cooperation - OS is the most reasonable place, but
- OS should incorporate application information in
power management - OS should expose power state and events to
applications for them to adapt.
7Power Savings Mechanisms
A
- Dynamic Power Management (DPM)
- When a device is idle, it can transition to
low-power sleep states. - Current trend is to design devices with multiple
sleep states and provide device driver hooks to
change these states under OS control. - Dynamic Voltage Scaling (DVS)
- A device can be run at different speeds at
different power levels - Execution of jobs can be slowed down to save
power as long as all jobs are completed by their
deadline. - Application level knobs
- quality and performance measures, application
tolerances
B
C
8A. Dynamic Power Management
- When a device becomes idle, it can transition to
lower power usage state. - A fixed amount of additional time and energy are
required to transition back to active state when
a new request for service arrives. - What is the best time threshold to transition to
the sleep state? - Too soon pay start-up cost too frequently.
- Too late spend too much time in the high-power
state - Generally, transition to sleep state when the
cost of being in active state is at least the
cost of waking up.
A
9Our Work In This Context
- We have developed quantitative bounds on the
quality of DPM algorithms based on Competitive
Analysis TCAD 01 - provides a basis for DPM strategy comparison
- Developed DPM strategies for devices with both
multiple active and multiple sleep states TCAD
02 - Design and analyze algorithms for systems that
allow both DPM and DVS SODA 03, TECS02 - Important conclusions
- Not all power states are useful in a given DPM
strategy - DPM generally useful for improving quality
measures.
A
10Competitive Analysis
- Deterministic algorithm (ski rental)
- Transition to sleep state when the cost of being
in active state is at least the cost of waking
up. - Normalize cost of transitioning from sleep to
active state to 1. - Power consumption rate of active state is ?.
- This algorithm is 2-competitive.
- 2 is the best possible competitive ratio for any
deterministic algorithm. - Probabilistic algorithm
- Idle period length generated by known
distribution with density function p(t). - Choose threshold T to minimize cost
- For any distribution p(t), the expected cost of
the above algorithm is within e/(e-1) of the
optimal cost. Furthermore, there is a
distribution for which no algorithm can be better
than e/(e-1) times optimal.
A
11Multi-state DPM Case
- Let there be k1 states
- Let State k be the shut-down state and 0 be the
active state - Let ?i be the energy dissipation rate at state i
- Let ?i be the total energy dissipated to move
back to State 0 - States are ordered such that ?i1 ? ?i
- ?k 0 and ?0 0 (without loss of generality).
- Power down energy cost can be incorporated in the
power up cost for analysis (if additive). - Now formulate an optimization problem to
determine the state transition thresholds.
A
12Lower Envelope Idea
State1
State2
State3
State 4
Energy
For each state i, plot
Time
t1
t2
t3
- LEA can be deterministic or probabilistic
- PLEA is e/(e-1) competitive.
A
13Power-Latency Tradeoff
- Tasks arrive through time and take time to run
- If the device is busy when a task arrives, it
waits in a queue - Idle period begins when device finishes current
job and the queue is empty - If device transitions to sleep state in an idle
period, some latency is incurred as device
transitions to active state. - This in turn effects (shortens) the length of
future idle periods. - Power-Latency tradeoff extremes
- Minimize latency always stay in the active
state. - Minimize energy usage delay completing any tasks
until they have all arrived.
A
14Experimental Study IBM Mobile Hard Drive
Trace data with arrival times of disk accesses
from Auspex file server archive.
A
15IBM Mobile Hard Drive
A
16B. Dynamic Voltage Scaling
- Device which can run at any speed s.
- Power consumed if running in state s is given by
convex function P(s). - Jobs arrive through time. Job j has
- Arrival time aj
- Deadline bj
- Work required Rj
- Schedule S (s, job)
- s(t) is the speed of the device at time t.
- job(t) is which job is executed at time t.
B
17Dynamic Voltage Scaling(Dynamic Voltage Scaling
- No Sleep DVS-NS)
- Schedule S is feasible for set of jobs J if for
every j in J - Cost of Schedule S is
B
18DVS with Sleep State (DVS-S)
- Schedule S ( s, job, h )
- h(t) sleep or on
- If h(t) sleep, then s(t) 0.
- Power is a function of speed and state
- P(s, state) P(s) if state on.
- P(s, state) 0 if state sleep.
- P(0) ? is power required to keep device active
with no tasks running. - Let k be the number of times the device
transitions from sleep state to the on state - Cost of a schedule S is
B
19Critical Speed
- If the cost to transition from sleep state to the
on state were 0, the optimal speed for all jobs
would be the s that minimizes (Rj/s) P(s) - This is the s that satisfies P(s) s P(s).
- Call this Scrit, the critical speed for ?.
- If we compress the execution of a task by x,
- we expend additional energy because we execute
the job faster - we save ? x.
- Scrit is the point at which it is no longer
beneficial to compress the execution of a task. - Our approach
- Decide on Active/Idle intervals (determined by
critical speed) - Decide on Sleep/On intervals (determined by the
cost of staying on)
B
20Implementing DVS
- Often done using slowdown factors
- can be static or dynamic
- For example
- Given a frequency range of fmin ,fmax
- Slowdown factor is frequency scaled to ?min,1,
where ?min fmin /fmax.. - When we use a slowdown factor of ?, we set the
frequency to, f ? fmax . - The voltage is changed to the minimum voltage
supported at f.
B
21Slowdown Factors
- Much of the work on slowdown factors has been in
the context of real-time systems - makes sense since we need something to tradeoff
against the power saved - Known results
- Essentially use schedulability tests to determine
the amount of slowdown possible - Along with the attendant assumptions and almost a
repeat of Real Time research history...
B
22C. Enable Application Knobs
- Need API Provide ways by which Application, OS
and Hardware can exchange energy/power and
performance related information efficiently. - Need Middleware Facilitate a continuous dialogue
/ adaptation between OS / Applications. - Need HAL Facilitate the implementation of power
aware OS services by providing a software
interface to low power devices
C
23Power-aware API Requirements
- Independent of Hardware and RTOS implementations
- enables its use in different hardware platforms
- for this all routines should access the HAL
(Hardware Abstraction Layer) rather than the
Hardware directly - enables its use in different RTOS as well as its
use with different scheduling strategies - do not count on specific RTOS info and/or
specific schedulers - Services provided
- processor frequency scaling and low-power state
transitions - with costs of making such transitions
- battery status (if the system is battery based)
- appropriate routines to control energy-speed and
energy-accuracy knobs available on I/O devices - network interface, serial interface, LCD, etc.
C
24Power-aware API
- The applications interface provides the following
services - The application is able to
- tell RT information to OS (period, deadlines,
WCET, hardness) - create new threads
- tell OS time predicted to finish a given task
instance - depending on the conditions of the environment
(application dependent and not yet implemented) - OS must be able to predict and tell applications
the time estimated to finish the task - depends on the scheduling scheme used
- A hard task must be killed if its deadline is
missed - matter of policy in the context of application
use.
C
25A Power-Aware Software Architecture
C
26Power Aware Software Architecture
- PA-API (Power Aware API)
- interfaces applications and OS making the power
aware OS services available to the application
writer. - PA-OSL (Power Aware Operating System Layer)
- implements modified OS services and active
components such as a DPM manager. - PA-HAL (Power Aware Hardware Abstraction Layer)
- interfaces OS and Hardware making the power
control knobs available to the OS programmer.
C
27Software Architecture
- PA-API - Power aware function calls available to
the application writer. - Some functions of this layer are specific to
certain scheduling techniques. - PA-Middleware - Power aware services
- implemented on the top of the OS (power
management threads, data handling, etc...). - POSIX - Standard interface for OS system calls.
- This isolates PA-API and PA-Middleware from OS.
- PA-OSL - Power aware OS layer.
- Calls related to modified OS services should go
through this level. Also isolates OS from PA-API
and PA-Middleware. - PA-HAL - Power Aware Hardware Abstraction Layer.
- Isolates OS from underlying power aware hardware.
- Modified OS services
- Implementation / modification of OS services in a
power related fashion. Ex scheduler, memory
manager, I/O, etc.
C
28Layer Functionality
C
29DVS Related Functions
- paapi_dvs_create_thread_type(),
paapi_dvs_create_thread_instance() - creates type and instance of a task respectively
- paapi_dvs_app_started(), paapi_dvs_app_done()
- delimits execution of useful work in a thread.
Tell the OS whether the task has finished
execution or not. - paapi_dvs_get_time_prediction(),
paapi_dvs_set_time_prediction() - get current execution time prediction for a given
thread - paapi_dvs_set_adaptive_param()
- set the paremeters of the adaptive policy (it
will be described later) for a given task. - paapi_dvs_set_policy()
- choses the policy to be using for DVS
C
30DVS Related Functions (contd.)
- paosl_dvs_create_task_type_entry(), ...
- create a type and an instance of a thread in the
kernel internal tables of type and instance
respectively - paosl_dvs_killer_thread()
- kills a thread that missed a deadline
- pahal_dvs_initialize_processor_pm()
- initialize structures for processor power
management - pahal_dvs_get_current_frequency(),
pahal_dvs_set_frequency_and_voltage()
pahal_dvs_pre_set_frequency_and_voltage(),
pahal_dvs_get_frequency_levels_info()
pahal_dvs_post_set_frequency_and_voltage() - functions to switch processor among possible
frequencies levels - pahal_dvs_get_lowpower_states_info(),
pahal_dvs_set_lowpower_state() - functions to switch processor among low power
states
C
31DPM Functions
- paapi_dpm_register_device()
- just register the device to be power managed
- paosl_dpm_deamon()
- implements the actual policy for a specific
device. This deamon uses PA-HAL functions to
decide on how to switch devices among all
possible states. - pahal_dpm_device_switch_state()
- switch devices state
- pahal_dpm_device_check_activity()
- check whether the device has been idle and for
how long. This functions needs support from the
device driver. - pahal_dpm_device_get_info(), pahal_dpm_device_get
_curr_state() - gets information about the device and about its
current state respectively - Others
- functions for helping implementing power
policies. For example - pahal_battery_get_info() gets battery status
C
32 Current Status
- API specification available from
- http//www.ics.uci.edu/cpereira/pads/
- Implementation
- eCOS RTOS
- open source, Object oriented and highly
configurable RTOS (by means of scripting
language) - Hardware platforms we are currently working with
- Linux-synthetic (emulation of eCos over Linux -
debugging purposes only) - Compaq iPaq Pocket PC - StrongARM SA1110 based
platform - Accelent IDP (Integrated Development Environment)
- also StrongARM SA1110 based. - LRH Intel evaluation board 80200EVB - Intel
Xscale based
33DPM Algorithms Implemented
- A predictive RMS low-power scheduling
- It validates the power-aware API implementation
- assumes periodic tasks and deadline period
- The predictive scheduler implementation is
divided as follows - tables and variables manipulation
- admission control and static slow down factor
- dynamic slow down factor computation (time
prediction) - deadline management (hard deadline tasks)
- The processor frequency and voltage are scaled
according to the time predicted by the OS - The application can also predict the execution
time in order to enhance accuracy.
C
34Experiments - XScale Processor
For varying voltage
All measurements executing a busy loop
35Using Power Aware OS Example
- The scheduler adapts frequency according to the
real time parameters passed in as parameter on
the thread type. - The frequency is adjusted by means of slowdown
factors (a factor can also speed up the processor
if it is gt 1).
deadline
- void main()
-
- mpeg_decoding_t
- paapi_dvs_create_thread_type(100,30,100,hard)
- paapi_dvs_set_policy(SHUTDOWN STATIC
- DYNAMIC ADAPTIVE)
- paapi_dvs_create_thread_instance(
- mpeg_decoding_t, mpeg_decode_thread)
-
- ...
WCET
period
void mpeg_decode_thread() for ()
paapi_dvs_app_started() / original code /
mpeg_frame_decode() paapi_dvs_app_done()
Selects the DVS policy for all threads
Kills the thread instance when deadline is missed
C
36An Experiment
- Application OS running on 80200 XScale board
- Altera FPGA board generating interrupts to wake
up the processor - Maxim board providing voltage scaling
- Host PC for debugging and for loading the App.
OS into the board
37The Experiment with DVS
- Shutdown when idle
- as soon as CPU becomes idle shutdown the
processor - Shutdown static slow down factors
- offline slow down factors are applied. The CPU is
shutdown when idle. - Shutdown static slow down dynamic slow down
- run-time slow down factors are computed based on
a history of execution times in addition to the
static and shutdown - Shutdown static slow down dynamic slow down
adaptive slow down - a deadline driven factor is also applied in
addition to the other factors and shutdown. This
factor adapts itself according to number of
deadline missed in a previous window of
executions.
38DVS Experiment
- Four parameters are defined for the adaptive
factor - of deadlines missed tolerable (D) every W
executions - Window size (W)
- Lower bound for the factor (L)
- Increments and decrement steps (Inc and Dec)
- For every W executions
- if the number of deadlines missed is less than D
- lower the adaptive factor by Dec if it is greater
than L, otherwise keep it as it was. - if the number of deadlines is greater than D
- increment the adaptive factor by Inc.
39Application Set
- Three different real applications running
concurrently - An MPEG2 decoder
- An ADPCM (Adaptive Differential Pulse Code
Modulation) speech enconding - Floating point FFT application
40Task Set
- We used three tasksets based on the applications
described earlier as shown in the table below
41Frequency Voltage Scaling
- For the 4 schemes and the 3 tasksets experimented
we measured processor power consumption using a
shunt resistor and a DAQ board. - The voltage of the Xscale processor is
dynamically varied according to the frequency as
in the table below
42Results Taskset A
- Column deadlines missed shows the number of
deadlines missed per task (T1, T3, T4) for a
total of 415/207/138 executions respectively. For
the adaptive algorithm, M varies as the number
between parentheses, Inc0.1, Dec0.5, W10 and
D20
43Results Taskset B
- Column deadlines missed shows the number of
deadlines missed per task (T2, T3, T4) for a
total of 130/65/43 executions respectively
44Results Taskset C
- Column deadlines missed shows the number of
deadlines missed per task (T1, T3, T5) for a
total of 130/65/43 executions respectively
45OS-directed DVS Results
46Using Application-level knob
- Example Image Compression Algorithm
- tradeoff image quality against energy available
by varying the compression parameters such as BPP
(bits per pixel) - The image compression algorithm is ran in a
continuous loop with battery polling every 10
secs. - A simple power tradeoff policy is added to adapt
the quality of the image against the battery
voltage left. - Whenever the battery drops 30mV the application
adjusts the image BPP by -0.5 starting at 1.5. - For a cut-off of 4020mV, the battery life is
extended from 290 seconds to 340 seconds.
47- The battery life is extended by 18 with a slight
( not noticeable by human eye) degradation of
image quality
48Concluding Remarks
- Computers with radios present a very wide range
of system optimization opportunities for power
performance - Efficient power and energy management is key to
enabling new range of applications - Energy efficiency is a system-level concern that
cuts across components, functionality layers and
implementations - Application programming needs to be energy aware
and provide knobs for the system designer to
incorporate in DPM.
49Yes, but Microsoft...