Title: Evaluating a DVS Scheme for Real-Time Embedded Systems
1Evaluating a DVS Scheme for Real-Time Embedded
Systems
- Ruibin Xu, Daniel Mossé and Rami Melhem
2Introduction
- Energy conservation is important for real-time
embedded systems - Dynamic Voltage Scaling (DVS) is effective in
power management - A popular problem minimizing energy consumption
while meeting the deadlines
3Focus
- Frame-based systems that execute variable
workloads - The problem becomes minimizing the expected
energy consumption while meeting the deadlines
4A New DVS Scheme (MEEC)
emsoft05
simplified problem
original problem
relax
efficient algorithm
Evaluations
fix
optimal solution
practical solution
parc05
5Task and System Model
- N periodic tasksT1, T2, , TN to be executed
consecutively in each frame - The power function is p(f) c0c1f a
6Review of Existing Schemes
Proportional Scheme
Greedy Scheme
Statistical Scheme
7The MEEC Scheme
- Incorporates the variability of the tasks into
the speed schedule - The variability of the tasks are captured by the
probability density function of the workload of
the tasks - Aims to minimize the expected energy consumption
in the system
probability
workload
8The MEEC Scheme
slack
ß1
9An Important Property
The optimal expected energy consumption for
10Computing ßi
T1
T2
T3
T4
11Applying PACE
- PACE is a technique in which the execution speed
is gradually increased as the task progresses
12The MEEC Scheme
- The ß values (optimal) are computed based on the
assumption of unrestricted continuous frequency - We need to deal with
- Minimum and maximum speed restriction
- Discrete speed
- We have solutions and will use simulation to test
them
13Evaluations Power models
- Synthetic processor
- Strictly conforms to p(f)f3
- 10 frequencies 100MHz, 200MHz,, 1000MHz
- Intel Xscale
- Power numbers from Intel datasheets
- p(f) 801520(f/1000)3
14Evaluation Synthetic Workload
- We simulated systems that have 5,10,15,20 tasks
- The WCEC of each task is randomly generated from
10M to 1G cycles - The probability distribution of each task is
randomly chosen from 6 representative
distributions - Frame length
15Evaluation Synthetic Workload
- We evaluated 8 schemes
- Proportional with and without PACE
- Greedy with and without PACE
- Statistical with and without PACE
- MEEC with and without PACE
- We simulated 100,000 frames and computed the
average energy consumption per frame for each
scheme
16Results Synthetic Workload
- For synthetic CPU, the best scheme is always MEEC
(with or without PACE), but MEEC with PACE is
only better than MEEE without PACE 13.6 of the
time with an average saving of 1.2 - For Intel Xscale, the best scheme is always MEEC
without PACE - Conclusion PACE is not recommended in the MEEC
scheme
17Why PACE Is Not Good in MEEC scheme?
PACE (under the assumption of unrestricted
continuous frequency)
18Results Synthetic Workload
19Evaluation Automatic Target Recognition (ATR)
- The ATR application does pattern matching of
targets in images - The regions of interest (ROI) in the image are
detected and each ROI is compared with all the
templates - Image processing time is proportional to the
number of ROIs
20Evaluation Automatic Target Recognition (ATR)
- A front-end is responsible for collecting images
and send them to the back-end periodically for
target recognition - This application can be modeled as a frame-based
real-time system in which all the tasks have the
same workload distribution
front-end
back-end
21Evaluation Automatic Target Recognition (ATR)
- Simulation setup
- Use Intel Xscale
- The period is 100ms
- The front-end sends 1 to 6 images to the back-end
- The number of ROIs in an image varies from 1 to 8
- The back-end precomputes 6 speed schedules
22Results - Automatic Target Recognition (ATR)
23Summary
- In this paper, we demonstrate and evaluate a new
DVS scheme that aims to minimize the expected
energy consumption in the system
24Conclusions
- The MEEC scheme achieves significant energy
savings over the existing schemes - Using only static information or aggregating
dynamic information, even with probabilistic
techniques, will not produce as good results as
when dynamic information for each task in
considered separately
25 26A Simple Example
- 3 tasks, the frame length is 14 time units
- For the CPU, c00, c11, fmin0, and fmax1