Title: Integrated Management of Power Aware Computing
1Integrated Management of Power Aware Computing
Communication Technologies
- Review Meeting
- Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi, UC
Irvine - Jean-Luc Gaudiot, USC,Nazeeh Aranki, Benny
Toomarian, JPL - DARPA Contract F33615-00-1-1719
- June 13, 2001
- JPL -- Pasadena, CA
2Agenda
- Administrative
- Review of milestones, schedule
- Technical presentation
- Progress
- Applications (UAV/DAATR, Rover, Deep Impact,
distributed sensors) - Scheduling (system-level pipelining)
- Advanced microarchitecture power modeling (SMT)
- Architecture (mode selection with overhead)
- Integration (Copper, JPL, COTS data sheet)
- Lessons learned
- Challenges, issues
- Next accomplishments
- Questions action items review.
3Quad Chart
Behavior
Innovations
high-level simulation
- Component-based power-aware design
- Exploit off-the-shelf components protocols
- Best price/performance, reliable, cheap to
replace - CAD tool for global power policy optimization
- Optimal partitioning, scheduling, configuration
- Manage entire system, including mechanical
thermal - Power-aware reconfigurable architectures
- Reusable platform for many missions
- Bus segmentation, voltage / frequency scaling
functional partitioning scheduling
Architecture
mapping
system integration synthesis
static configuration
dynamic powermanagement
Year 1
Year 2
Impact
Kickoff
2Q 02
2Q 00
2Q 01
- Static hybrid optimizations
- partitioning / allocation
- scheduling
- bus segmentation
- voltage scaling
- COTS component library
- FireWire and I2C bus models
- Static composition authoring
- Architecture definition
- High-level simulation
- Benchmark Identification
- Dynamic optimizations
- task migration
- processor shutdown
- bus segmentation
- frequency scaling
- Parameterizable components library
- Generalized bus models
- Dynamic reconfiguration authoring
- Architecture reconfiguration
- Low-level simulation
- System benchmarking
- Enhanced mission success
- More task for the same power
- Dramatic reduction in mission completion time
- Cost saving over a variety of missions
- Reusable platform design techniques
- Fast turnaround time by configuration, not
redesign - Confidence in complex design points
- Provably correct functional/power constraints
- Retargetable optimization to eliminate overdesign
- Power protocol for massive scale
4Program Overview
- Power-aware system-level design
- Amdahl's law applies to power as well as
performance - Enhance mission success (time, task)
- Rapid customization for different missions
- Design tool
- Exploration evaluation
- Optimization specialization
- Technique integration
- System architecture
- Statically configurable
- Dynamically adaptive
- Use COTS parts protocols
5Personnel teaming plans
- UC Irvine - Design tools
- Nader Bagherzadeh - PI
- Pai Chou - Co-PI
- Fadi Kurdahi
- Jinfeng Liu
- Dexin Li
- Duan Tran
- USC - Component power optimization
- Jean-Luc Gaudiot - faculty participant
- Seong-Won Lee - student
- JPL - Applications benchmarking
- Nazeeh Aranki
- Nikzad Benny Toomarian
- students
6Milestones Schedule
- Static hybrid optimizations
- partitioning / allocation
- scheduling
- bus segmentation
- voltage scaling
- COTS component library
- FireWire and I2C bus models
- Static composition authoring
- Architecture definition
- High-level simulation
- Benchmark Identification
- Dynamic optimizations
- task migration
- processor shutdown
- bus segmentation
- frequency scaling
- Parameterizable components library
- Generalized bus models
- Dynamic reconfiguration authoring
- Architecture reconfiguration
- Low-level simulation
- System benchmarking
7Review of Progress
- May'00 Kickoff meeting (Scottsdale, AZ)
- Sept'00 Review meeting (UCI)
- Scheduling formulation, UI mockup, System level
configuration - Examples Pathfinder X-2000 (manual solution)
- Nov'00 PI meeting (Annapolis, MD)
- Tools scheduler UI v.1 (Java)
- Examples Pathfinder X-2000 (automated)
- Apr'01 PI meeting (San Diego, CA)
- Tools scheduler UI v.2 - v.3 (Jython)
- Examples Pathfinder initial UAV (Pipelined)
- June'01 Review meeting
we are here!
8New for this Review (June '01)
- Tools
- Scheduler UI v.4 (pipelined, buffer matching)
- Mode selector v.1 (mode change overhead,
constraint based) - SMT model
- Examples
- Pathfinder, µAMPS sensors (mode selection)
- UAV, Wavelet (dataflow) (pipelined, detailed
estimate) - Deep Impact (command driven) (planning)
- Integration
- Input from Copper timing/power
estimation (PowerPC simulation model) - Output to Copper power profile
budget (Copper Compiler) - Within IMPACCT initial Scheduler Mode
Selector integration
9Overview of Design Flow
- Input
- Tasks, constraints, component library
- Estimation (measurement or simulation via
COPPER) - Refinement Loop
- Scheduling (pipeline/transform)
- Mode Selection (either before or after
scheduling) - System level simulation (planned integration)
- Output to COPPER
- Interchange Format
- Power Profile, Schedule, Selected modes
- Code Generation
- Microarchitecture Simulation
10Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
11Power Aware Scheduling
- Execution model
- Multiple processors, multiple power consumers
- Multiple domains digital, thermal, mechanical
- Constraint driven
- Min / Max power
- Min / Max timing constraints
- Handles problems in different domains
- Time Driven
- System level pipelining -- in time and in space
- Parallelism extraction
- Experimental results
- Coarse to fine grained parallelism tradeoffs
12Prototype of GUI scheduling tool
- Power-aware Gantt chart
- Time view
- Timing of all tasks on parallel resources
- Power consumption of each task
- Power view
- System-level power profile
- Min/max power constraint, energy cost
- Interactive scheduling
- Automated schedulers timing, power, loop
- Manual intervention drag drop
- Demo available
13Power-Aware Scheduling
- New constraint-based application model paper at
Codes'01 - Min/Max Timing constraints
- Precedence, subsumes dataflow, general timing,
shared resource - Dependency across iteration boundaries loop
pipelining - Execution delay of tasks enables
frequency/voltage scaling - Power constraints
- Max power total power budget
- Min power controls power jitter or force
utilization of free source - System-level, multi-scenario scheduling paper at
DAC'01 - 25 Faster while saving 31 energy cost
- Exploits "free" power (solar, nuclear min-output)
- System-level loop pipelining working papers
- Borrow time and power across iteration boundaries
- Aggressive design space exploration by new
constraint classification - Achieves 49 speedup and 24 energy reduction
14Scheduling case studyMars Pathfinder
- System specification
- 6 wheel motors
- 4 steering motors
- System health check
- Hazard detection
- Power supply
- Battery (non-rechargeable)
- Solar panel
- Power consumption
- Digital
- Computation, imaging, communication, control
- Mechanical
- Driving, steering
- Thermal
- Motors must be heated in low-temperature
environment
15Scheduling case studyMars Pathfinder
- Input
- Time-constrained tasks
- Min/Max Power constraints
- Rationale control jitter, ensure utilization of
free power - Core algorithm
- Static analysis of slack properties
- Solves time constraints by branchbound
- Solves power constraints by local movements
within slacks - Target architecture
- X-2000 like configurable space platform
- Symmetric multiprocessors, multi-domain power
consumers, solar/batt - Results
- Ability to track power availability
- Finishes task faster while incurring less energy
cost
16More aggressive schedulingSystem-level
pipelining
- Borrow tasks across iterations
- Alleviates "hot spots" by spreading to another
iteration - Smooth out utilization by borrowing across
iterations - Core techniques
- Formulation separate pseudo dependency from
true dependency - Static analysis and task transformation
- Augmented scheduler for new dependency
- Results -- on Mars Pathfinder example
- Additional energy savings with speedup
- Smoother power profile
17Scheduling case studyUAV DAATR
- Example of a very different nature!
- Algorithm, rather than "system" example
- Target architecture
- C code -- unspecified assume sequential
execution, no parallelism - MatLab -- unmapped
- Algorithm
- Sequential, given in MatLab or C
- Potential parallelism in space, not in time
- Constraints dependencies
- Dataflow partial ordering
- Timing latency no pairwise Min/Max timing
- Power budget for different resolutions
18Scheduling case studyUAV example (cont'd)
- Challenge Parallelism Extraction
- Essential to enable scheduling
- Difficult to automate need manual code rewrite
- Different pipeline stages must be relatively
similar in length - Rewritten code
- Inserted checkpoints for power estimation
- Error prone buffer mapping between iterations
- Found a dozen bugs in benchmark C code
- Missing Summation in standard deviation
calculation - Frame buffer off by one line
- Dangling pointers not exposed until pipelined
19ATR application what we are given
1 Frame
Bugs
Target Detection
3 filters
m Detections
FFT
FFT
FFT
FFT
FFT
FFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
20Bug report
- Misread input data file
- OK, no effect to the algorithm
- Miscalculate mean, std for image
- OK, these values not used (currently)
- Wrong filter data for SUN/PowerPC
- OK for us, since we operate on different
platforms - Bad for SUN/PowerPC users, wrong results
- Misplaced FFT module
- The algorithm is wrong
- However, these problems are not captured in the
output image files
21What it should look like
1 Frame
Target Detection
m Detections
3 filters
k distances
22What it really should look like
1 Frame
Target Detection
m Detections
3 filters
k distances
23Problems
- Limited parallelism
- Serial data flow with tight dependency
- Parallelism available (diff. detections, filters,
etc) but limited - Limited ability to extract parallelism
- Limited by serial execution model (C
implementation) - No available parallel platforms
- Limited scalability
- Cannot guarantee response time for big images (N2
complexity) - Cannot apply optimization for small images (each
block is too small) - Limited system-level knowledge
- High-level knowledge lost in a particular
implementation
24Our vision 2-dimensional partitioning
Output target detection w/ distance for N
simultaneous frames
25System-level blocks
InputN simultaneous frames
N Frames(N target detection)
Target Detection
M Targets(M FFTs)
FFT
M Targets(3M IFFTs)
Filter/IFFT
K Distances(2K IFFTs)
Compute Distance
Output target detection w/ distance for N
simultaneous frames
26Our vision
27System-level pipelining
InputN simultaneous frames
Target Detection
FFT
Filter/IFFT
Compute Distance
Output target detection w/ distance for N
simultaneous frames
28What does it buy us?
- Parallelism
- All modules run in PARALLEL
- Each module processes N (M, K) INDEPENDENT
instances, that could all be processed in
parallel - NO DATA DEPENDENCY between modules
- Throughput
- Throughput multiplied by processing units
- Process N frames at a reduced response time
- Better utilization of resources
29What does it buy us? (cont'd)
- Flexibility
- Insert / remove modules at any time
- Adjust N, (M or K) at any time
- Make each module parallel / serial at any time
- More knobs to tune parallelism / response time /
throughput / power - Driven by run-time constraints
- Scalability
- Reduced response time on big images (small N
and/or deeper pipe) - Better utilization/throughput on small images
- More compiler support
- Simple control / data flow each module is just a
simple loop, which is essentially parallel - Need an automatic partitioning tool to take
horizontal cuts
30What does it buy us how power-aware is it?
- Subsystems shut-down
- Turn on / off any time based on power budget
- Split / merge (migrate) modules on demand
- Power-aware scheduling
- Each task can be scheduled at any time during one
pipe stage, since they are totally independent - More scheduling opportunity with an entire system
- Dynamic voltage/frequency scaling
- The amount of computation N, (M or K) is known
ahead of time - Scaling factor C / N (very simple!)
- Less variance of code behavior gt strong
guarantee to meet deadline, more accurate power
estimates - Run-time code versioning
- Select right code based on N, (M or K)
31Experimental implementationpipelining
transformation
- Goal
- To make everything completely independent
- Methodology
- Dataflow graph extraction (vertical)
- Initial partitioning (currently manual with some
aids from COPPER) - Horizontal clustering
- Horizontal cut (final partitioning)
- Techniques
- Buffer assignment each module gets its own
buffer - Buffer renaming read/write on different buffer
- Circular buffer each module gets a window of
fixed buffer size - Our approach the combination
32Buffer rotation
Circular buffer B
B
Pipe stages a, b, c, d
33Background - acyclic dataflow
- Single circular buffer
- One serial data flow path
- All data flows are of same type same size
- Multiple buffers
- Multiple data flow paths
- Different type, size
a
a
b
b
c
c
d
d
34A more complete picture
3. Life-time spent in pipeline
4. Buffer dead
2. Buffer live
Circular buffer A, B
1. Buffer ready(raw data, e.g. ATR images)
Pipe stages a, b, c, d
Head pointer
35How does it work?
- Raw data is dumped into the buffer from the data
sources - A head pointer keeps incrementing
- Buffer is ready, but not live (active in
pipeline) yet - Example, ATR image data coming from sensors
- Buffer becomes live in pipeline
- Raw data are consumed and/or forwarded
- New data are produced/consumed
- When a buffer is no longer needed by any pipeline
stages, it is dead and recycled - Is everything really independent?
- Yes!
- At each snapshot, each module is operating on
different data
36What are we trading off?
Speed computation intensity, parallelism,throughp
ut,power
Time Response time, delay
Workload amount of computation, energy
373-D Design space navigation
Workload N frames
Time
Speed
38Design flow
C Source code
IMPACCT pipeline code versioning
DFG
Pipelined C Source code
COPPER power simulator
Task-level constraints
Power-aware schedule
IMPACCT scheduler and mode selection
System-level constraints
39Scheduling case studyWavelet compression (JPL)
- Algorithm in C
- Wavelet decomposition
- Compression "knob" to choose lossy factor or
lossless - Example category
- Dataflow, similar to DAATR
- Finer grained, better structure
- IMPACCT improvements
- Transformation to enable pipelining
- Exploit lossy factor in trade space
40Wavelet Algorithm
- Wavelet Decomposition
- Quantization
- Entropy coding
41Wavelet Algorithm structure
For all image blocks
Initialization (check params, allocate memory)
block init.,set params, read image block
decomp(), (lossless FWT)
- Sequential execution blocks
- No data dependency between image blocks
(remove overlap)
Bit_plane_decomp, (set decomp param)
(1st level entropy coding)
Output result to file
(bit_plane encoding)
42Wavelet experiments
- Experiments being conducted
- Checkpoints marked up manually
- Initial power estimation obtained
- Code being manually rewritten / restructured for
pipelining - Appears better structured than UAV example
- Trade space
- High performance to low power
- Pipelining in space and in time, similar to UAV
example - Lossy compression parameter
43Ongoing scheduling case studyDeep Impact
- "Planning" level example
- Coarse grained, system level
- Hardware architecture
- COTS PowerPC 750 babybed, emulating a Rad-Hard
PPC at 4xgt Models the X-2000 architecture using
DS1 software - COTS PowerPC 603e board, emulating I/O devices in
real time - Software architecture
- vxWorks, static priority driven, preemptive
- JPL's own software architecture -- command based
- 1/8 second time steps 1-second control loops
- Task set
- 60 tasks to schedule, 255 priority levels
44NASA Deep Impact project
- Platform
- X-2000 configurable architecture
- to be using RAD 6000 (Rad-Hard PowerPC 750
_at_133MHz) - Testbed (JPL Autonomy Lab)
- PPC 750 single-board computer -- runs flight
software - Prototype _at_233MHz, Real flight _at_133MHz
- COTS board, L1 only, no L2 cache
- PowerPC 603e -- emulate the I/O devices
- connected via compact PCI
- DS1 Deep Space One (legacy flight software )
- Software architecture
- 8 Hz ticks, command based
- running on top of vxWorks
- Perfmon performance monitoring utility in DS1
- 11 test activities
- 60 tasks
45Deep Impact example (cont'd)
- Available form Real-time Traces
- Collected using Babybed
- 90 seconds of trace, time-stamped tasks, L-1
cache - Input needed
- Algorithm (not available)
- Timing / power constraints (easy)
- Functional constraints
- Sequence of events
- Combinations of illegal modes
- Challenges
- Modeling two layers of software architecture
(RTOS command)
46Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
47SMT Power Simulator
- Simulator Features
- Compatible with SimpleScalar 3.0b
- Execute PISA and EV6 binaries
- Portability Run on most kinds of computers
- Handling Simultaneous Multithreading
- Run up to 8 threads simultaneously
- Similar to UW SMT model
- Power Aware Features
- Same analytic power model as WATTCH
- Clock Gating
- Parameterized Models
- 42 functional unit classifications (WATTCH has
12) - 10 dynamic activity factors (WATTCH has 4)
48Examples of Module Classification
- Functional Units include
- Arithmetic units ALU, FPU, etc
- Control units Instr decoder, etc
- Memory units Caches, CAM, etc
- Buses Result bus
- Cache Access
- Cache Hit
- Read Tag Data
- Cache Miss
- Read Tag
- Update Tag Data
- Read Data
- Arithmetic Operation 4 groups
- Int ALU , -, bit operations
- Int MULT ?, ?
- FP ALU , -
- FP MULT ?, ?
49SMT Power Simulator
- Project Status
- Performance Simulator Done
- Power Simulator Implementation is done
- Power parameter verification on going
- Verification Methodology
- Analytic model
- Proven models from WATTCH
- Comparison with COTS processors
- Motorola PowerPC 7450
- Intel mobile Pentium III
- Alpha 21264
50Example of Verification with COTS Processors
PowerPC 7450 Power Consumption
- Typical/Maximum Power Consumption
- Typical -gt Average power consumption of
applications - Maximum -gt Peak power consumption of applications
- Benchmark simulations are needed to verify
- Modules in operation
- Deep Sleep Nothing -gt Static power dissipation
- Sleep PLL working -gt Static PLL power
dissipation - Nap BUS snooping -gt Static PLL I/O power
dissipation - Doze No instruction fetch -gt no information
51Example of Simulation Result
- Processor Configuration
- 4 issue superscalar
- Target programs 4 simple test programs
- Maximum power consumption
- 87.37W at 4 ICP (Instruction per cycle) Maximum
throughput - Clock gating
- CC1 Max power for running units and zero for
idle units - CC2 Input dependent power for running units and
zero for idle units - CC3 Input dependent power for running units and
static power for idle units
52SMT Simulation Methodology
- Input
- C Program
- Executable Binaries
- PISA
- EV6
- Processor Parameters
- Architectural Parameters
- Output
- Static Power Consumption
- Program independent
- Dynamic Power Consumption
- Program dependent
- Power Profile Moving Avg.
Processor parameters
Target C Program
Power Parameters
Host Compiler
crossCompiler
Power Simulator
Dynamic Power
Dynamic Profile
Static Power
53SMT Power Simulator Tool Usage
- Host Portability
- Any host computer that can run SimpleScalar
- Execution command
- sim-smt options target.list
- List file content
- executable program arguments
- Processor parameters
- -config configuration.file
- Simulation results redirection
- -redirsim simulator.result
- -redirprog target.program.result
54Mode Selection
- Determine when what component is running at what
mode - Mode selection is non-trivial
- Scheduler will be overwhelmed to determine
component modes at the same time! - Exploration space of all mode combinations is
tremendous - Greedy solution may fail mission
timing-constraints or power constraints - Mode selection is worthwhile
- Exploration spaces exist to improve power
reduction and power-awareness - Energy saving ( 5-15) Cost saving (10-40)
- Ease the task planning and give a more realistic
picture
55Methodology and Design Flow
- The whole picture - the integration of
- Power-aware scheduler
- Mode selector
- Power estimation/profiling tools
- Static view
modified schedule
Scheduler
Mode Selector
Initial schedule
Power/timing number power profile
Power/timing budget
Power profile
Power Estimator
Power/timing budget
56System Modeling
- Component power model
- Power modes with overhead
- System timing model
- Constraint graph
- Mode dependency modeling
- Mode dependency graph
- External parameters
- Environment temperature
- Surrounding terrain
57Component Power Model
- Power mode
- Each mode is defined by power and timing
attributes - Constant, Profile, external (environmental)
parameters - May be hierarchical -- e..g. PowerPC 7450
- active cache on cache settings , cache
off, voltage scaling, clock scaling , - doze clock scaling ,
- nap
- deep sleep
- Overhead on mode changes
- Power overhead, timing overhead
- e.g. preheating a motor, voltage scaling, PLL
- Environmental parameters
- e.g. temperature, terrain (roughness of ground
for a motor) - Affect power and timing overhead
58Component Model Examples
- Driving motor
- Power is function of Temperature
- Mode change time also functionof Temperature T
- Microprocessor (PowerPC 603e)
Power 2.2W Time (1.875T10)(Tlt0) 10(T0)
off
on
Power 0.1225T 1.0
0W
Power 0.5W Time 3
Full power
4.0W
DPM
3.2W
10 cycles -
10 cycles -
100us 255 bus clocks 10 cycles
10 cycles -
100us 255 bus clocks 10 cycles
10 cycles -
Doze
Sleep
Nap
40mW
1.0W
70mW
t1 3 cycles
3 cycles
t1 3 cycles
59FireWire Bus Power Model
- Cable Power
- Pc µL Cf (µ constant, L cable length, Cf
data transfer rate) - Driver Power (Pd)
- Fast lookup table
- Protocol simulator (in progress)
- Event-driven system-level simulator
- Generated event traces for high level power
estimation - Bus Power
- Pbus Pc Pd
60Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
61Timing Constraint graph
- Min/max timing constraints
- between pairs of events
- Vertices
- Represent events
- A task has a Start and an End evente.g. A.s
start event of task A, B.e end event of
task B - Directed edges
- Weights on edges
- Nonnegative weight min constraint
- Negative weight -max constraint
10
A.s
B.e
End event of B should be no earlier than 10 time
units after the start event of A
-10
A.s
B.s
Start event of B should be no later than 10 time
units after the start event of A
62System Timing Modeling Example
Haz hazard detector Str steering motor Drv
driving motor Cam camera Ppc processor Sci
scientific device Rf radio frequency modem
- Micro Rover example
- Multiple resources
- Timing constraints between tasks
sci.s
rf.e
1
1
-30
-20
-5
ppc1.s
ppc2.s
ppc1.e
str.s
str.s
1
1
-10
5
1
Haz.e
drv.e
cam.s
drv.s
63Mode Dependency Modeling
- Functional modes
- examples ATR -- short range, middle range
- behavior choice as dictated by functional
requirements(i.e., not controllable by power
management) - Component modes
- examples processor full-on, sleep, doze,
voltage/clock scaling - operational setting of component(i.e., open to
mode selection for meeting power/timing
constraints) - Dependencies
- Among functional modes (of different activities)
- Among component modes
- Between functional and component modes
- e.g., ATR in short-range mode, Processor running
in high-clock rate
64Mode dependency graph
- Directed acyclic graph
- Mode Vertices
- modes of component
- Edges
- mode dependency "only if"
- mode A chosen implies B may be chosen
- mode B NOT chosen gt NOT mode A
- Operator vertices
- AND, OR, MUTEX
- (C op D) implies E may be chosen
- not E gt (C op D) must be false
- op imposes constraint on combination of C, D
mode
A
B
C
op
op
E
D
65Mode dependency example Rover
- Components
- hazard detector, driving motor, steering motor
- Constraints on modes
- hazard detector and the motors should not be
working at the same time - Mode combinations
str.on
OR
MUTEX
drv.on
haz.on
haz hazard detector str steering motor drv
driving motor
66Mode Modeling ExampleµAMPS sensors
- Components
- processor, memory, RF, sensor
- Constraints on modes
- Processor is active when both radio and sensor is
active - Memory is active only when processor is active
- Microsensor architecture
S.on
A.sleep
AND
R.on
A.sleep
S.on
XOR
R.rx
A.idle
MUTEX
R.rx_tx
A.active
M.on
A.active
M.on
AARM Mmemory R radio S sensor
67Mode Modeling of µAMPS sensors(contd)
- Mode combinations considered
- by MIT group 5 combinations
- manual grouping, ad hoc
- Our method
- 3 more combinations
- systematically generated from dependency graph
- Add constraint
- When sensor is off, all other component should be
off (proactive) - Automatically obtain same results as MIT group
Not given by MIT group
R.on
S.on
68Mode Combination Enumeration- Using Dependency
Graph
Radio
- Component level mode dep. graph
- Group modes by component
- Show mode dependency between components
- Enumerating reachable modes
- Topological sorting
- Graph helps prune out infeasible mode
combinations - Break cycle in comp. graph
- Removing an edge in cycle
- Keep track of the last dependent successor
component
ARM
Memory
Sensor
Radio
Sensor
ARM
Memory
off
off
sleep
off
on
off
sleep
off
idle
off
on
idle
off
active
on
69External Parameters Constraints
- Parameters in system model
- Temperature, terrain
- Used to characterize components and their
overhead - System Constraints
- Maximum Power constraint
- Constant or power profile (function of time)
- Minimum Power constraint
- Constant or power profile ( function of
time) - Total energy constraint ( under working)
- Mission time (mission deadline)
Power consumption of Driving motor at different
temperatures
70System Power Representation
- Schedule
- Gantt Chart
- Time view
- Power view
- Mode selection
- Gantt chart
- Tasks marked with mode settings
- Added non-operating tasks
- Idle intervals
- mode change overheads
- Power profile view
71Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
72Mode selection Problem statement
- Input
- initial schedule (timing power)
- component model, system model
- initial selection of modes
- Objective
- Model mode change overhead (timing, power)
- Capture sequence of mode changes
- Minimize energy cost by consideringoverhead
tradeoffs - Output
- Schedule for power timing, with overhead
- Augmented schedule with selected mode
73Application Example Rover
- Behaviors and tasks
- Moving around on Mars surface
- Hazard detection, driving and steering
- Communicating with the Lander
- Taking pictures (IMP)
- Performing scientific experiments (APXS, ASI/MET)
- Components in the entire system
- Hazard detector (HAZ)
- Driving motor (DRV)
- Steer motor (STR)
- Radio frequency modem (RF)
- Camera (CAM)
- Microprocessor (PowerPC)
- Microcontroller (ARM)
A schedule of the electronic subsystem of micro
rover
74Mode selection ResultsEnergy savings
- Traditional approach
- Only two modes On, Off
- Timing constraints ONLY
- Power constraints may be violated
- Considers mode change overhead
- Our Approachwith Mode Selection
- All legal mode combinations
- Both timing and power constraints
- Detailed mode change overhead
- Results
- Energy saving 3.7 to 11.9
- average saving 8.7
75Results for mode selectionCost savings
- Cost vs. Energy saving
- Cost defined as energy above minimum constraints
- Savings
- From 6.9 to 49.3
- average 26.5
76Exploring Different Working Scenarios
- Three tasks
- Moving around (MOV)
- Taking picture (CAM)
- Scientific experiment (SCI)
- Three scenarios
- A MOV, CAM, SCI
- B CAM, MOV, SCI
- C CAM, SCI, MOV
- Temperature profile is given as
77Result III
- Scenarios consume different amounts of energy
- Scenario C consumes 12 more energy than scenario
A (by mode selection) - Mode selection always does better
- compared to (on, off) only
- up to 11.7 energy saving
78Mode selection Issues
- Challenges
- Explosion of state space -- grows exponentially
- Modeling restrictions in mode change sequence
- Solution / novelty
- Formalism for mode dependency at component level
system level - Systematically prune search space
- Experimental results
- Energy and time saved
- More accurate modeling of overhead
79Accomplishments to date
- Power-aware scheduling
- Multi-processor/domain, Min / Max power and
timing constraints - 3 classes of system level pipelining techniques
- Mode selection
- Component and system model
- Captures power timing overhead on mode change
- Incorporating power models and simulators
- SMT simulator for advanced microarchitectural
exploration - FireWire, DRAM, cache, PowerPC
- Tool prototype Integration
- GUI for power-aware Gantt chart scheduling mode
selection - Power aware visualization tool for benchmarks
- Interface to COPPER project
80Lessons learned
- Challenges
- Not all applications fit a given model
- Alternative design flows may be required for
different applications - Manually extract parallelism dependency in
benchmarks - Capture mode dependency in components
applications - Integration of good power models for PowerPC
- Right level of abstraction
- Many low-level power models available not always
usable - Need system-level power estimations
- Details of the architecture model
- Memory / bus power models
- Overhead for voltage/frequency scaling
81Fulfilled Milestones
- Power-aware scheduling 3 papers
- Multi-scenario
- System-level pipelining
- Mode selection
- encompass power management (voltage/freq scaling)
- UI prototype
- scheduling, mode selection, benchmark
visualization - Initial tool integration
- interface to COPPER
- Processor power simulation models
- SMT simulator
82Upcoming Milestones
- Dynamic optimization
- Scheduling and planning -- using the Deep Impact
example - Pipeline depth/width tuning at run-time
- Additional static optimization
- component selection/assignment
- bus topology optimization
- Simulation
- Bus simulation models
- SMT -- Thermal dissipation profiling,Dynamic
power/thermal management - Tool integration
- Simulation models from other groups
- IMPACCT tools and library
- tighter integration between IMPACCT and COPPER
83Ideas dynamic optimization
- More dynamic scenarios
- Power suddenly cut off, with small power reserve
before shutdown - Mission replanning, changing objectives
- Solutions required
- Division between static preparation dynamic
handling - Ability to decide most important actions to take
under extreme time constraint - Need feedback/notification mechanism in execution
model - Decentralized power management
- Need new benchmark examples
84Future planned evaluation
- Deep Impact from JPL
- Mission planning and scheduling example
- Image compression (wavelet) algorithm
- Architectural mapping
- JPL Testbed
- PPC750 board to measure actual power
- PPC750 to simulate instrumentation in real-time
- advanced board with real instrumentation
- Validation through simulation
- Scheduler output fed to COPPER for compilation
- Simulation via COPPER and our own SMT
- Compare estimated power with refined version
85Applications
- Space
- Mars Rover (scheduling, mode selection)
- Deep Impact (planning)
- UAV
- DAATR (pipelined scheduling) (mode
selection under investigation) - Distributed sensors
- MIT µAMPS sensor (mode selection)
- Need apps requiring dynamic planning/reconfig!
86Development plans
- Scripting and web-based tool
- Jython (Java Python), TkInter for GUI prototype
- Core scheduler
- Modular, detachable from GUI
- Option to run on separate server or same process
as UI - CGI scripts for arch. configuration (unix/web
based) - Latest version distributed thru WebCVS
- Interface with commercial CAD backend
- Detailed power estimation tools
- Functional simulation with proprietary models
- Rationale
- Open source, runs on any platform
- All publicly available development tools
- Trivial to install, no compilation, encourage
modification
87Technology Transition --Consystant Design
Technologies
- Version 1 released Apr.11
- shown at ESC
- runs on Linux
- will support Solaris, Win2k
- Extensible system
- platform plugin for synthesis
- targets Linux, vxWorks,
- Simulator
- selective focus
- coordination centric
- Active collaboration confirmed
- Installation in week of June 25
- Designated application engineer
88http//www.ece.uci.edu/impacct/
89Metrics
- Source-aware energy model
- Takes free energy into account
- Cost for not using free energy
- Profile-aware
- Total energy dependent on consumers power
profile - Smoothness of power draw
- Scenario-aware
- Cost function tracks external factors (e.g.
temperature, solar level) - Stage in mission
- Timing/performance
- Makespan (length of an iteration)
- Dynamic planning cost
90Architectural Configuration
- Mode selection
- Power consumption level (doze, nap, sleep, etc.)
- Low power design techniques
- Clock scaling, voltage scaling
- Memory/cache configurations, bus encoding
- Communication protocols, compression, algorithm
transformations - Optimize feasible solutions for energy/timing
costs - Power, Real time, Inter-resource modes
constraints - Constraints between functionality modes and
resources modes Functionality mode and resource
modes - Bus topology optimization
- Static clustering and bus partitioning
- Dynamic reclustering with shutdown
91Application - Mars Rover
- Mission-critical embedded system
- Hard real-time system
- Composed of COTS component
- Electronic µprocessor, µcontroller,
memory,camera, scientific devices, ... - Mechanics/thermal driving motor, steering motor,
heaters, - Power sources solar panel, battery
- Power/energy and performance constraints
- Stringent max power constraint
- Flexible min power constraint
- Limited non-rechargeable energy sources
- Global timing requirement
- Limited working window during sol daytime
- Timing constraint among tasks
- Harsh and uncertain working environment
- Extremely low temperature - affects component
behaviors - Uncertain environment winds/obstacles/rugged
terrain
92Example Platform- X2000
- COTS components Modeling
- Processors (PowerPC 603e, 750)
- Memory organization (cache, memory)
- System interconnects (FireWire bus
driver/controller) - Scientific equipment
- Sensors/actuators
- Mechanics/Thermals (driving/steering
motors/heaters) - System-level architecture modeling
- Tree topology for FireWire bus architecture
- Component clustering for bus segmentation
93Testing Methodologies
- A
- "Activity" for given duration (5 s, 10 s, 15 s)
- repeated 6 times
- record both I-cache D-cache misses (recorded in
separate runs) - B
- Recording 90 seconds worth of an Activity till
its completion - 1 minute gap between runs
- also I-cache D-cache misses
- C -- what is measurement C?
94User Input
- Attributes
- tasks, resources, timing constraints,
- power budgets
- Unique features
- power as constraint
- scheduling, system-level mission planning,
power-aware loop pipelining, - timing constraint classification.
- subsumes deadline, dataflow
- Language
- mix of graphical and custom constraint language
95Methodology and Work Flow
- Exploration techniques
- Backtracking
- Cutting exploration space with multi-dimensional
constraints - Two steps in design exploration
- Find feasible mode selection for operating tasks
- Timing constraints
- Constraint graph
- Resource slacks
- Mission deadline
- Dependency between tasks
- Dependency graph
- Find feasible mode selections for idle intervals
- System power/energy constraints min, max, or
power profile - Mode change overhead, both time and power
overheads - Speedup techniques
- Sorting component modes with power numbers