Power Analyzer Program Review - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Power Analyzer Program Review

Description:

Motivator to insure out power analyzer can model new ... Dual Vdd or Dual Vt ... Power MPEG4 Codec Core Exploiting Voltage Scaling Techniques' ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 25
Provided by: dir119
Category:

less

Transcript and Presenter's Notes

Title: Power Analyzer Program Review


1
Power AnalyzerProgram Review
  • Dirk GrunwaldUniversity of Colorado

2
Overview
  • Clustered Voltage Scaling
  • Motivator to insure out power analyzer can model
    new microarchitectural mechanisms
  • Operating System Voltage Scaling
  • Memory system control for low power

3
Lessons from Physics
4
Dynamic Voltage Scaling in SA-2
2 Billion Instructions_at_ 750 MIPSTakes 2.7
secondsConsumes 1200 J.
5
Running just fast enoughcuts energy 3-fold
432 J
1200 J
12 second duty cycle
6
You can exploit slack at many levels
  • Circuits Dual Vdd or Dual Vt designs
  • Design methodology of Ultra Low-Power MPEG4
    Codec Core Exploiting Voltage Scaling Techniques
    Igarashi et al, DAC98 50 power reduction, no
    performance loss
  • Architecture
  • Macro-level clustered voltage scaling,
    multi-voltage multithreading
  • Operating Systems
  • Applications / Runtime Systems

7
Why are we looking at system-level savings
modeling
  • Computer architecture folks are the only people
    who would watch an MPEG movie at 120 fps
  • We ignore interactive applications
  • But thats the future of computing
  • And we ignore human response times..
  • Do credit card transactions need to be faster
    than a 1/10th of a second?

8
Clustered Voltage Scaling
  • Voltage Scaling normally refers to varying
    voltage over time.
  • CVS is voltage scaling in space
  • Run part of a processor at different V f
  • Historically, done at circuit level
  • Were trying to exploit at component level

9
Clustered Voltage Scaling
  • CVS already applied at circuit level
  • Mitsubishi designed MPACT media processor w/2
    voltage levels for 43 power savings, 10 area
    increase, no performance hit

10
Slack Scheduling
  • Use inherent instruction dependencies and
    operational latencies to form alternate schedules
  • Slack is the minimum time difference, in cycles,
    between when an instructions output is produced
    and when it is consumed
  • We want to exploit slack within a cluster of
    functional units
  • Schedule slackfull instructions to slower
    pipelines that run at ½ speed and reduced power

11
Exploiting Slack
add r0, r1, r2 (A) sub r3,
r4, r5 (B) and r9, 0x1, r9
(C) ornot r5, r9, r10 (D) xor r2,
r10, r11 (E)
12
Simulation Methodology
  • Simulation architecture
  • SimpleScalar 3.0a w/CaiWattch mode
  • 4-wide 21264 16-entry RUU
  • SPECint95 benchmarks

13
Results and Potential
  • Over 90 of issue cycles have at least one
    instruction that has 1 cycle of slack
  • This means that 90 of the time, we could run one
    instruction on a slow pipe without impacting
    performance
  • Between 1-7 have 2 cycles
  • 68-87 instructions with slack are integer

14
Story Gets Better With More Aggressive Processor
  • Slack is affected by deeper RUU
  • More opportunities to find slack
  • More slack valuesgt 1 cycle available

RUU Size 3
U
V
W
X
RUU Size 4
Y
RUU Size 5
15
Operating System Scheduling
  • Goals Control power using clock / voltage
    scheduling
  • Real systems
  • Real apps
  • To date comparison study showing that previously
    proposed heuristics dont really work well
  • Why know why
  • How to fix it

16
What are some challenges?
  • How slow is fast enough?
  • How do I tell the architecture?
  • Enforce constraints?
  • Not miss deadlines?
  • Can I define benchmarks and evaluation
    methodologies for human-scale computing where
    voltage?

17
Difficult to predict application demands
Goal Dont disturb application behavior.
Inelastic performance.
Speech Rendering
AudioRendering
18
Prior work
  • Weiser et al and Govil et al
  • Used Intervals
  • Selected averageweighted average
  • Reportedgreat success
  • Pering et al
  • Tried intervals
  • Switched to RTOS,which has highdemands
    onapplications
  • Is RTOS really needed?

19
Evaluation
  • Implement clock scheduling module in Linux 2.0.35
    kernel
  • Extensible can model all practical prior
    policies
  • Strong SA-1100 provides 15 clock steps from
    56Mhz to beyond 206Mhz
  • Used modified motherboard
  • Useful, but not critical in early study
  • Drop from 1.5V to 1.23V only provides 10 power
    reduction
  • Measured reasonable applications
  • Text speaker, chess player, Web browser, MPEG
    video player

20
Widely Varying Power Usage
  • Better evaluation metric given resources is
    scheduling stability.
  • E.g. MPEG-1 player runs at 80 utilization at
    206Mhz
  • Should be able to settle at 176mhz _at_ 93
    utilization
  • But, the best policy has widely varying power
    settings from
  • Best policy is assume next scheduling interval
    is same as prior
  • Only run at fastest or slowest setting
  • This is awful, but we know why this happens

21
Leading to bursty energy demands..
22
What were doing now
  • All this work done in old O/S
  • Upgrading to Linux 2.4.0, interoperation with
    iPAQ Itsy
  • Determine minimal O/S mechanism
  • Simple Go fast vs I went too slow
  • More complex soft real time system
  • Application-specific behavior state
  • Via queue length for events in e.g. Java
    applications
  • Trying to implement control mechanism in a number
    of processor families
  • AMD K6-III Mobile
  • SpeedStep
  • Xscale

23
Memory ManagementEnergy Efficiency
  • Dynamic memory management a large part of
    complex applications
  • Implemented four memory management mechanisms on
    Itsy
  • No allocation, explicit allocation, conservative
    allocation, incremental conservative allocation
  • Measured processor, system with DAQ
  • Energy not always correlated with performance
  • Interaction of CPU and memory system fairly
    complicated
  • SA-1 places CPU in sleep mode on memory traffic,
    slows clock.
  • Thus, memory traffic can take much time but less
    energy
  • Plan to exploit interaction with O/S for powering
    down memory pages

24
Collateral Related Projects
  • NSF ITR proposal funded on integrated power
    management of wireless and system resources
  • Management of 802.11b performance for location,
    trajectory, real time constraints
  • Adhoc routing for global energy minimization
  • Broader effort at leading to inter-departmental
    center
  • Colorado Center for low-power, ubiquitous,
    mobile and pervasive systems (CCLUMPS)
  • Circuits, architecture, telecom, computer
    services, applications
Write a Comment
User Comments (0)
About PowerShow.com