Full-System Chip Multiprocessor Power Evaluations Using FPGA-Based Emulation - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Full-System Chip Multiprocessor Power Evaluations Using FPGA-Based Emulation

Description:

on Low Power Electronics and ... Leon3 Sparc V8 VHDL core. Clock Rate. 65 MHz. Organization ... Modify Linux kernel to feed counter values to power models ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 25
Provided by: Abh61
Category:

less

Transcript and Presenter's Notes

Title: Full-System Chip Multiprocessor Power Evaluations Using FPGA-Based Emulation


1
Full-System Chip Multiprocessor Power
EvaluationsUsing FPGA-Based Emulation
  • Abhishek BHATTACHARJEE
  • Gilberto CONTRERAS
  • Margaret MARTONOSI
  • Princeton University

Intl. Symp. on Low Power Electronics and Design
(ISLPED) August 13, 2008
2
Problem SW Simulators for Architectural Power
Estimation
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Power has become a first-class design problem
  • Affects power density, thermal behavior,
    packaging constraints
  • Early stage µ-arch perf/power evaluation is
    crucial
  • Convention SW simulators (Wattch, SimplePower,
    Hotspot)
  • Flexible, low development time
  • But SW simulations are too slow 10-100 KIPS !
  • Chips getting more complex core counts, uncore
    etc.
  • Design space getting more complex
    perf/power/thermal
  • Must consider OS, workload interaction

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
3
Alternatives to Long Simulations
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Run application snippets, ignore OS
  • Compromises result accuracy and credibility
  • Parallelize simulator Chidester et al. 02
  • Shared structures (LLC, coherence) limit
    scalability
  • Hardware runtime monitoring Joseph et al. 01,
    Bellosa et al. 02, Isci et al. 03, Contreras et
    al. 03
  • Fast evaluation time
  • Restricted view of components
  • Requires existing design

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
4
Our Approach FPGA-Based Full-System Emulation
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Develop FPGA-based perf./power emulator of a
    proposed CMP machine
  • 50-300 MHz ? run full apps, OS
  • Similar to HW monitoring
  • Programmable ? insert relevant monitors, model
    various designs
  • Similar to SW simulations
  • Bottomline Get detail and full-system effects of
    real measurements before it is built

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
5
Past and Ongoing FPGA-Based Emulation Work
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Purely performance emulation
  • HASim Emer et al. 06, RAMP Wawrzynek et al.
    06
  • Modular, parameterizable perf. models on FPGAs
  • Purely power emulation Coburn et al. 05
  • RTL with power-models on FPGA (area/latency
    overhead analysis)
  • Performance and power emulation Atienza et al.
    06
  • Performance and thermal emulation of MPSoCs for
    existing cores
  • Runs OS on host and communicates with FPGA

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
6
Presentation Outline
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Designing the emulator
  • Validating emulator power models
  • Evaluating emulator speedup
  • Profiling application runtime power behavior
  • Case study Activity migration
  • Conclusion

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
7
Steps in Designing Emulator
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
8
Target Emulation Platform
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
  • Target FPGA platform BEE2
  • 5 Xilinx V2P 70 FPGAs (1 control/4 user)
  • Current design on control unit
  • Methodology extensible to other platforms

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
9
Candidate Core DesignLeon3 SparcV8 CMP
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
Candidate Core Leon3 Sparc V8 VHDL core
Clock Rate 65 MHz
Organization 2-core, L1 snoopy cache coherence (ARM bus)
Pipeline Single-issue, in-order, 7-stage
Functional Units Adder, Shifter, Pipelined Mul /Div
L1 I-Cache 4 KB, 2-way, 32-byte lines, LRR,
L1 D-Cache 4 KB, 2-way, 32-byte lines, LRR, write-through, virtually addressed
MMU 8-entry I and D TLBs, LRU
  • Currently use 60 LUTs, 20 BRAM
  • Future scale core count, L1 caches, add LLC, FPU
  • Methodology extensible to other core designs

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
10
Inserting Event Counters
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
SparcV8 Core 0
SparcV8 Core 1
3-Port Reg. File
3-Port Reg. File
7-Stage Integer Pipeline
7-Stage Integer Pipeline
Memory-mapped counters Add to ISA
start/stop/reset counters 36 counters ? 3
LUTs, no impact on freq.
Event Counters 64-bit
4KB I
4KB D
4KB I
4KB D
AHB Cont.
AHB Bus
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
11
Power Model Development
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
  • General form of component power model
  • How to assign event Ei?
  • Want power of emulated
  • machine, not FPGA !
  • Calibrate with gate-level
  • simulations and
  • microbenchmarks
  • Please refer to paper
  • for details

Assumed 0.13µm technology requires just switching
power term (negligible leakage)
Large idle power no clock gating
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
12
Register File Power Model
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
  • Write 500-instruction microbenchmarks
  • Vary event/nop ratio
  • Idle Power 18.83 mW, Write 0.53 nJ,
  • Single Read 0.29 nJ, Double Read 0.39 nJ

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
13
Full-System Emulator with OS and Applications
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Emulator Design Steps 1. Choose target
platform 2. Choose candidate core design 3.
Design event counters 4. Design power models 5.
Boot OS and run full apps.
FPGA Platform BEE2 Control Unit
I/O
Emulated CMP
Linux 2.6, applications (Spec2006, Splash-2,
PARSEC) Knowledge of power models
Host PC
RS-232
SparcV8Core 0
SparcV8Core 1
Ethernet
AHB Bus
Event counters for all modules
Main Memory
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
14
Presentation Outline
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Designing the emulator
  • Validating emulator power models
  • Evaluating emulator speedup
  • Profiling application runtime power behavior
  • Case study Activity migration
  • Conclusion

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
15
Validating Emulator Power Models
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Extensive validation with Synopsys PrimeTime PX
    using
  • Validation µ-benchmarks
  • 2x calibration µ-benchmarks, multiple event types
  • Spec 2006 benchmarks
  • Mcf, Libquantum, Bzip2, Gcc, Sjeng (train problem
    size)
  • Run 5 distinct 1-million instruction snapshots

Module µ-benchmarks Spec 2006
Pipeline 7.51 7.58
Reg. File 6.03 6.23
I-Cache 6.81 7.21
D-Cache 7.21 7.41
AHB 5.66 7.30
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
16
Results Emulation Speedup
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Speedup over architectural simulator, Multifacet
    GEMS
  • 2-core, 4KB L1 caches
  • Mcf, Libquantum, Bzip2, Gcc, Sjeng on each core
    with train size
  • With Ruby Max. 35x
  • With Ruby Opal Max. 452x
  • Even greater speedup expected for
  • Modeling greater core counts
  • Collecting power/thermal data
  • Greater FPGA clock
  • Bigger caches

NOTE GEMS host uses a 64-bit, 2-GHz dual-core
AMD Athlon processor
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
17
Presentation Outline
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Designing the emulator
  • Validating emulator power models
  • Evaluating emulator speedup
  • Profiling application runtime power behavior
  • Case study Activity migration
  • Conclusion

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
18
Runtime Power Profiling
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Important for OS controlled power-aware
    scheduling
  • Modify Linux kernel to feed counter values to
    power models
  • Read counters within 10ms timer interrupt
  • Sampling rate multiples of 10ms
  • Access 36 counters in 5700 cycles ? Max. 0.87
    perturbation

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
19
Runtime Power for LU (2-threads)
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
CPU 1 master, CPU0 idle (380 mW)
Barrier CPU0 spin-waiting
Possible Reg. File hotspot cannot be tracked on
CPU composite profile
Low power numbers and swing 65 MHz clock, no
L2, no FPU, no gating, simple pipeline
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
20
Case Study Activity Migration
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Why study AM?
  • Effective at handling hotspots Choi et al. 07,
    Heo et al. 03
  • Our emulator is the ideal platform for AM studies
  • Hotspots depend on component power
  • Emulator directly provides this
  • On-chip temperature rise/fall times 100 ms
  • Emulator fast enough to run OS and apps. beyond
    this time range

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
21
Linux Kernel Scheduler for AM
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
Avg. migration time ? 300ms (65 MHz clock and
small caches) 2s interval for max. 15 migration
penalty
Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
22
Case Study Activity Migration
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Example AM on Bzip2, Mcf
  • Power surge on pipeline triggers swap
  • Bzip2 small working set, pipeline active most
    of run
  • Mcf large working set, lower power phases

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
23
Presentation Outline
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Designing the emulator
  • Validating emulator power models
  • Evaluating emulator speedup
  • Profiling application runtime power behavior
  • Case study Activity migration
  • Conclusion

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
24
Conclusion
Full-System Chip Multiprocessor Power Evaluations
Using FPGA-Based Emulation
  • Emulator combines HW speeds (65 MHz) with SW
    programmability 432x speedup over GEMS (Ruby
    Opal)
  • Power models accurate within 10 of Synopsys
    simulations
  • Can model range of proposed designs
  • Moores Law applies to FPGAs too!
  • Ongoing/future work
  • Scaling design with higher core counts, larger
    caches
  • GHz emulation
  • DVFS emulation
  • Thermal models

Abhishek Bhattacharjee, Gilberto Contreras,
Margaret Martonosi, PRINCETON UNIVERSITY
Write a Comment
User Comments (0)
About PowerShow.com