Title: Parameterized Embedded Systems Platforms
1Parameterized Embedded Systems Platforms
- Frank Vahid Students Tony Givargis, Roman
Lysecky, Susan Cotterell
Dept. of Computer Science and Engineering Universi
ty of California, Riverside Member, Center for
Embedded Computer Systems, UC Irvine
Supported by NSF, NEC
The Dalton Project
2Outline
- Introduction
- Parameterized SOC platforms
- Exploring parameter configurations
- Future direction self-optimizing platforms
- Conclusions
3Introduction
- Advent of system-on-a-chip
Micro- proc. IC
Memory IC
Peripher. IC
FPGA IC
Board
Introduction
4System-on-a-chip (SOC)
Introduction
5The Productivity Gap
ITRS99
6Programmable Platforms (ITRS99)
Micro- processor
Cache
Memory
DMA
Bridge
System bus
Peripheral bus
FPGA
FPGA
Peripheral
Peripheral
Programmable Platform
- Pre-fabricated IC, synthesizable HDL, or both
- reference designs (VLSI), silicon platforms
(Philips), fig chips (Vahid/Givargis99)
Introduction
7Targeted to Embedded Systems
- May drive future architecture design
Patterson98 - Varied power/performance/size constraints
- Programmable platforms must adapt
Introduction
8Adapting platforms to constraints
- One solution Architectural Parameters
Cache
Application2 main() while()
Introduction
9Related work
- Pleiades project Rabaey97
Architecture
Applications
Introduction
10Outline
- Introduction
- Parameterized SOC platforms
- Exploring parameter configurations
- Future direction self-optimizing platforms
- Conclusions
11Basic parameters -- cache
Micro- processor
Cache
Memory
DMA
Bridge
Cache
System bus
Peripheral bus
FPGA
FPGA
Peripheral
Peripheral
Programmable Platform
Parameterized Systems-on-a-chip
12Basic parameters -- cache
Parameterized Systems-on-a-chip
13Basic parameters -- bus
Parameterized Systems-on-a-chip
14Basic parameters -- Bus
Bus
Change Bus Width Givargis98
C1 gt C2
Parameterized Systems-on-a-chip
15Basic parameters -- Bus
Bus
0 1 0 0 1 0 1 1
1 0 0 1 0 1 1 0
Hamming Dist 6
Binary Encoding
Hamming Dist 3
Bus-Invert Encoding
Parameterized Systems-on-a-chip
16Parameter definitions
- Parameter
- An architectural feature that can be varied, with
a small set of possible values, without changing
the applications essential functionality. - Configuration
- A selection of a particular value for every
architecture parameter - Static vs. dynamic parameter
- Static Value is set before fabricating the IC.
- Dynamic Value is set after fabricating the IC.
Parameterized Systems-on-a-chip
17Potential tradeoffs experiment ICCAD99
Parameterized Systems-on-a-chip
18Potential tradeoffs experiment ICCAD99
- Cache Dinero Edler, Hill
- ISS Tiwari96
C Program
Cache Simulator
Power
Power
Power
Power
Total power
Parameterized Systems-on-a-chip
19Potential tradeoffs experiment
- Computed power for all 45,568 configurations
- For each of four C applications
- Used microprocessor, cache, and bus simulators (1
wk CPU)
Tradeoff between performance and power
- X-axis execution time (sec)
Parameterized Systems-on-a-chip
20Potential tradeoffs experiment
Narrower bus required a larger cache size
Parameterized Systems-on-a-chip
21Potential tradeoffs experiment
- Performance varied by 11x
- Power varied by 13x
- Area varied by 1x
- Energy consumption varied by 2x
Parameterized Systems-on-a-chip
22Potential tradeoffs experiment
Parameterized Systems-on-a-chip
23Potential tradeoffs experiment
- Performance varied by 2.5x
- Power varied by 9.5x
- Area varied by 1x
- Energy consumption varied by 4x
Parameterized Systems-on-a-chip
24Potential tradeoffs experiment
- How much variation in total system power and
performance can we obtain just by varying the
cache and bus parameters? - 9 to 14x improvement in power/performance
- How interdependent are these two types of
parameters? - fixing cache param. values, then selecting bus
param. values results in non-optimal solutions
Parameterized Systems-on-a-chip
25Many more parameters possible
- Some examples include
- Code compression (Henkel/Wolf)
- Address bus encoding
- Multiple levels of memory hierarchy
- CPU parameters (e.g., voltage scale, DP width)
- Peripheral core parameters (our current focus)
- Fertile research area
- Can yield even larger tradeoffs if we
- Create parameter-aware compiler
- Adapt OS?
Parameterized Systems-on-a-chip
26Outline
- Introduction
- Parameterized SOC platforms
- Exploring parameter configurations
- Future direction self-optimizing platforms
- Conclusions
27Exploring parameter configurations
- Low-level simulation
- Gate-level simulation
- Far too slow, days per configuration
- RT-level simulation
- Still slow, hours per configuration
- Our approach
- System-level simulation
- Minutes per configuration
- System-level trace simulation
- Seconds per configuration
- System-level trace analysis
- Milliseconds per configuration
28Evaluation by gate-level simulation
- Capture each core in HDL, synthesize, simulate
Exploring Parameter Configurations
29Evaluation by system-level simulation
Exploring Parameter Configurations
30Evaluation by trace-simulation
- Note that the cache simulator is non-functional
- Same approach for others
- Get traces from small of system simulation
Exploring Parameter Configurations
31System simulation vs. trace simulation
System level model
System level model
uP
DMA
uP
DMA
UART
UART
Execute
Execute
Traces
Parameter evaluation
Trace simulators
Parameter evaluation
Power
Power
32Evaluation by trace-analysis
- Further speedup --
- statistically-characterize traces
- Still only small of system simulations
Exploring Parameter Configurations
Exploring Parameter Configurations
33Trace-analysis approach for cache
- Given a trace of memory refs
- Cache parameters
- Size (S)
- Line/block-size (L)
- Associativity (A)
- Compute of misses (N)
Exploring Parameter Configurations
34Trace-analysis approach for cache
Exploring Parameter Configurations
35Trace-analysis approach for cache
- Capture improvements obtainable by
- changing line-size at small/large values of
cache-size - changing associativity at small/large values of
cache-size
Exploring Parameter Configurations
36Trace-analysis approach for bus
Num transfers per item
Bus width
capacitance
Items/second
Random data
Exploring Parameter Configurations
37Trace-analysis approach for bus
- Bus equation
- m items/second (denotes the traffic N on the
bus) - n bits/item
- k bit wide bus
- bus-invert encoding
- random data assumption
Exploring Parameter Configurations
38Trace-analysis experiments
- Cache parameters
- size 128, 256, 512, 1k,
- 2k, 4k, 8k, 16k, 32k
- assoc 2, 4, 8
- line 8, 16, 32
- Bus Parameters
- width 4, 8, 16, 32
- code binary/bus-invert
- Analyzed 45K sets exhaustively for each of 4
examples.
Exploring Parameter Configurations
39Experiment Results
- Diesel applications performance
- Blue (light-gray) is system-simulation-based
- Red (dark-gray) is trace-analysis-based
4 error 320x faster
Exploring Parameter Configurations
40Experiment Results
- Diesel applications energy consumption
- Blue (light-gray) is obtained using full
simulation - Red (dark-gray) is obtained using our equations
2 error 420x faster
Exploring Parameter Configurations
41Experiment Results
- CKey applications performance
- Blue (light-gray) is obtained using full
simulation - Red (dark-gray) is obtained using our equations
8 error 125x faster
Exploring Parameter Configurations
42Experiment Results
- CKey applications energy consumption
- Blue (light-gray) is obtained using full
simulation - Red (dark-gray) is obtained using our equations
3 error 125x faster
Exploring Parameter Configurations
43Experiment Results
- 125 - 400x speedup
- 1-18 absolute error (power performance)
- 2 average power error
Exploring Parameter Configurations
44Techniques for general cores
- Earlier experiments were for uP/cache/bus
- System simulation for other cores (ISSS00)
- Isolate instructions in system-level model
- Gate-level simulation per instruction
- Back-annotate system-level models instructions
- Similar to technique for microprocessors, but
- Must consider power modes
45Trace approach for general cores
Full trace Reset -- Quantize P1,P2,,P64 IDCT
P1,P2,,P64 Quantize P1,P2,,P64 IDCT
P1,P2,,P64
Reduced trace with characterized data Reset
-- Quantize .80 IDCT .72 Quantize .93 IDCT
.63
System level model
uP
DMA
UART
Execute
Traces
Reduced trace with instructions only Reset
-- Quantize -- IDCT -- Quantize -- IDCT --
Reduced trace with instruction frequencies Reset
1 Quantize 2 IDCT 2
Trace simulators
Parameter evaluation
Power
46Experiments with general cores JPEG
trace file size (Kb)
CPU time for power evaluation (sec)
pixel
size
ftrc
rtrc_
rtrc
gate
sys
ftrc
rtrc_
rtrc_
(bits)
cd
_i
cd
i
10
32
3.6
0.5
290000
48
26
4.9
4.6
12
39
3.6
0.5
330000
49
27
5.1
4.6
average speedup
6K
12K
62K
67K
gate
ftrc
rtrc_cd
rtrc_i
pixel
size
mJ
mJ
error
mJ
error
mJ
error
(bits)
10
420
443
5
451
7
491
17
12
531
569
7
576
8
632
19
average error
6
7.5
18
47Experiments with general cores UART
48Outline
- Introduction
- Parameterized SOC platforms
- Exploring parameter configurations
- Future direction self-optimizing platforms
- Conclusions
49Future directions
- Earlier work
- used software on workstation to explore parameter
configurations - Self-optimizing platform
- Can we build the exploration ability into the
platform itself? - Transparent to the user
- Ease of use, more accurate metrics, wider
acceptance, - Embedded CAD
Exploration sw
Exploration ability
50Conclusions
- Parameters can improve usefulness of programmable
platforms - by adapting platform to particular application
and to power/performance constraints - Good tradeoff range even for basic parameters
- Fast and accurate evaluation seems possible
- Much work remains
- More parameters
- Better exploration
- Self-optimizing platforms