Compiler based Optimization Techniques for Scratchpad Memory - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Compiler based Optimization Techniques for Scratchpad Memory

Description:

Medical instruments (e.g. artificial eye) www.dobelle.com ... Symbols: s(varm ) = size of variable m. n(varm) = number of accesses to variable m ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 35
Provided by: Ver130
Category:

less

Transcript and Presenter's Notes

Title: Compiler based Optimization Techniques for Scratchpad Memory


1
Compiler based Optimization Techniques for
Scratchpad Memory
  • Manish Verma, Peter Marwedel
  • Department of Computer Science XII,
  • University of Dortmund,
  • Germany

2
Outline
  • Introduction
  • Motivation
  • Static Allocation Approach
  • Scratchpad only architecture
  • Cache Scratchpad architecture
  • Dynamic Allocation Approach
  • Scratchpad only architecture
  • Conclusion Future Work

1 S. Steinke DATE, 2002
3
Embedded Systems
  • Embedded systems (ES) information processing
    systems embedded into a larger product

Main reason for buying is not information
processing
  • Transportation (e.g. ABS)
  • Telecommunication (e.g. mobile phone)
  • Manufacturing (incl. robotics)
  • Medical instruments (e.g. artificial eye)

www.dobelle.com
4
Power Issues
Power is considered as the most important
constraint in embedded systems in Eggermont
(ed) Embedded Systems Roadmap 2002, STW
5
Power Distribution
  • Memory subsystem consumes gt 50 of total
    energy budget 1
  • Memory hierarchy
  • Cache Vs. Scratchpad
  • Power 2
  • Performance 2
  • Predictability 3
  • Software Support

1 S. Segars ISSCC, 2001 2 S.
Steinke DATE, 2002 3 P. Marwedel
ASPDAC, 2004
6
Low Power/Energy Techniques are Essential
  • Low energy dissipation is imperative for
    battery-driven embedded systems
  • Low power techniques are essential to both
    embedded systems and high performance processors

Skadron et al., 30th ISCA
Hot enough to cook an egg.
  • High performance processors are going to be too
    hot to work

7
Outline
  • Introduction
  • Motivation
  • Static Allocation Approach
  • Scratchpad only architecture 1
  • Cache Scratchpad architecture
  • Dynamic Allocation Approach
  • Scratchpad only architecture
  • Conclusion Future Work

1 S. Steinke DATE, 2002
8
Focus on memory- energy- aware
compilationScratch pad memories (SPM)
  • Fast,
  • energy-efficient,
  • timing-predictable

Small no tag memory
9
Scratchpad vs. main memory energy
Example Atmel ARM-Evaluation board
energy reduction/ 7.06 100 predictable
Prog?SPMData ?SPM
Prog?SPMData ? SPM
Prog?SPMData ? SPM
Prog?SPMData ? SPM
10
Static Allocation (Scratchpad only)
int nat() real sin() char ch() int wh ()
Example
Which objects (functions, variables) to be stored
in SPM? Gain gm and size sm for each object
m. Maximise gain G ? gm, respecting constraint
K ? ? sm.
"main" memory
int p
Static memory allocation Knapsack problem
?
real a
SPMcapacity K
int c
11
Static Allocation (Scratchpad only)
Symbols s(varm ) size of variable m n(varm)
number of accesses to variable m e(varm )
energy saved per variable access, if varm is
migrated E(varm ) energy saved if varm is
migrated ( e(varm ) n(varm )) x(varm ) 1 if
variable m is migrated to SPM, else 0 M set of
variables Similar for functions. Integer
programming formulation Maximize ?i?I x(Fi )
E(Fi ) ?m?M x(varm ) E(varm ) Subject to the
constraint ?i ?I s (Fi ) x(Fi ) ?m ?M s (varm )
x(varm ) ? K
12
Results (Energy Runtime)
Feasible with standard compiler postpass
optimization
Cycles
Multi_sort (mix of sort algorithms)
13
Outline
  • Introduction
  • Motivation
  • Static Allocation Approach
  • Scratchpad only architecture
  • Cache Scratchpad architecture
  • Dynamic Allocation Approach
  • Scratchpad only architecture
  • Conclusion Future Work

14
Static Allocation (Cache Scratchpad)
  • Caches Scratchpads
  • I-Mem subsystem
  • Trace Generation
  • memory objects (MO)
  • Conflict Graph
  • models I-Cache behavior
  • interaction of MOs
  • Fine Grained Energy Model
  • cache hits
  • cache misses

15
Example
  • B1 ((B2 B5 B6 B7)9 (B2 B3 B4 B7)))10 B8

B1
B7
B2
100, 0
B3
B5
10, 10
B6
B4
I-Cache
100, 0
90, 10
Total Cache Misses 40
16
Trace Generation
  • Min jumps across traces
  • NP Complete problem
  • Greedy approach
  • Coalesce most freq exec BB
  • Size of trace lt Scratchpad Size
  • Append NOPs
  • Reduce i-cache misses
  • Improve processor cycles

17
Conflict Graph
T4 ((T1 T2 T1)9 (T1 T3 T1)))10 T5
180,20
200,0
20,20
  • Weighted Directed Graph
  • Nodes (traces)
  • Execution frequency
  • Edges (conflict relationship)
  • conflict misses

18
Energy Model
Constant
Variable (program layout)
19
Energy Model (Example)
  • ECache_hit 1
  • ECache_miss 10
  • ECache(T2) 180 1 20 (10-1) 360
  • E(Total) 760
  • Energy consumption in Cache
  • program layout
  • execution frequencies insufficient

20
Problem Formulation
20
  • NP-complete
  • Knapsack (no edges)
  • Maximum Independent Set
  • (ESP_Hit ECache_Hit)
  • Integer Linear Programming /
  • Greedy Heuristic

T2 (180)
T3 (20)
20
T5 (1)
360
200
T4 (1)
T1 (200)
200
Conflict Graph
  • Formal Problem Formulation
  • Given conflict graph (G), scratchpad, i-cache,
    energy model
  • Determine Min. energy mapping
  • Assumption No new edges copying traces

21
Solution (Example)
T4 ((T1 T2 T1)9 (T1 T3 T1)))10 T5
20
NOP
T2 (180)
T3 (20)
0
T2
NOP
20
1
T5 (1)
360
200
90
20
2
T1
3
T4 (1)
T1 (200)
I-Cache
4
T3
5
200
Conflict Graph
T2
T4
6
  • ESP_hit 0.5
  • E(Total) 310

T5
7
Scratchpad
I-Mem
22
Energy Consumption (I-Cache)
MPEG benchmark
23
Energy Consumption (Cache Scratchpad)
8kB DM I-Cache
MPEG benchmark
24
Energy Consumption (Cache Scratchpad)
Static Allocation (Scratchpad only)
MPEG 20kB Cache 2K DM
25
Outline
  • Introduction
  • Motivation
  • Static Allocation Approach
  • Scratchpad only architecture
  • Cache Scratchpad architecture
  • Dynamic Allocation Approach (Scratchpad Overlay)
  • Scratchpad only architecture
  • Conclusion Future Work

26
Motivation (Dynamic Allocation)
SPILL_LOAD(A) for (i0ilt100i) Ai
for (j0jlt100j) Aj SPILL_STORE(
A) SPILL_LOAD(B) for (i0ilt100i)
Bi for (j0jlt100j)
Bj SPILL_STORE(B)
Main Memory
Scratchpad Memory
  • Dynamic Allocation (Scratchpad Overlay)
  • increased scratchpad utilization
  • overhead due to spill routines
  • similar to register allocation

27
Comparison against Register Allocation
Processor
Data Path
Scratchpad
Register File
RISC
  • Scarce Resource (Register File / Scratchpad)
  • Life-time of variables (temp. regs. / vars
    code)
  • Similar to RA for CISC, not for RISC processors
  • Memory objects (vars code) are of various sizes

28
Workflow (Scratchpad Overlay)
  • Memory Object Determination
  • Liveness Analysis
  • Code Generation
  • Scratchpad Overlay
  • Memory Assignment
  • Onchip Address Assignment

29
Memory Object Determination
  • Memory Objects
  • Global Variables (A)
  • Non-Scalar Local Variables
  • Traces (T1, T2, T3, T4)

B1
B2
B3
B5
MO A, T1, T2, T3, T4
B4
B6
B7
B8
30
Liveness Analysis
  • DEF-MOD-USE
  • Vars Profiling Info.
  • Traces Static Analysis

B1
DEF A
B2
MOD A
USE T3
B3
B5
LiveRange fixed-point iterative method
B4
B6
USE A
USE T3
B7
USE T4
B8
USE A
USE T4
31
Memory Assignment
  • Given MOs, LiveRanges, Scratchpad
  • Determine Memory Assignment of MOs
  • Assumption Onchip address to MOs can be assigned
  • Discussion NP-complete, reduces to register
    allocation
  • Solutions
  • Optimal ILP formulation (16 sec.)
  • Near Optimal Heuristic

Processor
Scratchpad
Main Memory
32
Memory Assignment (Solution)
  • MO A, T1, T2, T3, T3
  • SP Size A T1 T4

B1
DEF A
B2
B9
SPILL_STORE(A) SPILL_LOAD(T3)
MOD A
USE T3
B3
B5
Solution A ? SP T3 ? SP
B4
B6
USE T3
USE A
B7
B10
SPILL_LOAD(A)
B8
USE A
33
Onchip Address Assignment
Fragmentation Problem
0
  • Given Memory Assignment, Scratchpad
  • Determine Onchip Address (Offset) of MOs
  • Discussion NP-complete, reduces to
    Ship- Building problem
  • Solution
  • Optimal MIP formulation (4 hours)
  • Near Optimal First-fit, Best-fit heuristic

20
40
60
Scratchpad
34
Onchip Address Assignment (MIP)
Oij Offset of Memory Object moj at edge ei
Oik
Oij
Non-Overlap Constraints
Invariance Constraints
35
Results (Edge Detection)
1/8th Scratchpad
36
Results (SO vs. SA)
Static Allocation
21
22
43
64
Edge Detection
37
Results (SO vs. SA)
36
Static Allocation
34
38
Conclusion Future Work
  • Scratchpads are energy efficient memories.
  • Software allocation methods
  • Static Allocation Approach
  • avg. 30 reduction in energy consumption
  • SP I-Cache is better than best I-Cache
  • Dynamic Allocation Approach
  • avg. 30 reduction in energy consumption
  • Future Work
  • Multi-memory / Multi-Process.
  • Near-optimal solutions.

39
Multi-process Scratchpad Allocation Strategies
  • Static Allocation (SAMP)
  • Distributes SPM into non-overlapping regions
  • Good for large scratchpads

Static Region
  • Dynamic Allocation (DAMP)
  • Single common region for all processes
  • Good for small scratchpads

Dynamic Region
  • Hybrid Allocation (HAMP)
  • Static Dynamic regions
  • Good for all scratchpads

Scratchpad
40
Results
adpcm, g721, mpeg, edge_detection
41
Objective Function (ILP)
Objective Function Energy Savings
Energy reduction by assigning mok to Scratchpad
at edge ei
Maximize
Energy overhead of loading mok to Scratchpad at
edge ei
Energy overhead of storing mok from Scratchpad to
Main memory edge ei
42
Constraints (ILP)
  • Flow Constraints

DEF Constraint
DEF mok
USE/MOD/CONT Constraint
ei
STORE mok
CONT mok
ej
  • Size Constraints

43
Memory Assignment (ILP)
  • ILP inequations for edges, not for basic block
    nodes
  • Attributes on edges AttribSTATIC and AttribSPILL

AttribSTATIC DEF,MOD,USE,CONT
AttribSPILL LOAD,STORE
DEF gt MOD gt USE gt CONT
44
Edge Attributes (example)
  • AttribSPILL STORE, LOAD
  • STORE attribute
  • edges with DEF attribute
  • in-edges of a merge node
  • LOAD attribute
  • edges with USE, MOD, CONT attribute
  • out-edges of a diverge node

B1
DEF A
DEF STORE
DEF
MOD LOAD
MOD
B2
CONT LOAD
MOD A
B3
B5
USE A
B4
B6
CONT STORE
B7
CONT
CONT STORE
USE
USE LOAD
B8
USE A
45
Preloaded Loop Cache
  • Nice balance Cache Scratchpad
  • Little or no software support
  • Predictable I-Cache behavior
  • Energy overhead of Controller
  • Predetermined number of memory objects (2-8)
  • Strong dependence on application

Processor
Controller
Loop Cache
I-Cache
46
Motivation
  • Static Scratchpad Allocation

100, 0
10, 10
I-Cache
100, 0
B7
90, 10
Scratchpad
Total Cache Misses 200
47
Cache Aware Scratchpad Allocation (CASA)
Algorithm
  • Trace Generation
  • memory objects (MO)
  • Conflict Graph
  • models cache behavior
  • interaction of MO
  • Energy Model
  • Integer Linear Inequations

48
Trace Generation
B1 ((B2 B5 B6 B7)9 (B2 B3 B4 B7)))10 B8
T4 ((T1 T2 T1)9 (T1 T3 T1)))10 T5
180,20
200,0
20,20
49
SP (CASA) vs. LC (Ross) Energy
MPEG 20kB Cache 2K DM 4 Memory Obj.
Loop Cache (Ross)
50
SP (CASA), SP (Steinke) vs. LC (Ross)Energy
Loop Cache (Ross)
51
Characteristics of Embedded Systems
  • Dependability
  • Reliability
  • Safety
  • Security/privacy
  • Meeting real-time constraints
  • Reactive (? finite state machine)
  • Specialized user interface
  • Efficiency (weight, energy, price)
  • Analogue and digital components
  • Sensors, connected to physical environment
Write a Comment
User Comments (0)
About PowerShow.com