Memory Optimizations in Embedded Systems - PowerPoint PPT Presentation

1 / 125
About This Presentation
Title:

Memory Optimizations in Embedded Systems

Description:

(C) Preeti Ranjan Panda, Embedded Systems Design Workshop, 2-4 Jan 2002, I.I.T. Delhi ... Email: panda_at_synopsys.com. Embedded Systems Design Workshop. 2-4 ... – PowerPoint PPT presentation

Number of Views:385
Avg rating:3.0/5.0
Slides: 126
Provided by: preetiran
Category:

less

Transcript and Presenter's Notes

Title: Memory Optimizations in Embedded Systems


1
Memory Optimizations in Embedded Systems
  • Preeti Ranjan Panda
  • Synopsys, Inc.
  • Email panda_at_synopsys.com
  • Embedded Systems Design Workshop
  • 2-4 January, 2002, I.I.T. Delhi

2
Why is memory important?
Memory access is just another instruction...
...so why treat memory differently?
3
Rate of Performance Improvement is different
CPU
Speed
Memory
Year
4
Impact on Processor Pipeline
Clock cycle determined by slowest pipeline stage
5
Memory Hierarchy
  • To retain smaller clock cycle, we keep small
    memory in pipeline
  • Leads to Memory Hierarchy

Main Memory
6
Impact of Memory Architecture Decisions
  • Area
  • 50-70 of ASIC/ASIP may be memory
  • Performance
  • 10-90 of system performance may be memory
    related
  • Power
  • 25-40 of system power may be memory related

7
Important Memory Decisions in Embedded Systems
  • What is a good memory architecture for an
    application?
  • Total memory requirement
  • Delay due to memory
  • Power dissipation due to memory access
  • Compiler and Synthesis tool (Exploration tools)
    should make informed decisions on
  • Registers and Register files
  • Cache parameters
  • Number and size of memory banks

8
Embedded Systems Path to Implementation
Specification/ Program
HW/Software Partitioning
HW
SW
  • Synthesis Flow
  • High Level Synthesis
  • RTL Synthesis
  • Logic Synthesis
  • Compiler Flow
  • Parsing
  • Optimizations
  • Code Generation

9
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

10
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

11
High Level Synthesis
  • Under Constraints
  • Total Delay
  • Limited Resources

12
High Level Synthesis Scheduling
13
High Level Synthesis Resource Allocation and
Binding
14
Registers in High Level Synthesis
B
C
D
A
Resource Constraint - 2 Adders

15
Register Access Model
16
Limitation of Registers
  • Complex Interconnect
  • Every register connects to every FU

R1
R2
R3
R4


-

FU1
FU2
FU3
FU4
17
Register Files
R1
R2
R3
  • Modular architecture
  • Limited connectivity
  • New optimization opportunities

18
Access Model of Register Files
Register File
19
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

20
Motivation for SRAM
  • Limitation of Register File
  • OK for scalar variables
  • NOT OK for array variables
  • Need to handle large address space
  • But retain fast access to scalar variables

21
SRAM Access
Data Bus
Data Bus
SRAM
Data Bus
22
SRAM-based Architecture
Address
Data
Similar to Processor But Predictability Necessary
23
Memory Model in HLS
Multicycle Operations
24
Behavioral Templates
1. Defines precedence constraints between
stages 2. Templates are used directly by
scheduler
Stage 1
Stage 2
Stage 3
25
Templates for Memory Access
3-Cycle MEMORY READ
26
Using Memory Templates
  • Operation can be scheduled into Cycle 1
  • No change to scheduling algorithm
  • Used in Synopsys Behavioral Compiler

27
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

28
Motivation for DRAM
  • Large arrays stored in off-chip DRAM
  • Embedded DRAM technology

29
DRAM-based Architecture
1 cycle
Address
Reg File
SRAM
DRAM access times are not constant!
Data

10 cycles
DRAM
30
Typical DRAM Organization
Row Decoder
Page
Row Addr
Addr Bus
Cell Array
Data Bus
Col Addr
Column Decoder
31
Memory Read Operation
Col Decoder
Synthesis Model
32
Reading Multiple Words
Row Addr
Sample Behavior FindAverage
Av (b0 b1 b2 b3) / 4
Col Addr
Memory Read 7 cycles Add 1 cycle Shift
1 cycle
Data
Schedule Length 7 x 4 28 cycles
Memory Read
33
Page Mode Read
Behavior FindAverage
Av (b0 b1 b2 b3) / 4
Schedule Length 14 cycles (50 faster)
Col Decoder
34
Modeling Memory Operations
Col Addr
Data
Col Addr
Data
Col Decode (Read)
Col Decode (Write)
Row Decode
Precharge
Setup
35
Memory Write Operation
Row Address
Row Decode
Column Address
Data
Column Decode
Precharge
36
Read-Modify-Write (R-M-W)
Row (A0)
Column (A0)
x
3
Schedule Length 10 cycles
Separate Read/Write Would Cost 14 cycles
37
Page Mode Write
Row Decode
Precharge
Page Mode Write
for i 0 to 7 bi 0
Example Behavior
38
Page Mode R-M-W
Row Decode
Precharge
Page Mode R-M-W
for i 0 to 7 bi bi 1
Example Behavior
39
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

40
Clustering of Scalars
a
a
b
C a b
P
b
2 Reads
Page Mode Read
Problem Cluster variables into groups of size P
to maximize page mode operations
Analogous to Cache Line Clustering Problem
41
Reordering Memory Accesses
a, b in different pages
a i a i b i
Read ai Read bi Write ai
21 cycles
16 cycles
42
Reordering Memory Accesses
bi
ci
t bi ci ci t s t di bi
s di s

di
ci

di
bi
R-M-W Paths
R-M-W Possible Only for Non-Intersecting Paths
43
Hoisting
p d 2 c p 1 if (c 0) y a0 else
y a1
44
Loop Transformations 1
45
Loop Transformations 2
Multiple Arrays Accessed in Loop
for (i 0 i Read ai1 s1 Read bi s2 Read
bi1 Write ci, t1 Write ci1,
t2 end
for (i 0 i Read bi Write ci, t1 end
Unroll
a, b, c in Different Pages
46
Loop Transformations 3
Loops with Disjoint Sub-graphs in Body
for (i 0 i 0 end
Split
No Page Mode Without Unroll
47
CDFG Transformation
  • Cluster Scalars and Assign Addresses
  • For each Basic Block
  • Perform R-M-W Reordering
  • For each Conditional
  • Perform Hoisting, if applicable
  • For each Inner Loop L
  • Perform Loop Splitting, if applicable
  • For each Loop L in new CDFG
  • Perform Loop Restructuring/Unroll

48
DRAM Optimization Experiments
Schedule Length (Normalized)
Average Optimized 40 faster than Fine Grain
49
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

50
Life-time of Variables
Life-time definition to last use of variable
51
Conflict Graph of Life-times
x
y
x
y
z
p
z
p
q
r
q
r
52
Colouring the Conflict Graph
Minimum number of registers Chromatic number of
conflict graph
53
Colouring determines Register Allocation
x
y
z
x
y
z
p
p
q
q
r
r
54
Minimizing Register Count
  • Graph Colouring is NP-complete
  • Heuristics (Growing clusters)
  • Polynomial time solution exists for straight line
    code (no branches)
  • Left-edge algorithm
  • Possible to incorporate other factors
  • Interconnect cost annotated as edge-weight

55
Register Files/Multiport Memories
  • Scalar approach infeasible for 100s of registers
  • interconnect delays dominate
  • Need to store variables in Register Files
  • Limited Bandwidth

Problem How to do Register Allocation
to Register Files efficiently?
56
Which variables go into Multiport Memory?
Problem Given a Schedule and a Multiport memory,
which variables should be stored in the memory?
State1 R1 R2 R3 State2 R2 R1 R1
Schedule
Which registers should go into Dual-port Memory?
57
ILP Formulation
Maximize x1 x2 x3 (Maximize regs
stored in Memory) Constraints x1 x2 x3
x2 accesses)
Solution x1 1, x2 1, x3 0 (Store R1 and R2
in Memory)
58
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

59
Storing Array Data
  • Usually, storage dictated by language
  • In embedded systems, we can reorder data
  • entire application should be visible
  • New challenges and optimisation opportunities
  • Data storage strategies
  • Memory architecture customisation

60
Storing Multi-dimensional Arrays Row-major
0
int X 44
Row-major Storage
Physical Memory
Logical Memory
15
61
Storing Multi-dimensional Arrays Column-major
0
int X 44
Column-major Storage
Physical Memory
Logical Memory
15
62
Storing Multi-dimensional Arrays Tile-based
0
int X 44
Tile-based Storage
Physical Memory
Logical Memory
15
63
Impact of Storage Strategy
  • Successive memory references should be local
    (independent of memory architecture)
  • Better data cache performance/energy
  • Reduced address bus switching

64
Array Access Pattern Determines Storage Row-major
for (i 0 i j) A ij 0
Logical Memory
65
Array Access Pattern Determines Storage
Column-major
for (i 0 i j) B ji Bji 1
Logical Memory
Note Effect can also be achieved by Loop
Interchange
66
Array Access Pattern Determines StorageTile-based
Simplified kernel of Successive Over-relaxation
Algorithm
ui-1j
for (i 1 i N-1 j2) ... u i-1j u ij-1
u ij u ij1
u i1j
uij
uij1
ui-1j
ui1j
67
Determining Tile Shape
Execution trace
68
Determining Tile Shape
New elements in each iteration
Tile smallest rectangle enclosing access pattern
69
Address Switching Analysis
Definitions
Minimal Transition Small Offset Fewer address
bits switching
Maximal Transition Large Offset More address
bits switching
70
Address Bus Switching Row-major
3 Maximal Transitions per Iteration
Maximal Transitions Rows accessed in
iteration
71
Address Bus SwitchingTile-based (1)
Case 1 No Maximal Transition
72
Address Bus SwitchingTile-based (2)
Each iteration spans at most 2 tiles
Case 2 No more than 2 Maximal Transitions per
iteration
Case 1 No Maximal Transition
73
Mapping Strategy
If (Outer Loop Increment Tile width) / Case
1 No Maximal Transition / OR (Tile has 2 Rows
and Columns) / Case 2 2 Maximal Transitions /
Tile-based If Tile has / Columns / Column-major / Transitions /
Row-major
Column-major
Tile-based
74
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

75
Array Layout and Data Cache
a
ai
int a 1024 int b1024 int c 1024 ... for
(i 0 i b
bi
c
Data Cache (Direct-mapped, 512 words)
ci
Memory
Problem Every access leads to cache miss
76
Data Alignment
int a 1024 int b1024 int c 1024 ... for
(i 0 i Data Cache (Direct-mapped, 512 words)
Memory
Data alignment avoids cache conflicts
77
Motivating Example
struct x int a int b p 1000 int q
1000 ... avg 0 for (i 0 i i) avg avg pi.a avg avg /
1000 ... for (i 0 i pi.b avg qi pi.b 1
78
Cache Performance Loop 1
Data Cache Direct-mapped 4 lines, 2 words/line
struct x int a int b p 1000 int q
1000 ... avg 0 for (i 0 i i) avg avg pi.a avg avg /
1000 ... for (i 0 i pi.b avg qi pi.b 1
0
Loop 1
1
2
3
Line
79
Cache Performance Loop 2
Data Cache Direct-mapped 4 lines, 2 words/line
struct x int a int b p 1000 int q
1000 ... avg 0 for (i 0 i i) avg avg pi.a avg avg /
1000 ... for (i 0 i pi.b avg qi pi.b 1
0
1
Loop 2
2
3
Line
80
Cache Performance
struct x int a int b p 1000 int q
1000 ... avg 0 for (i 0 i i) avg avg pi.a avg avg /
1000 ... for (i 0 i pi.b avg qi pi.b 1
1000 cache misses
1500 cache misses 1000 misses for pi.b 500
misses for qi
Cache miss rate 62.5
81
Transformed Data Layout
struct y int q // originally q int b //
originally x.b r 1000 int a 1000 //
originally x.a ... avg 0 for (i 0 i i) avg avg ai avg avg / 1000 ... for
(i 0 i avg ri.q ri.b 1
struct x int a int b p 1000 int q
1000
Loop 1
Loop 2
82
Cache Performance Loop 1
Data Cache Direct-mapped 4 lines, 2 words/line
struct y int q // originally q int b //
originally x.b r 1000 int a 1000 //
originally x.a ... avg 0 for (i 0 i i) avg avg ai avg avg / 1000 ... for
(i 0 i avg ri.q ri.b 1
0
Loop 1
1
2
3
No useless data in cache
Line
83
Cache Performance Loop 2
Data Cache Direct-mapped 4 lines, 2 words/line
struct y int q // originally q int b //
originally x.b r 1000 int a 1000 //
originally x.a ... avg 0 for (i 0 i i) avg avg ai avg avg / 1000 ... for
(i 0 i avg ri.q ri.b 1
0
1
2
Loop 2
3
No useless data in cache
Line
84
Cache Performance
struct y int q // originally q int b //
originally x.b r 1000 int a 1000 //
originally x.a ... avg 0 for (i 0 i i) avg avg ai avg avg / 1000 ... for
(i 0 i avg ri.q ri.b 1
500 cache misses
1000 cache misses
Cache miss rate 37.5
85
Data Layout Transformation
  • Splitting structs into individual arrays
  • Account for pointer arithmetic, dereferencing
  • Clustering of arrays

86
Representing Array Accesses
A
for i 1 to 100 // Loop L1 Read Ai Read
Bi Read Ci for i 1 to 2000 // Loop
L2 Read Bi Read Ci for i 1 to 500 //
Loop L3 Read Ai Read Di
L1
(100)
B
L2
(2000)
C
L3
(500)
D
Bipartite Graph
87
Clustering Algorithm
  • Start with empty cluster set
  • Sort all loops in decreasing order of array
    access count
  • For each loop
  • for each unassigned array in loop
  • determine cost of assigning array to each
    existing cluster (including Ø)
  • assign array to cluster with least cost

88
Cost Computation
Correlated Arrays array indices are affine and
differ by a constant
  • Penalty for assigning two correlated arrays into
    separate clusters
  • Penalty for assigning two uncorrelated arrays
    into the same cluster

89
Experiments on DSP/Image/Scientific examples
Average reduction in cycle time 44
90
Motivating Example FFT
  • double sigreal 2048
  • ...
  • le le / 2
  • for (i j i
  • . . . sigreal i
  • . . . sigreal i le
  • . . .
  • sigreal i . . .
  • sigreal i le . . .

91
Example FFT-Padded
  • double sigreal 204816
  • ...
  • le le / 2 le le le / 128
  • for (i j i
  • i i i / 128
  • . . . sigreal i
  • . . . sigreal i le
  • . . .
  • sigreal i . . .
  • sigreal i le . . .

15 Speed-up on Sparc5 due to Padding
92
Loop Blocking
Original Code
Blocked Code
  • for kk 1 to N by B
  • for jj 1 to N by B
  • for i 1 to N
  • for k kk to min (kk B - 1, N)
  • r X I,k
  • for j jj to min (jj B - 1, N)
  • ZI,j r Yk,j
  • for i 1 to N
  • for k 1 to N
  • r X I,k
  • for j 1 to N
  • ZI,j r Yk,j

B
N
93
Terminology
  • Compulsory Cache Miss - Data Never Brought into
    Cache
  • Capacity Cache Miss - Cache Too Small
  • Conflict Cache Miss - Competition for Cache Space
  • Self-Interference - Within Same Tile
  • Cross-Interference - Across Different Tiles

Self-Interference
Tile
Cross-Interference
Array
Data Cache
94
Self-Interference Conflicts
30
30
Conflict
256
30
256
256
256
256
1024-element Direct-Mapped Cache
256
95
Padding Avoids Self-Interference
30
8
1
1
5
2
PAD
3
30
2
4
6
5
PAD
3
7
256
PAD
4
8
PAD
1024-element Direct-Mapped Cache
256
96
Multiple Tiled Arrays
Tiles in Initial Arrays
New Tile
X
X
Y
R
PAD
R R PAD
97
Stability of Cache Peformance
Matrix Multiplication (Array Sizes 35-350)
TSS
ESS
LRW
DAT
DAT uses Fixed Tile Dimensions Others use Widely
Varying Sizes
98
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

99
Embedded System Synthesis
ai bi ci
Specification
Hw/Sw Partitioning
On-Chip/ Off-Chip DRAM
On-Chip Instruction Memory
Processor Core
Synthesized HW
On-Chip Data Memory
100
Scratch-Pad Memory
  • Embedded Processor-based System
  • Processor Core
  • Embedded Memory
  • Instruction and Data Cache
  • Embedded SRAM
  • Embedded DRAM
  • Problem Partition program data into on-chip and
    off-chip memory

Scratch Pad Memory
101
Memory Address Space
1 cycle
0
On-chip Memory
CPU
P-1
Off-chip Memory
P
Data Cache (on-chip)
Addressable Memory
1 cycle
10-20 cycles
N-1
102
Architecture
CPU Core
Scratch-Pad Memory
Data Cache
Address
Data
Hit
External Memory Interface
Hit
DRAM
103
Illustrative Example - 1
Procedure Histogram_Evaluation char BrightnessLeve
l 512 512 int Hist 256 for (i 0 i 512 i ) for (j 0 j each pixel (i,j) in image / level
BrightnessLevel ij Hist level Hist
level 1
Regular Access Assign to Cache
Irregular Access Assign to SRAM
104
Illustrative Example - 2
Iteration (0,0)
mask (on-chip SRAM)
int source 128 128, dest 128 128, mask
4 4 Procedure CONV loop i loop j
dest i j Mult (source i j,
mask)
Iteration (0,1)
source (off-chip, accessed thru cache)
105
Data Partitioning
  • Pre-Partitioning
  • Scalar Variables and Constants to SRAM
  • Large Arrays to DRAM
  • SRAM/Cache Partitioning
  • Identify critical Data for Mapping to SRAM
  • Criteria
  • Life-times of arrays
  • Access frequency of arrays (IF)
  • Loop conflicts (LCF)

106
Data Partitioning Experiments
Average 30 Improvement in Memory Cycles
107
Reuse Analysis
Which Memory References Reuse a Cache Line?
Group Spatial
Self Spatial
loop i loop j a i j a i j1 a
i j-1 a i-1 a i1 j b i c
j i
Self Temporal
No Reuse
  • Divide Memory References into Reuse Equivalence
    Classes
  • Volume Analysis

108
Architecture Exploration
Algorithm MemExplore for on-chip Memory Size T
(in powers of 2) for cache size C (in powers of
2, (S) for line size L (in powers of 2, (C,L) that Maximizes Performance
109
Variation with Cache Line Size
Example Histogram
Cache Size 1 KB
110
Variation with Cache/SP-RAM Ratio
Example Histogram
Effect of different ratios of SRAM and Cache.
Total On-chip Memory Size 2 KB
111
Variation with Total On-chip Memory Space
Example Histogram
Variation of Memory Performance with Total
On-chip Memory
112
Outline
  • Memory Modeling in High Level Synthesis
  • Registers and Register Files
  • Modeling SRAM Access
  • Modeling DRAM Access
  • Optimizations
  • Data Placement
  • Register Allocation
  • Storing Data in SRAM
  • Data Cache
  • Memory Customisation and Exploration
  • Scratch Pad Memory
  • Memory Banking

113
Memory Banking Motivation
For I 0 to 1000 AI AI BI C2I
A
Row Address Addr 158
Page
B
C
Col Address Addr 70
Page Buffer
To Datapath
Address
114
Memory Banking Motivation
For I 0 to 1000 AI AI BI C2I
AI
Row
BI
Row
Row
C2I
Col
Col
Col
Page Buffer
Page Buffer
Page Buffer
Addr
To Datapath
Addr
To Datapath
Addr
To Datapath
115
Exploration Algorithm
  • Algorithm Partition (G)
  • for k 1 to M / do k-way partitioning /
  • 1. Generate initial partition P
  • 2. Generate n-move sequence into any of k
    partitions
  • 3. Retain partition Pk with minimum Delay (Pk)
  • 4. Plot (k, Area (Pk)) and (k, Delay (Pk)) on
    exploration graph
  • end Algorithm

116
Initial Partition
1 bank
2 banks
3 banks
4 banks
5 banks
A
B
C
D
E
Cut lines assign clusters to banks
117
Cost Function Computation
  • Area (Pk)
  • Total Memory Area
  • Delay (Pk)
  • Estimate Schedule Length (list scheduling)
  • Memory access delays unknown at this stage
  • Page hits/misses unknown
  • Determine ordering that minimises page misses for
    given partition

118
Memory Dependence Graph
AI1
EI
AI
1
AI
-

2
CI
x



HI
EI
DI
-
GI
DFG
119
Partitioned MDG (PMDG)
MDG is basis for bank partitioning exploration
A
A
E
D
C
E
D
H
Bank 1
A
D
G
G
C
H
Bank 2
E
Bank 3
MDG
Analogous to DFG MDG
120
Scheduling the PMDG
Need an ordering of the PMDG that minimises page
misses
A
A
D
D
G
  • Topological Sort with minimum page misses
  • Greedy Heuristic

121
List Scheduling Heuristic
At each step, select access that leads to longest
sequence of page mode accesses
A
D
A
A
A
  • Propagate schedulable condition
  • Select largest set of page mode accesses

D
G
122
ExperimentsExploration Results
IDCT
SOR
( banks)
( banks)
EQN_OF_STATE
2D-HYDRO
123
Summary
  • Memory in High Level Synthesis
  • Registers, Register Files, and SRAM memory
    modeled adequately in Synthesis tools today
  • More complex memory (DRAM)
  • New modeling methodologies
  • New Optimizations
  • Memory in Embedded Processors
  • Optimizations tailored to Data Cache
  • Data Layout
  • Memory architecture customized to a given
    application
  • Scratch Pad Memory
  • Memory Banking

124
The End
Thank You For Attending!
125
References
Books
  • P. Panda, N. Dutt, A. Nicolau - Memory issues in
    embedded systems-on-chip optimization and
    exploration, Kluwer Academic Publishers, 1999
  • F. Catthoor, S. Wuytack, E. De Greef, F. Balasa,
    L. Nachtergaele, A. Vandecappelle Custom memory
    management methodology, Kluwer Academic
    Publishers, 1998

Survey Paper
  • P. Panda, F. Catthoor, N. Dutt, K. Danckaert, E.
    Brockmeyer, C. Kulkarni, A. Vandecappelle Data
    and Memory Optimization Techniques for Embedded
    Systems, ACM Transactions on Design Automation of
    Embedded Systems, April 2001
Write a Comment
User Comments (0)
About PowerShow.com