Mobile Memory Improving memory locality in very large reconfigurable fabrics - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Mobile Memory Improving memory locality in very large reconfigurable fabrics

Description:

Configurable Logic Block Embedded Memory. Expect entire applications to be mapped on very large reconfigurable fabric (VLRF) ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 34

Provided by: lul9

Category:

more less

Transcript and Presenter's Notes

Title: Mobile Memory Improving memory locality in very large reconfigurable fabrics

1
Mobile MemoryImproving memory locality in very
large reconfigurable fabrics
Rong Yan, Seth C. Goldstein

Carnegie Mellon University
yanrong,seth_at_cs.cmu.edu
3/22/2002

2
Outline

Motivation
Mobile Memory vs. Cache-Only Memory Architecture
Design Issues
Implementation Cost
Conclusion

3
Outline

Motivation
Mobile Memory vs. Cache-Only Memory Architecture
Design Issues
Implementation Cost
Conclusion

4
Increasing FPGA Density
http//www.xilinx.com/support/techxclusives/evolut
ion-techX20.htm
5
Increasing FPGA Density

Configurable Logic Block Embedded Memory
Expect entire applications to be mapped on very
large reconfigurable fabric (VLRF)

http//www.xilinx.com/support/techxclusives/evolut
ion-techX20.htm
6
Abstraction for VLRF
Computation Core
Embedded Memory
Very Large Reconfigurable Fabric
7
Problem in VLRF

Long idle time for some benchmarks
One of the main reasons large memory latency

Above are all the benchmarks which can run on
our simulator from MediaBench and SpecInt95

8
Possible Solutions

Possible solutions exploit the reference
locality
introduce cache
move memory data at run time
Our Choice Mobile memory
Move the memory data closer to accessor at run
time, inspired by cache-only memory
architecture(COMA)
Investigate whether it is enough or we need more
complex solution, e.g. replication

9
Outline

Motivation
Mobile Memory vs. Cache-Only Memory Architecture
Design Issues
Implementation Cost
Conclusion

10
Quick Review - COMA

Key Points
Shared memory multiprocessor, connected by
network
Main Memory acts as cache
Automatically replicate/ migrate data to the
accessing processor at run time

11
Mobile memory vs. COMA

Similar idea in different contexts
Analogy

VLRF
Multiprocessors
Code
Processor
12
Mobile memory vs. COMA
13
Limit Study

Purpose Examine if mobile memory may or may not
be beneficial in the context of VLRF
Definition in our computational Model
Unit area of 32-bit memory or 32-bit adder
(assume equal size)
Cluster A number of units grouped together
Assumption
There are huge, infinite resources available
Assume the memory data can move to any position
at run time, even overlapped with code region
Assume no additional cost for memory movement
No replacement policy
Move only one memory word at a time

14
Outline

Motivation
Mobile Memory vs. Cache-Only Memory Architecture
Design Issues
Analytical Model
Implementation Cost
Conclusion

15
Mobile Memory

Goal reduce the memory latency by exploiting
memory locality
Approach Move the memory at run time, without
replication
mobile memory policies

16
Mobile memory policies

Three main design axes
When to move
Where to move (our focus)
How much to move
Proposed policies
Greedy Policy, N-Best Policy, Centroid

17
Greedy Policy

Always move memory data to the most recent
accessor

A
M
Example
18
Greedy Policy

Always move memory data to the most recent
accessor

A
M
Example
19
Bad case for Greedy Policy

Example Ping-Pong access, two accessors
alternate accesses to a memory location

20
N-Best Policy

History of last N accessors, shared for whole
cluster
Assume the access pattern is repeating
Move to one among these accessors that minimizes
memory access cycles

Example(N 3)
21
N-Best Policy

History of last N accessors, shared for whole
cluster
Assume the access pattern is repeating
Move to one among these accessors that minimizes
memory access cycles

A
A1
M
A2
Example(N 3)
22
Centroid Policy

History of last N accessors, shared for whole
cluster
Move to the centroid of the N accessors

Example(N 3)
23
Centroid Policy

History of last N accessors, shared for whole
cluster
Move to the centroid of the N accessors

A1
A
M
A2
Example(N 3)
24
Comparison

An offline algorithm is used to estimate the
optimal performance,
Please refer to our paper for more details

25
Memory Access Cycles
Memory access cycles for different policies
normalized by baseline cycles
26
Total Cycles
Total cycles for different policies normalized
by baseline cycles
27
Outline

Motivation
Mobile Memory vs. Cache-Only Memory Architecture
Design Issues
Implementation Cost
Conclusion

28
Implementation Cost

Directories
Cost to localize the moved memory
Assume each clusters coupled with two directories

Cluster
Local DIR
Home DIR
29
Implementation Cost

Directories (Cont.)

Local Directory Misses
Cluster
Local DIR
A
M
Home Cluster
Home DIR
30
Sensitivity to directory size
The effect on memory access cycles with different
size local directory
31
Implementation Cost

Cost of making room
Assure enough room to accommodate new data
Reserve portion of memory, and thus expand graph
Increase both control transfer cycles and memory
access cycles

2
1
Dilation Factor 2
Cluster
3
4
32
Effect of implementation cost
Total execution cycles with different dilation
factor for centroid policy, normalized by
baseline cycles
33
Conclusion