Exploiting Memory Hierarchy 7.1, 7.2 - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Exploiting Memory Hierarchy 7.1, 7.2

Description:

Exploiting Memory Hierarchy. 7.1, 7.2. John Ashman. Memory, The More the Merrier. This introduction will explore ways in which programmers create illusions to ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 36

Provided by: juanv

Category:

more less

Transcript and Presenter's Notes

Title: Exploiting Memory Hierarchy 7.1, 7.2

1
Exploiting Memory Hierarchy7.1, 7.2

John Ashman

2
Memory, The More the Merrier

This introduction will explore ways in which
programmers create illusions to having unlimited,
speedy memory.
The laboriously long library analogy summed up
looking at more items (books) simultaneously
saves time.
Apply this to memory.

3
Principle of Locality

Programs access a relatively small portion of
their address space at any given moment.
Temporal locality (in time)
If an item is referenced, it will probably be
referenced again soon
Spatial locality (in space)
If an item is referenced, other items with near
addresses will probably be referenced soon.

4
Why This Applies

Memory accesses come from natural program
structures.
Loops exhibit temporal locality.
Programs in general show high levels of spatial
locality (sequential access).
Also, access to an array show spatial locality
often.

5
Memory Hierarchy

This is a structure that uses multiple levels of
memory.
Each contain different speeds and sizes
naturally faster memory is more expensive they
are also smaller.

6
Memory Types

DRAM (dynamic random access memory)
This is the main memory of the system. Slower,
but less costly. Less area per bit of memory.
SRAM (static random access memory)
This is the level closer to the CPU, namely
caches.
The magnetic disk
The lowliest memory of them all.

7
Comparison
8
Memory Hierarchy
9
Accessing Data/Memory

Lets consider the higher and lower level of
memory, as we can only copy between two levels at
a given time.
The minimum unit of information that can be
present or not in the two-level hierarchy is
called a block or line i.e. one book.

10
Block Access
11
Hit Rate

This is the fraction of memory accesses found in
cache. Basically, the ratio of success of
finding requested data in the upper level of
memory.
Miss rate (1 hit rate), ratio of not found.
Hit time time taken to access memory in the
upper level.
Miss penalty time taken to replace memory in
the upper level with the correct data from the
lower, plus time to send it to the CPU.

12
Structure of Hierarchy
13
7.2 The Basics of Caches

Caches first appeared in the 1960s and were the
first level of memory hierarchy between the CPU
and main memory.
Also today referred to as any storage managing to
take advantage access locality.

14
1-Word Cache
15
Direct-Mapped Caches

This structure allots memory locations to be
mapped to exactly one memory in the cache.
Mapping
(block address) modulo (Number of cache blocks in
the cache)
This method is useful since the of entries will
be a power of two.
Thus, the cache can be accessed with low-order
bits by way of the low-order log2 bits of the
address.

16
Searching Within a Cache

How do we know if the cache contains the
requested data?
Tag a field containing the address information
identifying if the word within the cache is the
requested one.
Note it only needs to contain the upper portion
of the address that which is not used as an
index in the cache.

Upper 2 of the 5 address bits in the tag. The
lower 3 select the block.

18
Checking Validity

One problem, especially upon initial execution of
a program, is that the tags will be meaningless.
Even after a few runs, some caches may still be
empty.
Valid bit this is used to tell the CPU if a
cache entry is valid. If not, the entry will be
marked as not a possible match.

19
Accessing a Cache

Refer to pages 476 and 477.
A cache can taken use temporal locality to its
advantage more commonly run instructions can
replace less commonly run instructions in the
cache.

20
(No Transcript)
21
Sizes

2n values, means the total number of entries is a
power of two.
MIPS multiples of 4-bytes structure leaves the
least 2 significant bits within a word they are
ignored.

22
Specific Measurements

A direct-mapped cache of size 2n block with
2m-word blocks will require a tag field of size
32 (n m 2) bits.
n for the index, m for the word, and 2 for byte
part of the address
Total number of bits 2n x (block size tag size
valid field size)
2n x (m x 32 (32 n m 2) 1
2n x (m x 32 31 n m)

23
Block Size vs. Miss Rate
24
Cost Of A Miss

Simply increasing block size raises the cost of a
miss. With more blocks, time to access
increases.
The benefit of a lower miss rate eventually is
overshadowed by the miss cost when the block size
is increased without a proportional cache
increase.
Early restart return the data without finishing
the block

25
Handling Cache Misses

A cache miss is simply when the request for data
fails due to the lack of the data within the
cache.
A stall occurs at a miss that freezing all
execution until new memory is accessed. This is
opposite of an interrupt, that requires
instructions to still move through the pipeline.

26
Instructions To Be Taken

Send the original PC value (PC 4) to memory.
Instruct main memory to perform a read and wait
for the memory to complete its access.
Write the cache entry, entering data and address
info and turning valid bit on.
Restart the instruction execution at the first
step. This will refetch the instruction, where
it is found in the cache.

27
Handling Writes

Suppose we write, and change, main memory. This
may cause inconsistency between the cache and
memory.
Write-through this method allows for writing to
both the cache and main memory at the same time
for writes.
This method is inefficient, and can reduce
performance by as much as a factor of 10.

28
Improvement

Write buffer stores data while it is waiting to
be written to memory.
This can help so long as the number of reads
exceeds the number of writes.
It can even still occur with less writes, if the
writes come in bursts.
In either cases, stalls are still required.

29
Write-Back

In this alternative to write-through, data is
written only to the cache. This new block is
then written to main memory.
This is, of course, a more complicated method to
implement.

30
The Intrinsity FastMath Processor

MIPS architecture, simple cache.
12-stage pipeline.
Can request an instruction and data word every
clock cycle has both read and write caches.
Each cache is 16 KB (4K words), with 16-word
blocks.

31
Intrinsity Diagram
32
Designing Memory Systems to Support Caches

Cache misses are satisfied from main memory
(DRAMs density over access time).
Miss penalty can be reduced if bandwidth from
memory to cache is increased.
This allows larger block size and keeping the
lower miss penalty.

33
Example

1 memory bus clock cycle to send the address
15 mbccs for each DRAM access initiated
1 mbcc to send a word of data
Cache block of 4 words, one-word-wide bank
1 4(15) 4(1) 65 mbccs (4 x 4) / 65 0.25
Cache block of 4 words, two-word-wide bank
1 2(15) 2(1) 33 mbccs (4 x 4) / 33 0.48

34
Interleaving

This scheme allows sending an address to multiple
banks at a time the width of the bus or cache is
not increased.
With four banks
1 cyc to transmit address, 15 for all four banks
to access memory, and 4 cycles to send words back
1 1(15) 4(1) 20 mbccs
(4 x 4) / 20 0.80

35
Summary

Direct-mapped caches are the most simple.
Write-through allows for both main memory and the
cache to be written/updated simultaneously.
Write-back copies a block back to memory when it
is replaced. (This will be elaborated upon
later.
The use of a larger block decreases miss rate,
but can also increase miss penalty.

Write a Comment

User Comments (0)