Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy presentation

About This Presentation

Transcript and Presenter's Notes

Title: Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy

1
Computer ArchitectureChapter 7Large and Fast
Exploiting Memory Hierarchy
2
Who Cares About the Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
Proc 60/yr. (2X/1.5yr)
1000
CPU
100
Processor-Memory Performance Gap(grows 50 / ye
ar)
Performance
10
DRAM 9/yr. (2X/10 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
3
An Expanded View of the Memory System
Processor
Control
Memory
Memory
Memory
Datapath
Memory
Memory
Slowest
Fastest
Speed
Biggest
Smallest
Size
Lowest
Highest
Cost
4
Why hierarchy works Locality

The Principle of Locality
Program access a relatively small portion of the
address space at any instant of time.

Two Locality
If an item is referenced,
Temporal locality it will tend to be referenced
again soon
Spatial locality nearby items will tend to be
referenced soon

5
Memory Hierarchy How Does it Work?

Temporal Locality (Locality in Time)
? Keep most recently accessed data items closer
to the processor
Spatial Locality (Locality in Space)
? Move blocks consists of contiguous words to the
upper levels

6
Memory Hierarchy Terminology

Hit data appears in some block in the upper
level (example Block X)
Hit Rate the fraction of memory access found in
the upper level
Hit Time Time to access the upper level which
consists of
RAM access time Time to determine hit/miss
Miss data needs to be retrieve from a block in
the lower level (Block Y)
Miss Rate 1 - (Hit Rate)
Miss Penalty Time to replace a block in the
upper level
Time to deliver the block the processor
Hit Time

Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
7
Memory Hierarchy of a Modern Computer System

By taking advantage of the principle of
locality
Present the user with as much memory as is
available in the cheapest technology.
Provide access at the speed offered by the
fastest technology.

Processor
Control
Tertiary Storage (Disk)
Secondary Storage (Disk)
Main Memory (DRAM)
Second Level Cache (SRAM)
On-Chip Cache
Registers
Datapath
10,000,000s (10s ms)
10,000,000,000s (10s sec)
1s
Speed (ns)
10s
100s
100s
Size (bytes)
Ks
Ms
Gs
Ts
8
How is the hierarchy managed?

Registers Memory
by compiler (programmer?)
Cache Memory
by the hardware
Memory Disks
by the hardware and operating system (virtual
memory)
by the programmer (files)

9
Memory Hierarchy Technology

Random Access
Random is good access time is the same for all
locations
DRAM Dynamic Random Access Memory
High density, low power, cheap, slow
Dynamic need to be refreshed regularly
SRAM Static Random Access Memory
Low density, high power, expensive, fast
Static content will last forever until lose
power
Sequential Access Technology access time linear
in location (e.g.,Tape)
The next two lectures will concentrate on random
access technology
The Main Memory DRAMs Caches SRAMs

10
Cache

Two issues
How do we know if a data item is in the cache?
If it is, how do we find it?
Our first example
block size is one word of data
"direct mapped"

For each item of data at the lower level,
there is exactly one location in the cache where
it might be. e.g., lots of items at the lower l
evel share locations in the upper level
11
Direct Mapped Cache

Mapping address is modulo the number of blocks
in the cache

12
Direct Mapped Cache

For MIPS

13
Direct Mapped Cache

Taking advantage of spatial locality

14
Hits vs. Misses

Read hits
this is what we want!
Read misses
stall the CPU, fetch block from memory, deliver
to cache, restart
Write hits
can replace data in cache and memory
(write-through)
write the data only into the cache (write-back
the cache later)
Write misses
read the entire block into the cache, then write
the word

15
Performance

Simplified model execution time (execution
cycles stall cycles) ? cycle time stall
cycles of accesses ? miss ratio ? miss
penalty
Two ways of improving performance
decreasing the miss ratio
decreasing the miss penalty
What happens if we increase block size?

16
Impact on Performance

Suppose a processor executes at
Clock Rate 200 MHz (5 ns per cycle)
CPI 1.1
50 arith/logic, 30 ld/st, 20 control
Suppose that 10 of memory operations get 50
cycle miss penalty
CPI ideal CPI average stalls per
instruction 1.1(cyc) ( 0.30 (datamops/ins)
x 0.10 (miss/datamop) x 50 (cycle/miss) )
1.1 cycle 1.5 cycle 2. 6
58 of the time the processor is stalled
waiting for memory!
a 1 instruction miss rate would add an
additional 0.5 cycles to the CPI!

17
Decreasing miss ratio with associativity

Compared to direct mapped, give a series of
references that
results in a lower miss ratio using a 2-way set
associative cache
results in a higher miss ratio using a 2-way set
associative cache
assuming we use the least recently used
replacement strategy

18
An implementation
19
Virtual Memory

Main memory can act as a cache for the secondary
storage (disk)
Advantages
illusion of having more physical memory
program relocation
protection

20
Pages virtual memory blocks

Page faults the data is not in memory, retrieve
it from disk
huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
reducing page faults is important (LRU is worth
the price)
can handle the faults in software instead of
hardware
using write-through is too expensive so we use
writeback

21
Page Tables
22
Page Tables

23
Making Address Translation Fast

A cache for address translations translation
lookaside buffer

24
Thanks.

Write a Comment

User Comments (0)

About PowerShow.com

Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy PowerPoint PPT Presentation