Title: Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy
1Computer ArchitectureChapter 7Large and Fast
Exploiting Memory Hierarchy
2Who Cares About the Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
Proc 60/yr. (2X/1.5yr)
1000
CPU
100
Processor-Memory Performance Gap(grows 50 / ye
ar)
Performance
10
DRAM 9/yr. (2X/10 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
3An Expanded View of the Memory System
Processor
Control
Memory
Memory
Memory
Datapath
Memory
Memory
Slowest
Fastest
Speed
Biggest
Smallest
Size
Lowest
Highest
Cost
4Why hierarchy works Locality
- The Principle of Locality
- Program access a relatively small portion of the
address space at any instant of time.
- Two Locality
- If an item is referenced,
- Temporal locality it will tend to be referenced
again soon
- Spatial locality nearby items will tend to be
referenced soon
5Memory Hierarchy How Does it Work?
- Temporal Locality (Locality in Time)
- ? Keep most recently accessed data items closer
to the processor
- Spatial Locality (Locality in Space)
- ? Move blocks consists of contiguous words to the
upper levels
6Memory Hierarchy Terminology
- Hit data appears in some block in the upper
level (example Block X)
- Hit Rate the fraction of memory access found in
the upper level
- Hit Time Time to access the upper level which
consists of
- RAM access time Time to determine hit/miss
- Miss data needs to be retrieve from a block in
the lower level (Block Y)
- Miss Rate 1 - (Hit Rate)
- Miss Penalty Time to replace a block in the
upper level
- Time to deliver the block the processor
- Hit Time
Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
7Memory Hierarchy of a Modern Computer System
- By taking advantage of the principle of
locality
- Present the user with as much memory as is
available in the cheapest technology.
- Provide access at the speed offered by the
fastest technology.
Processor
Control
Tertiary Storage (Disk)
Secondary Storage (Disk)
Main Memory (DRAM)
Second Level Cache (SRAM)
On-Chip Cache
Registers
Datapath
10,000,000s (10s ms)
10,000,000,000s (10s sec)
1s
Speed (ns)
10s
100s
100s
Size (bytes)
Ks
Ms
Gs
Ts
8How is the hierarchy managed?
- Registers Memory
- by compiler (programmer?)
- Cache Memory
- by the hardware
- Memory Disks
- by the hardware and operating system (virtual
memory)
- by the programmer (files)
9Memory Hierarchy Technology
- Random Access
- Random is good access time is the same for all
locations
- DRAM Dynamic Random Access Memory
- High density, low power, cheap, slow
- Dynamic need to be refreshed regularly
- SRAM Static Random Access Memory
- Low density, high power, expensive, fast
- Static content will last forever until lose
power
- Sequential Access Technology access time linear
in location (e.g.,Tape)
- The next two lectures will concentrate on random
access technology
- The Main Memory DRAMs Caches SRAMs
10Cache
- Two issues
- How do we know if a data item is in the cache?
- If it is, how do we find it?
- Our first example
- block size is one word of data
- "direct mapped"
For each item of data at the lower level,
there is exactly one location in the cache where
it might be. e.g., lots of items at the lower l
evel share locations in the upper level
11Direct Mapped Cache
- Mapping address is modulo the number of blocks
in the cache
12Direct Mapped Cache
13Direct Mapped Cache
- Taking advantage of spatial locality
14Hits vs. Misses
- Read hits
- this is what we want!
- Read misses
- stall the CPU, fetch block from memory, deliver
to cache, restart
- Write hits
- can replace data in cache and memory
(write-through)
- write the data only into the cache (write-back
the cache later)
- Write misses
- read the entire block into the cache, then write
the word
15Performance
- Simplified model execution time (execution
cycles stall cycles) ? cycle time stall
cycles of accesses ? miss ratio ? miss
penalty - Two ways of improving performance
- decreasing the miss ratio
- decreasing the miss penalty
- What happens if we increase block size?
16Impact on Performance
- Suppose a processor executes at
- Clock Rate 200 MHz (5 ns per cycle)
- CPI 1.1
- 50 arith/logic, 30 ld/st, 20 control
- Suppose that 10 of memory operations get 50
cycle miss penalty
- CPI ideal CPI average stalls per
instruction 1.1(cyc) ( 0.30 (datamops/ins)
x 0.10 (miss/datamop) x 50 (cycle/miss) )
1.1 cycle 1.5 cycle 2. 6 - 58 of the time the processor is stalled
waiting for memory!
- a 1 instruction miss rate would add an
additional 0.5 cycles to the CPI!
17Decreasing miss ratio with associativity
-
- Compared to direct mapped, give a series of
references that
- results in a lower miss ratio using a 2-way set
associative cache
- results in a higher miss ratio using a 2-way set
associative cache
- assuming we use the least recently used
replacement strategy
18An implementation
19Virtual Memory
- Main memory can act as a cache for the secondary
storage (disk)
-
- Advantages
- illusion of having more physical memory
- program relocation
- protection
20Pages virtual memory blocks
- Page faults the data is not in memory, retrieve
it from disk
- huge miss penalty, thus pages should be fairly
large (e.g., 4KB)
- reducing page faults is important (LRU is worth
the price)
- can handle the faults in software instead of
hardware
- using write-through is too expensive so we use
writeback
21Page Tables
22Page Tables
23Making Address Translation Fast
- A cache for address translations translation
lookaside buffer
24Thanks.