Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: Computer Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy


1
Computer ArchitectureChapter 7Large and Fast
Exploiting Memory Hierarchy
2
Who Cares About the Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
Proc 60/yr. (2X/1.5yr)
1000
CPU
100
Processor-Memory Performance Gap(grows 50 / ye
ar)
Performance
10
DRAM 9/yr. (2X/10 yrs)
DRAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
3
An Expanded View of the Memory System
Processor
Control
Memory
Memory
Memory
Datapath
Memory
Memory
Slowest
Fastest
Speed
Biggest
Smallest
Size
Lowest
Highest
Cost
4
Why hierarchy works Locality
  • The Principle of Locality
  • Program access a relatively small portion of the
    address space at any instant of time.
  • Two Locality
  • If an item is referenced,
  • Temporal locality it will tend to be referenced
    again soon
  • Spatial locality nearby items will tend to be
    referenced soon

5
Memory Hierarchy How Does it Work?
  • Temporal Locality (Locality in Time)
  • ? Keep most recently accessed data items closer
    to the processor
  • Spatial Locality (Locality in Space)
  • ? Move blocks consists of contiguous words to the
    upper levels

6
Memory Hierarchy Terminology
  • Hit data appears in some block in the upper
    level (example Block X)
  • Hit Rate the fraction of memory access found in
    the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data needs to be retrieve from a block in
    the lower level (Block Y)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level
  • Time to deliver the block the processor
  • Hit Time

Lower Level Memory
Upper Level Memory
To Processor
Blk X
From Processor
Blk Y
7
Memory Hierarchy of a Modern Computer System
  • By taking advantage of the principle of
    locality
  • Present the user with as much memory as is
    available in the cheapest technology.
  • Provide access at the speed offered by the
    fastest technology.

Processor
Control
Tertiary Storage (Disk)
Secondary Storage (Disk)
Main Memory (DRAM)
Second Level Cache (SRAM)
On-Chip Cache
Registers
Datapath
10,000,000s (10s ms)
10,000,000,000s (10s sec)
1s
Speed (ns)
10s
100s
100s
Size (bytes)
Ks
Ms
Gs
Ts
8
How is the hierarchy managed?
  • Registers Memory
  • by compiler (programmer?)
  • Cache Memory
  • by the hardware
  • Memory Disks
  • by the hardware and operating system (virtual
    memory)
  • by the programmer (files)

9
Memory Hierarchy Technology
  • Random Access
  • Random is good access time is the same for all
    locations
  • DRAM Dynamic Random Access Memory
  • High density, low power, cheap, slow
  • Dynamic need to be refreshed regularly
  • SRAM Static Random Access Memory
  • Low density, high power, expensive, fast
  • Static content will last forever until lose
    power
  • Sequential Access Technology access time linear
    in location (e.g.,Tape)
  • The next two lectures will concentrate on random
    access technology
  • The Main Memory DRAMs Caches SRAMs

10
Cache
  • Two issues
  • How do we know if a data item is in the cache?
  • If it is, how do we find it?
  • Our first example
  • block size is one word of data
  • "direct mapped"

For each item of data at the lower level,
there is exactly one location in the cache where
it might be. e.g., lots of items at the lower l
evel share locations in the upper level
11
Direct Mapped Cache
  • Mapping address is modulo the number of blocks
    in the cache

12
Direct Mapped Cache
  • For MIPS

13
Direct Mapped Cache
  • Taking advantage of spatial locality

14
Hits vs. Misses
  • Read hits
  • this is what we want!
  • Read misses
  • stall the CPU, fetch block from memory, deliver
    to cache, restart
  • Write hits
  • can replace data in cache and memory
    (write-through)
  • write the data only into the cache (write-back
    the cache later)
  • Write misses
  • read the entire block into the cache, then write
    the word

15
Performance
  • Simplified model execution time (execution
    cycles stall cycles) ? cycle time stall
    cycles of accesses ? miss ratio ? miss
    penalty
  • Two ways of improving performance
  • decreasing the miss ratio
  • decreasing the miss penalty
  • What happens if we increase block size?

16
Impact on Performance
  • Suppose a processor executes at
  • Clock Rate 200 MHz (5 ns per cycle)
  • CPI 1.1
  • 50 arith/logic, 30 ld/st, 20 control
  • Suppose that 10 of memory operations get 50
    cycle miss penalty
  • CPI ideal CPI average stalls per
    instruction 1.1(cyc) ( 0.30 (datamops/ins)
    x 0.10 (miss/datamop) x 50 (cycle/miss) )
    1.1 cycle 1.5 cycle 2. 6
  • 58 of the time the processor is stalled
    waiting for memory!
  • a 1 instruction miss rate would add an
    additional 0.5 cycles to the CPI!

17
Decreasing miss ratio with associativity
  • Compared to direct mapped, give a series of
    references that
  • results in a lower miss ratio using a 2-way set
    associative cache
  • results in a higher miss ratio using a 2-way set
    associative cache
  • assuming we use the least recently used
    replacement strategy

18
An implementation
19
Virtual Memory
  • Main memory can act as a cache for the secondary
    storage (disk)
  • Advantages
  • illusion of having more physical memory
  • program relocation
  • protection

20
Pages virtual memory blocks
  • Page faults the data is not in memory, retrieve
    it from disk
  • huge miss penalty, thus pages should be fairly
    large (e.g., 4KB)
  • reducing page faults is important (LRU is worth
    the price)
  • can handle the faults in software instead of
    hardware
  • using write-through is too expensive so we use
    writeback

21
Page Tables
22
Page Tables

23
Making Address Translation Fast
  • A cache for address translations translation
    lookaside buffer

24
Thanks.
Write a Comment
User Comments (0)
About PowerShow.com