EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches PowerPoint PPT Presentation

presentation player overlay
1 / 38
About This Presentation
Transcript and Presenter's Notes

Title: EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches


1
EEM 486 Computer ArchitectureLecture 6Memory
Systems and Caches
2
The Big Picture Where are We Now?
  • The Five Classic Components of a Computer

3
The Art of Memory System Design
Workload or Benchmark programs
Processor
reference stream ltop,addrgt, ltop,addrgt,ltop,addrgt,
ltop,addrgt, . . . op i-fetch, read, write
Memory
Optimize the memory system organization to
minimize the average memory access time for
typical workloads
SRAM
Cache
Main Memory
DRAM
4
Technology Trends
5
Processor-DRAM Memory Gap
6
The Goal illusion of large, fast, cheap memory
  • Facts
  • Large memories are slow but cheap (DRAM)
  • Fast memories are small yet expensive (SRAM)
  • How do we create a memory that is large, fast and
    cheap?
  • Memory hierarchy
  • Parallelism

7
The Principle of Locality
  • The principle of locality Programs access a
    relatively small
  • portion of their address space at any instant of
    time
  • Temporal Locality (Locality in Time)
  • gt If an item is referenced, it will tend to be
    referenced again soon
  • gt Keep most recently accessed data items closer
    to the processor
  • Spatial Locality (Locality in Space)
  • gt If an item is referenced, nearby items will
    tend to be referenced soon
  • gt Move blocks of contiguous words to the upper
    levels
  • Q Why does code have locality?

8
Memory Hierarchy
  • Based on the principle of locality
  • A way of providing large, cheap, and fast memory

9
Cache Memory
10
Elements of Cache Design
  • Cache size
  • Mapping function
  • Direct
  • Set Associative
  • Fully Associative
  • Replacement algorithm
  • Least recently used (LRU)
  • First in first out (FIFO)
  • Random
  • Write policy
  • Write through
  • Write back
  • Line size
  • Number of caches
  • Single or two level
  • Unified or split

11
Terminology
  • Hit data appears in some block in the upper
    level
  • Hit Rate the fraction of memory accesses found
    in the upper level
  • Hit Time time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss

12
Terminology
  • Miss data needs to be retrieved from a block
  • in the lower level
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level
  • Time to deliver
    the block the processor
  • Hit Time ltlt Miss Penalty

13
Direct Mapped Cache
Each memory location is mapped to exactly one
location in the cache Cache block
(Block address) modulo ( of cache blocks)
Low order log2 ( of cache
blocks) bits of the address
14
64 KByte Direct Mapped Cache
  • Why do we need a Tag field?
  • Why do we need a Valid bit field?
  • What kind of locality are we taking
  • care of?
  • Total number of bits in a cache
  • 2n x (valid tag block)
  • 2n of cache blocks
  • valid 1 bit
  • tag 32 (n 2) 32-bit byte address
  • 1 word blocks
  • block 32 bit

15
Reading from Cache
  • Address the cache by PC or ALU
  • If the cache signals hit, we have a read hit
  • The requested word will be on the data lines
  • Otherwise, we have a read miss
  • stall the CPU
  • fetch the block from memory and write into cache
  • restart the execution

16
Writing to Cache
  • Address the cache by PC or ALU
  • If the cache signals hit, we have a write hit
  • We have two options
  • write-through write the data into both cache and
    memory
  • write-back write the data only into cache and
  • write it into memory only
    when it is replaced
  • Otherwise, we have a write miss
  • Handle write miss as if it were a write hit

17
64 KByte Direct Mapped Cache
  • Taking advantage of spatial locality

18
Writing to Cache
  • Address the cache by PC or ALU
  • If the cache signals hit, we have a write hit
  • Write-through cache write the data into both
    cache and memory
  • Otherwise, we have a write miss
  • stall the CPU
  • fetch the block from memory and write into cache
  • restart the execution and rewrite the word

19
Associativity in Caches
  • Compute the set number
  • (Block number) modulo (Number of sets)
  • Choose one of the blocks in the computed set

20
Set Asscociative Cache
  • N-way set associative
  • N direct mapped caches operates in parallel
  • N entries for each cache index
  • N comparators and a N-to-1 mux
  • Data comes AFTER Hit/Miss decision and set
    selection

A four-way set associative cache
21
Fully Associative Cache
  • A block can be anywhere in the cache gt No Cache
    Index
  • Compare the Cache Tags of all cache entries in
    parallel
  • Practical for small number of cache blocks

22
Four Questions for Caches
  • Q1 Block placement?
  • Where can a block be placed in the upper
    level?
  • Q2 Block identification?
  • How is a block found if it is in the
    upper level?
  • Q3 Block replacement?
  • Which block should be replaced on a
    miss?
  • Q4 Write strategy?
  • What happens on a write?

23
Q1 Block Placement?
  • Block 12 to be placed in an 8 block cache

Direct mapped One place - (Block address) mod (
of cache blocks) Set associative A few places -
(Block address) mod ( of cache sets)
of cache sets of cache
blocks/degree of associativity Fully
associative Any place
24
Q2 Block Identification?
Direct mapped Indexing index, 1
comparison N-way set associative Limited search
index the set, N comparison Fully associative
Full search search all cache entries
25
Q3 Replacement Policy on a Miss?
  • Easy for Direct Mapped
  • Set Associative or Fully Associative
  • Random Randomly select one of the blocks in the
    set
  • LRU (Least Recently Used) Select the block in
    the set which has been

  • unused for the longest time
  • Associativity 2-way 4-way 8-way
  • Size LRU Random LRU
    Random LRU Random
  • 16 KB 5.2 5.7 4.7
    5.3 4.4 5.0
  • 64 KB 1.9 2.0 1.5
    1.7 1.4 1.5
  • 256 KB 1.15 1.17 1.13
    1.13 1.12 1.12

26
Q4 Write Policy?
  • Write through The information is written to both
    the block in the cache and to the block in the
    lower-level memory
  • Write back The information is written only to
    the block in the cache. The modified cache block
    is written to main memory only when it is
    replaced
  • is block clean or dirty?
  • Pros and Cons of each?
  • WT read misses cannot result in writes
  • WB no writes of repeated writes
  • WT always combined with write buffers to avoid
  • waiting for lower level memory

27
Cache Performance
  • CPU time (CPU execution clock cycles
  • Memory stall clock cycles) x Cycle time
  • Note memory hit time is included in execution
    cycles
  • Stalls due to cache misses
  • Memory stall clock cycles Read-stall clock
    cycles
  • Write-stall clock cycles
  • Read-stall clock cycles Reads x Read miss
    rate x Read miss penalty
  • Write-stall clock cycles Writes x Write miss
    rate x Write miss penalty
  • If read miss penalty write miss penalty,
  • Memory stall clock cycles Memory accesses x
    Miss rate x Miss penalty

28
Cache Performance
  • CPU time Instruction count x CPI x Cycle time
  • Inst count x Cycle time x
  • (ideal CPI Memory stalls/Inst
    Other stalls/Inst)
  • Memory Stalls/Inst
  • Instruction Miss Rate x Instruction
    Miss Penalty
  • Loads/Inst x Load Miss Rate x Load Miss
    Penalty
  • Stores/Inst x Store Miss Rate x Store
    Miss Penalty
  • Average Memory Access time (AMAT)
  • Hit Time (Miss Rate x Miss Penalty)
  • (Hit Rate x Hit Time) (Miss Rate x Miss Time)

29
Example
  • Suppose a processor executes at
  • Clock Rate 200 MHz (5 ns per cycle)
  • Base CPI 1.1
  • 50 arith/logic, 30 ld/st, 20 control
  • Suppose that 10 of memory operations get 50
    cycle miss penalty
  • Suppose that 1 of instructions get same miss
    penalty
  • CPI Base CPI average stalls per instruction
  • 1.1(cycles/ins) 0.30 (Data
    Mops/ins) x 0.10 (miss/Data Mop) x 50
    (cycle/miss) 1 (Inst Mop/ins) x
    0.01 (miss/Inst Mop) x 50 (cycle/miss)
  • (1.1 1.5 .5) cycle/ins 3.1
  • AMAT (1/1.3)x10.01x50 (0.3/1.3)x10.1x50
    2.54

30
Improving Cache Performance
CPU Time IC x CT x (ideal CPI memory
stalls) Average Memory Access time Hit Time
(Miss Rate x Miss Penalty)
(Hit Rate x Hit Time) (Miss Rate x Miss Time)
  • Options to reduce AMAT
  • 1. Reduce the miss rate,
  • 2. Reduce the miss penalty, or
  • 3. Reduce the time to hit in the cache

31
Reduce Misses Larger Block Size
Increasing block size also increases miss penalty
!
32
Reduce Misses Higher Associativity
Increasing associativity also increases both time
and hardware cost !
33
Reducing Penalty Second-Level Cache
  • L2 Equations
  • AMAT Hit TimeL1 Miss RateL1 x Miss
    PenaltyL1
  • Miss PenaltyL1 Hit TimeL2 Miss RateL2 x Miss
    PenaltyL2
  • AMAT Hit TimeL1
  • Miss RateL1 x (Hit TimeL2 Miss RateL2 x
    Miss PenaltyL2)

34
Designing the Memory System to Support Caches
  • Wide
  • CPU/Mux 1 word Mux/Cache, Bus, Memory N words
  • Interleaved
  • CPU, Cache, Bus- 1 word
  • N Memory Modules
  • Simple
  • CPU, Cache, Bus, Memory same width (32 bits)

35
Main Memory Performance
  • DRAM (Read/Write) Cycle Time gtgt
  • DRAM
    (Read/Write) Access Time
  • DRAM (Read/Write) Cycle Time
  • How frequent can you initiate an access?
  • DRAM (Read/Write) Access Time
  • How quickly will you get what you want once you
    initiate an access?
  • DRAM Bandwidth Limitation

36
Increasing Bandwidth - Interleaving
Access Pattern without Interleaving
CPU
Memory
Memory Bank 0
Access Pattern with 4-way Interleaving
Memory Bank 1
CPU
Memory Bank 2
Memory Bank 3
Access Bank 1
Access Bank 0
Access Bank 2
Access Bank 3
We can Access Bank 0 again
37
Summary 1/2
  • The Principle of Locality
  • Program likely to access a relatively small
    portion of the address space at any instant of
    time.
  • Temporal Locality Locality in Time
  • Spatial Locality Locality in Space
  • Three (1) Major Categories of Cache Misses
  • Compulsory Misses sad facts of life. Example
    cold start misses.
  • Conflict Misses increase cache size and/or
    associativity. Nightmare Scenario ping pong
    effect!
  • Capacity Misses increase cache size
  • Cache Design Space
  • total size, block size, associativity
  • replacement policy
  • write-hit policy (write-through, write-back)
  • write-miss policy

38
Summary 2/2 The Cache Design Space
Cache Size
  • Several interacting dimensions
  • cache size
  • block size
  • associativity
  • replacement policy
  • write-through vs write-back
  • write allocation
  • The optimal choice is a compromise
  • depends on access characteristics
  • workload
  • use (I-cache, D-cache, TLB)
  • depends on technology / cost
  • Simplicity often wins

Associativity
Block Size
Bad
Factor A
Factor B
Good
Less
More
Write a Comment
User Comments (0)
About PowerShow.com