CENG 450 Computer Systems and Architecture Lecture 16 - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

CENG 450 Computer Systems and Architecture Lecture 16

Description:

Worst nightmare of a cache designer: Ping Pong Effect. Conflict Misses are misses caused by: ... Nightmare Scenario: ping pong effect! Capacity Misses: increase ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 20
Provided by: shin161
Category:

less

Transcript and Presenter's Notes

Title: CENG 450 Computer Systems and Architecture Lecture 16


1
CENG 450Computer Systems and
ArchitectureLecture 16
  • Amirali Baniasadi
  • amirali_at_ece.uvic.ca

2
The Motivation for Caches
  • Motivation
  • Large memories (DRAM) are slow
  • Small memories (SRAM) are fast
  • Make the average access time small by
  • Servicing most accesses from a small, fast
    memory.
  • Reduce the bandwidth required of the large memory

3
The Principle of Locality
  • The Principle of Locality
  • Program access a relatively small portion of the
    address space at any instant of time.
  • Example 90 of time in 10 of the code
  • Two Different Types of Locality
  • Temporal Locality (Locality in Time) If an item
    is referenced, it will tend to be referenced
    again soon.
  • Spatial Locality (Locality in Space) If an item
    is referenced, items whose addresses are close by
    tend to be referenced soon.

4
The Simplest Cache Direct Mapped Cache
Memory
Memory Address
0
4 Byte Direct Mapped Cache
1
Cache Index
2
0
3
1
4
2
5
3
6
7
  • Cache index(Block Address) MOD ( of blocks in
    cache)
  • Location 0 can be occupied by data from
  • Memory location 0, 4, 8, ... etc.
  • In general any memory locationwhose 2 LSBs of
    the address are 0s
  • Addresslt10gt gt cache index
  • Which one should we place in the cache?
  • How can we tell which one is in the cache?

8
9
A
B
C
D
E
F
5
Cache Tag and Cache Index
  • Assume a 32-bit memory (byte ) address
  • A 2N bytes direct mapped cache
  • Cache Index The lower N bits of the memory
    address
  • Cache Tag The upper (32 - N) bits of the memory
    address

0
N
31
Cache Index
Cache Tag
Example 0x50
Ex 0x03
Stored as part of the cache state
N
2
Bytes
Direct Mapped Cache
Valid Bit
0
Byte 0
1
Byte 1
2
Byte 2
3
Byte 3
0x50



Byte 2N -1
6
Example 1 KB Direct Mapped Cache with 32 B Blocks
  • For a 2 N byte cache
  • The uppermost (32 - N) bits are always the Cache
    Tag
  • The lowest M bits are the Byte Select (Block Size
    2 M)

0
4
31
9
Cache Index
Cache Tag
Example 0x50
Byte Select
Ex 0x01
Ex 0x00
Stored as part of the cache state
Cache Data
Valid Bit
Cache Tag

0
Byte 0
Byte 1
Byte 31

1
0x50
Byte 32
Byte 33
Byte 63
2
3




31
Byte 992
Byte 1023
7
Block Size Tradeoff
  • In general, larger block size take advantage of
    spatial locality BUT
  • Larger block size means larger miss penalty
  • Takes longer time to fill up the block
  • If block size is too big relative to cache size,
    miss rate will go up
  • Average Access Time
  • Hit Time x (1 - Miss Rate) Miss Penalty x
    Miss Rate

Average Access Time
Miss Rate
Miss Penalty
Exploits Spatial Locality
Increased Miss Penalty Miss Rate
Fewer blocks compromises temporal locality
Block Size
Block Size
Block Size
8
A Summary on Sources of Cache Misses
  • Compulsory (cold start, first reference) first
    access to a block
  • Cold fact of life not a whole lot you can do
    about it
  • Conflict (collision)
  • Multiple memory locations mappedto the same
    cache location
  • Solution 1 increase cache size
  • Solution 2 increase associativity
  • Capacity
  • Cache cannot contain all blocks access by the
    program
  • Solution increase cache size
  • Invalidation other process (e.g., I/O) updates
    memory

9
Conflict Misses
  • True If an item is accessed, likely that it
    will be accessed again soon
  • But it is unlikely that it will be accessed again
    immediately!!!
  • The next access will likely to be a miss again
  • Continually loading data into the cache
    butdiscard (force out) them before they are used
    again
  • Worst nightmare of a cache designer Ping Pong
    Effect
  • Conflict Misses are misses caused by
  • Different memory locations mapped to the same
    cache index
  • Solution 1 make the cache size bigger
  • Solution 2 Multiple entries for the same Cache
    Index

10
A Two-way Set Associative Cache
  • N-way set associative N entries for each Cache
    Index
  • N direct mapped caches operates in parallel
  • Example Two-way set associative cache
  • Cache Index selects a set from the cache
  • The two tags in the set are compared in parallel
  • Data is selected based on the tag result

Cache Index
Cache Data
Cache Tag
Valid
Cache Block 0



Adr Tag
Compare
0
1
Mux
Sel1
Sel0
OR
Cache Block
Hit
11
Disadvantage of Set Associative Cache
  • N-way Set Associative Cache versus Direct Mapped
    Cache
  • N comparators vs. 1
  • Extra MUX delay for the data
  • Data comes AFTER Hit/Miss
  • In a direct mapped cache, Cache Block is
    available BEFORE Hit/Miss
  • Possible to assume a hit and continue. Recover
    later if miss.

12
And yet Another Extreme Example Fully Associative
  • Fully Associative Cache -- push the set
    associative idea to its limit!
  • Forget about the Cache Index
  • Compare the Cache Tags of all cache entries in
    parallel
  • Example Block Size 32 B blocks, we need N
    27-bit comparators
  • By definition Conflict Miss 0 for a fully
    associative cache

0
4
31
Cache Tag (27 bits long)
Byte Select
Ex 0x01
Cache Data
Valid Bit
Cache Tag

Byte 0
Byte 1
Byte 31
X

Byte 32
Byte 33
Byte 63
X
X
X



X
13
Cache performance
14
Impact on Performance
  • Suppose a processor executes at
  • Clock Rate 1 GHz (1 ns per cycle), Ideal (no
    misses) CPI 1.1
  • 50 arith/logic, 30 ld/st, 20 control
  • Suppose that 10 of memory operations get 100
    cycle miss penalty
  • Suppose that 1 of instructions get same miss
    penalty

78 of the time the proc is stalled waiting for
memory!
15
Example Harvard Architecture
  • Unified vs. Separate ID (Harvard)
  • 16KB ID Inst miss rate0.64, Data miss
    rate6.47
  • 32KB unified Aggregate miss rate1.99
  • Which is better (ignore L2 cache)?
  • Assume 33 data ops ? 75 accesses from
    instructions (1.0/1.33)
  • hit time1, miss time50
  • Note that data hit has 1 stall for unified cache
    (only one port)
  • AMATHarvard75x(10.64x50)25x(16.47x50)
    2.05
  • AMATUnified75x(11.99x50)25x(111.99x50)
    2.24

16
IBM POWER4 Memory Hierarchy
4 cycles to load to a floating point
register 128-byte blocks divided into 32-byte
sectors
L1(Instr.) 64 KB Direct Mapped
L1(Data) 32 KB 2-way, FIFO
write allocate 14 cycles to load to a
floating point register 128-byte blocks
L2(Instr. Data) 1440 KB, 3-way,
pseudo-LRU (shared by two processors)
L3(Instr. Data) 128 MB 8-way (shared by two
processors)
? 340 cycles 512-byte blocks divided into
128-byte sectors
17
Intel Itanium Processor
L1(Data) 16 KB, 4-way dual-ported write through
L1(Instr.) 16 KB 4-way
32-byte blocks 2 cycles
64-byte blocks write allocate 12 cycles
L2 (Instr. Data) 96 KB, 6-way
4 MB (on package, off chip)
64-byte blocks 128 bits bus at 800 MHz (12.8
GB/s) 20 cycles
18
3rd Generation Itanium
  • 1.5 GHz
  • 410 million transistors
  • 6MB 24-way set associative L3 cache
  • 6-level copper interconnect, 0.13 micron
  • 130W (i.e. lasts 17s on an AA NiCd)

19
Summary
  • The Principle of Locality
  • Program access a relatively small portion of the
    address space at any instant of time.
  • Temporal Locality Locality in Time
  • Spatial Locality Locality in Space
  • Three Major Categories of Cache Misses
  • Compulsory Misses sad facts of life. Example
    cold start misses.
  • Conflict Misses increase cache size and/or
    associativity. Nightmare Scenario ping pong
    effect!
  • Capacity Misses increase cache size
  • Write Policy
  • Write Through need a write buffer. Nightmare
    WB saturation
  • Write Back control can be complex
  • Cache Performance
Write a Comment
User Comments (0)
About PowerShow.com