CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104 - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104

Description:

Section 5.5.1 of Chapter 8 of textbook, which is Chapter 5 ... Nightmare for cache designer; Ping Pong Effect. CS1104-P2-9. Cache. 24. Block Size Trade-off (2) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 43
Provided by: aaro3
Category:

less

Transcript and Presenter's Notes

Title: CS1104: Computer Organisation http:www'comp'nus'edu'sgcs1104


1
CS1104 Computer Organisation http//www.comp.nus.
edu.sg/cs1104
  • School of Computing
  • National University of Singapore

2
PII Lecture 9 Cache
  • Direct Mapped Cache
  • Addressing Cache Tag, Index, Offset Fields
  • Accessing Data in Direct Mapped Cache
  • Block Size Trade-off
  • Type of Cache Misses
  • Fully Associative Cache
  • Multi-Level Cache Hierarchy

3
PII Lecture 9 Cache
  • Reading
  • Section 5.5.1 of Chapter 8 of textbook, which is
    Chapter 5 in Computer Organization by Hamacher,
    Vranesic and Zaky.

4
Recap Current Memory Hierarchy
Speed(ns) 0.5ns 2ns 6ns 100ns 10,000,000ns Size
(MB) 0.0005 0.05 1-4 100-1000 100,000 Cost
(/MB) -- 100 30 1 0.05 Technology Regs SR
AM SRAM DRAM Disk
5
Another View of Memory Hierarchy
6
Cache 1st Level of Memory Hierarchy
  • How do you know if something is in the cache?
  • How to find it if it is in the cache?
  • In a direct mapped cache, each memory address is
    associated with one possible block (also called
    line) within the cache.
  • Therefore, we only need to look in a single
    location in the cache for the data if it exists
    in the cache.

7
Simplest Cache Direct Mapped
Cache index
4-byte Direct Mapped Cache
Address
Memory
0 1 2 3 4 5 6 7 8 9 A BC D E F
0 1 2 3
  • Cache location 0 can be occupied by data from
  • Memory locations 0, 4, 8, ...
  • In general, any memory location whose 2 rightmost
    bits of the address are 0s will go into cache
    location 0.
  • Cache index last 2 bits of address (i.e.
    address AND 00.011)


8
Tag, Index, Offset Fields
  • Which memory block is in the cache? What if
    block size is gt 1 byte?
  • Divide memory address into 3 portions tag,
    index, and byte offset within block.
  • The index tells where in the cache to look, the
    offset tells which byte in block is start of the
    desired data, and the tag tells if the data in
    the cache corresponds to the memory address being
    looking for.

9
Tag, Index, Offset Fields (2)
  • Assume
  • 32-bit Memory Address
  • Cache size 2N bytes
  • Block (line) size 2M bytes
  • Then
  • The leftmost (32 N) bits are the Cache Tag.
  • The rightmost M bits are the Byte Offset.
  • Remaining bits are the Cache Index.

10
Tag, Index, Offset Fields (3)
  • Example A 16KB direct-mapped cache with blocks
    of 4 words each. Determine the size of the tag,
    index and offset field, assuming a 32-bit
    architecture.
  • Offset
  • To identify correct byte within a block.
  • A block contains 4 words. Each word contains 4
    bytes (because 32-bit architecture).
  • Therefore a block contains 16 bytes 24 bytes.
  • Hence we need 4 bits for offset field.

11
Tag, Index, Offset Fields (4)
  • Index
  • To identify correct block/line in the cache.
  • Cache contains 16KB 214 bytes.
  • A block contains 16 bytes 24 bytes.
  • Therefore cache contains 214/24 blocks 210
    blocks.
  • Hence we need 10 bits for index field.

12
Tag, Index, Offset Fields (5)
  • Tag
  • To identify one of the blocks from main memory
    that is mapped into each block in the cache.
  • Tag size address size offset size index
    size 32 4 10 bits 18 bits.
  • Verify Main memory contains 232/24 228 blocks,
    cache contains 210 blocks. Therefore, there are
    228/210 or 218 blocks in the memory that can be
    mapped to the same block in the cache.
  • Hence we need 18 bits for tag field.

13
Direct Mapped Cache
A 64-KB cache using 4-word (16-byte) blocks.
Address (showing bit position)
1 word 4 bytes
14
Direct Mapped Cache Accessing Data
  • Lets go through accessing some data in a direct
    mapped, 16KB cache
  • 16-byte blocks x 1024 cache blocks.
  • Examples 4 Addresses divided (for convenience)
    into Tag, Index, Byte Offset fields.

15
16 KB Direct Mapped Cache, 16B Blocks
  • Valid bit ? to check if block is valid.

16
Address 000000000000000000 0000000001 0100
Example 1
So we read block 1 (0000000001)
17
Address 000000000000000000 0000000001 0100
Example 1
18
Address 000000000000000000 0000000001 0100
Example 1
19
Address 000000000000000000 0000000001 1100
Example 2
20
Address 000000000000000000 0000000011 0100
Example 3
21
Address 000000000000000010 0000000001 1000
Example 4
22
Block Size Trade-off
  • In general, larger block size takes advantage of
    spatial locality, but
  • Larger block size also means larger miss penalty
    (takes longer time to fill block)
  • If block size is too big relative to cache size,
    miss rate will go up (too few cache block)
  • In general, minimize average access time
  • (Hit time x Hit rate) (Miss penalty x Miss rate)

23
Extreme Case Single Big Block!
  • Cache size 4 bytes Block size 4 bytes
  • Only one entry in the cache!
  • If item accessed, likely accessed again soon
  • But unlikely will be accessed again immediately!
  • The next access will likely to be a miss again
  • Continually loading data into the cache but
    discard data (forced out) before it is used
    again.
  • Nightmare for cache designer Ping Pong Effect.

24
Block Size Trade-off (2)
25
Type of Cache Misses
  • Compulsory Misses
  • occur when a program is first started
  • cache does not contain any of that programs data
    yet, so misses are bound to occur
  • cannot be avoided easily, so wont focus on these
    in this course

26
Type of Cache Misses (2)
  • Conflict Misses
  • miss that occurs because two distinct memory
    addresses map to the same cache location
  • two blocks (which happen to map to the same
    location) can keep overwriting each other
  • big problem in direct-mapped caches
  • how do we lessen the effect of these?

27
Dealing with Conflict Misses
  • Solution 1 Make the cache size bigger
  • fails at some point
  • Solution 2 Multiple distinct blocks can fit in
    the same Cache Index?

28
Fully Associative Cache
  • Memory address fields
  • Tag same as before
  • Offset same as before
  • Index non-existent
  • What does this mean?
  • no rows any block can go anywhere in the cache
  • must compare with all tags in entire cache to see
    if data is there

29
Fully Associative Cache (2)
  • Fully Associative Cache (e.g., 32 B block)
  • Compare tags in parallel

No Conflict Miss (since data can go anywhere)
30
Third Type of Cache Miss
  • Capacity Misses
  • miss that occurs because the cache has a limited
    size
  • miss that would not occur if we increase the size
    of the cache
  • This is the primary type of miss for Fully
    Associate caches.

31
Fully Associative Cache (3)
  • Drawbacks of Fully Associative Cache
  • need hardware comparator for every single entry
    if we have a 64KB of data in cache with 4B
    entries, we need 16K comparators infeasible
  • Set-Associative Cache combine the features of
    direct-mapped cache and fully associative cache.

32
Cache Replacement Algorithms
  • In a fully associative cache, when the cache is
    full and a new block is to be loaded into the
    cache, which block should it replace? An
    algorithm is needed.
  • LRU (Least Recently Used) algorithm replace the
    block that was accessed least recently.
  • LFU (Least Frequently Used) algorithm replace
    the block that is accessed least frequently.

33
Cache Replacement Algorithms (2)
  • Replace-Oldest-Block algorithm replace the block
    that has been in the cache longest.
  • Random algorithm replace any block in random.

34
Improving Caches
  • In general, minimize average access time
  • (Hit time x Hit rate) (Miss penalty x Miss
    rate)
  • So far, we have look at improving Hit Rate
  • Larger block size
  • Larger cache
  • Higher associativity
  • What about Miss Penalty?

35
Improving Miss Penalty
  • When caches started becoming popular, Miss
    Penalty was about 10 processor clock cycles.
  • Today 500 MHz Processor (2 nanoseconds per clock
    cycle) and 200 ns to go to DRAM ? 100 processor
    clock cycles!
  • Solution Place another cache between memory and
    the processor cache Second Level (L2) Cache.

36
Multi-Level Cache Hierarchy
  • We consider the L2 hit and miss times to include
    the cost of not finding the data in the L1 cache.
  • Similarly, the L2 cache hit rate is only for
    accesses which actually make it to the L2 cache.

37
Multi-Level Cache Hierarchy Calculations for L1
Cache
  • Access time L1 hit time x L1 hit rate L1 miss
    penalty x L1 miss rate
  • We simply calculate the L1 miss penalty as being
    the access time for the L2 cache.
  • Access time L1 hit time x L1 hit rate (L2 hit
    time x L2 hit rate L2 miss penalty x L2 miss
    rate) x L1 miss rate.

38
Multi-Level Cache Hierarchy Calculations for L1
Cache
  • Assumptions
  • L1 hit time 1 cycle, L1 hit rate 90
  • L2 hit time (also L1 miss penalty) 4 cycles, L2
    miss penalty 100 cycles, L2 hit rate 90
  • Access time L1 hit time x L1 hit rate (L2 hit
    time x L2 hit rate L2 miss penalty x (1 - L2
    hit rate) ) x L1 miss rate
  • 1 x 0.9 (4 x 0.9 100 x 0.1) x(1-0.9)
    0.9 (13.6) x 0.1 2.26 clock cycles

39
What Would It Be Without L2 Cache?
  • Assume that the L1 miss penalty would be 100
    clock cycles
  • 1 x 0.9 (100) x 0.1
  • 10.9 clock cycles vs. 2.26 with L2
  • So gain a benefit from having the second, larger
    cache before main memory.
  • Todays L1 cache size 16 KB-64 KB, L2 cache may
    be 512 KB to 4096 KB.

40
Conclusion
  • Tag, index, offset to find matching data, support
    larger blocks, reduce misses
  • Where in cache? Direct Mapped Cache
  • Conflict Misses if memory addresses compete
  • Fully Associative to let memory data be in any
    block no Conflict Misses.
  • Set Associative Compromise, simpler hardware
    than Fully Associative, fewer misses than Direct
    Mapped.
  • LRU Use history to predict replacement.
  • Improving miss penalty? Add L2 cache.

41
Virtual Memory
  • If Principle of Locality allows caches to offer
    (usually) speed of cache memory with size of DRAM
    memory,then why not use at next level to give
    speed of DRAM memory with size of Disk memory?
  • Called Virtual Memory
  • Also allows OS to share memory, protect programs
    from each other.
  • Today, more important for protection vs. just
    another level of memory hierarchy.
  • Historically, it predates caches.

42
End of file
Write a Comment
User Comments (0)
About PowerShow.com