Cache Memories - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Cache Memories

Description:

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops or procedures ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 27
Provided by: JaneM173
Category:
Tags: cache | memories | power

less

Transcript and Presenter's Notes

Title: Cache Memories


1
Cache Memories
  • Effectiveness of cache is based on a property of
    computer programs called locality of reference
  • Most of programs time is spent in loops or
    procedures called repeatedly. The remainder of
    the program is accessed infrequently.
  • Temporal referencing a recently executed
    instruction is likely to be called again.
  • Spatial referencing instructions in close
    proximity to a recently executed instruction are
    likely to be called again.

2
Cache Memories
  • Based on locality of reference
  • Temporal
  • Recently executed instructions are likely to
    executed again soon
  • Spatial
  • Instructions in close proximity to a recently
    executed instruction (with respect to an address)
    are also likely to be executed soon.
  • Cache Block a set of contiguous address
    locations (cache block cache line)

3
Conceptual Operation of Cache
  • Memory control circuitry is designed to take
    advantage of locality of reference.
  • Temporal
  • Whenever an information (instruction or data) is
    first needed, this item should be brought into
    the cache where it will hopefully remain until it
    is needed again.
  • Spatial
  • Instead of fetching just one item from the main
    memory to the cache, it is useful to fetch
    several items that reside at adjacent addresses
    well.
  • A set of contiguous addresses are called a block
  • cache block or cache line

4
Cache Memories
  • Using an example cache size of 128 blocks of 16
    words each. (total of 2048 2K words)
  • Main memory is addressable by a 16-bit address
    bus (64K words viewed as 4K blocks of 16 words
    each)

5
  • Write through Protocol
  • Cache and main memory are updated simultaneously
  • Write Back Protocol
  • Update on the cache and mark it with an
    associated flag bit (dirty or modified bit)
  • Main memory is updated later, when the block
    containing this marked word is to be removed from
    cache to make room for a new block.

6
Write Protocols
  • Write through
  • Simpler, but results in unnecessary Write
    operations in main memory when a cache word is
    updated several times during its cache residency.
  • write back
  • can result in unnecessary write operations
    because when a cache block is written back to the
    memory all words of the block are written back,
    even if only a single word has been changed while
    the block was in the cache.

7
Mapping Algorithms
  • Processor does not need to know explicitly that
    there is a cache.
  • Based on R/W operations, the cache control
    circuitry determines whether the requested word
    currently exists in the cache. (Hit)
  • If information is in cache for a read, main
    memory is not involved. For write operations,
    system can either use write-through protocol or
    write-back protocol

8
Mapping Functions
  • Specification of correspondence between the main
    memory blocks and those in cache.
  • Hit or Miss
  • Write through Protocol
  • Write back protocol (uses dirty bit)
  • Read miss
  • Load through or early restart on read miss
  • Write Miss

9
Read Protocols
  • Read miss
  • Addressed word is not in cache
  • Block of words containing requested word is
    written from main memory to cache.
  • After entire block is written to cache,
    particular word is forwarded to processor.
  • Or word may be sent to processor as soon as it is
    read from main memory (load-through or
    early-restart)
  • reduces processors wait time but requires more
    complex circuitry.

10
Write Miss
  • If addressed word is not in cache for a write
    operation, write miss occurs.
  • write-through
  • information is written directly into main
    memory.
  • Write-back
  • block containing word is brought into cache,
    then the desired word in the cache is overwritten
    with the new information.

11
Mapping Functions
Cache
Block 0
tag
Cache consists of 128 blocks of 16 words each,
total of 2048 (2K words)
Block 1
tag
Block 127
tag
12
Main Memory
Block 0
Block 1
Main memory hasx 64K words, viewed as 4K blocks
of 16 words each
Block 127
Block 128
Block 129
Tag
Block
Word
5
7
4
Block 255
Main memory address
Block 256
Block 257
Block 4095
13
Direct Mapping
  • Block J maps to Block J modulo 128 of the cache
  • Main memory blocks 0, 128, 256, map to block 0
    of cache
  • Blocks 1, 129, 257, map to block 1
  • Contention can arise for the position even if the
    cache is not full.
  • Contention resolved by allowing new block to
    overwrite the currently resident block

14
Placement of block in Cache
  • Direct mapping - easy to implement not very
    flexible.
  • Determined from memory address
  • Low-order 4 bits select one of 16 words in a
    block
  • When a new block enters cache, 7-bit block field
    determines cache position
  • 5-bit high order are stored in tag address. They
    identify which of the 32 blocks that are mapped
    to this position are currently resident.

Tag
Block
Word
5
7
4
Main memory address
15
Associative Mapping
  • Much more flexible higher costs (must search
    all 128 tag patterns to determine if a given
    block is in cache.
  • All tags must be searched in parallel
  • A main memory block can be placed into any cache
    block position.
  • Existing blocks only need to be ejected if cache
    is full.

Tag
Word
12
4
Main memory address
16
Set Associative Mapping
  • Blocks of cache are grouped into sets
  • A block of main memory can reside in any block of
    a specific set.
  • Reduces contention problem of direct mapped
    reduces hardware necessary for searching tag
    addresses as seen in associative mapped.
  • K-blocks per set is a k-way set associative cache

Tag
Word
Set
6
6
4
Main memory address
17
Valid Bit
  • Provided for each block
  • Indicates whether the block contains valid data
  • Not the same as dirty bit (used with the
    write-through method) which indicated whether the
    block has been modified during its cache
    residency.
  • Transfers from disk to main memory are normally
    handled with DMA transfers, bypassing cache for
    both cost and performance reasons.
  • Valid bit is set to 1 first time loaded into
    cache from main memory. Whenever a main memory
    block is updated by a source that bypasses cache,
    checks are meade to determine if block being
    loaded is in cache. If it is, valid bit is
    cleared to 0.

18
Cache Coherence
  • Also, before a DMA transfer, need to determine if
    information in main memory is up-to-date with
    information in cache. (write back protocol)
  • One solution is to always flush the cache by
    forcing the dirty data to be written back to
    memory before a DMA transfer takes place.

19
Replacement Algorithms
  • Direct mapped
  • No replacement algorithm necessary position of
    each block is predetermined.
  • When cache is full, what block(s) must be
    ejected.
  • LRU least recently used replacement
  • Overwrite the block that has gone the longest
    time without being referenced.
  • Cache controller must keep records of all
    references to all blocks.
  • Algorithm performs well for many access patterns
  • Poor performance when accesses are made to
    sequential elements of an array that is slightly
    too large to fit in the cache.

20
Caches in Commercial Processors
  • 68040 Caches
  • 2 caches (each 4K bytes) (1 instruction, 1 data)
  • Uses set associative organization (64 sets, each
    4 blocks)
  • Each block has 4 long words, each long word 4
    bytes.

21
Caches in Commercial ProcessorsPentium III (high
performance processor)
  • Requires fast access to instructions and data
  • 2 cache levels
  • Level 1
  • 16KB instruction
  • 2-way set-associative organization (instructions
    not normally modified during execution)
  • 16KB data
  • 4-way set associative organization
  • Can use either writeback or write through policy
  • Level 2
  • Much larger

22
Level 2 Cache of Pentium III
  • Can be implemented external to processor
  • Katmai
  • 512KB
  • Implemented using SRAM memory
  • 4-way set-associative organization
  • Uses either write-back or write through protocol,
    programmable on a per-block basis.
  • Cache bus is 64-bits wide

23
Level 2 Cache of Pentium III
  • Can be integrated with processor
  • Coppermine
  • 256KB
  • 8-way set-associative organization
  • Cache bus is 256-bits wide

24
Which method is better?
  • External cache
  • allows larger cache
  • Larger data path width not available because of
    pins needed and increased power consumption of
    output drivers
  • Has slower clock speeds (Katmai driven at half
    processor speed coppermine driven at full
    processor speed)
  • Internal cache
  • Reduces latency, increases bandwidth because of
    wider path
  • Processor chip becomes much larger, making it
    much more difficult to fabricate.

25
Pentium 4 Caches
  • Can have up to 3 levels of cache
  • L1
  • Data cache (8 Kbytes)
  • 4-way set-associative organization
  • Cache block 64K bytes
  • Write through policy is used on writes
  • Integer data can be accessed from data cache in 2
    clock cycles (less than 2 ns)
  • Instruction cache does not hold normal
    instructions (rather already decoded versions of
    instructions).

26
L2 of Pentium 4
  • Unified cache of 256K bytes
  • 8-way set-associative
  • Write-back policy
  • Access latency is 7 clock cycles
  • Implemented on processor chip.
  • L3 cache also available for on-chip but not for
    desktops, intended for servers.
Write a Comment
User Comments (0)
About PowerShow.com