Measuring and Improving Cache Performance - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Measuring and Improving Cache Performance

Description:

Advantage: increasing associativity usually decreases the miss rate. ... Misses and Associativity in Caches ... Size of tags vs Set Associativity ... – PowerPoint PPT presentation

Number of Views:1086
Avg rating:3.0/5.0
Slides: 30
Provided by: informat1898
Category:

less

Transcript and Presenter's Notes

Title: Measuring and Improving Cache Performance


1
Measuring and Improving Cache Performance
  • Chapter 7.3
  • Austin Orgah

2
In this section
  • We explore two different techniques for
    improving cache performance.
  • ------------------------------------------------
    -----------------------------
  • 1. Focusing on reducing the miss rate by
    reducing the probability that two different
    memory blocks will contend for the same cache
    location.
  • 2. Reducing the miss penalty by adding an
    additional level to the memory hierarchy.
    (MULTILEVEL CACHING)

3
  • CPU time can be divided into
  • 1. Clock cycles that the CPU spends executing
    the program.
  • 2. Clock cycles that the CPU spends waiting
    for the memory system.
  • Normally we assume that the costs of cache
    accesses that are hits are part of the normal CPU
    execution cycles.
  • Thus,
  • CPU time (CPU execution clock cycles Memory-
    stall clock cycles x clock cycle
    time)
  • Memory-stall c/cycles primarily come from cache
    misses.

4
  • Memory-stall c/cycles are the sum of the stall
    cycles coming from reads and those coming from
    writes.
  • Memory-stall c/cycles read-stall cycles
    write-stall cycles
  • Read-stall c/cycles are the number of read
    accesses per program by miss penalty in c/cycles
    for a read by the read miss rate.
  • Read-stall c/cycles
  • Reads x Read miss rate x Read miss penalty
    Program

5
  • Write-stalls Continued.
  • With writes, the write-through scheme has two
    sources of stalls
  • 1. write misses requires that the block is
    fetched before continuing the write.
  • 2. write buffer stalls which occur when
    the write buffer is full when a write occurs.
  • Therefore, cycles stalled for writes equal the
    sum of these two.
  • Write-Stall cycles (Writes/ Program) x Write
    miss rate x Write miss penalty
    write buffer stalls

6
  • Write-stalls Continued.
  • Its difficult to compute write stalls but
    systems with a reasonable write buffer depth say
    four or more words and memory capable of
    accepting writes at a rate that significantly
    exceeds the average write frequency in programs
    e.g. by a factor of 2 then the write buffer
    stalls are minimal and could be safely ignored.
    (Else bad design)
  • In most write-through caches, read and write miss
    penalties are the same so

7
  • Write-stalls Continued.
  • The formula
  • Memory-stall c/cycles read-stall cycles
    write-stall cycles
  • Becomes
  • Memory-stall c/cycles
  • Mem/accesses x miss rate x miss penalty
  • Program

8
  • Calculating Cache performance
  • Example an instruction cache miss rate for a
    program is 2, the data cache miss rate is 4. If
    the processor has a CPI of 2 w/out any memory
    stalls and a miss penalty of 100 cycles for all
    misses, determine how much faster a processor
    would run with a perfect cache that never missed.
    (take the frequency of all loads stores in
    SPECint2000 to be 36)

9
  • Cache performance with increased clock rate
  • Suppose we increase the performance of the
    computer in the previous example. by doubling its
    clock rate. Since main memory speed is unlikely
    to change, assume that the absolute time to
    handle a cache miss doesnt change. How much
    faster will the computer be with the faster
    clock, assuming the same miss rate as the
    previous example.?

10
  • As the examples depict,
  • Relative cache penalties increase as the
    processor becomes faster. Processor improvements
    on both the c/rate and CPI causes it to suffer a
    double hit.
  • The lower the CPI, the more pronounced the impact
    of stalls.
  • Main memory system is unlikely to improve as fast
    as processor cycle time because the DRAM isnt
    getting faster. Main memories of two processors
    having the same absolute access times, a higher
    processor clock rate leads to a larger miss
    penalty.
  • If hit time increases, the total access time for
    a word from memory will increase possibly causing
    an increase in the processor cycle time.
  • Also having a larger cache results in longer
    access time.

11
  • Reducing Cache Misses by More Flexible
    Placement of Blocks
  • Note when we place a block in a cache, a simple
    placement scheme is being utilized.
  • Direct Mapped cache - direct mapping from any
    block address in memory to a single location in
    the upper level of hierarchy.
  • Fully associative cache - structure in which a
    block can be placed in any location in the cache.
  • Finding a block results to searching of all
    entries in the cache since the block could be
    placed anywhere.
  • Comparators associated with each entry(cache)
    make the search practical. Its not cost
    effective. Its only practical to use comparators
    for caches with small number of blocks.

12
  • Set-associative cache - middle range btw direct
    and fully associative. This structure has a fixed
    number of locations (at least 2) where each block
    can be placed. Each block in memory maps to a
    unique set in the cache given by the index field,
    and a block can be placed in any element of that
    set. Conversely, a block is directly mapped into
    a set and then all the blocks in the set are
    searched for a match.
  • Recall - for direct mapped cache
  • (Block number) mod (number of cache
    blocks)
  • For set-associative
  • (Block num) mod (num of sets in the cache)

13
(No Transcript)
14
  • A direct mapped cache is a one way
    set-associative cache
  • A fully associative cache with m entries is an
    m-way set-associative cache.
  • Advantage increasing associativity usually
    decreases the miss rate.
  • Disadvantage increases the hit time.

15
(No Transcript)
16
  • Misses and Associativity in Caches
  • e.g. take 3 small caches, each consisting of 4
    one-word blocks. One cache is fully assoc, a
    second is two-way set-associative the third is
    direct mapped. Find the of misses for each
    given the following sequence of block addresses
    0,8,0,6,8.
  • Hint
  • For d/mapped cache
  • (Block num) mod (num of cache blocks)
  • For set-associative
  • (Block num) mod (num of sets in the cache)

17
Direct Mapped
18
2-way set Associative
19
2-way set Associative
  • Note this cache has 2 sets (with indices 0 and
    1) with two elements per set.
  • This cache also replaces the least recently used
    block within a set.

20
Fully Associative
21
Fully Associative
  • This cache has 4 blocks in a single set
  • Has the best performance with only three misses

22
(No Transcript)
23
Locating a Block in the Cache
  • In a set-associative cache includes an address
    tag that gives the block address. The tag of
    every cache block within the appropriate set is
    checked to see if it matches the block address
    from the processor. The index value is used to
    select the set containing the address of
    interest.
  • Sequential search as in a fully associative cache
    would make the hit time of a set-assoc cache too
    slow.
  • In a fully associative cache theres effectively
    only one set and all the blocks must be checked
    in parallel. Theres no index and hence the
    entire address, excluding the blk offset is
    compared against the tag of every block.
  • In direct mapping the entry can only be in one
    block so access is simply by indexing.

24
  • 4-way set-associtive cache implementation.

25
  • For the 4-way set-associative cache, 4
    comparators are needed together with a 4-to-1
    multiplexor to choose among the 4 members of the
    selected set.
  • The choice among, direct, set-assoc, or fully
    assoc in any memory hierarchy will depend on the
    cost of a miss vs the cost of implementing
    associativity both in time and in extra h/ware.

26
  • Size of tags vs Set Associativity
  • E.g. increasing assoc requires more comparators
    and more tag bits/cache block. Assuming a cache
    of 4K blocks, a four word block size, and a
    32-bit address, find the total of sets and the
    total of tag bits for caches that are direct
    mapped, two-way, and four-way set assoc, and
    fully associative.

27
  • Choosing Which Block to Replace
  • Direct-mapped cache - the requested block can go
    in exactly 1 position and the block occupying
    that position must be replaced.
  • Fully associative cache all blocks are
    candidates for replacement.
  • Set-associative cache must choose among the
    blocks in the selected set.
  • An associative cache has the choice of where to
    place the requested block and hence a choice of
    which block to replace.
  • Least recently used (LRU) a replacement scheme
    in which the block replaced is the one that has
    been unused for the longest time.

28
  • Reducing the Miss Penalty Using Multilevel
    Caches
  • Multilevel cache - a memory hierarchy with
    multiple levels of caches, rather than just a
    cache and main memory.
  • To close the gap btw the fast c/rates of modern
    processors and relatively long time required to
    access DRAMs many microprocessors support an
    additional level of caching.
  • In particular, a 2 level cache structure allows
    the primary cache to focus on minimizing hit time
    to yield a shorter c/cycle, while allowing the
    secondary cache to focus on miss rate to reduce
    the penalty of long memory access times.

29
  • - The miss penalty of the p/cache is
    significantly reduced by the presence of the
    s/cache allowing the p/cache to be smaller and
    have a higher miss rate.
  • -the s/cache access time becomes less important
    with the presence of the p/cache since the access
    time of the s/cache affects the miss penalty of
    the p/cache rather than directly affecting the
    p/cache hit time or processor cycle time.
  • p/cache uses smaller block size, cache size and
    reduced miss penalty.
  • s/cache uses larger total size, block size and
    less critical access time.
Write a Comment
User Comments (0)
About PowerShow.com