Data Cache Prefetching using a Global History Buffer - PowerPoint PPT Presentation

About This Presentation
Title:

Data Cache Prefetching using a Global History Buffer

Description:

Separate IT and GHB: Fixed table size: ... Pointer: chain other GHB entries into address list (access info for the same address) ... – PowerPoint PPT presentation

Number of Views:325
Avg rating:3.0/5.0
Slides: 23
Provided by: chuck100
Category:

less

Transcript and Presenter's Notes

Title: Data Cache Prefetching using a Global History Buffer


1
Data Cache Prefetching using a Global History
Buffer
Written by - Kyle Nesbit - James
Smith Department of Electrical and Computer
Engineering University of Wisconsin, Madison
  • Presented by
  • Chuck (Chengyan) Zhao
  • Mar 30, 2004

2
  • Introduction
  • Cache-hierarchy
  • CPU registers, very small number, fastest
  • L1 Cache usually 8k, larger than CPU registers,
    slower than CPU
  • L2 Cache usually 256/512k, larger than L1,
    slower than L1
  • L3 Cache (optional) usually 1M/2M, larger than
    L2, slower than L2 Cache
  • Main memory
  • Usually 256M/512 M or more,
  • larger than L3, slowest

CPU-Memory Cache Hierarchy
3
  • Each level on cache hierarchy
  • latency is around 10 times
  • Problem with the cache hierarchy architecture
  • limited capacity (size)
  • Limited associativity
  • Solution for the problems using effective
    prefetching
  • 2. Pre-fetching technique
  • Sequential prefetching
  • What access cache lines that immediately
    following the current cache line (for the cache
    miss)
  • Algorithm
  • early pre-fetch after each cache miss
  • mature Issue prefetch after a sequential access
    pattern is built
  • Degree of prefetching
  • Maximum number of cache lines prefetched in
    response to a single prefetch request
  • in order to completely hide the latency of a
    miss to main memory

4
  • 2. Table based prefetching
  • What
  • record history information related to data access
  • Operate
  • Table is accessed with a key (Program Counter of
    the load instruction, or the missed address)
  • Use history information to predict the
    prefetching behavior
  • Evaluate
  • Pro simple
  • Con inefficient
  • Fixed amount of history for each prefetching key
  • Stale happens data in entry sit for a very long
    time. When using this information, the memory
    access behavior has changed
  • 3. Global History Buffer (GHB) prefetching
  • Organized Fig 1.b
  • Features
  • FIFO Table cache misses enter from bottom, goes
    up to top
  • Separate IT and GHB
  • Fixed table size
  • Circular table overwrite existing items, when
    overflow happens

5
(No Transcript)
6
  • Benefit of GHB
  • reduce stale data
  • more accurate construction of history access
    patterns
  • more effective prefetching algorithm
  • 4. Table-based prefetching techniques
  • Stride Prefetching Fig. 2.
  • the following addresses are fetched
  • a s, where a target address
  • a 2s, s detected stride
  • d degree of
    prefetching
  • a d s, note in this case, stride s is a const
  • Correlation Prefetching (Markov Prefetching)
    Fig. 3. explain
  • Use a history table to record cache-misses
  • missing address index the correlation table
  • Each entry
  • List of addresses that have immediately followed
    the current miss address
  • Most recent miss first
  • Markov graph

7
(No Transcript)
8
  • 3. Distance Prefetching Fig 4. explain
  • Generalized Correlation Prefetching
  • Use distance (between 2 global miss address) to
    index correlation table
  • Problems with table-based prefetching
  • Table data becomes stale not used, not refreshed
    neither
  • Table entry conflicts multiple access keys map
    to the same table entry
  • Fixed small history data per entry Fig 3.
    2-piece of history per data item
  • 5. Global History Buffer (GHB) base prefetching
  • Table structure Fig. 1 (b)
  • IT Index Table
  • accessed by key as traditional table-based
    prefetching
  • Key Program Counter, cache missing address or a
    combination of them
  • Have pointers to GHB

9
  • GHB (cont)
  • GHB n-entry FIFO circular table
  • holding n most recent misses
  • each entry
  • global miss address
  • Pointer chain other GHB entries into address
    list (access info for the same address)
  • Notions used later
  • Prefetching Method X / Y
  • X
  • PC Program Counter based indexing
  • G global address
  • Y
  • CS Const Striding
  • DC Delta Correlation
  • AC Address Correlation
  • Different combination of X and Y creates
    different prefetching methods

10
  • 2. GHB for Correlation Prefetching
  • Fig. 5.
  • Explain breadth first, shaded area
  • 3. GHB for Stride Prefetching
  • PC / CS
  • Use again Fig. 5. to explain (depth 1st)
  • 6. Global History Buffer (GHB) error handling
  • error can occur
  • how
  • when GHB array is over-written
  • Pointers become obsolete, as of information
    re-written
  • Solution
  • Use low-order extra bits of a pointer to
    reference entries
  • Compare
  • (head pointer ref pointer) gt table size, then,
    it is an error

11
(No Transcript)
12
  • 7. GHB evaluation
  • GHB benefits
  • FIFO
  • first in, first out buffer
  • naturally gives table space to the most recent
    history
  • Separation of IT GHB buffer
  • IT Indexing Table
  • Hold working set of prefetching list
  • Relatively small
  • GHB
  • Larger than IT
  • Sized to hold missed address stream
  • Benefit of this design
  • Enable more sophisticated prefetching methods
    (show later)
  • GHB drawback
  • Multiple access on collecting prefetching info
    (internal linked-list traversal)

13
  • 7. GHB evaluation (cont)
  • 3. Types of GHB prefetching
  • Width prefetching
  • prefetch only the immediate adjacent nodes
  • E.g. in Fig. 5
  • Depth prefetching
  • begin with current miss
  • Follow with a sequence of most likely node on its
    path
  • prefetch at each node
  • E.g. in Fig 5.
  • Hybrid
  • Mix of the width prefetch and depth prefetch
  • 4. New prefetching technique Global / Delta
    Correlation
  • what non-const step prefetching
  • Example Table 1
  • Pattern 0, 1, 1, 62, 1, 1, , access 1st 3
    elements of a 2-dimensional array
  • Const stride prefetching down to incorrect
    addresses 1, 1, 1, 1,

14
Non-const address stream
15
  • 4. New prefetching technique (cont)
  • Using GHB
  • Sequence of the loads missing addresses
  • Detecting variable stride steps
  • Use delta pairs (Table-1) to predict
  • 8. Simulation and testing
  • Simulator its configuration
  • Config table 4
  • Simple Scalar 3.0
  • Other details
  • Each access to IT 1 cycle
  • Each access to GHB 1 cycle
  • Degree of prefetching 4
  • Benchmark under ideal L2 cache table 2 table 3
  • GHBs train set
  • use some benchmarks to decide the optimal table
    size for
  • IT
  • GHB

16
  • 4. GHB Testing Global / Delta Correlation

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
  • 5. GHB Testing PC / Local Prefetching
  • GHB PC / CS, GHB PC / DC with table-based PC /CS

21
(No Transcript)
22
Conclusion
  • Global History Buffer based prefetching
  • 2-level table hierarchy
  • IT Index table
  • GHB Global History Buffer
  • Performance improvements
  • Generally as well as or better than on 14 out of
    15 tested benchmarks
  • Increase IPC
  • Reduce memory traffic
  • Advantage
  • Reduce stale data
  • Increase prediction accuracy
  • Reduce memory traffic
  • Enable further predicting opportunity variable
    step striding
  • Disadvantage
  • Multiple table access on building history
    information
  • but, extra delay is relatively small and
    tolerable
Write a Comment
User Comments (0)
About PowerShow.com