2K papers on caches by Y2K: Do we need more - PowerPoint PPT Presentation

About This Presentation
Title:

2K papers on caches by Y2K: Do we need more

Description:

Page coloring. Many different write policies ... For overlap: lock-up free caches. For latency reduction: prefetch ... Active Pages (Chong et al. 1998) ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 47
Provided by: cse46
Category:
Tags: y2k | caches | coloring | free | more | need | pages | papers

less

Transcript and Presenter's Notes

Title: 2K papers on caches by Y2K: Do we need more


1
2K papers on caches by Y2KDo we need more?
  • Jean-Loup Baer
  • Dept. of Computer Science Engineering
  • University of Washington

2
A little bit of history
  • The Y0K problem

3
A little bit of history
  • The Y0K problem
  • The Y1K problem

4
A little bit of history
  • The Y0K problem
  • The Y1K problem
  • Pour la version française, qui était Roi de
    France en lan 1000?

5
Outline
  • More history
  • Anthology
  • Challenges
  • Conclusion

6
More history
  • Caches introduced (commercially) more than 30
    years ago in the IBM 360/85
  • already a processor-memory gap
  • Oblivious to the ISA
  • caches were organization, not architecture
  • Sector caches
  • to minimize tag area
  • Single level off-chip

7
Terminology
  • One of the original designers (Gibson) had first
    coined the name muffer
  • When papers were submitted, the authors (Conti,
    Gibson, Liptay, Pitkovsky) used the term
    high-speed buffer
  • The EIC of IBM Systems Journal (R.L.Johnson)
    suggested a more sexy name, namely cache, after
    consulting a thesaurus

8
Today
  • Caches are ubiquitous
  • On-chip, off-chip
  • But also, disk caches, web caches, trace caches
    etc.
  • Multilevel cache hierarchy
  • With inclusion or exclusion
  • Many different organizations
  • direct-mapped, set-associative,
    skewed-associative, sector, decoupled sector etc.

9
Today (ced)
  • Cache exposed to the ISA
  • Prefetch, Fence, Purge etc.
  • Cache exposed to the compiler
  • Code and data placement
  • Cache exposed to the O.S.
  • Page coloring
  • Many different write policies
  • copy-back, write-through, fetch-on-write,
    write-around, write-allocate etc.

10
Today (ced)
  • Numerous cache assists, for example
  • For storage write-buffers, victim caches,
    temporal/spatial caches
  • For overlap lock-up free caches
  • For latency reduction prefetch
  • For better cache utilization bypass mechanisms,
    dynamic line sizes
  • etc ...

11
Caches and Parallelism
  • Cache coherence
  • Directory schemes
  • Snoopy protocols
  • Synchronization
  • Test-and-test-and-set
  • load linked -- store conditional
  • Models of memory consistency

12
When were the 2K papers being written?
  • A few facts
  • 1980 textbook
  • 1996 textbook 120 pages on caches (20)
  • Smith survey (1982)
  • About 40 references on caches
  • Uhlig and Mudge survey on trace-driven simulation
    (1997)
  • About 25 references specific to cache performance
    only
  • Many more on tools for performance etc.

13
Cache research vs. time
Largest number (14)
1st session on caches
14
Outline
  • More history
  • Anthology
  • Challenges
  • Conclusion

15
Some key papers - Cache Organization
  • Conti (Computer 1969) direct-mapped (cf. slave
    memory and tags in Wilkes 1965),
    set-associativity
  • Bell et al (IEEE TC 1974) cache design for small
    machines (advocated unified caches pipelining
    nullified that )
  • Hill (Computer 1988) the case for direct-mapped
    caches (technology has made the case obsolete)
  • Smith (Computing Surveys 1982) virtual vs.
    physical addressing (first cogent discussion)

16
Some key papers - Qualitative Properties
  • Smith (Computing Surveys 1982) Spatial and
    temporal locality
  • Hill (Ph.D 1987) The three Cs
  • Baer and Wang (ISCA 1988) Multi-level inclusion

17
Some key papers - Cache Evaluation Methodology
  • Belady (IBM Systems J. 1966) MIN and OPT
  • Mattson et al. (IBM Systems J. 1970) The stack
    property
  • Trace collection
  • Hardware Clark (ACM TOCS 1983)
  • Microcode Agarwal, Sites and Horowitz (ISCA
    1986) ATUM
  • Software M. Smith (1991) Pixie
  • Very long traces Borg, Kessler and Wall (ISCA
    1990)

18
Some key papers - Cache Performance
  • Kaplan and Winder (Computer 1973) 8 to 16K
    caches with block sizes of 64 to 128 bytes and
    set-associativity 2 or 4 will yield hit ratios of
    over 95
  • Strecker (ISCA 1976) Design of the PDP 11/70 --
    2KB, 2-way set-associative, 4 byte (2 words)
    block size
  • Smith (Computing Surveys 1982)Most comprehensive
    study of the time prefetching, replacement,
    associativity, line size etc.
  • Przybylski et al. (ISCA 1988) Comprehensive
    study 6 years later
  • Woo et al. (ISCA 1995) Splash-2

19
Some key papers - Cache Assists
  • IBM ?? Write buffers
  • Gindele (IBM TD Bull 1977) OBL prefetch (OBL
    coined by Smith?)
  • Kroft (ISCA 1981) Lock-up free caches
  • Jouppi (ISCA 1990) Victim caches stream buffers
  • Pettis and Hansen (PLDI 1990) Code placement

20
Some key papers - Cache Coherence
  • Censier and Feautrier (IEEE TC 1978) Directory
    scheme
  • Goodman (ISCA 1983) The first snoopy protocol
  • Archibald and Baer (TOCS 1986) Snoopy
    terminology
  • Dubois, Scheurich and Briggs (ISCA 1986) Memory
    consistency

21
Outline
  • More history
  • Anthology
  • Challenges
  • Conclusion

22
Caches are great. Yes but
  • Caches are poorly utilized
  • Lots of dead lines (only 20 efficiency - Burger
    et al 1995)
  • Squandering of memory bandwidth
  • The memory wall
  • At the limit, it will take longer to load a
    program on-chip than to execute it (Wulf and
    McKee 1995)

23
Solution Paradigms
  • Revolution
  • Evolution
  • Enhancements

24
Revolution
25
Evolution (processor in memoryapplication
specific)
  • IRAM (Patterson et al. 1997)
  • Vector processor data stream apps low power
  • FlexRAM (Torrellas et al. 1999)
  • Memory chip Simple multiprocessor superscalar
    banks of DRAM memory intensive apps.
  • Active Pages (Chong et al. 1998)
  • Co-processor paradigm reconfigurable logic in
    memory apps such as scatter-gather
  • FBRAM (Deering et al. 1994)
  • Graphics in memory

26
Enhancements
  • Hardware and software cache assists
  • Examples hardware tables most common case
    resolved in hardware less common in software
  • Use real estate on-chip to provide intelligence
    for managing on-chip and off-chip hierarchy
  • Examples memory controller, prefetch engines for
    L2 on processor chip

27
General Approach
  • Identify a cache parameter/enhancement whose
    tuning will lead to better performance
  • Assess potential margin of improvement
  • Propose and design an assist
  • Measure efficiency of the scheme

28
Identify a cache parameter/enhancement
  • The creative part!
  • Our current projects
  • Dynamic line sizes
  • Modified LRU policies using detection of temporal
    locality
  • Prefetching in L2

29
Assess potential margin of improvement
  • Metrics?
  • Miss rate bandwidth average memory access time
  • Weighted combination of some of the above
  • Execution time
  • Compare to optimal (off-line) algorithm
  • Easy for replacement algorithms
  • OK for some other metrics (e.g., cost of a
    cache miss depending on line size oracle for
    prefetching)
  • Hard for execution time

30
Measure efficiency of the scheme
  • Same problem metrics?
  • The further from the processor, the more
    relaxed the metric
  • For L1-L2, you need to see impact on execution
    speed
  • For L2- DRAM, you can get away with average
    memory access time

31
Anatomy of a Predictor
Exec.
Event selec.
Pred. Index.
Recovery?
Pred. Mechan.
Feedback
32
Anatomy of a Cache Predictor
Exec.
Event selec.
Pred. Index.
Pred. Mechan.
Feedback
33
Anatomy of a Cache Predictor
Load/storecache miss
Exec.
Pred. trigger.
Pred. Index.
Pred. Mechan.
Feedback
34
Anatomy of a Cache Predictor
PC EA global/local history
Exec.
Pred. trigger.
Pred. Index.
Pred. Mechan.
Feedback
35
Anatomy of a Cache Predictor
Exec.
Pred. trigger.
Pred. Index.
One level table Two level tables Associative
buffers Specialized caches
Pred. Mechan.
Feedback
36
Anatomy of a Cache Predictor
Exec.
Pred. trigger.
Pred. Index.
Pred. Mechan.
Feedback
Counters Stride predictors Finite
context Markov pred.
37
Anatomy of a Cache Predictor
Exec.
Pred. trigger.
Pred. Index.
Pred. Mechan.
Feedback
Often imprecise
38
Applying the Model
  • Modified LRU policies for L2 caches
  • Identify a cache parameter
  • L2 cache miss rate

39
Applying the Model
  • Modified LRU policies for L2 caches
  • Identify a cache parameter
  • Assess potential margin of improvement
  • OPT vs. LRU

40
Applying the Model
  • Modified LRU policies for L2 caches
  • Identify a cache parameter
  • Assess potential margin of improvement
  • Propose a design
  • On-line detection of lines exhibiting temporal
    locality

41
Propose a Design
L1 cache miss
EA PC
Exec.
Event selec.
Pred. Index.
Metadata in L2 Locality Table
Pred. Mechan.
Feedback
LRU stack locality bit
42
Applying the Model
  • Modified LRU policies for L2 caches
  • Identify a cache parameter
  • Assess potential margin of improvement
  • Propose a design
  • Measure efficiency of the scheme
  • How much of the margin of improvement was reduced
    (i.e., compare with OPT and LRU)

43
Conclusion
  • Do we need more?
  • We need substantive research on the design of
    memory hierarchies that reduce or hide access
    latencies while they deliver the memory
    bandwidths required by current and future
    applications PITAC Report Feb 1999

44
Possible important areas of research
  • L2- DRAM interface
  • Prefetching
  • Better cache utilization
  • Data placement
  • Caches for low-power design
  • Caches for real-time systems

45
With many thanks to
  • Jim Archibald
  • Wen-Hann Wang
  • Sang Lyul Min
  • Rick Zucker
  • Tien-Fu Chen
  • Craig Anderson
  • Xiaohan Qin
  • Dennis Lee
  • Peter Vanvleet
  • Wayne Wong
  • Patrick Crowley

46
  • Pour la version française, qui était Roi de
    France en lan 1000?
  • Robert II Le Pieux, fils ainé de Hughes Capet
Write a Comment
User Comments (0)
About PowerShow.com