Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture

Description:

Dhruba Chandra Fei Guo Seongbeom Kim. Yan Solihin. Electrical and Computer ... Chandra, Guo, Kim, Solihin - Contention Model. Impact of Cache Space Contention ... – PowerPoint PPT presentation

Number of Views:197
Avg rating:3.0/5.0
Slides: 27
Provided by: hpca
Category:

less

Transcript and Presenter's Notes

Title: Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture


1
Predicting Inter-Thread Cache Contention on a
Chip Multi-Processor Architecture
  • Dhruba Chandra Fei Guo Seongbeom Kim
  • Yan Solihin
  • Electrical and Computer Engineering
  • North Carolina State University
  • HPCA-2005

2
Cache Sharing in CMP
Processor Core 1
Processor Core 2
L1
L1
L2

3
Impact of Cache Space Contention
  • Application-specific (what)
  • Coschedule-specific (when)
  • Significant Up to 4X cache misses, 65 IPC
    reduction

4
Related Work
  • Uniprocessor miss estimation
  • Cascaval et al., LCPC 1999 Chatterjee et al.,
    PLDI 2001
  • Fraguela et al., PACT 1999 Ghosh et al., TPLS
    1999
  • J. Lee at al., HPCA 2001 Vera and Xue, HPCA
    2002
  • Wassermann et al., SC 1997
  • Context switch impact on time-shared processor
  • Agarwal, ACM Trans. On Computer Systems, 1989
  • Suh et al., ICS 2001
  • No model for cache sharing impact
  • Relatively new phenomenon SMT, CMP
  • Many possible access interleaving scenarios

5
Contributions
  • Inter-Thread cache contention models
  • 2 Heuristics models (refer to the paper)
  • 1 Analytical model
  • Input circular sequence profiling for each
    thread
  • Output Predicted num cache misses per thread in
    a co-schedule
  • Validation
  • Against a detailed CMP simulator
  • 3.9 average error for the analytical model
  • Insight
  • Temporal reuse patterns ? impact of cache sharing

6
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

7
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

8
Assumptions
  • One circular sequence profile per thread
  • Average profile yields high prediction accuracy
  • Phase-specific profile may improve accuracy
  • LRU Replacement Algorithm
  • Others are usu. LRU approximations
  • Threads do not share data
  • Mostly true for serial apps
  • Parallel apps threads likely to be impacted
    uniformly

9
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability (Prob) Model
  • Validation
  • Case Study
  • Conclusions

10
Definitions
  • seqX(dX,nX) sequence of nX accesses to dX
    distinct addresses by a thread X to the same
    cache set
  • cseqX(dX,nX) (circular sequence) a sequence in
    which the first and the last accesses are to the
    same address

A B C D A E E B
11
Circular Sequence Properties
  • Thread X runs alone in the system
  • Given a circular sequence cseqX(dX,nX), the last
    access is a cache miss iff dX Assoc
  • Thread X shares the cache with thread Y
  • During cseqX(dX,nX)s lifetime if there is a
    sequence of intervening accesses seqY(dY,nY), the
    last access of thread X is a miss iff dXdY
    Assoc

12
Example
  • Assume a 4-way associative cache

No cache sharing A is a cache hit Cache sharing
is A a cache hit or miss?
13
Example
  • Assume a 4-way associative cache

A U B V V W A
A U B V V A W
seqY(3,4) intervening in cseqXs lifetime
seqY(2,3) intervening in cseqXs lifetime
14
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

15
Inductive Probability Model
  • For each cseqX(dX,nX) of thread X
  • Compute Pmiss(cseqX) the probability of the last
    access is a miss
  • Steps
  • Compute E(nY) expected number of intervening
    accesses from thread Y during cseqXs lifetime
  • For each possible dY, compute P(seq(dY, E(nY))
    probability of occurrence of seq(dY, E(nY)),
  • If dY dX Assoc, add to Pmiss(cseqX)
  • Misses old_misses ? Pmiss(cseqX) x F(cseqX)

?
16
Computing P(seq(dY, E(nY)))
  • Basic Idea
  • P(seq(d,n)) A P(seq(d-1,n)) B
    P(seq(d-1,n-1))
  • Where A and B are transition probabilities
  • Detailed steps in paper

seq(d,n)
1 access to a distinct address
1 access to a non-distinct address
seq(d-1,n-1)
seq(d,n-1)
17
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

18
Validation
  • SESC simulator
  • Detailed CMP memory hierarchy
  • 14 co-schedules of benchmarks (Spec2K and Olden)
  • Co-schedule terminated when an app completes

19
Validation
Error (PM-AM)/AM
  • Larger error happens when miss increase is very
    large
  • Overall, the model is accurate

20
Other Observations
  • Based on how vulnerable to cache sharing impact
  • Highly vulnerable (mcf, gzip)
  • Not vulnerable (art, apsi, swim)
  • Somewhat / sometimes vulnerable (applu, equake,
    perlbmk, mst)
  • Prediction error
  • Very small, except for highly vulnerable apps
  • 3.9 (average), 25 (maximum)
  • Also small for different cache associativities
    and sizes

21
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

22
Case Study
  • Profile approx. by geometric progression
  • F(cseq(1,)) F(cseq(2,)) F(cseq(3,))
    F(cseq(A,))
  • Z Zr
    Zr2 ZrA
  • Z amplitude
  • 0
  • Larger r ? larger working set
  • Impact of interfering thread on the base thread?
  • Fix the base thread
  • Interfering thread vary
  • Miss frequency misses / time
  • Reuse frequency hits / time

23
Base Thread r 0.5 (Small WS)
  • Base thread
  • Not vulnerable to interfering threads miss
    frequency
  • Vulnerable to interfering threads reuse
    frequency

24
Base Thread r 0.9 (Large WS)
  • Base thread
  • Vulnerable to interfering threads miss and reuse
    frequency

25
Outline
  • Model Assumptions
  • Definitions
  • Inductive Probability Model
  • Validation
  • Case Study
  • Conclusions

26
Conclusions
  • New Inter-Thread cache contention models
  • Simple to use
  • Input circular sequence profiling per thread
  • Output Number of misses per thread in
    co-schedules
  • Accurate
  • 3.9 average error
  • Useful
  • Temporal reuse patterns?? cache sharing impact
  • Future work
  • Predict and avoid problematic co-schedules
  • Release the tool at http//www.cesr.ncsu.edu/solih
    in
Write a Comment
User Comments (0)
About PowerShow.com