Title: Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture
1Predicting Inter-Thread Cache Contention on a
Chip Multi-Processor Architecture
- Dhruba Chandra Fei Guo Seongbeom Kim
- Yan Solihin
- Electrical and Computer Engineering
- North Carolina State University
- HPCA-2005
2Cache Sharing in CMP
Processor Core 1
Processor Core 2
L1
L1
L2
3Impact of Cache Space Contention
- Application-specific (what)
- Coschedule-specific (when)
- Significant Up to 4X cache misses, 65 IPC
reduction
4Related Work
- Uniprocessor miss estimation
- Cascaval et al., LCPC 1999 Chatterjee et al.,
PLDI 2001 - Fraguela et al., PACT 1999 Ghosh et al., TPLS
1999 - J. Lee at al., HPCA 2001 Vera and Xue, HPCA
2002 - Wassermann et al., SC 1997
- Context switch impact on time-shared processor
- Agarwal, ACM Trans. On Computer Systems, 1989
- Suh et al., ICS 2001
- No model for cache sharing impact
- Relatively new phenomenon SMT, CMP
- Many possible access interleaving scenarios
5Contributions
- Inter-Thread cache contention models
- 2 Heuristics models (refer to the paper)
- 1 Analytical model
- Input circular sequence profiling for each
thread - Output Predicted num cache misses per thread in
a co-schedule - Validation
- Against a detailed CMP simulator
- 3.9 average error for the analytical model
- Insight
- Temporal reuse patterns ? impact of cache sharing
6Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
7Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
8Assumptions
- One circular sequence profile per thread
- Average profile yields high prediction accuracy
- Phase-specific profile may improve accuracy
- LRU Replacement Algorithm
- Others are usu. LRU approximations
- Threads do not share data
- Mostly true for serial apps
- Parallel apps threads likely to be impacted
uniformly
9Outline
- Model Assumptions
- Definitions
- Inductive Probability (Prob) Model
- Validation
- Case Study
- Conclusions
10Definitions
- seqX(dX,nX) sequence of nX accesses to dX
distinct addresses by a thread X to the same
cache set - cseqX(dX,nX) (circular sequence) a sequence in
which the first and the last accesses are to the
same address
A B C D A E E B
11Circular Sequence Properties
- Thread X runs alone in the system
- Given a circular sequence cseqX(dX,nX), the last
access is a cache miss iff dX Assoc - Thread X shares the cache with thread Y
- During cseqX(dX,nX)s lifetime if there is a
sequence of intervening accesses seqY(dY,nY), the
last access of thread X is a miss iff dXdY
Assoc
12Example
- Assume a 4-way associative cache
No cache sharing A is a cache hit Cache sharing
is A a cache hit or miss?
13Example
- Assume a 4-way associative cache
A U B V V W A
A U B V V A W
seqY(3,4) intervening in cseqXs lifetime
seqY(2,3) intervening in cseqXs lifetime
14Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
15Inductive Probability Model
- For each cseqX(dX,nX) of thread X
- Compute Pmiss(cseqX) the probability of the last
access is a miss - Steps
- Compute E(nY) expected number of intervening
accesses from thread Y during cseqXs lifetime - For each possible dY, compute P(seq(dY, E(nY))
probability of occurrence of seq(dY, E(nY)), - If dY dX Assoc, add to Pmiss(cseqX)
- Misses old_misses ? Pmiss(cseqX) x F(cseqX)
?
16Computing P(seq(dY, E(nY)))
- Basic Idea
- P(seq(d,n)) A P(seq(d-1,n)) B
P(seq(d-1,n-1)) - Where A and B are transition probabilities
- Detailed steps in paper
seq(d,n)
1 access to a distinct address
1 access to a non-distinct address
seq(d-1,n-1)
seq(d,n-1)
17Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
18Validation
- SESC simulator
- Detailed CMP memory hierarchy
- 14 co-schedules of benchmarks (Spec2K and Olden)
- Co-schedule terminated when an app completes
19Validation
Error (PM-AM)/AM
- Larger error happens when miss increase is very
large - Overall, the model is accurate
20Other Observations
- Based on how vulnerable to cache sharing impact
- Highly vulnerable (mcf, gzip)
- Not vulnerable (art, apsi, swim)
- Somewhat / sometimes vulnerable (applu, equake,
perlbmk, mst) - Prediction error
- Very small, except for highly vulnerable apps
- 3.9 (average), 25 (maximum)
- Also small for different cache associativities
and sizes
21Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
22Case Study
- Profile approx. by geometric progression
- F(cseq(1,)) F(cseq(2,)) F(cseq(3,))
F(cseq(A,)) - Z Zr
Zr2 ZrA - Z amplitude
- 0
- Larger r ? larger working set
- Impact of interfering thread on the base thread?
- Fix the base thread
- Interfering thread vary
- Miss frequency misses / time
- Reuse frequency hits / time
23Base Thread r 0.5 (Small WS)
- Base thread
- Not vulnerable to interfering threads miss
frequency - Vulnerable to interfering threads reuse
frequency
24Base Thread r 0.9 (Large WS)
- Base thread
- Vulnerable to interfering threads miss and reuse
frequency
25Outline
- Model Assumptions
- Definitions
- Inductive Probability Model
- Validation
- Case Study
- Conclusions
26Conclusions
- New Inter-Thread cache contention models
- Simple to use
- Input circular sequence profiling per thread
- Output Number of misses per thread in
co-schedules - Accurate
- 3.9 average error
- Useful
- Temporal reuse patterns?? cache sharing impact
- Future work
- Predict and avoid problematic co-schedules
- Release the tool at http//www.cesr.ncsu.edu/solih
in