Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture

Description:

Dhruba Chandra Fei Guo Seongbeom Kim. Yan Solihin. Electrical and Computer ... Chandra, Guo, Kim, Solihin - Contention Model. Impact of Cache Space Contention ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 27

Provided by: hpca

Category:

more less

Transcript and Presenter's Notes

Title: Predicting InterThread Cache Contention on a Chip MultiProcessor Architecture

1
Predicting Inter-Thread Cache Contention on a
Chip Multi-Processor Architecture

Dhruba Chandra Fei Guo Seongbeom Kim
Yan Solihin
Electrical and Computer Engineering
North Carolina State University
HPCA-2005

2
Cache Sharing in CMP
Processor Core 1
Processor Core 2
L1
L1
L2

3
Impact of Cache Space Contention

Application-specific (what)
Coschedule-specific (when)
Significant Up to 4X cache misses, 65 IPC
reduction

4
Related Work

Uniprocessor miss estimation
Cascaval et al., LCPC 1999 Chatterjee et al.,
PLDI 2001
Fraguela et al., PACT 1999 Ghosh et al., TPLS
1999
J. Lee at al., HPCA 2001 Vera and Xue, HPCA
2002
Wassermann et al., SC 1997
Context switch impact on time-shared processor
Agarwal, ACM Trans. On Computer Systems, 1989
Suh et al., ICS 2001
No model for cache sharing impact
Relatively new phenomenon SMT, CMP
Many possible access interleaving scenarios

5
Contributions

Inter-Thread cache contention models
2 Heuristics models (refer to the paper)
1 Analytical model
Input circular sequence profiling for each
thread
Output Predicted num cache misses per thread in
a co-schedule
Validation
Against a detailed CMP simulator
3.9 average error for the analytical model
Insight
Temporal reuse patterns ? impact of cache sharing

6
Outline

Model Assumptions
Definitions
Inductive Probability Model
Validation
Case Study
Conclusions

7
Outline

Model Assumptions
Definitions
Inductive Probability Model
Validation
Case Study
Conclusions

8
Assumptions

One circular sequence profile per thread
Average profile yields high prediction accuracy
Phase-specific profile may improve accuracy
LRU Replacement Algorithm
Others are usu. LRU approximations
Threads do not share data
Mostly true for serial apps
Parallel apps threads likely to be impacted
uniformly

9
Outline

Model Assumptions
Definitions
Inductive Probability (Prob) Model
Validation
Case Study
Conclusions

10
Definitions

seqX(dX,nX) sequence of nX accesses to dX
distinct addresses by a thread X to the same
cache set
cseqX(dX,nX) (circular sequence) a sequence in
which the first and the last accesses are to the
same address

A B C D A E E B
11
Circular Sequence Properties

Thread X runs alone in the system
Given a circular sequence cseqX(dX,nX), the last
access is a cache miss iff dX Assoc
Thread X shares the cache with thread Y
During cseqX(dX,nX)s lifetime if there is a
sequence of intervening accesses seqY(dY,nY), the
last access of thread X is a miss iff dXdY
Assoc

12
Example

Assume a 4-way associative cache

No cache sharing A is a cache hit Cache sharing
is A a cache hit or miss?
13
Example

Assume a 4-way associative cache

A U B V V W A
A U B V V A W
seqY(3,4) intervening in cseqXs lifetime
seqY(2,3) intervening in cseqXs lifetime
14
Outline

Model Assumptions
Definitions
Inductive Probability Model
Validation
Case Study
Conclusions

15
Inductive Probability Model

For each cseqX(dX,nX) of thread X
Compute Pmiss(cseqX) the probability of the last
access is a miss
Steps
Compute E(nY) expected number of intervening
accesses from thread Y during cseqXs lifetime
For each possible dY, compute P(seq(dY, E(nY))
probability of occurrence of seq(dY, E(nY)),
If dY dX Assoc, add to Pmiss(cseqX)
Misses old_misses ? Pmiss(cseqX) x F(cseqX)

?
16
Computing P(seq(dY, E(nY)))