Title: Line Distillation:
1Line Distillation Increasing Cache Capacity by
Filtering Unused Words in Cache Lines
Moinuddin K. Qureshi M. Aater Suleman Yale N.
Patt
HPCA 2007
2Introduction
- Caches are organized at linesize granularity
- ? Helps when spatial locality is high ?
Unused words when spatial locality is low - Unused words occupy space without contributing to
cache hits - Filtering unused words allows cache to store
more cache lines
3Problem Not all words are useful
- Cache line (64B) divided into 8 words of 8B each
- (1 MB 8-way L2 cache)
Words used per line (avg)
On average less than 60 words used (4.7/8)
4Goal Improving cache performance
- Smaller linesize can result in fewer unused
words - Smaller linesize degrades cache performance
- Linesize of 32B increases MPKI for 14 of 16
benchmarks - Average MPKI increases by 25
Goal Improving cache performance by filtering
unused words
Insight Words usage stabilizes as line traverses
from MRU to LRU
5Insight
Footprint 8-bits per line that tracks word usage
Max recency position before footprint update
Most footprint updates occur early in recency
stack
6Outline
- Background
- Line Distillation
- Experimental Evaluation
- Interaction with Compression
- Related Work and Summary
7 Framework for LDIS
Distill Cache
L2 Cache
Line Organized Cache Word Organized Cache
8 Distill Cache (Operation)
- Four cases
- Cache Miss Access to line D
- LOC Hit Access to line B
- WOC Hit Access to line A (word A0)
- Hole Miss Access to line A (word A1)
Traditional cache (4-way)
Words used?
MRU
LRU
A0,A7
B
A
C
D
(A0,A7 used)
LOC
WOC
Invalidate all words of A in WOC. Fetch A from
Memory and install in LOC
Same as traditional cache
Send A0 and A7 to L1 and valid bits
Install Line D in LOC and update LRU state
9 Median Threshold Filtering
A line with many used words can evict several
lines from WOC
WOC
Line X has all 8 words used
8 Lines evicted from WOC
Increase lines in WOC by not installing lines
for which used words gt threshold K K
median words used in LOC line (computed at
runtime)
10Outline
- Background
- Line Distillation
- Experimental Evaluation
- Interaction with Compression
- Related Work and Summary
11 Methodology
- Configuration
- L2 cache 1MB 8-way 64B linesize
- (Distill cache gives 6 ways to LOC and 2 ways to
WOC) - Out-of-order processor with 16KB 2-way L1s
- 400 cycle memory
- Benchmarks
- 15 SPEC2K benchmarks health from olden suite
- (A 250M instruction slice using SimPoint for
SPEC2K)
12 Results
() Reduction in L2 MPKI
LDIS (MT) reduces MPKI by 25
13 Reverter Circuit (RC)
- Tournament selection Distill cache vs.
traditional cache - Dynamic set sampling with 32 sets Qureshi
ISCA06
(storage overhead of ATD 1KB)
For sets A, C, D, F, H if (SCTR gt 75) Enable
LDIS if (SCTR lt 25) Disable LDIS
14 Results with RC
LDIS (MT, No RC)
LDIS (MT,RC)
() Reduction in L2 MPKI
RC disables LDIS when it increases MPKI. LDIS
(MT,RC) reduces MPKI by 30
15 Overheads
- Storage
- Tags for WOC footprint bits 12.2 overhead
- Latency
- Tag-access (LOCWOC) increases by one cycle
- WOC hits incur two cycles to rearrange words
- Power
- Additional power of WOC tag-store
16 IPC Results
() IPC Improvement
LDIS improves average IPC by 12
17Outline
- Background
- Line Distillation
- Experimental Evaluation
- Interaction with Compression
- Related Work and Summary
18 Compression vs. LDIS
- Several proposals to increase capacity via
compression - Compression and LDIS fundamentally different
- Compression exploits redundancy in stored data
- LDIS leverages unused words for spare capacity
- Footprint Aware Compression (FAC) combines both
- FAC compresses used words before installing in
WOC
19 Results for FAC
() Reduction in L2 MPKI
50
40
30
20
10
0
Compression
FAC
LDIS
Compression and LDIS interact positively. FAC
reduces MPKI by 50
20Outline
- Background
- Line Distillation
- Experimental Evaluation
- Interaction with Compression
- Related Work and Summary
21Related work
- Spatial-Temporal Cache -Gonzales ICS95
- Spatial Locality Prediction Johnson ISCA97
- Variable Linesize Cache Veidenbaum ICS99
- Spatial Footprint Prediction Kumar ISCA98,
Pujara HPCA06 - Spatial Pattern Prediction -Chen HPCA05
LDIS is particularly suited for large caches and
outperforms predictor-based techniques
without requiring separate structure for tracking
spatial footprint
22Contributions
- Line Distillation Filter unused words without a
separate footprint predictor - Distill cache Utilize extra capacity created by
LDIS - Median Threshold Filtering and Reverter Circuit
Improve performance and robustness of LDIS - Result LDIS (MTRC) reduces MPKI by 30
- Footprint Aware Compression LDIS compression
- Result FAC reduces MPKI by 50
23Questions
24 Result comparing capacity
25Line Size vs. MPKI
26Distribution of Hit-Miss
27Average words usage (detailed)
28 Result for 3 types of LDIS
29 Replacement
- LRU in LOC
- WOC needs variable sized replacement
- Only power-of-two sizes allowed in WOC
- Placement constrained to alignment boundary
- Random selection in case of multiple candidates
30 Background (pictorial)
31 Result LDIS vs. FAC (detailed)
32 Comparison with SFP
33Appendix A Other SPEC Benchmarks
34Appendix B Cache Size vs. Density
35Summary
- Many words in cache lines remain unused
- Unused words unlikely to be accessed in less
recent part of LRU stack ? Line Distillation
(LDIS) - Distill-cache utilizes extra capacity created by
LDIS - LDIS reduces MPKI by 30 and improves IPC by 12
- Footprint Aware Compression combines LDIS and
compression to reduce MPKI by 50