A Low-Power Instruction Cache Architecture Exploiting Program Execution Footprints - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

A Low-Power Instruction Cache Architecture Exploiting Program Execution Footprints

Description:

(2. If a cache miss occurs, then erase all the footprints. ... If the footprint is detected in BTB, then omit the tag comparisons for all the instruction in A! ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 60
Provided by: Ino79
Category:

less

Transcript and Presenter's Notes

Title: A Low-Power Instruction Cache Architecture Exploiting Program Execution Footprints


1
A Low-Power Instruction Cache Architecture
Exploiting Program Execution Footprints
Koji Inoue and Kazuaki Murakami
Kyushu University
2
Introduction
Increase in cache size
Power consumed in on-chip caches
DEC 21164 CPU
StrongARM SA-110 CPU
Bipolar ECL CPU
50
25
43
Kamble et. al., Analytical energy Dissipation
Models for Low Power Caches, ISLPED97
Joouppi et. al., A 300-MHz 115-W 32-b Bipolar
ECL Microprocessor ,IEEE Journal

of
Solid-State Circuits93
3
Breakdown of Cache Energy
Word 64 bits Cache Size 64 KB Line Size 64 B
Energy consumed in Cache Edecode Esram Eio
Breakdown of Esram per access
Others
Etag Edata
Data (bit-lines)
Cache Subbanking
Tag (bit-lines)
Tag Memory
Data Memory (Cache Lines)
of words in a Subbank-entry (Total of
Subbanks)
Subbank
This calculation is based on Kamble, et. Al.,
Analytical energy Dissipation Models for Low
Power Caches, ISLPED97
4
History-Based Tag-Comparison (HBTC) Instruction
Cache -Motivation-
Hit rate of instruction cache (I-) is quite HIGH!
Most of the tag-comparisons result in HIT
5
Can We Know Existence of Instructions in Cache
without Tag-Comparison?
YES!
Consider
  • An instruction has been executed at least once.
  • No cache miss has occurred since the last
    execution of the instruction.

We know that the instruction exists in cache
without any tag-comparison.
6
But, How?
Keep the track of instruction execution by
leaving footprints in BTB!
1. Execute an instruction block A at time T
Leave the execution footprint in the
corresponding BTB-entry.
A?

(2. If a cache miss occurs, then erase all the
footprints.)
3. Try to execute the instruction block A at time
TX
If the footprint is detected in BTB, then omit
the tag comparisons for all the instruction in A!
A

7
HBTC I- Architecture
EFT (Execution Footprint on Taken)
EFN (EF on Not-taken)
Instruction Block
Top
Target Address
Branch Inst. Addr.
I-
BTB
Not-taken
Tail
Target Address
Branch Inst. Addr.
Branch Prediction Result
TCO (Tag Comparison Omitting flag)
Tag comparison enable?
8
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
1
2
3
4
5
6
7
A
Top
Branch Target Buffer
B
Time
Branch to F
(Iteration Count Address of Branch)
C
Branch to A
D
Branch to A
State of BTB
F
Top
Execution Flow
9
Operation Example
Iteration Count
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
10
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
11
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
12
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
13
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Omitting!
F
Top
14
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
15
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
16
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
F
Top
17
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
F
Top
18
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
19
Operation Example
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
20
Operation Example
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
21
Operation Example
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
22
Evaluation
Simulation Results
Integer Programs
Normalized Tag-Comparison Count
FP Programs
099.go 129.compress 130.li 134.perl
102.swim 110.applu 141.apsi
124.m88ksim 126.gcc 132.ijpeg 147.vortex
107.mgrid 125.turb3d
Simulator SimpleScalar Cache size 32 KB, block
size 32 B, Branch predictor 2-bit counter, of
BPT entry 2K of BTB entry 2 K, BTB
associativity 4 RAS 8
23
Conclusions
History-Based Tag-Comparison Instruction Cache
  1. Exploits execution footprints recorded in BTB.
  2. Reduces tag-comparison count.
  3. Reduces tag-comparison count by 99 (107.mgrid).

Future work
  • Analyze energy consumption with more accurate
    cache-energy models.
  • Evaluate performance with cycle-base simulation.

24
Buck Up Slides (History-based Tag-Comparison
Cache)
25
Outline
  • Introduction
  • History-Based Tag-Comparison Cache
  • Motivation
  • Mechanism
  • Architecture
  • Operation
  • Evaluations
  • Conclusions

26
Conventional Direct-Mapped Cache
ECache Edecode Esram Eio
Etag Edata
Tag memory
Reference-address
Data memory
Tag
Index
Offset
Tag
Line
Direct-Mapped Cache
Word Data
Hit?
27
History-Based Tag-Comparison Cache-Operation
Example-
Iteration Count
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
28
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
29
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
30
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Performing!
F
Top
31
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
C
Branch to A
D
Branch to A
Omitting!
F
Top
32
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
33
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
D
Branch to A
Omitting!
F
Top
34
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
F
Top
35
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
F
Top
36
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
37
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
Iteration Count
1-C
Branch-C
1
2
3
4
5
6
7
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
38
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Performing!
4-D
F
Top
Branch-C
Branch-D
39
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
Branch-C
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
40
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
41
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
Branch-C
C
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
42
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
43
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
Branch-C
D
Branch to A
Omitting!
4-D
F
Top
Branch-C
Branch-D
44
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
Omitting!
4-D
F
Top
Branch-C
Branch-D
45
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
Omitting!
4-D
F
Top
Branch-C
Branch-D
46
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
EFN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
7-B
4-D
Branch-B
F
Top
Branch-C
Branch-C
Branch-D
Branch-D
Performing!
47
History-Based Tag-Comparison Cache-Operation
Example-
EFT
EFN
TCO
RCN of Branch-C
Iteration Count
1-C
5-C
Branch-C
Branch-C
1
2
3
4
5
6
7
Branch-D
A
Top
2-C
5-D
Branch-C
Branch-C
Branch-D
B
Branch to F
3-C
6-C
Branch-C
Branch-C
C
Branch-D
Branch to A
4-C
6-D
Branch-C
Branch-C
D
Branch to A
Branch-D
7-B
4-D
Branch-B
F
Top
Branch-C
Branch-C
Branch-D
Branch-D
Performing!
Omitting!
48
Low Power Caches- Reducing both Etag and Edata -
Processor
Adding a small L0 cache
L0 Cache
  • Filter Cache
  • S-Cache
  • Block Buffering

L1 Cache
Dividing cache module
Cache
  • MDM Cache

Multiple accessing
Sequential Way-Access
  • MRU Cache
  • Hash-Rehash Cache

way3
way0
way1
way2
49
Low Power Caches- Reducing Edata -
Dividing cache module
Tag
Line
  • Cache Sub-Banking

Accessing sequentially
  • Phased Cache
  • Pipelined Cache

Tag
Line
Miss!
Hit!
Replace
50
Low Power Caches- Reducing Etag -
Conditional Tag Compare
  • Inter-Line Tag Comparison

Successive Instructions i and j
Intra-line sequential flow Consecutive
addresses, and same cache line Intra-line
non-sequential flow Non-Consecutive addresses,
and same cache line Inter-line sequential
flow Consecutive addresses, and different cache
lines Inter-line non-sequential
flow Non-Consecutive addresses, and different
cache lines
Perform tag comparison only on inter-line flows
51
Breakdown of Esram
CS 32 KB L S 32 B
CS 64 KB LS 64 B
32-bit CPU
64-bit CPU
Breakdown of Energy
of words in a Subbank (Total of Subbanks)
CS Cache Size LS Line Size
Esram_others
Esram_data_bit
This calculation is based on Kamble, et. Al.,
Analytical energy Dissipation Models for Low
Power Caches, ISLPED97
Esram_tag_bit
52
History-Based Tag-Comparison Cache-Operation
Flow-
On BTB access
53
History-Based Tag-Comparison Cache-Operation
Flow-
Start
On PC recovery
Y
BTB update?
Replacement?
N
N
Y
Wrong Prediction?
N
Y
TCO RCT
TCO RCN
RCN TCO
RCT TCO
1 RCN
1 RCT
Go to start
54
Evaluations-Simulation Environment-
8 integer and 5 FP programs from the SPEC95
Cache Simulator
Address Traces
Report
Branch Target Buffer
Branch Prediction Table
Total count of tag-comparison
Functional Execution
SimpleScalar Processor
55
Evaluation-Cache Models-
  • C-TC (Conventional Tag-ComparisonBase)
  • IL-TC (Interline Tag-Comparison)
  • H-TC (History-based Tag-Comparison)
  • H-TCideal (History-based Tag-Comparison)
  • HIL-TC (History-based Interline Tag-Comparison)

Perform tag comparison in every cache access
Perform tag comparison only on inter-line flow
Perform tag comparison only when TCO flag is 0
Nearly ideal H-TC (perfect instruction cache and
full-associative BTB)
Combination of IL-TC and H-TC
56
Evaluation-Simulation Results-
Normalized Total Count of Tag-Comparisons
57
Evaluation-Effect of BTB Associativity-
2way 8way 32way 128way 512way 2048way H-TCIdeal
Normalized Total Counts of Tag-Comparison
58
Evaluation-Effect of Cache Size-
4 KB 8 KB 16 KB 32 KB 64 KB 512
KB Perfect H-TCideal
Normalized Total Counts of Tag-Comparison
59
Evaluation-Energy Overhead -
0.1
0.09
0.08
0.07
0.06
Ave. of Erased Footprints per I-fetch bit
Ave. of Erased Footprints per Erase-Operation
bit
0.05
0.04
0.03
0.02
0.01
0.00
Write a Comment
User Comments (0)
About PowerShow.com