PEEP: Exploiting Predictability of Memory Dependences in SMT Processors

About This Presentation

Title:

PEEP: Exploiting Predictability of Memory Dependences in SMT Processors

Description:

Performance benefit purely due to a more efficient fetch policy based on a ... When will I get paroled?' LD. LD. LD. LD. LD. LD. LD. LD. 29. 29. B1:Sensitivity ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 40

Provided by: Saman8

Category:

more less

Transcript and Presenter's Notes

Title: PEEP: Exploiting Predictability of Memory Dependences in SMT Processors

1
PEEP Exploiting Predictability of Memory
Dependences in SMT Processors

Samantika Subramaniam, Milos Prvulovic, Gabriel
H. Loh

2
Simplified view of SMT execution
Front-end
Reservation Stations
Icache
Execution Units
Store per thread state Enough work from all
threads put together High throughput
3
Something bad happens
Producer insn stalls
Front-end
Icache
Reservation Stations
Execution Units
Low ILP thread eventually uses up the CPU
resources Other independent high ILP threads
forced to stall Defeats purpose of SMT
Tackle the problem at the source
FETCH UNIT
4
Previously proposed solution
ICOUNT (Instruction Count) Tullsen et al. ISCA
1996 Count the number of instructions in the
pipeline per thread Fetch Policy Less priority
to thread with more instructions
Clogged resources
OOPS!
Front-end
Icache
Reservation Stations
Execution Units
REACTIVE EXCLUSION !
5
So can we do better?
Oracle
Front-end
Icache
Reservation Stations
Execution Units
PROACTIVE EXCLUSION !
6
Proactive Exclusion Strategies (PE)

Load Misses Moursy et al. ISCA 2003
predicted load miss ?GATE
MLP Eyerman et al. HPCA 2007
all available MLP exposed ? GATE

Memory Dependences

7
A Brief Overview of Memory Dependences
LSQ
Memory Dependence Predictor

PRED
ADDR
INST

0xF023
ST 1
PC
1
0xF380
LD 1
ST 2
0xF793
0xF060
?
0xF060
LD 2
Predictability of Memory Dependences Predictor
can indicate future stalls
8
Proactive Exclusion using Memory Dependences
T0
T0
LD
ST
LD
T1
T1
ST
LD
ST

T2
T2
Learn ST-LD relationships

ST A LD A
ST ? LD A
T3
T3

9
Starvation Problem with Proactive Exclusion

Stall resolves
Insn enters RS
T0
T1
Reservation Stations

Exclusion (any strategy) could cause temporary
STARVATION
T2

T3
Especially bad for short duration stalls!!!
10
Short Duration Stall
ST A LD A ADD SUB
ST ? LD A ADD SUB
ADD
Original
ST A LD A ADD SUB
ST ? LD A ADD SUB
ADD
Original PE
Memory Dependence Predictor
11
Can we avoid starvation?
With PE based on memory dependences we can
Memory Dependence Predictor
INST
ADDR
?
0xF060
20 cycles
12
Delay Predictor Details
Memory Dependence Predictor

Conservative
Maximum observed delay
Aggressive
Last observed delay
Adaptive
Average of last observed n delays

DELAY
PRED
PC
1
20
13
How does this help us?
ST A LD A ADD SUB
ADD
Addr resolves
Original
ST A LD A ADD SUB
ST ? LD A ADD SUB
ADD
Addr resolves
Memory Dependence Predictor
Original PE
Choose an appropriate delay threshold
14
Performance Impact of Delay Information
Phase 1
After 20 cycles
MDP
AST 1 BLD 1
P
D
ST ?
ST xF060
B
0
1
0
20
ST 1
. . .
AST 21 BLD 21
LD1
LD xF060
Reservation Stations
P prediction D delay
Execution Units
15
Phase 2
Delay Threshold Front End Depth 5
MDP
AST 1 BLD 1
D
P
B
1
20
. . .
AST 21 BLD 21
Front-end
P prediction D delay
16
PE without delay information
Phase 3
Front End Depth 5
Reservation Stations
Front-end
Stall resolves
Restart fetch
Insn enter RS
5
20
25 cycles
Instructions enter RS after stall resolves
17
PE with delay information
Phase 3
Delay Threshold Front End Depth 5
ReservationStations
Front-end
Stall resolves
Restart fetch
Insn enter RS
5
15
20 cycles
Instructions enter RS right in time as stall
resolves
18
What does this give us?
PEEP

Proactive Exclusion
When a memory dependence stall is predicted
Avoid starvation
Ignore short stalls
Give the thread a head start
Restart fetch of gated thread few cycles before
stall resolves

Early Parole!!!
PROACTIVE EXCLUSION AND EARLY PAROLE
19
PEEP In Our Context
Memory Dependence and Delay Predictor
20 cycles
Front-end
Icache
Reservation Stations
Execution Units
Predicted delay FE pipeline depth 15
cycles
20
Simulation Parameters

Aggressive four-way SMT processor
MDP modeled on Load Wait Table
SPEC2000, MediaBench and others
32 four-thread application mixes evaluated
Application Classification
S sensitive to memory dependences
N non-sensitive to memory dependences
L low-ILP
M medium-ILP
H high-ILP

21
Proactive Exclusion Strategies
S Sensitive N Non-sensitive L low-ILP M
medium-ILP H high-ILP

PE using memory dependencies shows 13 speedup
Maximum benefit with both sensitive (S) and
non-sensitive (N) threads
All sensitive threads all PE strategies perform
comparably

22
PEEP
17

PEEP using delay prediction outperforms MLP and
PE mdep
All sensitive threads PEEP does better since it
can predict stall durations accurately

PEEP with an oracle-based MDP shows performance
speedup of 19

23
2-threaded Workloads
12

Less threads ? less opportunities to fetch from
non-stalled threads
12 performance speedup over 25 application mixes
shows there is potential benefit even in a
2-way SMT

Intel Simulator shows 8 performance speedup over
150 application mixes
24
Relationship with OOO Load Scheduling
Hypothesis Performance benefit purely due to a
more efficient fetch policy based on a highly
predictable attribute Experiment PEEP on a
processor without OOO memory scheduling
Prediction is used only for
controlling fetch policy

Result Avg. Speedup over ICOUNT17 (same as
PEEP!) Conclusion Memory Dependencies are a
very good indicator of future stalls Even a
machine without load reordering benefits from
predicting these stalls
25
Why does it work so well?
LMP
PEEP
LD 1
LD 1
ST 1
ST 1
LD 2
LD 2
LD 3
LD 3
Reservation Stations
Reservation Stations
LD 4
LD 4
26
LMP
PEEP
MLP
LD 1
LD 1
LD 1
ST 1
ST 1
ST 1
LD 2
LD 2
LD 2
ADD
ADD
ADD
Reservation Stations
Reservation Stations
Reservation Stations
SUB
SUB
SUB
Can expose more ILP
27
Key Points

Need a mechanism for efficient resource
management in SMT
Improve the fetch unit
Memory Dependences and Associated Latencies are
predictable
Proactively Exclude bad threads but give them
Early Parole to avoid temporary starvation
Performance improvements on both 4-way and 2-way
SMT machines

28
Thank You www.cc.gatech.edu/samantik
LD
LD
LD
LD
LD
LD
LD
LD
When will I get paroled?
29
B1Sensitivity Analysis
30
Predictor Size
Delay Threshold
31
B2PEEP
17.3

Memory Dependences are a very good indicator of
future stalls
Performance shows that PEEP works because it
leverages knowledge of future stalls to improve
instruction fetch

32
B3Fairness
19

Speedup is computed for harmonic mean of weighted
IPCs
Since all PE strategies run on top of ICOUNT,
they inherit its fairness
SDS (standard deviation of speedup) for PEEP
0.17 and for ICOUNT 0.11

33
B4 OOO memory scheduling on SMT machine
34
B5 Accuracy of MDP
35
B6 Delays associated with PEEP
36
B7 Delay Predictors
Conservative Maximum observed delay Aggressive
Last observed delay Adaptive Average of last n
observed delays
37
B8Simulator Configuration
38
4-threaded mixes
39
2-threaded mixes

Write a Comment

User Comments (0)

About PowerShow.com

PEEP: Exploiting Predictability of Memory Dependences in SMT Processors - PowerPoint PPT Presentation

PEEP: Exploiting Predictability of Memory Dependences in SMT Processors

Performance benefit purely due to a more efficient fetch policy based on a ... When will I get paroled?' LD. LD. LD. LD. LD. LD. LD. LD. 29. 29. B1:Sensitivity ... – PowerPoint PPT presentation