Title: Gautham K.Dorai and
1TRANSPARENT THREADS
- Gautham K.Dorai and
- Dr.Donald Yeung
- ECE Dept., Univ. Maryland, College Park
2SMT Processors
Priority Mechanisms
Pipeline
Multiple Threads
- ICOUNT, IQPOSN, MISSCOUNT, BRCOUNT
- Tullsen ISCA96
3Individual Threads run Slower!
Individual Thread Performance 31
4Single-Thread Performance
- Multiprogramming (Process Scheduling)
- Subordinate Threading (Prefetching/Pre-Execution,C
ache management, Branch Prediction etc.) - Performance Monitoring (Dynamic Profiling)
5 Transparent Threads
Foreground Thread
Background Thread (Transparent)
0 SLOWDOWN
6Single-Thread Performance
- Multiprogramming
- Latency of critical high-priority process
- Subordinate Threading
- Performance Monitoring
- Benefit vs Cost (Overhead) Tradeoff
7Road Map
- Motivation
- Transparent Threads
- Experimental Evaluation
- Transparent Software Prefetching
- Conclusion
8Shared vs. Private
Transparency No stealing of shared resources
PC
ROB
Predictor
Functional Units
Issue Queues
Register File
I-Cache
Register Map
D-Cache
Fetch Queue
9Slots, Buffers and Memories
SLOTS Allocation based on current cycle only
BUFFERS Allocation based on future cycles
PC
PC
MEMORIES Allocation based on future cycles
ROB
Predictor
Issue Units
Issue Queues
Register File
I-Cache
I-Cache
I-Cache
Register Map
D-Cache
Fetch Queue
10Slot Prioritization
ICOUNT (background) ICOUNT(background)
Inst-Window Size
Foreground
Background
I-CACHE BLOCK
ICOUNT 2.N
Fetch Slots
11Slot Prioritization
ICOUNT (background) ICOUNT(background)
Inst-Window Size
Foreground
Background
I-CACHE BLOCK
ICOUNT 2.N
Fetch Slots
12Slot Prioritization
ICOUNT (background) ICOUNT(background)
Inst-Window Size
Foreground
Background
I-CACHE BLOCK
ICOUNT 2.N
Fetch Slots
13Buffer Transparency
ROB
Fetch Hardware
PC1
PC2
Fetch Queue
Foreground
Background
Issue Queue
14Background Thread Window Partitioning
- Limit on Background Thread Instructions
ROB
- Stops fetch when ICOUNT reaches limit
Fetch Hardware
PC1
PC2
Partition
Fetch Queue
Foreground
Background
Issue Queue
15Background Thread Window Partitioning
ROB
- Foreground Thread can occupy all available
entries
Fetch Hardware
PC1
PC2
Partition
Fetch Queue
Foreground
Background
Issue Queue
16Background Thread Flushing
ROB
- No limit on Background Thread
Head
Tail
Fetch Hardware
PC1
PC2
Tail
Fetch Queue
Foreground
Background
Issue Queue
Head
17Background Thread Flushing
ROB
- No limit on Background Thread
Head
- Flush Triggered on Conflict
Tail
Fetch Hardware
PC1
PC2
Fetch Queue
Foreground
Tail
Background
Issue Queue
Head
18Background Thread Flushing
ROB
- No limit on Background Thread
Head
- Flush Triggered on Conflict
Fetch Hardware
Tail
PC1
PC2
Fetch Queue
Foreground
Tail
Background
Issue Queue
Head
19Foreground Thread Flushing
ROB
- Instructions remain stagnant in ROB
Load Miss
Head
- Flush Triggered on load miss at head
Fetch Hardware
PC1
PC2
Fetch Queue
Foreground
Background
Issue Queue
Tail
20Foreground Thread Flushing
ROB
Load Miss
Head
- Flush F Entries from the tail
- Block the fetch for T Cycles
Fetch Hardware
PC1
PC2
Tail
Fetch Queue
Head
Foreground
Background
Issue Queue
21 Foreground Thread Flushing
ROB
Load Miss
Head
- After T Cycles allow to fetch again
- F T depend on R (Residual Cache Latency)
Fetch Hardware
PC1
PC2
Fetch Queue
Foreground
Background
Tail
Issue Queue
22SimpleScalar-based SMT
23Benchmark Suites
- Evaluate Transparency Mechanisms
- Transparent Software Prefetching
24Transparency Mechanisms
Background Thread Window Partitioning (32 Entries)
Slot Prioritization
Background Thread Flushing
Private Caches
Private Predictor
Equal Priority
EP
SP
BP
BF
PC
PP
25Transparency Mechanisms
Equal Priority 30 Slowdown Slot Prioritization
16 Slowdown Background Window Partitioning
9 Slowdown Background Thread Flushing 3
Slowdown
EP
SP
BP
BF
PC
PP
26Performance Mechanisms
Foreground Thread Window Partitioning (112F 32B)
ICOUNT 2.8 with Flushing
Equal Priority
ICOUNT 2.8
2B
2F
2P
EP
27Performance Mechanisms
Equal Priority 31 degradation ICOUNT 2.8 -
41 slower than EP ICOUNT 2.8 Foreground Thread
Flushing 23 slower than EP Foreground Thread
Window Partitioning 13 slower than EP
2B
2F
2P
EP
Normalized IPC
28Transparent Software Prefetching
Conventional
Transparent Software Prefetching
Computation Thread
Transparent Prefetch Thread
For (I0 I lt N-PD I8) prefetch(bI)
bI z bI
For (I0 I lt N-PD I8) bI z bI
For (I0 I lt N-PD I8) prefetch(bI)
- Offload Prefetch Code to Transparent Threads
- Profitability of Prefetching
- Zero Overhead No profiling Required
(Profiling required)
29 Transparent Software Prefetching
Naive Conventional Software Prefetching
Profiled Conventional Software Prefetching
No Prefetching
Transparent Software Prefetching
Normalized Execution Time
NP PF PS TSP
VPR
30Transparent Software Prefetching
Naïve Software Prefetching 19.6 Overhead, 0.8
Performance Selective Software Prefetching
14.13 Overhead, 2.47 Performance Transparent
Software Prefetching 1.38 Overhead, 9.52
Performance
NP PF PS TSP
NP PF PS TSP
NP PF PS TSP
NP PF PS TSP
NP PF PS TSP
NP PF PS TSP
NP PF PS TSP
VPR
BZIP
GAP
EQUAKE
ART
AMMP
IRREG
31Conclusions
- 3 overhead on foreground thread
- Less than 1 without cache and predictor
contention
- Within 23 of Equal Priority
- Transparent Software Prefetching
- 9.52 gain with 1.38 Overhead
- Eliminates the need for profiling
- Availability of spare bandwidth
- Can be used transparently for interesting
applications
32Related Work
- Tullsens work on Flushing mechanisms
- Tullsen Micro-2001
- Raaschs work on prioritization
- Raasch MTEAC Worshop 1999
- Snavelys work on Job Scheduling
- Snavely ICMM-2001
- Chappells work on Subordinate Multithreading and
- Duboiss work on Assisted Execution
- Chappell ISCA-1999Dubois Tech-Report Oct98
33Foreground Thread Window Partitioning
- Advantages
- Minimal guaranteed entries
- Disadvantages
- Transparency minimized
Fetch Hardware
PC1
PC2
Partition
Fetch Queue
Foreground
Background
Issue Queue
34Benchmark Suites
- Evaluate Transparency Mechanisms
- Transparent Software Prefetching
35Transparency Mechanisms
EP SP BF
EP SP BF
EP SP BF
EP SP BF
EP SP
EP SP BF
EP SP BF
EP SP BF
EP SP BF
36Transparency Mechanisms
EP SP BF
EP SP BF
EP SP BF
EP SP BF
SP BF
EP SP BF
EP SP BF
EP SP BF
EP SP BF
37Transparency Mechanisms
38Transparent Software Prefetching
NP PF PS TSP NF
NP PF PS TSP NF
NP PF PS TSP NF
NP PF PS TSP NF
NP PF PS TSP NF
NP PF PS TSP NF
NP PF PS TSP NF