Title: Threadlevel Speculation
1Thread-level Speculation
- Presenter Minglong Shao
- Rong Yan
2Outline
- Motivation of Thread-level Speculation
- Design Issues
- Case study
- Summary
3Outline
- Motivation of Thread-level Speculation
- Design Issues
- Case study
- Summary
?
4Thread Level Parallelism
- Break up computation into threads
- Assign the threads to processors
Thread2
Program
Thread1
Thread3
5How to deal with dependency
Original i1 j2 a3 bc ...
Original i1 j2 a3 bj ...
Vs.
Thread i i1j2
Thread i1 a3bc
Thread i i1j2
Time
Thread i1 a3bj
6False Data Dependency
Sometimes complier cannot fully exploit potential
parallelism
- Example pseudo-code
- While (continue_condition)
-
- xhashindex1
-
- hashindex2y
-
Time
Thread i hash3 hash10..
Do you have any ideas to obtain run-time
parallelism in this case?
Thread i1 hash19 hash21..
7Thread-Level Speculation
- What is thread level speculation?
- An approach that
- Enables the compiler to create parallel threads
despite the existence of ambiguous data
dependence. -
Run Time
Compile Time
Parallelize without detection of dependency
8Thread-Level Speculation(Cont.)
Dynamically detect dependency in run-time
Time
Processor 0
Processor 1
Processor 2
Thread 1 hash3 Hash10..
Thread 2 hash19 Hash21..
Thread 3 hash10 Hash21..
Violation!
9Thread-Level Speculation(Cont.)
Dynamically detect dependency in run-time
Time
Processor 0
Processor 1
Processor 2
Thread 1 hash3 Hash10..
Thread 2 hash19 Hash21..
Thread 3 hash10 Hash21..
Violation!
Thread 4 hash30 Hash40..
Redo
Thread 5 hash50 Hash60..
Thread 3 hash10 Hash21..
10Outline
- Motivation of Thread-level Speculation
- Design Issues
- Case study
- Summary
?
11Design Issue
- Hardware/Software must provide the methods for
- Detecting the true memory dependencies
- Backing up and re-executing instructions
- Buffering any data written during the
speculative region, for later committing /
discarding
12Outline
- Motivation of Thread-level Speculation
- Design Issues
- Case study
- Summary
?
13Case study
- Multiscalar architecture all the dynamic control
is performed by hardware in runtime - TLDS architecture all thread control is handled
by software routine - Hydra Architecture (CMP) speculative write
buffer with write-through coherence scheme - Scalable Speculation Approach all kinds of
architecture writeback invalidation-based cache
coherence
14Hydra Architecture
- 4 MIPS Processors Chip Multiprocessor (CMP)
- Speculation Coprocessor execute software
exception handler - L1 Data cache with write-through
invalidation-based policy - L2 cache with speculation write buffers
FOR MORE INFO...
Please refer to the paper Data Speculation for a
Chip Multiprocessor
15Data Cache Modification
FOR MORE INFO...
Please refer to the paper Data Speculation for a
Chip Multiprocessor
16Downside of Hydra
- Only reasonable in single chip
- Not scalable to larger system
- -- Write through scheme
- -- Snooping write buffer upon every store
FOR MORE INFO...
Please refer to the paper Data Speculation for a
Chip Multiprocessor
17Scalable Thread-level Speculation
- Built on writeback invalidation-based cache
coherence - Scalable to arbitrary scale of architecture
FOR MORE INFO...
Refer to the paper A Scalable Approach to
Thread-Level Speculation
18Example
Time
Processor 2 Epoch 6 become_speculative() ?LOAD a
p ?attempt_commit()
Processor 1 Epoch 5 ?STORE q 2
p q x
L1 Cache
L1 Cache
Epoch 5
Epoch 6
Violation? FALSE
Violation? FALSE
19Example
Time
Processor 2 Epoch 6 become_speculative() ?LOAD a
p ?attempt_commit()
Processor 1 Epoch 5 ?STORE q 2
p q x
L1 Cache
L1 Cache
Epoch 5
Epoch 6
Violation? FALSE
Violation? FALSE
20Example
Time
Processor 2 Epoch 6 become_speculative() ?LOAD a
p ?attempt_commit()
Processor 1 Epoch 5 ?STORE q 2
p q x
L1 Cache
L1 Cache
Epoch 5
Epoch 6
Violation? FALSE
21When TLS is not desired?
- Not desirable to invoke with frequent
dependency e.g. scalar variable - Solution -- accommodate the dependence through
synchronization -- turn off the speculation
support when necessary
22Outline
- Motivation of Thread-level Speculation
- Design Issues
- Case study
- Summary
?
23Summary
- TLS enables compiler to create parallel threads
despite of uncertainty on actual dependency - Detect dependency on run-time
- Several implementations --single chip/scalable,
write through / write back - Never simply count on TLS -- Combined with
synchronization, superscalar mechanism
24( Thanks )