Dynamic Region Selection for Thread Level Speculation - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic Region Selection for Thread Level Speculation

Description:

Dynamic Region Selection for Thread Level Speculation. Presented by: Jeff Da Silva ... Multithreading on a Chip is here TODAY! Supercomputers. Threads of ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 56

Provided by: jason218

Learn more at: https://www.eecg.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Region Selection for Thread Level Speculation

1
Dynamic Region Selection for Thread Level
Speculation

Presented by
Jeff Da Silva
Stanley Fung
Martin Labrecque

Feb 6, 2004

Builds on research done by
Chris Colohan from CMU
Greg Steffan

2
Multithreading on a Chip is here TODAY!
Threads of Execution
Supercomputers
3
Improving Performance with a Chip Multiprocessor
With a bunch of independent applications
Applications
Execution Time
Processor
Caches
?improves throughput (total work per second)
4
Improving Performance with a Chip Multiprocessor
With a single application
?
Exec. Time
?need parallel threads to reduce execution time
5
Thread-Level Speculation the Basic Idea
?
6
Support for TLS What Do We Need?

Break programs into speculative threads
to maximize thread-level parallelism
Track data dependences
to determine whether speculation was safe
Recover from failed speculation
to ensure correct execution

?three key elements of every TLS system
7
Support for TLS What Do We Need?

Lots of research has been done on TLS hardware
Tracking data dependence
Recover from violation
We focus on how to select regions to run in
parallel
A region is any segment of code that you want to
speculatively parallelize
For this work, region loop, iterations
speculative threads

8
Why is static region selection hard?

Extensive profiling information
Regions can be nested
for ( i 1 to N )
lt 2x faster in parallel
.
for ( j 1 to N ) lt
3x faster in parallel
.
for ( k 1 to N ) lt 4x
faster in parallel
.
Which loop should we parallelize?
Dynamic behaviour

?Dynamic Region Selection is a potential solution
9
Dynamic Region Selection

Compiler transforms all candidate regions into
parallel and sequential versions
Through dynamic profiling, we decide which
regions are to be run in parallel
Key Questions
Is there any dynamic behaviour between region
instances?
What is a good algorithm for selecting regions?
Are there performance trade-offs for doing
dynamic profiling?
Is there any dynamic behaviour within region
instances? (not the focus of this research)

10
Outline

The role of the TLS compiler
Characterizing dynamic behaviour
Dynamic Region Selection (DRS) algorithms
Results
Conclusions
Open questions and future work

11
Current Compilation for TLS

LoopA
LoopB
EndB
LoopC
LoopD
EndD
EndC
EndA
LoopE
LoopF
EndF
EndE
LoopG
LoopH
EndH
EndH

12
DRS Compilation
LoopA LoopB EndB LoopC LoopD EndD EndC End
A LoopE LoopF EndF EndE LoopG LoopH EndH EndH
LoopA LoopB EndB LoopC LoopD EndD EndC End
A LoopE LoopF EndF EndE LoopG LoopH EndH EndH
13
DRS Compilation
14
DRS Compilation
15
DRS Compilation
16
DRS Compilation
17
DRS Compilation
?DRS Compilation by Colohan
18
Characterizing TLS Region Behaviour
19
Characterizing TLS Region Behaviour
20
DRS Algorithms

Sample Twice
Continuous Monitoring
Continuous Resample
Path Sensitive Sampling

21
Sample Twice Algorithm

Effective if behaviour is constant.
When a region is encountered
1st Time Run sequential version and record
execution time t1
2nd Time Run parallel version (if possible) and
record execution time tp
Subsequent instances
if tp lt t1 then run parallel version
else run sequential version
Note that by using execution time as a metric, it
is assumed that the amount of work done from
instance to instance remains relatively constant.
Using throughput (IPC) as a metric eliminates the
need for this assumption but adds additional
complexity.

22
Sample Twice Example
23
Continuous Monitoring

Effective if behaviour is continuously degrading.

Extension to sample twice method. Continuously
monitor all regions and reevaluate your decision
if speedup changes.
Not doing much more besides monitoring
continuously -gt the overhead is free.
When a region is encountered
1st Time Run sequential version and record
execution time t1
2nd Time Run parallel version (if possible) and
record execution time tp
Subsequent instances
if tp lt t1 then run parallel version and update
tp
else run sequential version and update t1

24
Continuous Monitoring Example
25
Continuous Resample

Effective if behaviour is continuously changing.

Continuously resample by flushing values t1 and
tp periodically.
Adds new overhead.
This algorithm has not yet been explored.

26
Path Sensitive Sampling

If the behaviour is periodic, a means of
filtering is required.
One intuitive solution is to sample when the
invocation path or region nesting path changes.

27
Path Sensitive Sampling

Sample when region nesting path changes
Makes the assumption that state stays the same if
the invocation path does not change

void foo() while(cond)
moo() void bar() while(cond)
moo() void moo() while(cond)
moo()
28
Results Static analysis
Average number of per-path instances for all
regions
29
Interesting Region in IJPEG
Number of speculative threads per region instance
Program execution ?
30
Interesting Region in Perl
Number of instructions per region instance
Program execution ?
31
Experimental Framework

SPEC benchmarks
TLS compiler
MIPS architecture
TLS profiler and simulator

32
Outline

The role of the TLS compiler
Characterizing dynamic behaviour
Dynamic Region Selection (DRS) algorithms
Results
Conclusions
Open questions and future work

?Is there any dynamic behavior between region
instances?

34
Results Dynamic behavior
?Regions with high coverage have low instruction
variance between instances
35
Results Dynamic behavior
?Regions with high coverage have low violation
variance between instances
36
Results Dynamic behavior
?Regions with high coverage have low speculative
thread count variance between instances
37

?What is a good algorithm for selecting regions?

38
slower
faster
?Continuous monitoring 1 better on average than
sample twice ?About 10 worse than static
optimal selection
39

?How often did we agree with the optimal
selection?

40
?Sample twice agrees 57 of the time, on
average ?Continuous monitoring agrees 43 of the
time, on average ?Levels of agreement are close ?
no dynamic behavior?
41
?Agreeing with static optimal gives better
performance? ?Another sign of no dynamic
behaviour?
42
? Sample twice often leaves regions
undecided ?Overall, undecided regions represent
low coverage
43
Outline

The role of the TLS compiler
Characterizing dynamic behaviour
Dynamic Region Selection (DRS) algorithms
Results
Conclusions
Open questions and future work

44
Conclusions

This is an unexplored research topic (as far as
we know)
? Is there any dynamic behavior between region
instances?
We have good indications that there isnt tons of
it
?What is the best algorithm for selecting
regions?
Continuous sampling does 1 better than sample
twice
Within 10 of the static optimal without any
sampling done!
?Any performance trade-offs for doing dynamic
profiling?
The code size is increased by at most 30
The runtime performance overhead is believed to
be negligible
? Is there any dynamic behavior within a region
instance?
We dont know yet

45
Open Questions

The dynamic optimal is the theoretical optimal
How close are we from the dynamic optimal?
How close is the static optimal to the dynamic
optimal?
How do the other proposed algorithms perform?
What should be implemented in hardware/software?

Questions?

47
AUXILIARY SLIDES
48
Results Potential Study
Execution time versus invocation (IJPEG)
49
Results Potential Study
Execution time versus invocation (CRAFTY)
50
Results Potential Study
Execution time versus invocation (LI)
51
Results Potential Study
Execution time versus invocation (PERL)
52
Results Static analysis
53
Results Dynamic behavior
54
Results Dynamic behavior
55
Results Dynamic behavior

Write a Comment

User Comments (0)