Alias Speculation using Atomic Regions presentation

About This Presentation

Transcript and Presenter's Notes

Title: Alias Speculation using Atomic Regions

1
Alias Speculation using Atomic Regions

(To appear at ASPLOS 2013)
Wonsun Ahn, Yuelu Duan, Josep Torrellas
University of Illinois at Urbana Champaign

2
Disclaimer

This talk is not about parallelism.
This talk is about decreasing the amount of work
that needs to be done through better code
generation.
We want to do this by making the
software-hardware barrier more porous.

Assumptions
Compiler
Hardware
Information
3
What prevents good code generation?

Many popular optimizations require code motion
Loop Invariant Code Motion (LICM) From the body
to the preheader of a loop
Redundancy elimination From the location of the
redundant computation to the first computation
Memory aliasing prevents code motion

r1 a b c a b
r1 a b r2 a b c r2
r1 a b r2 a b c r2
r1 a b r2 r1 c r2
r1 a b p c a b
r1 a b p r2 a b c r2
r1 a b r2 a b p c r2
4
Alias Analysis is Difficult

Alias analysis returns one of three results
Must-Alias, No-Alias, May-Alias
Accurate static analysis is fundamentally
difficult
Requires points-to analysis, heap modeling etc.
Quickly becomes intractable in space/time
complexity
Alternative insert runtime checks
Software checks
Hardware checks (e.g. Itanium ALAT, Transmeta)
We propose to leverage atomic regions to do
runtime checks and automatic recovery

5
Background Atomic Regions (aka Transactions)

Sections of code demarcated in software that are
either committed atomically on success or rolled
back on failure
Atomic regions are here and now
Intel TSX, AMD ASF, IBM Bluegene/Q, IBM Power
Originally to ease parallel programming but
again thats not what the talk is about today
Does two things well that software finds
difficult
Checkpointing to guarantee atomic commit of
transaction
Exposed to software through begin atomic, end
atomic
Memory alias detection to guarantee isolation of
transaction
Hidden from software

6
Proposal Leverage Atomic Regions for Alias
Speculation

Expose alias checking HW to SW through ISA
extensions
Use HW support for Atomic Regions to perform
alias speculation in a compiler for optimizations
Cover path of code motion in an Atomic Region
Speculate may-aliases in code motion path are
no-aliases
Check speculated aliases using alias checking HW
Recover from failure by rolling back to
checkpoint
Apply this to optimizations such as
Loop Invariant Code Motion (LICM)
Partial Redundancy Elimination (PRE)
Global Value Numbering (GVN)

7
Modifications to Atomic Regions

Key insight
Atomic regions maintain a read set and a write
set
Speculative Read (SR), Speculative Written (SW)
bits in speculative cache
Only SW bits are needed for checkpointing
Repurpose SR bits to mark certain load locations
for monitoring alias speculation failures
Do not mark SR bits for regular loads
Add ISA extensions to manipulate and check SR and
SW bits to do alias checks

8
Extensions to the ISA(for Checkpointing)
already supported

begin_atomic PC / end_atomic / abort_atomic
Starts / ends / aborts atomic region
PC is the address of the Safe-Version of atomic
region
atomic region code without speculative
optimizations
abort_atomic jumps to Safe-Version after rollback

9
Extensions to the ISA(for Alias Checking)
newly added

load.add.sr r1, addr
Loads location addr to r1 just like a regular
load
Marks SR bit in cache line containing addr
Used for marking monitored loads
clear.sr addr
Clears SR bit in cache line containing addr
Used to mark end of load monitoring
store.chk.(sr / sw / srsw) addr, r1
Stores r1 to location addr just like a regular
store
sr If SR bit is set, atomic region is aborted
sw If SW bit is set, atomic region is aborted

10
How are these Instructions Used?

Instrumentation goals
Minimize alias checking instruction overhead
Allow alias checks on a subset of accesses in AR
A single AR can enable multiple optimizations
Each code motion involves only a subset of
accesses
Two cases of code motion that involve alias
checks
Moving (hoisting) loads
Moving (sinking) stores

11
Code Motion 1 Hoisting Loads
begin_atomic load.add.sr a store.chk.sr x store
y end_atomic
begin_atomic store x load a store y end_atomic
begin_atomic load.add.sr a store.chk.sr
x clear.sr a store y end_atomic
clear.sr a

Assume a may-alias with x and y
Hoist load a above store x and setup monitoring
of a
store.chk.sr x will rollback AR on alias check
failure
Sink clear.sr a to end of AR (if possible)
store y will not trigger rollback on alias with a
Now clear.sr a can be removed

Can selectively check against stores in path of
code motion
(Often) no instruction overhead for checking

12
Code Motion 2 Sinking Stores
begin_atomic store a load x store y end_atomic
begin_atomic load.add.sr x store
y store.chk.srsw a end_atomic

Assume a may-alias with x and y
Sink store a below load x and store y
Alias with x is checked when SR bits are checked
in store.chk.srsw a
Alias with y is checked when SW bits are checked
in store.chk.srsw a

Can selectively check only loads in path of code
motion
Must check against all previous stores in atomic
region
Because SW bits cannot be set selectively

13
Illustrative Example LICM and GVN
// a,b may alias with p,q,s. // p,q,s may
alias with each // other. for(i0 i lt 100 i)
a b 10 p q 20 s q 20
// PC points to the original loop begin_atomic
PC for(i0 i lt 100 i) a b 10 p
q 20 s q 20 end_atomic

Put atomic region around loop
Perform optimizations after inserting appropriate
checks

14
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2 begin_atomic PC ld.add.sr r1, b r2 r1
10 for(i0 i lt 100 i) store a, r2
store.chk.sr p, q 20 store s, q
20 clear.sr b end_atomic

Put atomic region around loop
Perform optimizations after inserting appropriate
checks
Hoist b 10 (LICM)

15
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 clear.sr q store s, r4 clear.sr
b end_atomic

Put atomic region around loop
Perform optimizations after inserting appropriate
checks
Hoist b 10 (LICM)
Eliminate 2nd q 20 (GVN)

16
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 store s, r4 clear.sr q clear.sr
b end_atomic

Put atomic region around loop
Perform optimizations after inserting appropriate
checks
Hoist b 10 (LICM)
Eliminate second c i (GVN)
Sink clear.sr q

17
Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) ld.add.sr r3,
q r4 r3 20 store.chk.sr p, r4 store
s, r4 store.chk.srsw a, r2 clear.sr
q clear.sr b end_atomic

Put atomic region around loop
Perform optimizations after inserting appropriate
checks
Hoist b 10 (LICM)
Eliminate second c i (GVN)
Sink clear.sr q
Sink a r1 (LICM)

Checked needlessly but is fine since it does
not alias with a
18
Where should we Place Atomic Regions?

We chose to focus on loops
Where most of the execution time is spent
Loops provide ample range for opts such as LICM
or PRE to perform large scale redundancy
elimination
Can amortize cost of atomic region
instrumentation over multiple iterations for a
given optimization
When loops can potentially overflow speculation
resources, loops are blocked into nested
sub-loops appropriately

19
Memory Consistency Issues

In a multiprocessor system, disabling conflict
checks on speculative read lines can change
access ordering
Stores commit out of order at the end of an
atomic region even when loads read values from
remote processors
Conventionally, this causes a rollback
Not a problem in reality
Compiler code motion cause access re-orderings
anyway.
If it is legal for the compiler to re-order, it
is legal for HW
If it was illegal for the compiler to re-order
(e.g. due to synchronization), the atomic region
would not be placed there

20
Compiler Toolchain

Run loop blocking pass that uses loop footprint
estimation
Run application instrumented with alias check
instructions to profile how many Atomic Region
aborts a particular speculation would have
caused.
Run Atomic Region instrumentation pass for loops
that would benefit according to a cost-benefit
model and the abort profile information.
Run modified optimization passes (e.g. LICM, PRE,
GVN) that perform the code movements deemed
beneficial by the cost-benefit model. Insert
appropriate alias checks.

21
Experimental Setup

Compare three environments using LICM and GVN/PRE
optimizations
BaselineAA
Unmodified LLVM-2.8 using basic alias analysis
Default alias analysis used by O3 optimization
DSAA
Unmodified LLVM-2.8 using data structure alias
analysis
Experimental alias analysis with high time/space
complexity
LAS
Modified LLVM-2.8 using loop-based alias
speculation
Applications
SPEC INT2006, SPEC FP2006
Simulation
SESC with Pin-based front end with Atomic Region
support
32KB 8-way associative speculative L1 cache w/
64B lines

22
Alias Analysis Results

Breakdown of alias analysis results when run with
LICM pass
LAS is able to convert almost all may-aliases to
no-aliases using profile information

23
Speedups

Speedups normalized to BaselineAA

24
Atomic Region Characterization

Low L1 cache occupancy due to not buffering
speculatively read lines
Overhead amortized over large atomic region

25
Summary

Proposed exposing HW Atomic Region alias checking
primitive to SW using ISA extensions
Proposed loop-based Atomic Region instrumentation
To maximize speculation opportunity
To minimize instrumentation overhead
Proposed an alias speculation framework
leveraging Atomic Regions and evaluated using
LICM and GVN/PRE
May-alias results 56 ? 4 SPECINT2006, 43 ? 1
SPECFP2006
Speedup 3 for SPECINT2006, 9 for SPECFP2006

Write a Comment

User Comments (0)

About PowerShow.com

Alias Speculation using Atomic Regions PowerPoint PPT Presentation