Title: Alias Speculation using Atomic Regions
1Alias Speculation using Atomic Regions
- (To appear at ASPLOS 2013)
- Wonsun Ahn, Yuelu Duan, Josep Torrellas
- University of Illinois at Urbana Champaign
2Disclaimer
- This talk is not about parallelism.
- This talk is about decreasing the amount of work
that needs to be done through better code
generation. - We want to do this by making the
software-hardware barrier more porous.
Assumptions
Compiler
Hardware
Information
3What prevents good code generation?
- Many popular optimizations require code motion
- Loop Invariant Code Motion (LICM) From the body
to the preheader of a loop - Redundancy elimination From the location of the
redundant computation to the first computation - Memory aliasing prevents code motion
r1 a b c a b
r1 a b r2 a b c r2
r1 a b r2 a b c r2
r1 a b r2 r1 c r2
r1 a b p c a b
r1 a b p r2 a b c r2
r1 a b r2 a b p c r2
4Alias Analysis is Difficult
- Alias analysis returns one of three results
- Must-Alias, No-Alias, May-Alias
- Accurate static analysis is fundamentally
difficult - Requires points-to analysis, heap modeling etc.
- Quickly becomes intractable in space/time
complexity - Alternative insert runtime checks
- Software checks
- Hardware checks (e.g. Itanium ALAT, Transmeta)
- We propose to leverage atomic regions to do
runtime checks and automatic recovery
5Background Atomic Regions (aka Transactions)
- Sections of code demarcated in software that are
either committed atomically on success or rolled
back on failure - Atomic regions are here and now
- Intel TSX, AMD ASF, IBM Bluegene/Q, IBM Power
- Originally to ease parallel programming but
again thats not what the talk is about today - Does two things well that software finds
difficult - Checkpointing to guarantee atomic commit of
transaction - Exposed to software through begin atomic, end
atomic - Memory alias detection to guarantee isolation of
transaction - Hidden from software
6Proposal Leverage Atomic Regions for Alias
Speculation
- Expose alias checking HW to SW through ISA
extensions - Use HW support for Atomic Regions to perform
alias speculation in a compiler for optimizations - Cover path of code motion in an Atomic Region
- Speculate may-aliases in code motion path are
no-aliases - Check speculated aliases using alias checking HW
- Recover from failure by rolling back to
checkpoint - Apply this to optimizations such as
- Loop Invariant Code Motion (LICM)
- Partial Redundancy Elimination (PRE)
- Global Value Numbering (GVN)
7Modifications to Atomic Regions
- Key insight
- Atomic regions maintain a read set and a write
set - Speculative Read (SR), Speculative Written (SW)
bits in speculative cache - Only SW bits are needed for checkpointing
- Repurpose SR bits to mark certain load locations
for monitoring alias speculation failures - Do not mark SR bits for regular loads
- Add ISA extensions to manipulate and check SR and
SW bits to do alias checks
8Extensions to the ISA(for Checkpointing)
already supported
- begin_atomic PC / end_atomic / abort_atomic
- Starts / ends / aborts atomic region
- PC is the address of the Safe-Version of atomic
region - atomic region code without speculative
optimizations - abort_atomic jumps to Safe-Version after rollback
9Extensions to the ISA(for Alias Checking)
newly added
- load.add.sr r1, addr
- Loads location addr to r1 just like a regular
load - Marks SR bit in cache line containing addr
- Used for marking monitored loads
- clear.sr addr
- Clears SR bit in cache line containing addr
- Used to mark end of load monitoring
- store.chk.(sr / sw / srsw) addr, r1
- Stores r1 to location addr just like a regular
store - sr If SR bit is set, atomic region is aborted
- sw If SW bit is set, atomic region is aborted
10How are these Instructions Used?
- Instrumentation goals
- Minimize alias checking instruction overhead
- Allow alias checks on a subset of accesses in AR
- A single AR can enable multiple optimizations
- Each code motion involves only a subset of
accesses - Two cases of code motion that involve alias
checks - Moving (hoisting) loads
- Moving (sinking) stores
11Code Motion 1 Hoisting Loads
begin_atomic load.add.sr a store.chk.sr x store
y end_atomic
begin_atomic store x load a store y end_atomic
begin_atomic load.add.sr a store.chk.sr
x clear.sr a store y end_atomic
clear.sr a
- Assume a may-alias with x and y
- Hoist load a above store x and setup monitoring
of a - store.chk.sr x will rollback AR on alias check
failure - Sink clear.sr a to end of AR (if possible)
- store y will not trigger rollback on alias with a
- Now clear.sr a can be removed
- Can selectively check against stores in path of
code motion - (Often) no instruction overhead for checking
12Code Motion 2 Sinking Stores
begin_atomic store a load x store y end_atomic
begin_atomic load.add.sr x store
y store.chk.srsw a end_atomic
- Assume a may-alias with x and y
- Sink store a below load x and store y
- Alias with x is checked when SR bits are checked
in store.chk.srsw a - Alias with y is checked when SW bits are checked
in store.chk.srsw a
- Can selectively check only loads in path of code
motion - Must check against all previous stores in atomic
region - Because SW bits cannot be set selectively
13Illustrative Example LICM and GVN
// a,b may alias with p,q,s. // p,q,s may
alias with each // other. for(i0 i lt 100 i)
a b 10 p q 20 s q 20
// PC points to the original loop begin_atomic
PC for(i0 i lt 100 i) a b 10 p
q 20 s q 20 end_atomic
- Put atomic region around loop
- Perform optimizations after inserting appropriate
checks
14Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2 begin_atomic PC ld.add.sr r1, b r2 r1
10 for(i0 i lt 100 i) store a, r2
store.chk.sr p, q 20 store s, q
20 clear.sr b end_atomic
- Put atomic region around loop
- Perform optimizations after inserting appropriate
checks - Hoist b 10 (LICM)
15Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 clear.sr q store s, r4 clear.sr
b end_atomic
- Put atomic region around loop
- Perform optimizations after inserting appropriate
checks - Hoist b 10 (LICM)
- Eliminate 2nd q 20 (GVN)
16Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) store a, r2
ld.add.sr r3, q r4 r3 20 store.chk.sr
p, r4 store s, r4 clear.sr q clear.sr
b end_atomic
- Put atomic region around loop
- Perform optimizations after inserting appropriate
checks - Hoist b 10 (LICM)
- Eliminate second c i (GVN)
- Sink clear.sr q
17Illustrative Example LICM and GVN
// a aliases with p,q // b aliases with p //
p,q,s aliases with each other for(i0 i lt
100 i) a b 10 p q 20 s
q 20
// PC points to the original loop register int
r1, r2, r3 begin_atomic PC ld.add.sr r1, b r2
r1 10 for(i0 i lt 100 i) ld.add.sr r3,
q r4 r3 20 store.chk.sr p, r4 store
s, r4 store.chk.srsw a, r2 clear.sr
q clear.sr b end_atomic
- Put atomic region around loop
- Perform optimizations after inserting appropriate
checks - Hoist b 10 (LICM)
- Eliminate second c i (GVN)
- Sink clear.sr q
- Sink a r1 (LICM)
Checked needlessly but is fine since it does
not alias with a
18Where should we Place Atomic Regions?
- We chose to focus on loops
- Where most of the execution time is spent
- Loops provide ample range for opts such as LICM
or PRE to perform large scale redundancy
elimination - Can amortize cost of atomic region
instrumentation over multiple iterations for a
given optimization - When loops can potentially overflow speculation
resources, loops are blocked into nested
sub-loops appropriately
19Memory Consistency Issues
- In a multiprocessor system, disabling conflict
checks on speculative read lines can change
access ordering - Stores commit out of order at the end of an
atomic region even when loads read values from
remote processors - Conventionally, this causes a rollback
- Not a problem in reality
- Compiler code motion cause access re-orderings
anyway. - If it is legal for the compiler to re-order, it
is legal for HW - If it was illegal for the compiler to re-order
(e.g. due to synchronization), the atomic region
would not be placed there
20Compiler Toolchain
- Run loop blocking pass that uses loop footprint
estimation - Run application instrumented with alias check
instructions to profile how many Atomic Region
aborts a particular speculation would have
caused. - Run Atomic Region instrumentation pass for loops
that would benefit according to a cost-benefit
model and the abort profile information. - Run modified optimization passes (e.g. LICM, PRE,
GVN) that perform the code movements deemed
beneficial by the cost-benefit model. Insert
appropriate alias checks.
21Experimental Setup
- Compare three environments using LICM and GVN/PRE
optimizations - BaselineAA
- Unmodified LLVM-2.8 using basic alias analysis
- Default alias analysis used by O3 optimization
- DSAA
- Unmodified LLVM-2.8 using data structure alias
analysis - Experimental alias analysis with high time/space
complexity - LAS
- Modified LLVM-2.8 using loop-based alias
speculation - Applications
- SPEC INT2006, SPEC FP2006
- Simulation
- SESC with Pin-based front end with Atomic Region
support - 32KB 8-way associative speculative L1 cache w/
64B lines
22Alias Analysis Results
- Breakdown of alias analysis results when run with
LICM pass - LAS is able to convert almost all may-aliases to
no-aliases using profile information
23Speedups
- Speedups normalized to BaselineAA
24Atomic Region Characterization
- Low L1 cache occupancy due to not buffering
speculatively read lines - Overhead amortized over large atomic region
25Summary
- Proposed exposing HW Atomic Region alias checking
primitive to SW using ISA extensions - Proposed loop-based Atomic Region instrumentation
- To maximize speculation opportunity
- To minimize instrumentation overhead
- Proposed an alias speculation framework
leveraging Atomic Regions and evaluated using
LICM and GVN/PRE - May-alias results 56 ? 4 SPECINT2006, 43 ? 1
SPECFP2006 - Speedup 3 for SPECINT2006, 9 for SPECFP2006