Title: Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations
1Evaluating the Impact of Thread Escape Analysis
on Memory Consistency Optimizations
- Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo
Lee, Samuel P. Midkiff, Jaejin Lee and David
Padua - University of Illinois at Urbana-Champaign
- IBM T.J. Watson Research Center
- Purdue University
- Seoul National University
2Outline
- Memory Models
- The Pensieve System
- Escape Analyses
- Qualitative Impact of Escape Analyses on Delay
Set Analysis and Synchronization Analysis - Experimental Results
- Conclusion
3Memory Models
- Consider the following code segments
- Thread 1 data 100 data_ready true
- Thread 2 while (!data_ready) t data
- Can t 0?
- Yes if reordering happens
- Thread 1 data_ready true data 100
- Can be done by compiler and hardware
- Memory models tell us the answer
- Sequential Consistency says no
4Objective of the Pensieve Project
- Sequential consistency (SC) on top of Intel x86
memory models - Implementation based on Jikes RVM
- All analyses done in JIT time
- Need to minimize both analysis and application
execution time
5Enforcing SC
- Done by enforcing memory accesses orders
- not all orderings need to be enforced
- only enforce orders really needed
- Delay Set Analysis (DSA) SS88 computes such
orders - Our approach Approximation of DSA
- Orders enforced by inserting fences in generated
code
6Original DSA
- Program edge
- x executes before y in the same thread
- Conflict edge
- x and x conflict accesses
- Order of access affects program outcome
- In this paper
- to the same memory location
- one of them is a write
7Original DSA (Contd)
- Critical cycle
- Minimal
- Cannot form smaller cycle using subset of nodes
- Mixed
- Contains both edges
- Enforce program edges on a critical cycle
8Approximate DSA
- Approximate of critical cycle
- x precedes y
- Conflict accesses for
- x and x
- y and y
- y precedes x
- Enforce program edges on approx critical cycle
x
y
y
x
9The Pensieve System
10Escape Analyses
- Identify objects which may be accessed by two or
more threads - Output set of variables
- v v points to an object may be accessed by gt
2 threads
11Impact on Delay Set Analysis
- x, y, y, x must be escaping accesses
- Cannot form a cycle if one of them is not
escaping access - Fewer escaping accesses implies fewer possible
pairs of (x,y) - Fewer checks to be done
- Fewer delays
y
y
x
12Impact on Synchronization Analysis
- Synchronization analysis reduces number of
conflict edges considered by DSA - Consider synchronized construct
- Calls to start() and join()
- Our system only consider t1.join()
- if it can match some t2.start() call
- t1 and t2 are not escaping
- More precise escape info
- more join() calls matched
- more precise DSA result
13Escape Analyses Comparison
- In this study, we compare 4 algorithms
- Connectivity Analysis (Pensieve)
- Field Base Analysis (Pensieve)
- For comparison purposes
- Bogdas Analysis
- Removing Unnecessary Synchronization in Java.
(OOPSLA 1999) - Rufs Analysis
- Effective Synchronization Removal for Java. (PLDI
2000)
14Connectivity Escape Analysis
- An object is escaping if both
- Reachable by more than one thread due to two
possible cases - Reachable by a static field
- Passed from a thread constructor
- Accessed by more than one thread
- Do not assume this escaping in run() by default
- Field insensitive for most memory accesses
- I.e. do not distinguish x.f vs x.g
- Except accesses to Runnable objects
15Field Base Escape Analysis
- An object is escaping if
- Reachable from a static field
- Passed from a thread constructor
- Do not assume this escaping in run() by default
- Similar to connectivity base analysis,
- Field sensitive
- Suppose O1, O2 of same type
- O1.f different from O1.g
- O1.f same as O2.f
16Bogdas Escape Analysis
- An object is escaping if it is reachable
- By a static field
- By a Runnable object
- Via more than 1 field reference
17Rufs Escape Analysis
- An object is escaping if both
- Reachable from either
- A static field or
- A Runnable object
- Synchronized by more than one thread
- Adapted for our own use
- synchronized ? accessed
18Experimental Settings (Machine)
- Intel (Dell PowerEdge 6600 SMP)
- 4 Intel hyperthreaded 1.5Ghz Xeon processors
- with 1MB cache each
- 6G system memory.
19Experimental Settings (Software)
- Original
- default Jikes RVM implementation
- base case for performance comparison
- Enforcing SC
- Empty
- Arg Escaping
- Connectivity analysis
- Field-base analysis
- Bogdas analysis (bogda)
- Rufs analysis
20Measurements
- Escape Analysis Time
- Impact on Delay Set Analysis Time
- Impact on Synchronization Analysis Time
- Slowdown due to fence insertion
- Delay Set Analysis only
- Delay Set Analysis with Synchronization Analysis
21Escape Analysis Time
22Impact on Delay Set Analysis Time
23Impact on Synchronization Analysis Time
24EscapeDSA Synchronization Analysis Time /
Compilation Time
25Slowdown (DSA Only)
26Slowdown (DSASync Analysis)
27Slowdown of connect (DSASync Analysis)
28Conclusions
- Evaluate interaction between escape analysis and
synchronization/delay set analysis - Montecarlo and jbb motivates enabling field
sensitivity for connectivity base analysis
29Backup Slides Follow
30Number of Delay Checks Performed
31Total Compilation Time
32Number of Delays Found (DSA Only)
33Number of Delays Found (DSA Sync Analysis)