Title: Hybrid Analysis
1Hybrid Analysis
for Loop-level Parallelization
Kiran Kumar vkirankr_at_iitk.ac.in
2Outline
- Core of Hybrid Analysis.
- Terminology Revision.
- Motivating Example.
- Working of Hybrid Analysis.
- Execution of Hybrid Analyzed Program.
- Experimental Results.
3Hybrid Analysis (relative defn)
1. Core of Hybrid Analysis.
Conservative Compiler Analysis
Run-time overhead
Hybrid Analysis
Compile-time overhead
4Aggressiveness of Hybrid Analysis
1. Core of Hybrid Analysis.
Any cross-iteration dependency?
Dependence distance is not considered
Conservative Compiler Analysis
No
Yes
May be
Generate Sequential Code
Generate Parallel Code
May be
Any cross-iteration dependency?
Yes
No
Hybrid Analysis
5Loop Parallelization
2. Terminology Revision.
- Loops are major source for parallelization.
- One loop iteration can be one thread.
- Thread granularity - Number of iterations in a
thread.
Thread Dependence
Thread Execution
Sequential Loop
Thread Extraction
T1
T2
T1
T2
for(i1 ilt3 i) ..
T3
T3
6Types of Data Dependence
2. Terminology Revision.
X .. .. X
.. X X ..
X .. X ..
Flow
Anti
Output
Privatization
Cross-iteration Dependency
DO j1, 50 a(j) a(j40) ENDDO
7Compile-time Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
8Weakness of Compile-time Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
9Run-time Analysis LRPD Test
3. Motivating Example.
Speculative Parallelism
Time Complexity O(n)
Source http//parasol.tamu.edu/
10Hybrid Analysis
3. Motivating Example.
Time Complexity O(1)
Source http//parasol.tamu.edu/
11Hybrid Analysis
3. Motivating Example.
Source http//parasol.tamu.edu/
124-step process for Hybrid Analysis
4. Working Procedure of Hybrid Analysis.
- Collect references as Expression tree.
- Aggregate references symbolically.
- Formulate independence test.
- Extract lowest-cost runtime test.
13Sample tracing of Hybrid Analysis
4. Working Procedure of Hybrid Analysis.
Source http//parasol.tamu.edu/
14Expression Tree Grammar
4. Working Procedure of Hybrid Analysis.
1. Collect references as Expression tree.
USR RT_LMAD
Sample Trace of Expr Tree
15Array symbolic Aggregation
4. Working Procedure of Hybrid Analysis.
2. Aggregate references symbolically.
X
41 40n
j 1 to n
j 40
- Triplet Notation.
- LMAD Notation.
X(lb1 ub1 incr1, lb2 ub2 incr2. .. ,)
for(i1 iltn i) for(j1 jltm j)
.. Aij
LMAD Notation
Triplet Notation.
16Independence Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
- DS Dependence Set (All loop carried
dependences) - DS DS - Dependency elimination by applying
transformations (privatization, reduction, etc) - Case (DS Ø) Of
- True Generate Parallel Loop
- False Generate Sequential Loop
- Maybe ( not sure at compile time )
- Extract condition P (DS Ø)
- Generate Parallel Loop guarded by P
17Independence Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
- R set all addresses that are read in an
iteration. - W set all addresses that are write in an
iteration.
?
Wi
Ri
?
n
DS
i1..n
i1..n
DS
DS ? (
? (
? (
Wi
n
Wj ))
i1..n
j1..i
- RO, WF, RW sets based dependence
Pre-elimination - R, W sets based dependence Post-elimination
18RO, WF, RW sets based Test
4. Working Procedure of Hybrid Analysis.
3. Formulate independence test.
- RO set all memory locations only read (not
written). - WF set all memory locations that are written
first and then possibly read and written. - RW set all memory locations that are read first
and written later.
ROj ))
? (
? (
DS
WFi
n
i1..n
j1..i
RWj ))
? (
? (
DS DS
? (
WFi
n
i1..n
j1..i
DS
DS ? (
RWj ))
? (
? (
RWi
n
i1..n
j1..i
Memory Set Aggregation
19Proof System
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Source http//parasol.tamu.edu/
20PDAG extraction from RT_LMAD
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Sample Trace of PDAG extraction
21Pattern based Analysis
4. Working Procedure of Hybrid Analysis.
4. Extract lowest-cost runtime test.
Empty?
n
n lt 40
1 n
41 40n
- Programmer sorts the tests in ascending order of
complexity. - Programmer defines a set of code-patterns for
each test. - Compiler checks for patterns in given program and
generates min cost condition.
22Lowest-cost runtime test
5. Execution of Hybrid Analyzed Program.
Scalar value condition
Min. Cost Runtime test
Unsatisfied
Satisfied
Greater Cost Runtime test
Vector Inspection
Unsatisfied
Satisfied
LRPD test
Max Cost Runtime test
Satisfied
Unsatisfied
Speculative Parallelism
Execute Parallel version
23Experimental Results
- Code Coverage.
- Speedup for ADM.
- Speedup for DYFESM.
- Speedup for MDG.
- Speedup for TRACK.
- Speedup w.r.t Multi-cores.
24References
- Paper - Hybrid Analysis Static Dynamic Memory
Reference Analysis. - Paper - Hybrid Dependence Analysis for Automatic
Parallelization. - PhD Thesis Inter-procedural Parallelization
Using Memory Classification Analysis. - Lectures - http//web.cse.iitk.ac.in/cs738/
- Textbook - Advanced Compiler Design
Implementation. - Textbook - Compilers Principles, Techniques and
Tools.
Ref
25Thank you
Questions
26Privatization Technique
- This technique removes output and anti
dependencies.
A expr1 .. A . A expr2 .. A
t1 expr1 .. t1 . t2 expr2 .. t2 A
t2
Privatization
Go Back
27Speculative Parallelism
- All threads read current value or speculate
required value. - All threads execute with values available to
them. - Threads commit in sequential order.
- Before commit their values each threads checks
the value in memory with speculated value. - If speculation is correct then thread commits
else it rollbacks.
for(i1 ilt3 i)
T1
T2
T3
Core 1
Core 2
Core 3
T1
T2
T3
1
A
A
1
6
A
2
5
8
6
Speculated value
1
2
6
8
A
Go Back
Main Memory
28- Dependence Set Ø condition is necessary because
we consider in this paper only DOALL
parallelization (no synchronizations). - Ref Page 7 of Report 2
Go Back
29- We have changed its name from RT LMAD (run-time
lmad) to USR because it is used mainly as an
intermediate representation subject to our
predicate extraction analysis. - Ref Page 5 of Report 2
Go Back
30Sample Tracing of Expr Tree
- aexpr
- Loop(j 1 to 10)
- If(a gt 5)
- B(j5) ..
- End If
- End Loop
?
a
X
j 1 to 10
j5
a gt 5
Go Back
31Triplet Notation
Precise when dim. indexing is independent
- for(i0 ilt2 i)
- for(j0 jlt3 j)
- aij ..
A(0 2 1, 0 3 1)
Imprecise when dim. indexing is dependent
for(i0 ilt2 i) for(j0 jlt3 j)
aiij ..
A(0 2 1, 3 1)
32LMAD Notation
Address computation i10 (ji)
11ij Eliminate j variable Stride for j
11i(j1) (11ij) 1 Span for j 11i3
(11i0) 1 4 Offset for j 11i0
11i Eliminate i variable Stride for i
11(i1) 11i 11 Span for i 112 110 1
23 Offset for i 110 0
for(i0 ilt2 i) for(j0 jlt3 j)
aiij ..
Let size of A be A310 LMAD A(1, 4, 11,
23)
Go Back
33Memory set aggregation
RO, WF, RW
Section 1
RO1, WF1, RW1
Section 1
RO2, WF2, RW2
Go Back
34Sample Trace of PDAG extraction
?
n
Empty?
Empty?
n
n
?
1 n
41 40n
1 n
41 40n
1 n
x gt 0
21 20n
x gt 0
21 20n
?
?
?
n lt 40
(nlt20 ? xgt0) ? nlt40
?
n lt 40
x gt 0
Go Back
n lt 20
x gt 0
21 20n
41 40n
35Vector inspection example
- Main()
- Read C(1N), L, lim
- Do i1 to L
- If C(I) lt lim then
- C(I) ..
- End If
- .. C(I)
- End Do
Go Back
36LRPD Test Example
- Main()
- Read C(1N), A(1N), L
- Do i1 to L
- C(A(I)) ..
- .. C(I)
- End Do
Go Back
37Speculative Execution example
- Main()
- Read C(1N), A(1N), L
- Do i1 to L
- A(i-1)
- C(A(I)) ..
- .. C(I)
- End Do
Go Back
38Code Coverage
Source http//parasol.tamu.edu/
39Speedup for ADM Benchmark
Source http//parasol.tamu.edu/
40Speedup for DYFESM Benchmark
Source http//parasol.tamu.edu/
41Speedup for MDG
Source http//parasol.tamu.edu/
42Speedup for TRACK
Go Back
Source http//parasol.tamu.edu/
43Advanced Compiler Architecture
- Common Subexpr elim.
- Copy propagation
- Dead code elimination
- Code motion
- Strength reduction
- Constant folding
Go Back