The Potential of TraceLevel Parallelism in Java Programs presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Potential of TraceLevel Parallelism in Java Programs

1
The Potential of Trace-Level Parallelism in Java
Programs

Borys J. Bradel
Tarek S. Abdelrahman
University of Toronto
Principles and Practices of Programming in Java
September 7th 2007

2
Motivation

Gap exists between hardware and software
Hardware
Majority of computer chips contain multiple cores
Athlon X2, Core 2 Duo, Power5, Cell, Niagara
Software
Writing parallel software is difficult
Bridging the gap may lead to better utilization
of hardware and therefore improved performance

3
Automatic Parallelization

Traditional compile time
Perform analysis at compile time
Divide program based on analysis
Limited success
Runtime
New approach to automatic parallelization is
needed
Combine analysis with runtime information
What information to use?
Trace-Based
Our solution is to use traces

3
4
How successful can using traces be?

We answer this question by simulating trace
execution
monitor a programs execution
simulate the execution of traces in parallel
Measure a practical upper-bound on parallelism
not an accurate measurement of performance

5
Outline

Traces
Execution Model
Simulation Platform
Experimental Evaluation
Conclusion

6
Trace Definition

A trace is a frequently executed sequence of
unique basic blocks or instructions
Identified by a trace collection system at runtime

public static int foo() int a0 for (int
i0iltni) ai return a
7
Benefits

Source code is not required
Granularity of parallelism can vary
Traces simplify control flow and analysis
Traces are simple to identify

8
Execution Model
parallel
sequential
CFG
Method
9
Dependence Communication
Method
Dependences limit parallelism

ai
10
Dependence Communication
Different types of communication
Instruction-Instruction
Trace-Trace
i4
i4
ai
Communication Delay
Trace-Instruction
ai
i4
ai
11
Requirements

Java Virtual Machine
Execute bytecode
Interpreted or compiled
Trace Collection System
monitor control flow
create traces

JVM
Code Execution
control flow
TCS
12
Parallel Identification Engine

Records memory information
Keeps track of dependences
Ignore instructions that read and write to the
same variablee.g. dependence between i and
itself is ignored
Schedules instructions
Instruction Window
Communication
Processor Count

JVM
Code Execution
control flow
instruction info
traces
13
Scheduling
Record trace information when traces execute
sequentially Schedule when instruction window
is full
Schedule
Schedule
14
Schedule around Dependences
4 processors 12 traces per window

Dependent traces are scheduled far enough apart
to have correct execution

15
Speedup

Ratio
Cycles aggregated all scheduled traces on
parallel system
Cycles over all scheduled traces on one processor
system
Each trace executes sequentially on one processor
A cycle represents the write of one memory
location

ai i
B1
2 cycles
if (iltn) goto B1
B2
16
Experimental Evaluation

Jupiter Patrick Doyle
RedSpot Borys Bradel
Modified Critical-Path Min-You Wu scheduler
Benchmarks
Java Grande Section 3
SPECjvm98

17
Effect of Window Size
18
Effect of Communication Cost
19
Effect of Communication Type
20
Effect of Processor Count
21
Conclusion

How successful can using traces be?
Built simulator to measure parallel execution of
traces
Traces have the potential to be used to
parallelize programs
Some benchmarks do not scale well
Some benchmarks scale very well
Most benchmarks have at least 2x speedup on four
processors
Future work create a system that performs
trace-based parallelization

22
Jupiter and RedSpot
Interpreter emulate a0 emulate i0 emulate goto
B2 call RedSpot emulate if (iltn)
goto B1 call RedSpot emulate ai emulate
i emulate if (iltn) goto B1 call RedSpot
Trace 1
emulate ai emulate i emulate if (iltn) goto B1
call RedSpot
23
Parallel Identification Engine
Interpreter emulate if (iltn) goto B1 call
RedSpot call PIE emulate ai
call PIE emulate
i
call PIE emulate if (iltn) goto B1 call
RedSpot call PIE emulate ai
call PIE emulate
i
call PIE emulate if (iltn) goto B1 call
RedSpot call PIE
call call PIE for each instruction and each
memory access
24
Processor Count
Maximum number of processors limits performance
2 processors
25
Scheduling Window
Can only schedule a limited number of tracesat a
time
4 traces per window

Write a Comment

User Comments (0)

About PowerShow.com

The Potential of TraceLevel Parallelism in Java Programs PowerPoint PPT Presentation