Compactly Representing Parallel Program Executions - PowerPoint PPT Presentation

About This Presentation
Title:

Compactly Representing Parallel Program Executions

Description:

Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 27
Provided by: edus91
Category:

less

Transcript and Presenter's Notes

Title: Compactly Representing Parallel Program Executions


1
Compactly Representing Parallel Program Executions
  • Ankit Goel Abhik Roychoudhury Tulika Mitra
  • National University of Singapore

2
Path profiles
  • Profiling a programs execution
  • Count based
  • Path based
  • Count based profiles are more aggregate
  • of execution of the programs basic blocks
  • of accesses of various memory locations
  • Path based profiles are more accurate
  • Sequence of basic blocks executed
  • Sequence of memory locations accessed
  • Use Online compression to generate compact path
    profiles.

3
Organization
  • Compressed Path Profiles in Sequential Programs
  • Parallel Program Path Profiles
  • Compression Efficiency and Overheads
  • Data race detection over path profiles

4
Compressed Path - Example
Uncompressed Path 123123
1
Compressed Representation S ? AA A ? 123
2
3
Control Flow Graph
5
Online Path Compression
  • A program path is a string over a finite alphabet
  • Alphabet decided by what we instrument
  • Control flow (Basic Blocks executed)
  • Data flow (Memory Locations accessed)
  • A string s is represented by a Context Free
    Grammar Gs Language of Gs is s
  • Construction of Gs is online and not post-mortem
  • Start with trivial grammar modify it for each
    symbol
  • No recursive rules (DAG representation)
  • Compression scheme Nevill-Manning Witten 97
  • Application to program paths Larus 99

6
Online Compression in action
Path Executed Compressed Representation
1
S -gt 1
12
S -gt 12
123
S -gt 123
1231
S -gt 1231
12312
S -gt 12312
S -gt A3A A -gt 12
7
Online Compression in action
Path Executed Compressed Representation
S -gt A3A3 A -gt 12
123123
S -gt BB B -gt A3 A -gt 12
S -gt BB B -gt 123
8
Organization
  • Compressed Path Profiles in Sequential Programs
  • Parallel Program Path Profiles
  • Compression Efficiency and Overheads
  • Data race detection over path profiles

9
What to represent ?
  • Control/data flow in each program thread
  • Communication among threads
  • Synchronization (locks, barriers)
  • Unsynchronized shared variable accesses
  • Too costly to observe/record order of all shared
    variable accesses
  • We will represent
  • Compressed flow in each thread (via Grammar)
  • Communication via synchronizations (How ?)

10
Synchronization Pattern (Locks)
lock
Compute
Pgm P1 P2
unlock
lock
unlock
Memory
P1
P2
Message Sequence Chart (MSC)
11
Synchronization Pattern (Barrier)
ready
Pgm P1 P2
Blocked
ready
go
go
Compute
Compute
P1
P2
Memory
12
Connection to MSCs
Partial Order of MSC
  • Matches Observed Ordering
  • Total order in each thread
  • Ordering across threads visible via
    synchronization (msg. exchange)

unlock
lock
Th. 1
Th. 2
Shared Mem.
All synchronization ops. form a total order
13
A first cut
  • Instrument each thread to observe local
    control/data flow and global synch.
  • Represent path profile of P1 P2
  • Each threads flow as a Grammar (G1, G2)
  • Contains synch. ops. as well.
  • All synchronization ops. as a list.
  • Associate entries in this list to the occurrence
    of synch. ops. in (G1,G2)
  • How to navigate the path profile ?
  • Zoom in to a specific lockunlock segment of P1

14
Edge annotations
a b (lock) c (unlock) x b (lock) c (unlock) y
S
4
0
2
2
y
A
a
x
0
1
b
c
Grammar for one thread
15
Locating synch. operations
S
4
X
0
2
2
y
n
A
a
x
Y

0
1
b
c
n synch ops.
Locating the 3rd synchronization operation Can
find synch. segments by looking up global list.
16
So far
  • Control flow of each thread stored as a grammar
  • Synchronization ops. form a global list
  • Grammar of each thread annotated with counts
  • Easy searching of synchronization operations
  • What about shared data accesses ?
  • Sequence of memory locations accessed by a single
    LD/ST instruction can be compressed
  • Use a Grammar representation for this seq. as well

17
Further compression
  • Locations accessed by a memory operation
  • 10,14,18,22,26,54,58,62,66,70,98
  • Online Compression of the string as grammar
  • 10(1), 4(4), 28(1), 4(4), 28(1)
  • Difference representation Run-length encoding
  • Useful for detecting regularity of array accesses
  • Sweep through an array A run of constant diffs.
  • Accessing a sub-grid of a multidimensional array

18
Organization
  • Compressed Path Profiles in Sequential Programs
  • Parallel Program Path Profiles
  • Compression Efficiency and Overheads
  • Data race detection over path profiles

19
Any better than gzip ?
Compression (2 Processors)
20
Scalability of Compression
Compression for our scheme
21
Concerns about Timing Overheads
  • Our scheme does not add substantial time overhead
    over grammar based string compression
  • Our experiments conducted using RSIM
  • Tracing overheads can be higher in a real
    multiprocessor
  • Can tracing distort program behavior ?
  • Possible solution
  • Trace minimal number of operations in a parallel
    program execution (Netzer 1993) to ensure
    deterministic replay
  • Collect compressed path profile during replay.

22
Organization
  • Compressed Path Profiles in Sequential Programs
  • Parallel Program Path Profiles
  • Compression Efficiency and Overheads
  • Data race detection over path profiles

23
Apparent Data races
lock
  • Last unlock in Th. 1 (first unlock)
  • Next lock in Th. 1 (second lock)
  • Locate root-to-leaf paths of these ops.
  • Tree rooted at the least common ancestor of these
    ops.

unlock
lock
unlock
lock
unlock
lock
unlock
Th. 1
Th.2
Th.3
Mem.
No Decompression of the grammar of Th. 1
24
Data race artifacts
Sub 1 A1 0
X Sub Y AX
(artifact)
X decides which addr. is accessed in Y AX X
is set by Sub 1 which is also in a data
race. Detecting artifacts requires Data-flow Not
captured by rd/wr sets in synch.
segments Captured in our compact path profiles.
25
Summary
  • Compressed representation of the execution
    profile of shared memory parallel programs
  • Control and shared data flow per thread
  • Synchronization patterns across threads
  • Overall compression efficiency 0.25 -- 9.81
  • Compression efficiency scalable with increasing
    number of processors
  • Application Post-mortem debugging such as
    detecting data races

26
Other Applications
  • We do not capture actual order of unsynchronized
    shared memory accesses across processors
  • Can be useful in making architectural decisions
    such as choice of cache coherence protocol
  • Sufficient to maintain Netzer 1993
  • transitive reduction of program order on each
    proc.
  • shared variable conflict orders
  • Can we capture transitive reduction relation via
    annotations of WPP edges?
Write a Comment
User Comments (0)
About PowerShow.com