Speculative Parallelization of Partially Parallel Loops - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Speculative Parallelization of Partially Parallel Loops

Description:

Required information is not available at compile-time. ... Reapply the LRPD test on the remaining iterations. Blocked scheduling. Worst case ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 24

Provided by: Quix6

Category:

Tags: loops | parallel | parallelization | partially | reapply | speculative

Transcript and Presenter's Notes

Title: Speculative Parallelization of Partially Parallel Loops

1
Speculative Parallelization of Partially Parallel
Loops

Francis Dang and Dr. Lawrence Rauchwerger
Department of Computer Science,
Texas AM University
http//www.cs.tamu.edu/people/fhd4244
http//www.cs.tamu.edu/faculty/rwerger

2
Motivation

Static compiler methods cannot always extract all
the parallelism in loops because
Access patterns are too complex.
Required information is not available at
compile-time.
Can use run-time methods to parallelize more
loops.

3
Partially Parallel Loops

Loops can be
Fully parallel (doall)
Fully sequential
Partially parallel
Partially parallel loops
Not all iterations can be executed independently
May still have enough parallelism to exploit

4
Partially Parallel Loops Example

do i 1, 8
z AKi
ALi z Ci
enddo
K15 1,2,3,1,4,2,1,1
L15 4,5,5,4,3,5,3,3

5
Related Work

Inspector/Executor (Saltz, Zhu and Yew,
Rauchwerger, Mellor-Crummey, et al.)
Advantage Works well for partially parallel
loops
Disadvantages
A proper side-effect free inspector is not always
available.
Can require large additional data structures
LRPD Test (Rauchwerger)
Advantage Works well for fully parallel loops
Disadvantage Slowdown is proportional to the
speculative parallel execution time.

6
Recursive LRPD

Main Idea
Transform a partially parallel loop into a
sequence of fully parallel loops.
Iterations before the first data dependence are
correct and committed.
Reapply the LRPD test on the remaining
iterations.
Blocked scheduling
Worst case
Sequential time plus testing overhead.

7
Recursive LRPD Algorithm
8
Recursive LRPD Implementation

Implemented with our run-time pass in Polaris and
hand-inserted code.
Privatization for arrays under test
Replicated buffers for reductions.
Checkpoint shared arrays.
Record memory references during execution.

9
Recursive LRPD Example

do i 1, 8
z AKi
ALi z Ci
enddo

K15 1,2,3,1,4,2,1,1 L15
4,5,5,4,2,5,3,3
10
Work Redistribution

Redistribute remaining iterations across all
processors.
Advantage
Execution time for each stage will decrease.
Disadvantages
May uncover new dependences across processors.
May have remote cache misses from data
redistribution.

11
Work Redistribution Illustration
With Redistribution
Without Redistribution
12
Work Redistribution Example

do i 1, 8
z AKi
ALi z Ci
enddo

K15 1,2,3,1,4,2,1,1 L15
4,5,5,4,2,5,3,3
13
Redistribution Model

Redistribution may not always be beneficial.
Stop redistribution if
The cost of data redistribution outweighs the
benefit from redistribution.
Used a synthetic loop to model this adaptive
method.

14
Redistribution Model
15
Experiments

Setup
16 processor HP V-Class
4 GB memory
HP-UX 11.0
Codes and Loops

16
Input Profile
17
Experimental Results
18
Experimental Results
19
Experimental Results
20
Experimental Results
21
Issues

May have load imbalance due to blocked
scheduling.
Checkpointing can be expensive.
Work redistribution
May uncover more dependencies.
May cause remote cache misses from data
redistribution.

22
Feedback Guided Block Scheduling

Use the timing information from the previous
instantiation (Bull, EuroPar 98).
Estimate the processor chunk sizes for minimal
load imbalance.

23
Conclusion

Contributions
Can speculatively parallelize any loop.
Concern is now not when to parallelize but
optimizing the parallelization.
Future Work
Use feedback-guided block scheduling to minimize
load imbalance.
Decrease run-time overhead.
Use dependence distribution information for
adaptive redistribution and scheduling.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops PowerPoint PPT Presentation

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops - parasol.tamu.edu | PowerPoint PPT presentation | free to view

PARLab Parallel Boot Camp Sources of Parallelism and Locality in Simulation PowerPoint PPT Presentation

PARLab Parallel Boot Camp Sources of Parallelism and Locality in Simulation - Title: EECS 252 Graduate Computer Architecture Lec XX - TOPIC Last modified by: Roxana Infante Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming PowerPoint PPT Presentation

Introduction to Parallel Programming - Title: The IC Wall Collaboration between Computer science + Physics Last modified by: bal Document presentation format: Custom Other titles: Times New Roman Arial ... | PowerPoint PPT presentation | free to view

Alias Speculation using Atomic Regions PowerPoint PPT Presentation

Alias Speculation using Atomic Regions - Title: Speculative Shared-Memory Architectures Subject: Parallel Computing Author: Jos Fernando Mart nez Last modified by: Wonsun Ahn Created Date | PowerPoint PPT presentation | free to view

The Parallel Computing Landscape: A View from Berkeley PowerPoint PPT Presentation

The Parallel Computing Landscape: A View from Berkeley - ... and to allow software full access to hardware within partition * Partitions and Fast Barrier ... Technology Curriculum for 21st ... Patterns Breaking through ... | PowerPoint PPT presentation | free to view

Parallel (and Distributed) Computing Overview PowerPoint PPT Presentation

Parallel (and Distributed) Computing Overview - Parallel (and Distributed) Computing Overview Chapter 1 Motivation and History * Current Status Strategy 2 (extend languages) is most popular Augment existing ... | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming - Given two N x N matrices A and B. Compute C = A x B. Cij = Ai1B1j ... if x not present = factor 2 ... 50] = factor 1. 3. if A[51] = x = factor 51 ... | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming PowerPoint PPT Presentation

Introduction to Parallel Programming - blocking: wait until message has arrived (like a fax) ... Each cpu gets one row (striping ) Execute one (outer-loop) iteration at a time ... | PowerPoint PPT presentation | free to view

Introduction to Parallel Programming - Don't know in advance which data we need to access. Parallel ... In cases 3 and 4 the parallel program does less work = negative search overhead. Discussion ... | PowerPoint PPT presentation | free to view

Runtime Parallelization: Its Time Has Come PowerPoint PPT Presentation

Runtime Parallelization: Its Time Has Come - Insufficiently defined access patterns reduce possibility of exploiting benefits ... transform the do loop into a doall and enclose the access in a critical region ... | PowerPoint PPT presentation | free to view

Principles of Parallel Algorithm Design PowerPoint PPT Presentation

Principles of Parallel Algorithm Design - A given problem may be docomposed into tasks in many different ways. Tasks may be of same, different, or even interminate sizes. ... | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 12 PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 12 - Title: Perspective on Parallel Programming Author: David E. Culler Last modified by: EECS Created Date: 1/29/1999 12:18:59 AM Document presentation format | PowerPoint PPT presentation | free to view

Parallel (and Distributed) Computing Overview PowerPoint PPT Presentation

Parallel (and Distributed) Computing Overview - Parallel (and Distributed) Computing Overview Chapter 1 Motivation and History Outline Motivation Modern scientific method Evolution of supercomputing Modern parallel ... | PowerPoint PPT presentation | free to view

The Future of Vector Processors PowerPoint PPT Presentation

The Future of Vector Processors - Cross-Pollination of Vector/Superscalar/VLIW. MMX, Embedded... Idiom recognition. IF conversion. Vector parallelization. Kyoto, May 28th. 1999. 22 ... | PowerPoint PPT presentation | free to view

Hardware-Software Trade-offs in Synchronization PowerPoint PPT Presentation

Hardware-Software Trade-offs in Synchronization - 'A parallel computer is a collection of processing elements that cooperate and ... High-level language advocates want ... Swap, Exch. Fetch&op. Compare&swap ... | PowerPoint PPT presentation | free to view

Why parallel computing matters and what makes it hard PowerPoint PPT Presentation

Why parallel computing matters and what makes it hard - e.g., billiard balls, semiconductor device simulation, galaxies ... Only simulate up to (and including) the minimum time stamp of inputs ... | PowerPoint PPT presentation | free to view

HPC Parallel Programming: From Concept to Compile PowerPoint PPT Presentation

HPC Parallel Programming: From Concept to Compile - Parallel Random Access Machine the PRAM model. result, specialist, agenda - RSA model ... Cosmology and astrophysics. Computational fluid dynamics and turbulence ... | PowerPoint PPT presentation | free to view

The Next Four Orders of Magnitude in Parallel PDE Simulation Performance http:www.math.odu.edukeyest PowerPoint PPT Presentation

The Next Four Orders of Magnitude in Parallel PDE Simulation Performance http:www.math.odu.edukeyest - The Next Four Orders of Magnitude in. Parallel PDE Simulation Performance ... One to two orders of magnitude can be gained by catching up to the clock, and by ... | PowerPoint PPT presentation | free to view

The SGI Pro64 Compiler Infrastructure PowerPoint PPT Presentation

The SGI Pro64 Compiler Infrastructure - loop body, hammock region, etc. Hyperblock formation algorithm ... Hammock regions. Innermost loops. General regions (path based) Paths sorted by priorities (freq. ... | PowerPoint PPT presentation | free to view

Parallel Programming Models PowerPoint PPT Presentation

Parallel Programming Models - Levels of parallelsim-Tasks- threads- data flow- multiple processor issues ... Package of parallel utilities (PUL) on top pf MPI. Part for parallel I/O ... | PowerPoint PPT presentation | free to view

The Future of Vector Processors PowerPoint PPT Presentation

The Future of Vector Processors - The Future of Vector Processors M. Valero, R. Espasa and J. Corbal UPC, Barcelona Kyoto, May 28th, 1999 | PowerPoint PPT presentation | free to view

Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) PowerPoint PPT Presentation

Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) - Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) | PowerPoint PPT presentation | free to view

Membrane Computing in the Connex Environment PowerPoint PPT Presentation

Membrane Computing in the Connex Environment - The Ubiquitousness of Parallelism Asks for Integral Parallel ... Intel's approach. Multi-processors: the best approach for multi-threading on MIMD architecture ... | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 12 Vector Processing (Con - Graduate Computer Architecture Lecture 12 Vector Processing (Con t) Branch Prediction John Kubiatowicz Electrical Engineering and Computer Sciences | PowerPoint PPT presentation | free to view

EECS 252 Graduate Computer Architecture Lec 16 PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 16 - Lec 16 Papers, MP Future Directions, and Midterm Review David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 5: Instruction-Level Parallelism (Part 2) - Instructions fetched and decoded into instruction. reorder buffer in-order ... Next PC determined before branch fetched and decoded. 2k-entry direct-mapped BTB ... | PowerPoint PPT presentation | free to view

An Embedded CoherentMultithreading Multimedia Processor and Its Programming Model PowerPoint PPT Presentation

An Embedded CoherentMultithreading Multimedia Processor and Its Programming Model - Swap. Working set (Wx) means those tasks can parallel execute. 11. Mapping into application ... Die photo: 4.71x4.70 mm2 (chip) 4.02x4.01 mm2 (core) 180Mhz, ... | PowerPoint PPT presentation | free to view