CS 584 Lecture 19 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 584 Lecture 19

Description:

Everybody does row FFT then column FFT. Parallel Composition ... Does our library need different algorithms? One Dimensional Decomposition ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 22
Provided by: quinn5
Category:

less

Transcript and Presenter's Notes

Title: CS 584 Lecture 19


1
CS 584 Lecture 19
  • Test?
  • Assignment
  • Glenda program
  • Project Proposal is coming up! (March 13)
  • 2 pages
  • 3 references
  • 1 page plan of action with goal dates.

2
Modular Programming
  • Control Program Complexity
  • Encapsulation
  • Each module provides an interface
  • Limit data access except through the interface.
  • Composition
  • Develop programs by combining modules
  • Reuse

3
Modular Design and Parallel Programming
  • Different than traditional sequential modular
    programming.
  • We must consider other issues
  • Data distribution
  • Module Composition

4
Data Distribution
  • No simple answer.
  • Data distribution changes may necessitate
    different module structures and vice-versa.
  • Best Solution
  • Design your code to be data distribution neutral
  • Not necessarily easy!
  • Different data distribution schemes sometimes
    dictate totally different algorithms.

5
Module Compostion
6
Sequential Composition
  • Sequentially move from one parallel module or
    operation to the next.
  • SPMD
  • Great target for parallel library functions
  • ScaLAPACK
  • Not necessarily very flexible.

7
Parallel Composition
  • Different parts of the computer execute different
    programs.
  • Can enhance scalability
  • locality
  • Can also decrease memory requirements
  • less code and data replication

8
Concurrent Composition
  • Components are data-driven.
  • More directly matches task-channel model
  • Since the components are data-driven overlap of
    communication and computation is easier.
  • Can simplify design decisions.

9
Communication
Computation
  • Overlap communication and computation
  • 2 basic methods
  • Send then compute then receive
  • Send ,post an asynchronous receive, compute
    something else until the receive completes.
  • Don't do send-receive pairs unless you must
  • receive-send pairs are the worst.

10
Case Study Image Processing
Data flow diagram for an image processing pipeline
11
FFT Algorithm Choices
  • FFTs are done by row and then column.
  • Sequential composition
  • Everybody does row FFT then column FFT
  • Parallel Composition
  • Some do row FFT others do column FFT

12
(No Transcript)
13
Performance Results
14
Case Study Matrix Multiply
  • Goal Data-distribution neutral
  • Three basic ways to distribute
  • row
  • column
  • submatrix
  • Question?
  • Does our library need different algorithms?

15
One Dimensional Decomposition
  • Each processor "owns" black portion
  • To compute the owned portion of the answer, each
    processor requires all of A.
  • This affects data-distribution.

16
Two Dimensional Decomposition
  • Requires less data per processor
  • Algorithm can be performed stepwise.

17
Broadcast an A sub- matrix to the other
processors in row. Compute Rotate the B
sub- matrix upwards
18
Analysis
  • Performance analysis reveals that the 2
    dimensional decomposition is always better.
  • So our matrix multiply only needs one algorithm
  • Might need redistribution algorithm to be totally
    data distribution neutral
  • However, this is not the best algorithm.

19
(No Transcript)
20
Systolic Matrix Multiply
  • Replace the A row broadcast with a rotation
    similar to the B column rotation.
  • Eliminates the expensive broadcast and replaces
    it with nearest neighbor comm.
  • Communication costs much less.
  • Changes data distribution.
  • Should we include it in a library?
  • Redistribution costs?

21
Conclusion
  • Modular design is good.
  • Parallelism introduces different issues
  • Data distribution
  • Module composition
  • Sequential composition easy but inflexible.
  • Parallel composition can improve locality.
  • Concurrent composition is most general.
Write a Comment
User Comments (0)
About PowerShow.com