Parallel%20Programming%20in%20C%20with%20MPI%20and%20OpenMP - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel%20Programming%20in%20C%20with%20MPI%20and%20OpenMP

Description:

Definition unique to Quinn (see Pg 43) Multiple asynchronous CPU's with a common ... dynamic task allocation chosen, the task allocator (i.e., manager) is not a ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 26
Provided by: kents90
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel%20Programming%20in%20C%20with%20MPI%20and%20OpenMP


1
Parallel Programmingin C with MPI and OpenMP
  • Michael J. Quinn

2
Chapter 3
  • Parallel Algorithm Design

3
Outline
  • Task/channel model
  • Algorithm design methodology
  • Case studies

4
Task/Channel Model
  • Parallel computation set of tasks
  • Task
  • Program
  • Local memory
  • Collection of I/O ports
  • Tasks interact by sending messages through
    channels

5
Task/Channel Model
6
Multiprocessors
  • Definition unique to Quinn (see Pg 43)
  • Multiple asynchronous CPUs with a common shared
    memory.
  • Usually called a
  • shared memory multiprocessors or
  • shared memory MIMDs
  • An example is
  • the symmetric multiprocessor (SMP)
  • Also called a centralized multiprocessor
  • Quinn feels his terminology is more logical.

7
Multicomputer
  • Definition unique to Quinn (See pg 49)
  • Multiple CPUs with local memory that are
    connected together.
  • Connection can be by interconnection network,
    bus, ether net, etc.
  • Usually called a
  • Distributed memory multiprocessor or
  • Distributed memory MIMD
  • Quinn feels his terminology is more logical

8
Fosters Design Methodology
  • Partitioning
  • Communication
  • Agglomeration
  • Mapping

9
Fosters Methodology
10
Partitioning
  • Dividing computation and data into pieces
  • Domain decomposition
  • Divide data into pieces
  • Determine how to associate computations with the
    data
  • Functional decomposition
  • Divide computation into pieces
  • Determine how to associate data with the
    computations

11
Example Domain Decompositions
12
Example Functional Decomposition
13
Partitioning Checklist
  • At least 10x more primitive tasks than processors
    in target computer
  • Minimize redundant computations and redundant
    data storage
  • Primitive tasks roughly the same size
  • Number of tasks an increasing function of problem
    size

14
Communication
  • Determine values passed among tasks
  • Local communication
  • Task needs values from a small number of other
    tasks
  • Create channels illustrating data flow
  • Global communication
  • Significant number of tasks contribute data to
    perform a computation
  • Dont create channels for them early in design

15
Communication Checklist
  • Communication operations balanced among tasks
  • Each task communicates with only small group of
    neighbors
  • Tasks can perform communications concurrently
  • Task can perform computations concurrently

16
Agglomeration
  • Grouping tasks into larger tasks
  • Goals
  • Improve performance
  • Maintain scalability of program
  • Simplify programming
  • In MPI programming, goal often to create one
    agglomerated task per processor

17
Agglomeration Can Improve Performance
  • Eliminate communication between primitive tasks
    agglomerated into consolidated task
  • Combine groups of sending and receiving tasks

18
Agglomeration Checklist
  • Locality of parallel algorithm has increased
  • Replicated computations take less time than
    communications they replace
  • Data replication doesnt affect scalability
  • Agglomerated tasks have similar computational and
    communications costs
  • Number of tasks increases with problem size
  • Number of tasks suitable for likely target
    systems
  • Tradeoff between agglomeration and code
    modifications costs is reasonable

19
Mapping
  • Process of assigning tasks to processors
  • Centralized multiprocessor mapping done by
    operating system
  • Distributed memory system mapping done by user
  • Conflicting goals of mapping
  • Maximize processor utilization
  • Minimize interprocessor communication

20
Mapping Example
21
Optimal Mapping
  • Finding optimal mapping is NP-hard
  • Must rely on heuristics

22
Mapping Decision Tree
  • Static number of tasks
  • Structured communication
  • Constant computation time per task
  • Agglomerate tasks to minimize communications
  • Create one task per processor
  • Variable computation time per task
  • Cyclically map tasks to processors
  • Unstructured communication
  • Use a static load balancing algorithm
  • Dynamic number of tasks

23
Mapping Decision Tree (cont.)
  • Static number of tasks
  • Dynamic number of tasks
  • Frequent communications between tasks
  • Use a dynamic load balancing algorithm
  • Many short-lived tasks
  • Use a run-time task-scheduling algorithm

24
Mapping Checklist
  • Considered designs based on one task per
    processor and multiple tasks per processor
  • Evaluated static and dynamic task allocation
  • If dynamic task allocation chosen, the task
    allocator (i.e., manager) is not a bottleneck to
    performance
  • If static task allocation chosen, ratio of tasks
    to processors is at least 101

25
Case Studies
  • Boundary value problem
  • Finding the maximum
  • The n-body problem
  • Adding data input
Write a Comment
User Comments (0)
About PowerShow.com