CS 584 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 584

Description:

Most problems have several parallel solutions which may ... to the surface of the subdomain. Computation is proportional to the volume of the subdomain. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 65
Provided by: quinn5
Category:
Tags: subdomain

less

Transcript and Presenter's Notes

Title: CS 584


1
CS 584
2
Designing Parallel Algorithms
  • Designing a parallel algorithm is not easy.
  • There is no recipe or magical ingredient
  • Except creativity
  • We can benefit from a methodical approach.
  • Framework for algorithm design
  • Most problems have several parallel solutions
    which may be totally different from the best
    sequential algorithm.

3
PCAM Algorithm Design
  • 4 Stages to designing a parallel algorithm
  • Partitioning
  • Communication
  • Agglomeration
  • Mapping
  • P C focus on concurrency and scalability.
  • A M focus on locality and performance.

4
PCAM Algorithm Design
  • Partitioning
  • Computation and data are decomposed.
  • Communication
  • Coordinate task execution
  • Agglomeration
  • Combining of tasks for performance
  • Mapping
  • Assignment of tasks to processors

5
(No Transcript)
6
Partitioning
  • Ignore the number of processors and the target
    architecture.
  • Expose opportunities for parallelism.
  • Divide up both the computation and data
  • Can take two approaches
  • domain decomposition
  • functional decomposition

7
Domain Decomposition
  • Start algorithm design by analyzing the data
  • Divide the data into small pieces
  • Approximately equal in size
  • Then partition the computation by associating it
    with the data.
  • Communication issues may arise as one task needs
    the data from another task.

8
Domain Decomposition
  • Evaluate the definite integral.

0
1
9
Split up the domain
0
1
10
Split up the domain
0
1
11
Split up the domain
Now each task simply evaluates the integral in
their range.
All that is left is to sum up each task's
answer for the total.
0
1
0.5
0.25
0.75
12
Domain Decomposition
  • Consider dividing up a 3-D grid
  • What issues arise?
  • Other issues?
  • What if your problem has more than one data
    structure?
  • Different problem phases?
  • Replication?

13
(No Transcript)
14
(No Transcript)
15
Functional Decomposition
  • Focus on the computation
  • Divide the computation into disjoint tasks
  • Avoid data dependency among tasks
  • After dividing the computation, examine the data
    requirements of each task.

16
Functional Decomposition
  • Not as natural as domain decomposition
  • Consider search problems
  • Often functional decomposition is very useful at
    a higher level.
  • Climate modeling
  • Ocean simulation
  • Hydrology
  • Atmosphere, etc.

17
Partitioning Checklist
  • Define a LOT of tasks?
  • Avoid redundant computation and storage?
  • Are tasks approximately equal?
  • Does the number of tasks scale with the problem
    size?
  • Have you identified several alternative
    partitioning schemes?

18
Communication
  • The information flow between tasks is specified
    in this stage of the design
  • Remember
  • Tasks execute concurrently.
  • Data dependencies may limit concurrency.

19
Communication
  • Define Channel
  • Link the producers with the consumers.
  • Consider the costs
  • Intellectual
  • Physical
  • Distribute the communication.
  • Specify the messages that are sent.

20
Communication Patterns
  • Local vs. Global
  • Structured vs. Unstructured
  • Static vs. Dynamic
  • Synchronous vs. Asynchronous

21
Local Communication
  • Communication within a neighborhood.

Algorithm choice determines communication.
22
Global Communication
  • Not localized.
  • Examples
  • All-to-All
  • Master-Worker

1
5
2
3
7
23
Avoiding Global Communication
  • Distribute the communication and computation

1
5
2
3
7
1
3
10
13
5
3
7
2
1
24
Divide and Conquer
  • Partition the problem into two or more
    subproblems
  • Partition each subproblem, etc.
  • Results in structured nearest neighbor
    communication pattern.

25
Structured Communication
  • Each tasks communication resembles each other
    tasks communication
  • Is there a pattern?

26
Unstructured Communication
  • No regular pattern that can be exploited.
  • Examples
  • Unstructured Grid
  • Resolution changes
  • Complicates the next stages of design

27
Synchronous Communication
  • Both consumers and producers are aware when
    communication is required
  • Explicit and simple

t 1
t 2
t 3
28
Asynchronous Communication
  • Timing of send/receive is unknown.
  • No pattern
  • Consider very large data structure
  • Distribute among computational tasks (polling)
  • Define a set of read/write tasks
  • Shared Memory

29
Problems to Avoid
  • A centralized algorithm
  • Distribute the computation
  • Distribute the communication
  • A sequential algorithm
  • Seek for concurrency
  • Divide and conquer
  • Small, equal sized subproblems

30
Communication Design Checklist
  • Is communication balanced?
  • All tasks about the same
  • Is communication limited to neighborhoods?
  • Restructure global to local if possible.
  • Can communications proceed concurrently?
  • Can the algorithm proceed concurrently?
  • Find the algorithm with most concurrency.
  • Be careful!!!

31
Agglomeration
  • Partition and Communication steps were abstract
  • Agglomeration moves to concrete.
  • Combine tasks to execute efficiently on some
    parallel computer.
  • Consider replication.

32
Agglomeration Goals
  • Reduce communication costs by
  • increasing computation
  • decreasing/increasing granularity
  • Retain flexibility for mapping and scaling.
  • Reduce software engineering costs.

33
Changing Granularity
  • A large number of tasks does not necessarily
    produce an efficient algorithm.
  • We must consider the communication costs.
  • Reduce communication by
  • having fewer tasks
  • sending less messages (batching)

34
Surface to Volume Effects
  • Communication is proportional to the surface of
    the subdomain.
  • Computation is proportional to the volume of the
    subdomain.
  • Increasing computation will often decrease
    communication.

35
How many messages total? How much data is sent?
36
How many messages total? How much data is sent?
37
Replicating Computation
  • Trade-off replicated computation for reduced
    communication.
  • Replication will often reduce execution time as
    well.

38
Summation of N Integers
s sum b broadcast
How many steps?
39
Using Replication (Butterfly)
40
Using Replication
Butterfly to Hypercube
41
Avoid Communication
  • Look for tasks that cannot execute concurrently
    because of communication requirements.
  • Replication can help accomplish two tasks at the
    same time, like
  • Summation
  • Broadcast

42
Preserve Flexibility
  • Create more tasks than processors.
  • Overlap communication and computation.
  • Don't incorporate unnecessary limits on the
    number of tasks.

43
Agglomeration Checklist
  • Reduce communication costs by increasing
    locality.
  • Do benefits of replication outweigh costs?
  • Does replication compromise scalability?
  • Does the number of tasks still scale with problem
    size?
  • Is there still sufficient concurrency?

44
Mapping
  • Specify where each task is to operate.
  • Mapping may need to change depending on the
    target architecture.
  • Mapping is NP-complete.

45
Mapping
  • Goal Reduce Execution Time
  • Concurrent tasks ---gt Different processors
  • High communication ---gt Same processor
  • Mapping is a game of trade-offs.

46
Mapping
  • Many domain-decomposition problems make mapping
    easy.
  • Grids
  • Arrays
  • etc.

47
Mapping
  • Unstructured or complex domain decomposition
    based algorithms are difficult to map.

48
Other Mapping Problems
  • Variable amounts of work per task
  • Unstructured communication
  • Heterogeneous processors
  • different speeds
  • different architectures
  • Solution LOAD BALANCING

49
Load Balancing
  • Static
  • Determined a priori
  • Based on work, processor speed, etc.
  • Probabilistic
  • Random
  • Dynamic
  • Restructure load during execution
  • Task Scheduling (functional decomp.)

50
Static Load Balancing
  • Based on a priori knowledge.
  • Goal Equal WORK on all processors
  • Algorithms
  • Basic
  • Recursive Bisection

51
Basic
  • Divide up the work based on
  • Work required
  • Processor speed

ö
æ

ç
p

i
R
r

ç
å
i
p

ç
ø
è
i
52
Recursive Bisection
  • Divide work in half recursively.
  • Based on physical coordinates.

53
Dynamic Algorithms
  • Adjust load when an imbalance is detected.
  • Local or Global

54
Task Scheduling
  • Many tasks with weak locality requirements.
  • Manager-Worker model.

55
Task Scheduling
  • Manager-Worker
  • Hierarchical Manager-Worker
  • Uses submanagers
  • Decentralized
  • No central manager
  • Task pool on each processor
  • Less bottleneck

56
Mapping Checklist
  • Is the load balanced?
  • Are there communication bottlenecks?
  • Is it necessary to adjust the load dynamically?
  • Can you adjust the load if necessary?
  • Have you evaluated the costs?

57
PCAM Algorithm Design
  • Partition
  • Domain or Functional Decomposition
  • Communication
  • Link producers and consumers
  • Agglomeration
  • Combine tasks for efficiency
  • Mapping
  • Divide up the tasks for balanced execution

58
Example Atmosphere Model
  • Simulate atmospheric processes
  • Wind
  • Clouds, etc.
  • Solves a set of partial differential equations
    describing the fluid behavior

59
Representation of Atmosphere
60
Data Dependencies
61
Partition Communication
62
Agglomeration
63
Mapping
64
Mapping
Write a Comment
User Comments (0)
About PowerShow.com