John Cavazos - PowerPoint PPT Presentation

About This Presentation

Title:

John Cavazos

Description:

Title: John Cavazos Institute for Computing Systems Architecture Schoo Last modified by: John Cavazos Document presentation format: Custom Other titles – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 22

Provided by: udelEdu7

Learn more at: https://www.eecis.udel.edu

Category:

more less

Transcript and Presenter's Notes

Title: John Cavazos

1
Lecture 10 Patterns for Parallel Programming III

John Cavazos
Dept of Computer Information Sciences
University of Delaware
www.cis.udel.edu/cavazos/cisc879

2
Lecture 10 Overview

Cell B.E. Clarification
Design Patterns for Parallel Programs
Finding Concurrency
Algorithmic Structure
Organize by Tasks
Organize by Data
Supporting Structures

3
LS-LS DMA transfer (PPU)
rc spe_in_mbox_write(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) rc
spe_out_intr_mbox_read(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i) rc
spe_in_mbox_write(spei,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i)
pthread_join(ptsi,NULL)
spe_image_close(program) for (i 0 i lt
N i)
spe_context_destroy(spei)
return 0
int main() pthread_t ptsN
spe_context_ptr_t speN struct thread_args
t_argsN int i spe_program_handle_t
program program spe_image_open("../spu/he
llo") for (i 0 i lt N i)
spei spe_context_create(0,NULL)
spe_program_load(spei,program)
t_argsi.spe spei t_argsi.spuid
i pthread_create(ptsi,NULL, my_
spe_thread,t_argsi) void ls
spe_ls_area_get(spe1) unsigned int
mbox_data (unsigned int)ls printf
("mbox_data x\n", mbox_data) int rc
4
LS-LS DMA transfer (PPU)
rc spe_in_mbox_write(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) rc
spe_out_intr_mbox_read(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i) rc
spe_in_mbox_write(spei,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i)
pthread_join(ptsi,NULL)
spe_image_close(program) for (i 0 i lt
N i)
spe_context_destroy(spei)
return 0
int main() pthread_t ptsN
spe_context_ptr_t speN struct thread_args
t_argsN int i spe_program_handle_t
program program spe_image_open("../spu/he
llo") for (i 0 i lt N i)
spei spe_context_create(0,NULL)
spe_program_load(spei,program)
t_argsi.spe spei t_argsi.spuid
i pthread_create(ptsi,NULL, my_
spe_thread,t_argsi) void ls
spe_ls_area_get(spe1) unsigned int
mbox_data (unsigned int)ls printf
("mbox_data x\n", mbox_data) int rc
5
LS-LS DMA transfer (PPU)
rc spe_in_mbox_write(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) rc
spe_out_intr_mbox_read(spe0,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i) rc
spe_in_mbox_write(spei,
mbox_data, 1,
SPE_MBOX_ALL_BLOCKING) for (i
0 i lt N i)
pthread_join(ptsi,NULL)
spe_image_close(program) for (i 0 i lt
N i)
spe_context_destroy(spei)
return 0
int main() pthread_t ptsN
spe_context_ptr_t speN struct thread_args
t_argsN int i spe_program_handle_t
program program spe_image_open("../spu/he
llo") for (i 0 i lt N i)
spei spe_context_create(0,NULL)
spe_program_load(spei,program)
t_argsi.spe spei t_argsi.spuid
i pthread_create(ptsi,NULL, my_
spe_thread,t_argsi) void ls
spe_ls_area_get(spe1) unsigned int
mbox_data (unsigned int)ls printf
("mbox_data x\n", mbox_data) int rc
6
LS-LS DMA transfer (SPU)
int main() gettimeofday(tv,NULL)
printf("spu lld t.tv_usec ld\n",
spuid,tv.tv_usec) if (spuid 0)
unsigned int ea
unsigned int tag 0
unsigned int mask 1
ea spu_read_in_mbox()
printf("ea p\n",(void)ea)
mfc_put(tv,ea
(unsigned int)tv,
sizeof(tv),tag,1,0)
mfc_write_tag_mask(mask)
mfc_read_tag_status_all()
spu_write_out_intr_mbox(0)
spu_read_in_mbox() printf("spu lld
tv.tv_usec ld\n",
spuid,tv.tv_usec) return 0
7
LS-LS Output
-bash-3.2 ./a.out spu 0 t.tv_usec 875360 spu
1 t.tv_usec 876446 spu 2 t.tv_usec
877443 spu 3 t.tv_usec 878459 mbox_data
f7764000 ea 0xf7764000 spu 0 tv.tv_usec
875360 spu 1 tv.tv_usec 875360 spu 2
tv.tv_usec 877443 spu 3 tv.tv_usec 878459
8
Organize by Data

Operations on core data structure
Geometric Decomposition
Recursive Data

9
Geometric Deomposition

Arrays and other linear structures
Divide into contiguous substructures
Example Matrix multiply
Data-centric algorithm and linear data structure
(array) implies geometric decomposition

10
Recursive Data

Lists, trees, and graphs
Structures where you would use divide-and-conquer
May seem that can only move sequentially through
data structure
But, there are ways to expose concurrency

11
Recursive Data Example

Find the Root Given a forest of directed trees
find the root of each node
Parallel approach For each node, find its
successors successor
Repeat until no changes
O(log n) vs O(n)

Slide Source Dr. Rabbah, IBM, MIT Course 6.189
IAP 2007
12
Organize by Flow of Data
Organize By Flow of Data
Regular
Irregular
Event-Based Coordination
Pipeline
13
Organize by Flow of Data

Computation can be viewed as a flow of data going
through a sequence of stages
Pipeline one-way predictable communication
Event-based Coordination unrestricted
unpredictable communication

14
Pipeline performance

Concurrency limited by pipeline depth
Balance computation and communication
(architecture dependent)
Stages should be equally computationally
intensive
Slowest stage creates bottleneck
Combine lightly loaded stages or decompose
heavily-loaded stages
Time to fill and drain pipe should be small

15
Supporting Structures

Single Program Multiple Data (SPMD)
Loop Parallelism
Master/Worker
Fork/Join

16
SPMD Pattern

Create single program that runs on each processor
Initialize
Obtain a unique identifier
Run the same program each processor
Identifier and input data can differentiate
behavior
Distribute data (if any)
Finalize

Slide Source Dr. Rabbah, IBM, MIT Course 6.189
IAP 2007
17
SPMD Challenges

Split data correctly
Correctly combine results
Achieve even work distribution
If programs require dynamic load balancing,
another pattern may be more suitable (Job Queue)

Slide Source Dr. Rabbah, IBM, MIT Course 6.189
IAP 2007
18
Loop Parallelism Pattern

Many programs expressed as iterative constructs
Programming models like OpenMP provide pragmas to
automatically assign loop iterations to processors

Slide Source Dr. Rabbah, IBM, MIT Course 6.189
IAP 2007
19
Master/Work Pattern
Slide Source Dr. Rabbah, IBM, MIT Course 6.189
IAP 2007
20
Master/Work Pattern