A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

About This Presentation

Title:

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

Description:

Allocate all data involved in communication in shared address space. Reduce SYNC time ... about those applications that indeed have irregular, unpredictable and ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 20

Provided by: xin57

Learn more at: https://cs.login.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

1
A comparison of CC-SAS, MP and SHMEM on SGI
Origin2000
2
Three Programming Models

CC-SAS
Linear address space for shared memory
MP
Communicate with other processes explicitly via
message passing interface
SHMEM
Via get and put primitives

3
Platforms

Tightly-coupled multiprocessors
SGI Origin2000 a cache-coherent distributed
shared memory machine
Less tightly-coupled clusters
A cluster of workstations connected by ethernet

4
Purpose

Compare the three programming models on
Origin2000, a modern 64-processor hardware
cache-coherent machine
We focus on scientific applications that access
data regularly or predictably.

5
Questions to be answered

Can parallel algorithms be structured in the same
way for good performance in all three models?
If there are substantial differences in
performance under three models, where are the key
bottlenecks?
Do we need to change the data structures or
algorithms substantially to solve those
bottlenecks?

6
Applications and Algorithms

FFT
All-to-all communication(regular)
Ocean
Nearest-neighbor communication
Radix
All-to-all communication(irregular)
LU
One-to-many communication

7
Performance Result
8
question

Why MP is much worse than CC-SAS and SHMEM?

9
Analysis

Execution time BUSY LMEM RMEM SYNC
where
BUSY CPU computation time
LMEM CPU stall time for local cache miss
RMEM CPU stall time for sending/receiving remote
data
SYNC CPU time spend at synchronization events

10
Where does the time go in MP?
11
Improving MP performance

Remove extra data copy
Allocate all data involved in communication in
shared address space
Reduce SYNC time
Use lock-free queue management instead in
communication

12
Speedups under Improved MP
13
Why does CC-SAS perform best?
14
Why does CC-SAS perform best?

Extra packing/unpacking operation in MP and SHMEM
Extra packet queue management in MP

15
Speedups for Ocean
16
Speedups for Radix
17
Speedups for LU
18
Conclusions

Good algorithm structures are portable among
programming models.
MP is much worse than CC-SAS and SHMEM under
hardware-coherent machine. However, we can
achieve similar performance if extra data copy
and queue synchronization are well solved.
Something about programmability

19
Future work

How about those applications that indeed have
irregular, unpredictable and naturally
fine-grained data access and communication
patterns?
How about software-based coherent machines (i.e.
clusters)?

Write a Comment

User Comments (0)