A comparison of CC-SAS, MP and SHMEM on SGI Origin2000 - PowerPoint PPT Presentation

About This Presentation
Title:

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

Description:

Allocate all data involved in communication in shared address space. Reduce SYNC time ... about those applications that indeed have irregular, unpredictable and ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 20
Provided by: xin57
Learn more at: https://cs.login.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A comparison of CC-SAS, MP and SHMEM on SGI Origin2000


1
A comparison of CC-SAS, MP and SHMEM on SGI
Origin2000
2
Three Programming Models
  • CC-SAS
  • Linear address space for shared memory
  • MP
  • Communicate with other processes explicitly via
    message passing interface
  • SHMEM
  • Via get and put primitives

3
Platforms
  • Tightly-coupled multiprocessors
  • SGI Origin2000 a cache-coherent distributed
    shared memory machine
  • Less tightly-coupled clusters
  • A cluster of workstations connected by ethernet

4
Purpose
  • Compare the three programming models on
    Origin2000, a modern 64-processor hardware
    cache-coherent machine
  • We focus on scientific applications that access
    data regularly or predictably.

5
Questions to be answered
  • Can parallel algorithms be structured in the same
    way for good performance in all three models?
  • If there are substantial differences in
    performance under three models, where are the key
    bottlenecks?
  • Do we need to change the data structures or
    algorithms substantially to solve those
    bottlenecks?

6
Applications and Algorithms
  • FFT
  • All-to-all communication(regular)
  • Ocean
  • Nearest-neighbor communication
  • Radix
  • All-to-all communication(irregular)
  • LU
  • One-to-many communication

7
Performance Result
8
question
  • Why MP is much worse than CC-SAS and SHMEM?

9
Analysis
  • Execution time BUSY LMEM RMEM SYNC
  • where
  • BUSY CPU computation time
  • LMEM CPU stall time for local cache miss
  • RMEM CPU stall time for sending/receiving remote
    data
  • SYNC CPU time spend at synchronization events

10
Where does the time go in MP?
11
Improving MP performance
  • Remove extra data copy
  • Allocate all data involved in communication in
    shared address space
  • Reduce SYNC time
  • Use lock-free queue management instead in
    communication

12
Speedups under Improved MP
13
Why does CC-SAS perform best?
14
Why does CC-SAS perform best?
  • Extra packing/unpacking operation in MP and SHMEM
  • Extra packet queue management in MP

15
Speedups for Ocean
16
Speedups for Radix
17
Speedups for LU
18
Conclusions
  • Good algorithm structures are portable among
    programming models.
  • MP is much worse than CC-SAS and SHMEM under
    hardware-coherent machine. However, we can
    achieve similar performance if extra data copy
    and queue synchronization are well solved.
  • Something about programmability

19
Future work
  • How about those applications that indeed have
    irregular, unpredictable and naturally
    fine-grained data access and communication
    patterns?
  • How about software-based coherent machines (i.e.
    clusters)?
Write a Comment
User Comments (0)
About PowerShow.com