Yasmin Shared memory programming - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Yasmin Shared memory programming

Description:

Common pitfalls and caveats. Conclusion and Outlook. SCI shared memory ... SCI pitfalls and caveats. Read access to remote memory is slow. Use sci_memcpy(), not ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 18
Provided by: ennore
Category:

less

Transcript and Presenter's Notes

Title: Yasmin Shared memory programming


1
YasminShared memory programming
  • Enno Rehling
  • Universität Paderborn

2
Overview
  • Shared memory programming on SCI
  • The Yasmin library
  • Common pitfalls and caveats
  • Conclusion and Outlook

3
SCI shared memory
  • SCI can do more than message passing.
  • The shared memory programming model is more
    intuitive.
  • Its more efficient, too SCI offers hardware
    support for distributed shared memory
    programming.
  • Remote memory and local memory have very
    different properties.

4
SCI shared memory
  • Memory from remote nodes is mapped into the PCI
    address space
  • All processes can handle local and remote mapped
    memory in the same way

Process B
Process A
PCI
physical
PCI
physical
5
Performance Data
  • Bandwidth
  • writing at 65 MB/sec
  • reading at 1.7 MB/sec
  • Latency
  • 4.7 µsec (MPI, zero byte ping-pong)
  • 2.7 µsec (Yasmin, 4 byte ping-pong)
  • Benchmarks were done on outdated hardware.

6
Yasmin Shared memory API
  • Developed at Paderborn
  • API layer on top of the SCI driver
  • Runs on Linux but not Solaris
  • Reliable, fast and extensively tested

7
Distributed segments
  • User creates and exports distributed segments
  • void foo(size_t block_size, sci_group_p group)
  • int procs sci_get_groupsize(group)
  • sci_distr_seg_p segment
  • void base NULL
  • size_t sizes malloc(procs
    sizeof(size_t))
  • for (int i 0 i ! procs i) sizesi
    block_size
  • sci_create_distr_seg(group, base, sizes,
    segment)
  • free(sizes)
  • return base

8
Consistent view
  • Same view of shared segments for each process
  • void bar(sci_group_p group, size_t size)
  • int rank sci_get_rank(group)
  • int procs sci_get_groupsize(group)
  • void base foo(sizeprocs, group)
  • void msg basesize(procsrank)
  • do_some_work(msg, size)
  • for (int i0i!procsi) if (i!rank)
  • memcpy(basesize(iprocsrank), msg, size)
  • sci_barrier(group)
  • return base sizerankprocs

9
Allocating memory
  • Allocation in both local and remote segments
  • void foobar(char s, sci_group_p group, char
    msg)
  • int rank sci_get_rank(group)
  • int procs sci_get_groupsize(group)
  • int error sci_heap_create(4096)
  • int next (rank1)procs, prev
    (rankprocs-1)procs
  • char dst (char)sci_heap_malloc(rank1,
    strlen(s)1)
  • msgnext strcpy(dst, s)
  • sci_barrier(group)
  • printf("message from d s\n", prev,
    msgrank)
  • sci_heap_free(msgrank)
  • Allocation in both local and remote segments
  • void foobar(char s, sci_group_p group, char
    msg)
  • int rank sci_get_rank(group)
  • int procs sci_get_groupsize(group)
  • int error sci_heap_create(4096)
  • int next (rank1)procs, prev
    (rankprocs-1)procs
  • char dst (char)sci_heap_malloc(rank1,
    strlen(s)1)
  • msgnext strcpy(dst, s)
  • while (msgrankNULL)
  • printf("message from d s\n", prev,
    msgrank)
  • sci_heap_free(msgrank)

10
What else?
  • Synchronization primitives
  • synchronization barriers
  • mutexes
  • reader/writer locks
  • condition variables
  • Group operations
  • create static subgroups
  • all functions work on subgroups

11
Startup mechanism
  • Hostfile
  • contains list of hostnames to run program on
  • defines number of processes per node on SMP
  • Processes are created using rsh or ssh
  • output returned to shell or into file
  • easy debugging

12
Debugging SCI Applications
13
Profiling
  • Diploma Thesis at Paderborn
  • Dynamic programm analysis
  • Gathers information about access patterns of SCI
    programs, helps identify performance bottlenecks

14
SCI pitfalls and caveats
  • Read access to remote memory is slow.
  • Use sci_memcpy(), not memcpy. MMX instructions
    boost performance by a factor of 2
  • Sometimes, access to remote memory can fail.
  • Hardware problems lead to errors
  • Driver functions can lead to errors

15
SCI pitfalls and caveats
  • Awkward consistency model
  • void awkward(volatile int base)
  • base 7
  • sci_flush()
  • base 2
  • fprintf(stdout, "d\n", base) // undefined
  • Programming without explicit knowledge of memory
    layout is never efficient

16
Conclusions
  • Yasmin greatly simplifies use of shared segments
  • Raw performance plus easier development
  • Transparent shared memory programming on SCI is
    not yet achieved

17
Outlook
  • Integration with CCS
  • Yasmin goes open source
  • Inclusion in SuSE cluster CD
Write a Comment
User Comments (0)
About PowerShow.com