Cluster OpenMP Benchmark of 64-bit PC Cluster - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Cluster OpenMP Benchmark of 64-bit PC Cluster

Description:

8 Intel single-cpu PCs (3GHz P4, 2GB DDR, 80GB IDE, Gbps NIC) ... The hyper-threading technology of Intel CPU is the worst case. ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 2
Provided by: yaohua
Category:

less

Transcript and Presenter's Notes

Title: Cluster OpenMP Benchmark of 64-bit PC Cluster


1
Cluster OpenMP Benchmark of 64-bit PC Cluster
Yao-Huan Tseng Institute of Astronomy and
Astrophysics, Academia Sinica, ROC
Abstract Although the MPI has become
accepted as a portable style of parallel
programming, but has several significant
weaknesses that limit its effectiveness and
scalability. MPI in general is difficult to
program and doesn't support incremental
parallelization of an existing sequential
program. With the current trend in parallel
computer architectures towards clusters of shared
memory symmetric multi-processors (SMP), e.g.
dual-core, quad-core, dual dual-core. Clusters of
SMP (Symmetric Multi-Processors) nodes provide
support for a wide range of parallel programming
paradigms. The shared address space within each
node is suitable for OpenMP parallelization. I
have implemented a direct nbody program with
OpenMP and test on IAA PC Cluster to demonstrate
the feasibility and efficiency. The most benefit
is easy to use.
  • Hardware
  • 8 Intel single-cpu PCs (3GHz P4, 2GB DDR, 80GB
    IDE, Gbps NIC)
  • 4 AMD2 PCs (1.0GHz AMD dual core 5000, 2GB
    DDR2, 120GB IDE, Gbps NIC)
  • 4 dual AMD PCs (1.8GHz AMD Opetron244, 2GB DDR,
    120GB IDE, Gbps NIC)
  • 2 dual AMD2 PCs (2GHz Opteron270 dual core, 8GB
    DDR2, 120GB IDE, Gbps NIC)
  • 2 Intel dual-core PCs (3GHz Pentium, 2GB DDR,
    80GB IDE, Gbps NIC)
  • Software
  • a simple direct nbody program (particle number
    100,000)
  • OpenMP directives
  • OpenMP is a specification for a set of compiler
    directives, library routines, and environment
    variables that can be used to specify shared
    memory parallelism in Fortran and C/C programs.
  • Intel fortran compiler v9.1 with Cluster OpenMP
  • Cluster OpenMP is an implementation of OpenMP
    that can make use of multiple SMP machines
    without resorting to MPI.

node x thread 1 x 2 1 node, each one with 2
threads 2 x 1 2 nodes, each one with 1
threads 2 x 2 2 nodes, each one with 2
threads 4 x 1 4 nodes, each one with 1 thread 1
x 4 1 nodes, each one with 4 threads 4 x 2 4
nodes, each one with 2 threads 2 x 4 2 nodes,
each one with 4 threads
Fig6
  • Summary
  • Only six OpenMP directives added in the code, the
    efficiency of a direct nbody simulation can speed
    up factor of 7 with 8 AMD CPUs running
    simultaneously (Fig4.).
  • Although the frequency of AMD CPU is lower than
    Intel CPU, the performance per GHz is almost the
    same for different CPUs. Except the AMD2
    dual-core, AMD2 is about 3 times of the others
    (Fig6).
  • Because of limiting memory bandwidth, the
    efficiency of dual CPU is better than dual-core
    CPU (Fig2 and Fig4). The hyper-threading
    technology of Intel CPU is the worst case. Do not
    submit more than one thread at one single CPU.
  • In Linpack test, the performance of two CPUs is
    two times of one CPU for a dual AMD machine
    (Fig10). But the performance is only 1.01 for a
    Intel dual-core machine (Fig8).

2007/3/20
Write a Comment
User Comments (0)
About PowerShow.com