PARALLEL PROCESSING - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

PARALLEL PROCESSING

Description:

... physical address space of up to 2048 processors over a 3D torus interconnect. ... Torus links provide a raw bandwidth of 650 MBps in each direction to maintain ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 26
Provided by: gro70
Category:

less

Transcript and Presenter's Notes

Title: PARALLEL PROCESSING


1
PARALLEL PROCESSING
  • The NAS Parallel Benchmarks
  • Daniel Gross
  • Chen Haiout

2
NASA (NAS Devision)
3
NASA (NAS Devision) Aims
  • NASA Advanced Supercomputing Division
  • Develop, demonstrate, and deliver innovative
    computing capabilities to enable NASA projects
    and missions
  • Demonstrate by the next millennium an
    operational computing system capable of
    simulating, in one to several hours, an entire
    aerospace vehicle system throughout its mission
    and life cycle.

4
NPB Introduction
  • NAS Parallel Benchmarks suite (NPB) has been used
    widely to evaluate modern parallel systems
  • Measure objectively the performance of highly
    parallel computers and to compare their
    performance with that of conventional
    supercomputers
  • NPB is based on Fortran 77 and the MPI message
    passing standard
  • Consists of eight benchmark problems derived
    from important classes of Arophysics
    applications.

5
Benchmark Problems
  • EP Embarrassingly Parallel
  • IS Integer sort
  • CG Conjugate gradient
  • MG Multigrid method for Poisson eqn
  • FT Spectral method (FFT) for Laplace eqn
  • BT ADI Block-Tridiagonal systems
  • SP ADI Scalar Pentadiagonal systems
  • LU Lower-Upper symmetric Gauss-Seidel

6
  • The Embarrassingly Parallel Benchmark (EP)
  • In this benchmark, 2-dimensional statistics
    are accumulated from a large number of Gaussian
    pseudo-random numbers. This problem requires
    almost no communication, in some sense this
    benchmark provides an estimate of the upper
    achievable limits for floating-point performance
    on a particular system.
  • SP benchmark
  • It is called the scalar pentadiagonal (SP)
    benchmark. In this benchmark, multiple
    independent systems of non-diagonally dominant,
    scalar pentadiagonal equations are solved. A
    complete solution of the SP requires 400
    iteration.
  • MultiGrid (MG) Benchmark
  • MG uses a multigrid method to compute the
    solution of the three-dimensional scalar Poisson
    equation.
  • This code is a good test of both short and
    long distance highly structured communication.

7
  • 3-D FFT PDE (FT) Benchmark
  • FT contains the computational kernel of a
    three dimensional FFT-based spectral method.
  • BT Simulated CFD benchmark
  • BT solve systems of equations resulting from
    an approximately factored finite difference
    discretization of the Navier-Stokes equations.

8
(No Transcript)
9
Class Benchmarks
  • Since the 1991 specifications of NPB 1.0,
    computer speed and memory sizes have grown and
    correspondingly so have representative problem
    sizes.
  • NPB 1.0 specifies two problem sizes for each
    benchmark class A and a larger class B.
  • The class A benchmarks can now be run on a
    moderately
  • powerful workstation, and class B benchmarks on
    high-end workstations or small parallel systems.
  • To retain the focus on high-end supercomputing,
    we now add a class C for all of the NAS
    benchmarks.

10
Weakness Points
  • Implementations of the NAS Benchmarks are usually
    highly tuned by computer vendors
  • largest problems (class B) no longer reflect the
    largest problems being done on present-day
    supercomputers

11
Why 8 Different Benchmarks ?
12
(No Transcript)
13
Comparing World Wide Clusters
  • Loki and Hyglac
  • In September 1996 two medium-scale parallel
    systems called Loki and Hyglac were
    installed.
  • Each consisted of sixteen Pentium Pro (200
    MHz) PCs with 16 Mbytes of memory and 3.2 and 2.5
    Gbytes of disks per node, respectively. Each
    system was integrated using two fast Ethernet
    NICs in each node.
  • Both sites had performed a complex N-body
    gravitational simulation of 2 million particles
    using an advanced tree-code algorithm. Each of
    these systems achieved a sustained performance of
    1.19 Gflops and 1.26 Gflops, respectively. When
    the systems were connected together The same code
    was run again and achieved a sustained capability
    of over 2 Gflops without further optimization of
    the code for this new configuration.

14
  • Berkeley NOW
  • The hardware configuration of the Berkeley NOW
    (Network Of Workstation) system comprise 105 Sun
    Ultra 170 workstations connected by Myricom
    networks. Each node includes 167MHz Ultra 1
    microprocessor with 512 KB cache, 128 MB of RAM,
    two 2.3 GB disk space.

15
  • Cray T3E
  • The Cray T3E-1200 is a scalable shared-memory
    multiprocessor based on the DEC Alpha 21164
    microprocessor. It provides a shared physical
    address space of up to 2048 processors over a 3D
    torus interconnect. Each node of the system
    contains an Alpha 21164 processor each of which
    is capable of 1200 Mflops. The system logic runs
    at 75 MHz, and the processor runs at some
    multiple of this, such as 600 MHz for Cray
    T3E-1200. Torus links provide a raw bandwidth of
    650 MBps in each direction to maintain system
    balance with the faster processors and memory.

16
NPB Graph Results
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
The Dwarves Hardware
  • Old PII at 300MHz processors Will be
  • removed soon.
  • 8 PIII at 450MHz processors
  • 4 PIII at 733MHz processors
  • The new machines
  • Dual AMD Athlon(tm) MP 2000 _at_
  • 1,666MHz. 1GB Memory.

24
In The Next 2 Weeks
  • Install the NPB 2.2 on the Dwarves cluster
  • Run the Benchmark tests on the Dwarves Cluster
  • Run tests on several different configurations
    (different number of dwarves)
  • Estimate Network Bandwidth and latency.
  • Compare the Dwarves cluster performance to
    similar clusters in the world

25
Questions will not be answered !!!
GOOD NIGHT
Write a Comment
User Comments (0)
About PowerShow.com