Communication Performance Measurement and Analysis on Commodity Clusters - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Communication Performance Measurement and Analysis on Commodity Clusters

Description:

Communication Performance Measurement and Analysis on Commodity Clusters Research Proposal Name Nor Asilah Wati Abdul Hamid – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 33
Provided by: wati
Category:

less

Transcript and Presenter's Notes

Title: Communication Performance Measurement and Analysis on Commodity Clusters


1
Communication Performance Measurement and
Analysis on Commodity Clusters
Research Proposal
Name Nor
Asilah Wati Abdul Hamid
Supervisor Dr. Paul
Coddington Dr. Francis Vaughan
2
Table of Content
  • Introduction
  • Message-Passing Multicomputers.
  • Previous Research to Improve Communication Over
    Ethernet.
  • Communication Performance Measurement.
  • Previous Benchmark Software
  • Performance Analysis for MPIBench.
  • Motivation
  • Methodology
  • Value of the Research.

3
Introduction
  • The proposed research is on parallel computing
    and focus on message-passing parallel computers.
  • This research will study communications benchmark
    software and performance measurement and analysis
    for message-passing parallel computers.
  • The proposed research will find a clearer
    understanding of communications performance
    problems and how they can be improved,
    particularly for commodity clusters using Linux
    PCs and Ethernet networks.

4
Message-Passing Parallel Computers
  • There are various types of message-passing
    parallel computers, from high end to the low end.
  • Beowulf clusters are high-performance computers
    built from off-the-shelf commodity components -
    PCs running Linux and Fast Ethernet network.
  • However, some clusters use high-end Unix
    workstations (such as Compaq Alpha or Sun
    UltraSPARC machines) and/or high-end gigabit
    networks (such as Myrinet, QSNet)

Hydra
APAC NF
5
Message-Passing Parallel Computers
  • The low end commodity cluster - consist of a
    cluster of PCs running Linux connected using a
    Fast Ethernet network, e.g Perseus.
  • Use MPI message-passing libraries, e.g MPICH,
    LAM MPI.
  • MPI standard library specification for
    message-passing computer.
  • MPICH freely available implementation of MPI
  • The proposed research is mainly focussed on low
    end commodity clusters.

Perseus
6
Message-Passing Parallel Computers
  • Beowulf clusters have become very popular over
    the past couple of years, due to the rapid
    improvements in the performance of commodity
    processors and networking infrastructure, and the
    development of Linux, for PCs.
  • For most applications, Beowulf clusters offer
    much better price/performance than standard
    supercomputers.
  • Beowulf cluster commonly use Ethernet network
    and TCP/IP for communication and MPICH for MPI
    library.
  • Ethernet network is much cheaper than high-speed
    networks.
  • However there are several inadequacies related to
    the Ethernet network due to TCP/IP and MPI
    implementation.

7
Network Cost Comparison (Clustervision.com)
Interconnect Bandwidth (Mbytes/s) Latency (µs) Cost/port (Euro)
QsNet (Quadrics) 360 5 4770
Myrinet (Myricom) 245 10 2050
Gigabit Ethernet 90 100 200
Megabit Ethernet 12 100 28
Infiniband 560 - 610 13 - 17 2000
8
Ethernet Problems
  • TCP/IP is specifically designed for Internet use,
    hence, there are several problems in using it for
    parallel computing
  • Examples mechanism for packet loss and
    congestion control, timeout etc.
  • Problems in MPI implementation occur because -
  • TCP/IP support detect errors, loss of data and
    retransmission until data is correct and receive
  • BUT
  • MPI implementation assume network with
    reliable data transfer.
  • There is much research trying to improve the
    performance of TCP/IP, but mostly focussed on
    optimizing the performance for internet and
    local-area network.

9
Previous Research to Improve Communication Over
Ethernet
  • Active Messages aims to reducing the
    communication overhead and allowing communication
    and computation overlap.
  • GAMMA an extension layer in communication layer
    for Linux in cluster of PCs.
  • BIP Basic Interface for Parallelism, an
    interface for network communication for
    message-passing parallel computing.
  • VIA is a standard communication infrastructure
    for System Area Networks (SANs) that provides
    protected, zero-copy user-space inter-process
    communication
  • MVICH is an MPICH-based implementation of MPI
    for Virtual Interface Architecture (VIA).

10
Protocol Comparison (Ping-Pong
Application)
Platform Latency(us) Bandwidth (Mbyte/s)
BIP Myrinet 5.0 108.0
TCP - Myrinet 103.0 42.0
GAMMA Gigabit Ethernet 9.6 90.0
TCP Gigabit Ethernet 103.0 62.0
GAMMA - Fast Ethernet 12.7 12.2
VIA Fast Ethernet 27.0 -
TCP Fast Ethernet 105.0 10.0
11
Previous Research to Improve Communication Over
Ethernet
  • Previous research focusing more on developing a
    new design for replacing the TCP/IP protocol.
  • However, a new protocol will require new software
    (e.g drivers) for all Ethernet hardware.
  • Also, need to port MPI implementation to new
    protocol, e.g MVICH.
  • TCP/IP and MPICH are widely used in existing
    Beowulf cluster. So a more flexible TCP/IP and
    better MPICH will be better than a new protocol.
  • Research from Pope et al is an example of
    research aiming to design a more flexible TCP/IP
    using a compliant systems approach.
  • They proposed the argument for separation of
    policy and mechanism and examine what policies is
    suitable for TCP/IP stacks which depends on the
    type of communication use.

12
Communication Performance Measurement
  • Why communication performance measurement is
    important, examples -
  • To improve the performance of the machine and the
    MPI implementation
  • Needed as input to performance modeling tools for
    parallel programs
  • To compare the performance of the machine, in
    order to find the fastest machine.
  • Benchmark software, e.g SKaMPI, MPBench,
    Mpptest, Pallas MPI Benchmark, and recently
    developed MPIBench

13
Previous Benchmark Software
  • SKaMPI, MPBench, Pallas MPI Benchmark, Mpptest.
  • Existing benchmark software has several
    weaknesses, which can result in the inaccuracy of
    time measurement.
  • The use of relatively coarse grained clocks for
    timing measurement, which will lead a benchmark
    to average results over a high number of test
    repetitions.
  • Rely on MPI_Wtime for timing and use ping-pong
    test to measure the total round trip time, not
    single communication time.
  • None of the communication patterns used in
    existing benchmark consider clusters of SMP
    nodes.

14
MPIBench
  • MPIBench has been developed by Duncan Grove as
    part of his PhD research.
  • The extra functionality in MPIBench
  • Topology-aware, specifically designed to ensure
    meaningful results on clusters of SMP nodes.
  • Uses an accurate globally synchronized clock to
    measure the performance of all the processes
    involved.
  • Can measure times of single communications - not
    just averages.
  • Can generate histograms (distributions) of
    communication times.
  • The proposed research will used MPIBench for the
    performance measurement and also improve the
    MPIBench.

15
Performance Analysis with MPIBench
  • Comparison of communication performance of
    different networks.
  • Beowulf-type cluster of PCs connected by Fast
    Ethernet (Perseus and Bunyip).
  • Perseus vs Bunyip to analyse effects of
    different communication topology.
  • Sun Technical Compute Farm connected with Myrinet
    (Orion).
  • Compaq AlphaServer SC connected with QsNet (APAC
    NF).

16
Performance Analysis with MPIBench
  • MPIBench found several inadequacies from the
    performance analysis, for examples -
  • Problem caused by TCP/IP timeouts and congestion
    control.
  • Problems with MPI implementations.
  • Problems caused by network congestion.
  • Distribution results with long tails, including
    outliers with very long communication time due
    to -
  • Spurious interference from unrelated operating
    system services.
  • Cluster management system daemons
  • Outlier - An extreme point that is much longer
    than the average value of distribution.

17
Perseus Average time for MPI_Bcast
18
Perseus Percentage of procesess experiencing
outliers during MPI_Bcast
19
Distribution of times for MPI_Bcast
20
Perseus Average times for MPI_Alltoall
21
Perseus Percentage of processess experiencing
outliers during MPI_Alltoall
22
Motivation 1
  • MPIBench is a new communication benchmark
    software which has new capability compared to
    existing benchmark software.
  • HOWEVER, there has been no detailed comparison or
    study between MPIBench with the existing MPI
    benchmarks. Furthermore, in order to improve
    MPIBench a comparison with existed benchmark
    software is important, to identify any
    inadequacies in MPIBench.
  • Research Aims
  • To compare MPIBench with the other existing
    benchmark software . The comparison also to test
    the scalability, functionality and usability of
    MPIBench compared with the existing software.
  • Based from the comparison results, improvements
    and changes can be done to MPIBench.

23
Methodology
  • Comparison of different benchmark software for
    message-passing parallel computer.
  • Particularly, the comparison is divided into
    theoretical and experimental part.
  • The theoretical part will involved a study based
    from the conference or journal paper and the
    documentation from the benchmark software.
  • The experimental part will involve installation
    of the benchmark software into the Hydra cluster
    and test the functionality of the software.
  • Then, a standard procedure for test particular
    such as size of data, MPI routine and number of
    iterations will be identify to standardized the
    experiment. All the data that obtain from the
    experiment will be recorded and compared.

24
Methodology
  • Improvement to MPIBench
  • Generally, the second method will required a
    detailed understanding to the MPIBench code.
  • After that, changes to the code will be
    highlighted and then changes will be made to the
    code.
  • Crucially important after the changes is the
    testing to the MPIBench, the testing should be
    done with the same testing in the first
    methodology to ensure the correctness of the
    program.

25
Motivation 2
Previously, Grove had used MPIBench to test
between two cluster which has a similar commodity
component but different in their topology,
Perseus and Bunyip. HOWEVER, there has not been
any experimental work done with MPIBench to test
on a machine which has a similar components and
similar topology but only different in their
network type. Research Aims 3. To analyze the
performance between Myrinet and Ethernet network
on a large Linux PC cluster (Hydra). Results
obtained from the test will be analyze and may
provide ideas on how to upgrade the communication
performance for Ethernet network in Beowulf
cluster.
26
Methodology
  • Performance Analysis and Investigation of
    Communication Performance on Different Networks.
  • Design a method to differentiate between Ethernet
    and Myrinet network to run the program.
  • A set of procedure or parameter is required to
    standardize the experiment, for examples number
    of iterations, MPI routine, number of processors
    and size of data.
  • The performance analysis result will be recorded
    and analysed.
  • After the performance analysis results is
    obtained, then, the results will be used to
    investigate the problems in Ethernet network.
  • The investigation will involve study, analysis
    and discussion regarding the comparison results
    on communication performance for Myrinet and
    Ethernet network.
  • The expectation of this stage is to obtain ideas
    for problems that occur in the Ethernet network,
    particularly for TCP/IP and MPI implementation.

27
Motivation 3
  • Previously, there are several research to
    overcome the problems of communication
    performance for Ethernet network in Beowulf
    cluster.
  • However, previous research focus more on a new
    design of protocol. A new protocol will require
    new software (e.g drivers) for all Ethernet
    hardware and also need to port MPI implementation
    to new protocol.
  • It will be more valuable if the problems of
    TCP/IP and MPICH itself can be fixed.
  • Research Aims
  • 4. To propose or develop solutions to
    communication problems in Beowulf clusters using
    Ethernet network, particularly for TCP/IP and MPI
    implementation.

28
Methodology
  • 4. Propose or Develop Solutions for the
    Ethernet Network Problems in Beowulf Clusters
    Computers. 
  • This will involve study, analysis, comparison
    results and experiment.
  • Based from the study that has been done, there
    are several expected problems that might be
    occurred in TCP/IP, for example packet loss and
    congestion.
  • Suggestions that might be suitable to the TCP/IP,
    decrease the time out or improve the algorithm
    for the resend mechanism in TCP/IP.
  • The problems that occur in MPICH such as poor
    performance and unusual distribution of
    MPI_Alltoall.
  • Suggest or develop optimised code for some MPI
    routines that is suitable for TCP/IP and Ethernet
    network.
  • Re-run experiments to test changes to MPICH code
    or TCP, in order to check for performance
    improvement.

29
Motivation 4
  • Previously Grove had used MPIBench to benchmark
    several machines, from his analysis he recorded
    outlier results showing very long communication
    times.
  • The main causes of outlier is because of -
  • Spurious interference from unrelated operating
    system services.
  • Cluster management system daemons
  • However, there has been no further work to
    investigate the solution of these problems.
  • Research Aims
  • 5. To find solutions for loss of performance in
    Beowulf clusters with Linux PCs.
  • 6. Possibly develop a customized installation of
    Linux.

30
Methodology
  • 5. Investigation of the Outliers Problem.
  • Set the same experiment that the MPIBench did
    previously on Perseus.
  • Based on the expected main causes of the
    outliers, the experiment will involve -
  • Experiment with removing operating system and
    Cluster Management system processes.
  • Experiment with reducing the frequency of the
    interference from process execution.
  • Try to identify the cause of outliers and
    propose solutions.

31
Value of the Research
  • This proposed research will provide -
  • An improvement to MPIBench which can be used to
    analyze communication networks and MPI
    implementations.
  • Results that can be used for future study for
    PEVPM, a new performance modelling technique.
  • An improvement in communication performance for
    Beowulf Clusters using Ethernet network which can
    provide a solution for cheap high performance
    computing.

32
END.
Write a Comment
User Comments (0)
About PowerShow.com