HighPerformance Computations and the technologies of Microsoft - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

HighPerformance Computations and the technologies of Microsoft

Description:

Microsoft Academic Days 2006 Russia, Moscow, 28.04.2005 ... Shuttle_at_NewEgg.com. Sun HPC 10000. Cray Y-MP C916. System. 2005. 1998. 1991 ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 62
Provided by: victorp9
Category:

less

Transcript and Presenter's Notes

Title: HighPerformance Computations and the technologies of Microsoft


1
Nizhni Novgorod State University
High-performance computations and the
technologies of Microsoft
Prof. Gergel V.P., D.Sc., Software Department, NN
SU
www.unn.ru www.software.unn.ac.ru
2
  • Needs of High Performance Computations (HPC)
  • Windows based Clusters Microsoft Compute
    Cluster Server
  • HPC Simplicity vs Complexity Brief
    Introduction to MPI
  • How to overcome the complexity HPC Curriculum

3
Needs of High Performance Computations
  • Time-consuming nature of many scientific and
    engineering problems (problems of "Great
    Challenge")
  • Increase of serial computers performance is
    limited
  • Cost of parallel computational systems is
    reducing (clusters,)
  • Parallelism on processor layer
    HyperThreading, multicore (70 of market in 2006)

4
Needs of High Performance Computations
Price is reduced more than 10,000 times !!!
Supercomputing Goes Personal
5
Needs of High Performance Computations
PVP
SMP (incl. Multi-core)
UMA
Multiprocessors
COMA
(shared memory)
CC-NUMA
NUMA
NCC-NUMA
MIMD
Cluster
NORMA
Multipcomputers
(distributed memory)
MPP
6
Needs of High Performance Computations
  • Cluster
  • Group of computers (local network) capable to
    work as a unified computational unit,
  • Higher reliability and efficiency than local
    network,
  • Essentially lower cost comparing to other types
    of parallel computational systems (by using of
    commodity-on-the-shelves hardware and software)

7
  • Needs of High Performance Computations (HPC)
  • Windows based Clusters Microsoft Compute
    Cluster Server
  • HPC Simplicity vs Complexity Brief
    Introduction to MPI
  • How to overcome the complexity HPC Curriculum

8
Windows based Clusters - Microsoft Compute
Cluster Server
  • Microsoft Vision in HPC area
  • Compute Cluster Server (CCS) consist of
  • Dedicated release of OS Windows Server 2003 -
    Cluster Edition
  • Compute Cluster Pack (CCP)
  • MS MPI implementation of the standard MPI-2,
  • Cluster management system,
  • GUI, CUI, COM and other interfaces for job
    submitting
  • Current Release - Community Preview Release 3
  • First Release CCS became available in November,
    2005
  • Download - http//www.connect.microsoft.com

9
Microsoft Compute Cluster Server
  • Computational Nodes
  • 64-bit processors of x86 family,
  • 512 ?b RAM,
  • 4 Gb HDD,
  • 64-bit Microsoft Windows Server 2003
  • Parallel Software Development
  • PC under MS Windows XP, 2003, Vista,
  • MS Compute Cluster Pack SDK,
  • Recommended IDE MS Visual Studio 2005

10
Microsoft Compute Cluster Server
  • Job Management
  • CCS provides job management and efficient use of
    the resources on the compute cluster,
  • The interfaces for scheduling jobs include
  • Command Line Interface (CLI),
  • GUI,
  • Web UI,
  • Web-services, COM,

11
Microsoft Compute Cluster Server
  • Job Management provide
  • Schedule the job execution,
  • Inspect current states of jobs,
  • Terminate jobs,
  • etc.

12
Microsoft Compute Cluster Server
  • Development and executing MPI-programs
  • IDE VS 6.0, VS 2003, VS 2005,
  • Language ?,
  • MS MPI is compatible with MPICH-2 (on source code
    level),
  • mpiexec is used to run MPI-programs at the same
    way as for MPICH-2

13
Microsoft Compute Cluster Server
  • Debugging MPI-programs
  • Visual Studio 2005 and CCP has build-in
    MPI-debugger!

14
Microsoft Compute Cluster Server
  • Debugging MPI-programs

15
Microsoft Compute Cluster Server
16
  • Needs of High Performance Computations (HPC)
  • Windows based Clusters Microsoft Compute
    Cluster Server
  • HPC Simplicity vs Complexity Brief
    Introduction to MPI
  • How to overcome the complexity HPC Curriculum

17
HPC Simplicity vs Complexity Brief
Introduction to MPI
The processors in the computer systems with
distributed memory operate independently.
It is necessary to have a possibility
  • to distribute the computational load,
  • to organize the information communication (data
    transmission) among the processors.

The solution of the above mentioned problems is
provided by the MPI (Message Passing Interface)
18
Brief Introduction to MPI
  • Example Computing the constant ?
  • The value of constant ? can be computed by means
    of the integral
  • To compute this integral the method of rectangles
    can be used for numerical integration

19
Brief Introduction to MPI
// Serial programs include double f(d
ouble a) return (4.0 / (1.0 aa)) i
nt main(int argc, char argv)
int n, i double PI25DT 3.1415926535897932
38462643 double mypi, h, sum, x printf("
Enter the number of intervals ")
scanf("d",n) // calculating h 1.0 /
(double) n sum 0.0 for ( i 0 i i ) x h ((double)i 0.5)
sum f(x) mypi h sum printf(
"pi is approximately .16f, error is .16f\n",
mypi, fabs(pi PI25DT))
20
Brief Introduction to MPI
  • Parallel method
  • Cyclic scheme can be used to distribute the
    calculations among the processors
  • Partial sums, that were calculated on different
    processors, have to be summed

- Processor 0 - Processor 1 - Processor 2
21
Brief Introduction to MPI
include "mpi.h" include double f(do
uble a) return (4.0 / (1.0 aa)) in
t main(int argc, char argv)
int ProcRank, ProcNum, n, i
double PI25DT 3.141592653589793238462643
double mypi, pi, h, sum, x, t1, t2
MPI_Init(argc,argv) MPI_Comm_size(MPI_COM
M_WORLD,ProcNum) MPI_Comm_rank(MPI_COMM_WORL
D,ProcRank) if ( ProcRank 0) pri
ntf("Enter the number of intervals ")
scanf("d",n) t1 MPI_Wtime()

22
Brief Introduction to MPI
MPI_Bcast(n, 1, MPI_INT, 0, MPI_COMM_WORLD)
// calculating the local sums
h 1.0 / (double) n sum 0.0 for (i
ProcRank 1 i x h ((double)i 0.5)
sum f(x) mypi h sum //
reduction MPI_Reduce(mypi,pi,1,MPI_DOUBLE,MPI
_SUM,0,MPI_COMM_WORLD) if ( ProcRank 0 )
// printing results t2 MPI_Wtime()
printf("pi is approximately .16f, Error is
.16f\n",pi, fabs(pi PI25DT))
printf("wall clock time f\n",t2-t1)
MPI_Finalize()
23
  • Needs of High Performance Computations (HPC)
  • Windows based Clusters Microsoft Compute
    Cluster Server
  • HPC Simplicity vs Complexity Brief
    Introduction to MPI
  • How to overcome the complexity HPC Curriculum

24
How to overcome the complexity HPC Curriculum
  • HPC Required Skills and Knowledge
  • Architecture of parallel computer systems
  • Computation models and methods for analyzing
    complexity of calculations
  • Parallel computation methods
  • Parallel programming (languages, development
    environments, libraries),

It is important to have an integrated teaching
course on parallel programming
25
HPC Curriculum
  • The essential curriculum part is an
    integrated course "HPC and parallel programming"
    which provides
  • studying the models of parallel computations,
  • mastering in parallel algorithms and
  • getting practical experience in parallel
    programming.
  • The course provides good knowledge in many
    parallel programming areas (models, methods,
    technologies, programs) for students. Learning
    combines theoretical classes and laboratory works

26
HPC Curriculum
  • Components
  • Course syllabus,
  • Syllabus of laboratory works,
  • E-textbook,
  • Program system for supporting laboratory works,
  • Manual of program system user,
  • Function library,
  • Function library reference guide,
  • PowerPoint presentations for all lections
  • http//www.software.unn.ac.ru/ccam
  • Development of HPC curriculum has been supported
    by Microsoft

27
HPC Curriculum
  • Highlights of the course
  • Comprehensive learning of the spectrum of
    parallel programming issues (models, methods,
    technologies, programs)
  • Organic combination of theoretical classes and
    laboratory training
  • Intensive use of research and educational
    software systems for carrying out computational
    experiments

28
HPC Curriculum
  • Syllabus
  • Architecture of parallel computers and their
    classification,
  • Modeling and analysis of parallel computations,
  • Analysis of communication complexity of parallel
    programs,
  • Technology for developing parallel programs
  • Parallel expansions for industrial algorithmic
    languages (OpenMP),
  • Developer's libraries for parallel programming
    (MPI)
  • Principles of parallel algorithm design,
  • Parallel computation methods

29
HPC Curriculum
  • Modeling and analysis of parallel computations
  • Computations model in the form of an operations
    operands graph,
  • Description of the scheme for parallel execution
    of an algorithm,
  • Predicting the time for executing of a parallel
    algorithm,
  • Efficiency criteria of a parallel algorithm

30
HPC Curriculum Modeling and analysis of
parallel computations
  • Characteristics of Parallel Algorithm
    Efficiency
  • Speedup
  • Efficiency
  • Very often these criteria are antagonistic !

31
HPC Curriculum Modeling and analysis of
parallel computations
  • Example Total Sum Computation
  • The computation of the total sum of the available
    set of values (particular case of the general
    reduction problem)

32
HPC Curriculum Modeling and analysis of
parallel computations
  • Example Total Sum Computation
  • Sequential summation of the elements of a series
    of values

This standard sequential summation algorithm
allows only strictly serial execution and
cannot be parallelized
33
HPC Curriculum Modeling and analysis of
parallel computations
  • Example Total Sum Computation
  • Cascade Summation Scheme

!!!
34
HPC Curriculum Modeling and analysis of
parallel computations
  • Example Total Sum Computation
  • Modified Cascade Scheme

35
HPC Curriculum Analysis of communication
complexity
  • Characteristics of the topology of data
    communication network,
  • General description of data communication
    techniques,
  • Analysis of time complexity for data
    communication operations,
  • Methods of logic representation of communication
    topology

36
HPC Curriculum Principles of parallel algorithm
design
The general scheme of parallel algorithm design
(proposed by I. Foster)
37
HPC Curriculum Parallel algorithms
  • Matrix-vector multiplication
  • Matrix multiplication
  • Sorting
  • Graph processing
  • Partial differential equations
  • Optimization

38
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Data distribution Checkerboard scheme
  • Basic subtask is a procedure, that calculates all
    elements of one block of matrix C

39
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Analysis of Information Dependencies
  • The subtask with the number (i,j) calculates the
    block Cij, of the result matrix C. As a result,
    the subtasks form the qq two-dimensional grid,
  • The initial distribution of matrix blocks in
    Cannons algorithm is selected in such a way that
    the first block multiplication can be performed
    without additional data transmission
  • At the beginning each subtask (i,j) holds the
    blocks Aij and Bij,
  • For the i-th row of the subtasks grid the matrix
    A blocks are shifted for (i-1) positions to the
    left,
  • For the j th column of the subtasks grid the
    matrix B blocks are shifted for (j-1) positions
    upward,
  • Data transmission operations are the example of
    the circular shift communication

40
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Analysis of Information Dependencies
  • After the redistribution, which was performed at
    the first stage, the matrix blocks can be
    multiplied without additional data transmission
    operations,
  • To obtain all of the rest blocks after the
    operation of blocks multiplication
  • Matrix A blocks are shifted for one position left
    along the grid row,
  • Matrix B blocks are shifted for one position
    upward along the grid column.

41
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Scaling and Distributing the Subtasks among the
    Processors
  • The sizes of the matrices blocks can be selected
    so that the number of subtasks will coincides
    the number of available processors p,
  • The most efficient execution of the parallel
    Canons algorithm can be provided when the
    communication network topology is a
    two-dimensional grid,
  • In this case the subtasks can be distributed
    among the processors in a natural way the
    subtask (i,j) has to be placed to the pi,j
    processor

42
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Efficiency Analysis
  • Speed-up and Efficiency generalized estimates

Developed method of parallel computations allows
to achieve ideal speed-up and efficiency charact
eristics
43
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Efficiency Analysis (detailed estimates)

- The Cannons algorithm differs from the Foxs
algorithm only in the types of communication
operations, that is
- Time of the initial redistribution of matrices
blocks
- After every multiplication operation matrix
blocks are shifted
Total time of parallel algorithm execution is
44
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Results of computational experiments
  • Comparison of theoretical estimations and results
    of computational experiments

45
HPC Curriculum Parallel algorithms
  • Example Matrix multiplication by Cannons
    Method
  • Results of computational experiments
  • Speedup

46
HPC Curriculum Laboratory classes
  • Methods of parallel programs development for
    multi-processor systems with shared and
    distributed memory using OpenMP and MPI
    technologies
  • Training for developing parallel algorithms and
    programs for solving computational problems
  • Training on using parallel methods libraries for
    solving complex scientific and engineering
    problems

47
HPC Curriculum Laboratory classes
  • Computational experiments on parallel systems

48
HPC Curriculum Laboratory classes
  • Intensive use of research and education software
    systems for modeling computations on various
    multiprocessor systems and visualization of
    parallel computation processes
  • The system Parallel Laboratory (ParaLab) the
    software system for studying and investigations
    parallel methods for solving time-consuming
    problems

49
HPC Curriculum Laboratory classes with ParaLab
  • Modelling a parallel computing system,
  • Choice of computing problems and methods to
    solve them,
  • Carrying out computational experiments,
  • Parallel computation visualization,
  • Information gathering and analysis of results
    (experiment log"),
  • Data archive

50
HPC Curriculum Laboratory classes with ParaLab
Experiment's results
Area for experiment data visualization
Visualization of an processor's operations
51
HPC Curriculum Laboratory classes with ParaLab
  • Modelling a parallel computing system

52
HPC Curriculum Laboratory classes with ParaLab
  • Choosing computing problems and methods to solve
    them

53
HPC Curriculum Laboratory classes with ParaLab
  • Computational experiments and parallel
    computation visualization

Matrix computations
Sorting
54
HPC Curriculum Laboratory classes with ParaLab
  • Information gathering and analysis of results
    (experiment log")

55
HPC Curriculum Laboratory classes with ParaLab
  • System usage experience shows, that ParaLab may
    be useful for both novices, who are just starting
    to learn parallel computing, and sometimes even
    for experts in this perspective sphere of
    strategic computer technology

56
Winter School on Parallel Computing 2004, 2005,
2006
  • January 25 February 7, 2004,
  • 39 participants from 11 cities in CIS,
  • 6 lecture courses given by leading
    specialists in parallel computing,
  • scientific seminar

57
Winter School on Parallel Computing 2004, 2005,
2006
  • School syllabus
  • Technologies of parallel programming (Gergel
    V. NNSU, Popova N. MSU),
  • Parallel data bases (Sokolinsky L. ChelSU),
  • Parallel computation models (on the based of
    DVM system) Krukov V. (IPM RAN)
  • Parallel computational algorithms (Yakobovski
    M. IMM RAN)

58
Winter School on Parallel Computing 2004, 2005,
2006
  • School highlights
  • Intensive form of classes (9-00?18-00 daily,
    till 21-00 self-instruction works),
  • Predominance of practical classes and
    laboratory works,
  • Remote access to many Russian high-performance
    resources (clusters of NNSU, MSU, RCC MSU, ICC
    RAN, SPbSU, IAP RAN),
  • Holding training on parallel software
    development tools (Intel),
  • Holding research and educational seminar for
    students and scientists
  • Winter School has been supported by Intel

59
Conclusions
  • High performance computing - Challenge for CS and
    IT
  • Microsoft Vision Clusters under Compute Cluster
    Server
  • UNN HPC Curriculum provides the easiest entering
    to HPC world

60
Contacts
University of Nizhni Novgorod
23, Gagarin Avenue 603950, Nizhni Novgorod Tel
7 (8312) 65-48-59,
E-mail gergel_at_unn.ac.ru
61
Questions, Remarks, Something to add
Write a Comment
User Comments (0)
About PowerShow.com