Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load PowerPoint PPT Presentation

presentation player overlay
1 / 33
About This Presentation
Transcript and Presenter's Notes

Title: Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load


1
Automatic Optimisation of Parallel Linear Algebra
Routines in Systems with Variable Load
  • Javier Cuenca
  • Domingo Giménez
  • José González

Jack Dongarra Kenneth Roche
2
Optimisation of Linear Algebra Routines
  • Traditional method Hand-Optimisation for each
    platform
  • Time-consuming
  • Incompatible with Hardware Evolution
  • Incompatible with changes in the system
  • (architecture and basic libraries)
  • Unsuitable for systems with variable load
  • Misuse by non expert users

3
Solutions to this situation?
Some groups and projects ATLAS, GrADS, LAWRA,
FLAME, I-LIB But the problem is very complex.
OCULTA
4
Our Approach
D E S I G N
R U N - T I M E
Modelling the Linear Algebra Routine
(LAR) Texec f (SP, AP, n) SP System
Parameters AP Algorithmic Parameters n Problem
size
Execution of LAR
Selection of AP values
I N S T A L L A T I O N
Estimation of SP
5
Our Approach
D E S I G N
R U N - T I M E
LAR
Execution of LAR
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
OCULTA
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
6
Our Approach
Static Model of LAR Situation of platform at
installation time
LARs Jacobi methods for the symmetric eigenvalue
problem Gauss elimination LU factorisation QR
factorisation
Platforms Cluster of Workstations Cluster of
PCs SGI Origin 2000 IBM SP2
7
Our Approach
Static Model of LAR Situation of platform at
installation time Dynamic Model of LAR Situation
of platform at run-time.
LARs Jacobi methods for the symmetric eigenvalue
problem Gauss elimination LU factorisation QR
factorisation
Platforms Cluster of Workstations Cluster of
PCs SGI Origin 2000 IBM SP2
8
DESIGN PROCESS
D E S I G N
LAR
LAR Linear Algebra Routine Made by the LAR
Designer
Example of LAR Parallel Block LU factorisation
9
Modelling the LAR
D E S I G N
LAR
Modelling the LAR
MODEL
10
Modelling the LAR
D E S I G N
LAR
Made by the LAR-Designer Only once per LAR
Modelling the LAR
MODEL
SP System Parameters AP Algorithmic
Parameters n Problem size
MODEL Texec f (SP, AP, n)
11
Modelling the LAR
D E S I G N
LAR
SP k3, k2, ts, tw AP p, b n Problem size
Modelling the LAR
MODEL
MODEL LAR Parallel Block LU factorisation
12
Implementation of SP-Estimators
D E S I G N
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
13
Implementation of SP-Estimators
D E S I G N
LAR
Modelling the LAR
Estimators of Arithmetic-SP Computation Kernel of
the LAR Similar storage scheme Similar quantity
of data Estimators of Communication-SP
Communication Kernel of the LAR Similar kind of
communication Similar quantity of data
MODEL
Implementation of SP-Estimators
SP-Estimators
14
INSTALLATION PROCESS
D E S I G N
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Installation Process Only once per Platform Done
by the System Manager
15
Estimation of Static-SP
D E S I G N
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
16
Estimation of Static-SP
D E S I G N
Basic Libraries Basic Communication Library
MPI PVM Basic Linear Algebra Library
reference-BLAS machine-specific-BLAS ATLAS
LAR
Modelling the LAR
Installation File SP values are obtained using
the information (n and AP values) of this file.
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
17
Estimation of Static-SP
D E S I G N
PlatformCluster of Pentium III Fast
Ethernet Basic Libraries ATLAS and MPI
LAR
Modelling the LAR
Estimation of the Static-SP k3-static (in
?sec) Block size 16 32 64 128 k3-static 0.003
8 0.0033 0.0030 0.0027
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Estimation of the Static-SP tw-static (in
?sec) Message size (Kbytes) 32 256 1024 2048 tw-
static 0.700 0.690 0.680 0.675
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
18
RUN-TIME PROCESS
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
19
RUN-TIME PROCESS Static approach
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
20
LU on IBM SP2
OCULTA
Quotient between the execution time with
the parameters provided by the model and the
optimum execution time. In the sequential case,
and in parallel with 4 and 8 processors.
21
RUN-TIME PROCESS Static approach
D E S I G N
R U N - T I M E
LAR
Execution of LAR
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
22
RUN-TIME PROCESS Static approach
OCULTA
D E S I G N
R U N - T I M E
LAR
Execution of LAR
p4 dev Static Static n opt MODEL MODEL 5
12 0.25 0.25 0 1024 1.36 1.36 0 1536 3.22
3.22 0 2048 6.76 6.76 0 2560 11.81 11.81
0 3072 19.28 19.41 1
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
23
RUN-TIME PROCESS Static
OCULTA
D E S I G N
R U N - T I M E
LAR
Execution of LAR
p8 dev Static Static n opt MODEL MODEL 1
024 0.93 0.99 6 2048 4.98 4.98 0 3072 13.8
1 13.81 0 4096 27.65 29.31 6
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
24
RUN-TIME PROCESSDynamic Approach
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
Estimation of Static-SP
Static-SP-File
25
Call to NWS
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
26
Call to NWS
R U N - T I M E
The NWS is called and it reports the fraction
of available CPU (fCPU) the current word
sending time (tw-current) for a specific n and AP
values (n0, AP0). Then the fraction of
available network is calculated
NWS Information
Call to NWS
27
Call to NWS
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
28
Dynamic Adjustment of SP
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
29
Dynamic Adjustment of SP
R U N - T I M E
The values of the SP are adjusted, according to
the current situation
Current-SP
Dynamic Adjustment of SP
NWS Information
Call to NWS
Static-SP-File
30
Dynamic Adjustment of SP
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
MODEL
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
31
Selection of Optimum AP
D E S I G N
R U N - T I M E
LAR
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
32
Selection of Optimum AP
R U N - T I M E
Optimum-AP
OCULTA
Selection of Optimum AP
Current-SP
Dynamic Adjustment of SP
NWS Information
Call to NWS
Static-SP-File
33
Execution of LAR
D E S I G N
R U N - T I M E
LAR
Execution of LAR
Modelling the LAR
Optimum-AP
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
34
Execution of LAR
D E S I G N
R U N - T I M E
LAR
Execution of LAR
Modelling the LAR
Optimum-AP
OCULTA
MODEL
Selection of Optimum AP
Implementation of SP-Estimators
SP-Estimators
Current-SP
Dynamic Adjustment of SP
I N S T A L L A T I O N
Basic Libraries
Installation-File
NWS Information
Estimation of Static-SP
Call to NWS
Static-SP-File
35
Platform load different situations studied
nodo1 nodo2 nodo3 nodo4 nodo5 nodo6 nodo7 nodo8
Situation A CPU avail. 100 100 100 100 1
00 100 100 100 tw-current 0.7?sec
Situation B CPU avail. 80 80 80 80 100
100 100 100 tw-current 0.8?sec 0.7?sec
Situation C CPU avail. 60 60 60 60 100
100 100 100 tw-current 1.8?sec 0.7?sec
Situation D CPU avail. 60 60 60 60 100
100 80 80 tw-current 1.8?sec 0.7?sec 0.8
?sec Situation E CPU avail. 60 60 60 60
100 100 50 50 tw-current 1.8?sec 0.7?se
c 4.0?sec
36
Platform load different situations studied
OCULTA
37
Optimum AP for the different situations studied
Block size Situations of the Platform
Load n A B C D E 1024 32 32 64 64 64 2048 64 64
64 128 128 3072 64 64 128 128 128
Number of nodes to use p r ? c Situations of
the Platform Load n A B C D E 1024 4?2 4?2 2?2 2
?2 2?1 2048 4?2 4?2 2?2 2?2 2?1 3072 4?2 4?2 2?
2 2?2 2?1
38
Experimental Timedeviations from the Optimum
39
Experimental Timedeviations from the Optimum
40
Experimental Timedeviations from the Optimum
41
Conclusions and Future Work
  • The use of the proposed methodology is viable in
    systems where the load is stable or variable.
  • Software like NWS is suitable for the adjustment
    of the system parameters values obtained at
    installation time.
  • The heterogeneous load case offers many more
    possibilities than the one studied.
Write a Comment
User Comments (0)
About PowerShow.com