Automatic Optimization in Parallel Dynamic Programming Schemes - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Automatic Optimization in Parallel Dynamic Programming Schemes

Description:

Departamento de Estad stica y Matem tica Aplicada. Universidad Miguel Hern ndez ... General Goal: to obtain parallel routines with ... processors assignation ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 26

Provided by: javier98

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Optimization in Parallel Dynamic Programming Schemes

1
Automatic Optimization in Parallel Dynamic
Programming Schemes

Domingo Giménez
Departamento de Informática y Sistemas
Universidad de Murcia, Spain
domingo_at_dif.um.es
dis.um.es/domingo

Juan-Pedro Martínez Departamento de Estadística
y Matemática Aplicada Universidad Miguel
Hernández de Elche, Spain jp.martinez_at_uhm.es
2
Our Goal

General Goal to obtain parallel routines with
autotuning capacity
Previous works Linear Algebra Routines
This communication Parallel Dynamic Programming
Schemes
In the future apply the techniques to hybrid,
heterogeneous and distributed systems

3
Outline

Modelling Parallel Routines for Autotuning
Parallel Dynamic Programming Schemes
Autotuning in Parallel Dynamic Programming
Schemes
Experimental Results

4
Modelling Parallel Routines for Autotuning

Necessary to predict accurately the execution
time and select
The number of processes
The number of processors
Which processors
The number of rows and columns of processes (the
topology)
The processes to processors assignation
The computational block size (in linear algebra
algorithms)
The communication block size
The algorithm (polyalgorithms)
The routine or library (polylibraries)

5
Modelling Parallel Routines for Autotuning

Cost of a parallel program
arithmetic time
communication time
overhead, for synchronization, imbalance,
processes creation, ...
overlapping of communication and computation

6
Modelling Parallel Routines for Autotuning

Estimation of the time
Considering computation and communication divided
in a number of steps
And for each part of the formula that of the
process which gives the highest value.

7
Modelling Parallel Routines for Autotuning

The time depends on the problem (n) and the
system (p) size
But also on some ALGORITHMIC PARAMETERS like the
block size (b) and the number of processors (q)
used from the total available

8
Modelling Parallel Routines for Autotuning

And some SYSTEM PARAMETERS which reflect the
computation and communication characteristics of
the system.
Typically the cost of an arithmetic operation
(tc) and the start-up (ts) and word-sending time
(tw)

9
Modelling Parallel Routines for Autotuning

The values of the System Parameters could be
obtained
With installation routines associated to the
routine we are installing
From information stored when the library was
installed in the system
At execution time by testing the system
conditions prior to the call to the routine

10
Modelling Parallel Routines for Autotuning

These values can be obtained as simple values
(traditional method) or as function of the
Algorithmic Parameters.
In this case a multidimensional table of values
as a function of the problem size and the
Algorithmic Parameters is stored,
And when a problem of a particular size is being
solved the execution time is estimated with the
values of the stored size closest to the real
size
And the problem is solved with the values of the
Algorithmic Parameters which predict the lowest
execution time

11
Parallel Dynamic Programming Schemes

There are different Parallel Dynamic Programming
Schemes.
The simple scheme of the coins problem is used
A quantity C and n coins of values
v(v1,v2,,vn), and a quantity q(q1,q2,,qn) of
each type. Minimize the quantity of coins to be
used to give C.
But the granularity of the computation has been
varied to study the scheme, not the problem.

12
Parallel Dynamic Programming Schemes

Sequential scheme
for i1 to number_of_decisions
for j1 to problem_size
obtain the optimum solution with i
decisions and problem size j
endfor Complete the table with the formula
endfor

13
Parallel Dynamic Programming Schemes

Parallel scheme
for i1 to number_of_decisions
In Parallel
for j1 to problem_size
obtain the optimum
solution with
i decisions
and problem size j
endfor
endInParallel
endfor

14
Parallel Dynamic Programming Schemes

Message-passing scheme
In each processor Pj
for i1 to number_of_decisions
communication step
obtain the optimum
solution with
i decisions
and the problem
sizes Pj has
assigned
endfor
endInEachProcessor

N
PO P1
P2 .................... PK-1
PK
15
Autotuning in Parallel Dynamic Programming Schemes

Theoretical model
Sequential cost
Computational parallel cost (qi large)
Communication cost
The only AP is p
The SPs are tc , ts and tw

one step
16
Autotuning in Parallel Dynamic Programming Schemes

How to estimate arithmetic SPs
Solving a small problem
How to estimate communication SPs
Using a ping-pong (CP1)
Solving a small problem varying the number of
processors (CP2)
Solving problems of selected sizes in systems of
selected sizes (CP3)

17
Experimental Results

Systems
SUNEt five SUN Ultra 1 and one SUN Ultra 5 (2.5
times faster) Ethernet
PenET seven Pentium III FastEthernet
Varying
The problem size C 10000, 50000, 100000, 500000
Large value of qi
The granularity of the computation (the cost of a
computational step)

18
Experimental Results

CP1
ping-pong (point-to-point communication).
Does not reflect the characteristics of the
system
CP2
Executions with the smallest problem (C 10000)
and varying the number of processors
Reflects the characteristics of the system, but
the time also changes with C
Larger installation time (6 and 9 seconds)
CP3
Executions with selected problem (C 10000,
100000) and system (p 2, 4, 6) sizes, and linear
interpolation for other sizes
Larger installation time (76 and 35 seconds)

19
Experimental Results
Parameter selection
SUNEt
PenFE
20
Experimental Results

Quotient between the execution time with the
parameter selected by each one of the selection
methods and the lowest execution time, in SUNEt

21
Experimental Results

Quotient between the execution time with the
parameter selected by each one of the selection
methods and the lowest execution time, in PenFE

22
Experimental Results

Three types of users are considered
GU (greedy user)
Uses all the available processors.
CU (conservative user)
Uses half of the available processors
EU (expert user)
Uses a different number of processors depending
on the granularity
1 for low granularity
Half of the available processors for middle
granularity
All the processors for high granularity

23
Experimental Results

Quotient between the execution time with the
parameter selected by each type of user and the
lowest execution time, in SUNEt

24
Experimental Results

Quotient between the execution time with the
parameter selected by each type of user and the
lowest execution time, in PenFE

25
Conclusions and future work

The inclusion of Autotuning capacities in a
Parallel Dynamic Programming Scheme has been
considered.
Different forms of modelling the scheme and how
parameters are selected have been studied.
Experimentally the selection proves to be
satisfactory, and useful in providing the users
with routines capable of reduced time executions
In the future we plan to apply this technique
to other algorithmic schemes,
in hybrid, heterogeneous and distributed systems.