Automatic Optimization in Parallel Dynamic Programming Schemes - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Automatic Optimization in Parallel Dynamic Programming Schemes

Description:

Departamento de Estad stica y Matem tica Aplicada. Universidad Miguel Hern ndez ... General Goal: to obtain parallel routines with ... processors assignation ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 26
Provided by: javier98
Category:

less

Transcript and Presenter's Notes

Title: Automatic Optimization in Parallel Dynamic Programming Schemes


1
Automatic Optimization in Parallel Dynamic
Programming Schemes
  • Domingo Giménez
  • Departamento de Informática y Sistemas
  • Universidad de Murcia, Spain
  • domingo_at_dif.um.es
  • dis.um.es/domingo

Juan-Pedro Martínez Departamento de Estadística
y Matemática Aplicada Universidad Miguel
Hernández de Elche, Spain jp.martinez_at_uhm.es
2
Our Goal
  • General Goal to obtain parallel routines with
    autotuning capacity
  • Previous works Linear Algebra Routines
  • This communication Parallel Dynamic Programming
    Schemes
  • In the future apply the techniques to hybrid,
    heterogeneous and distributed systems

3
Outline
  • Modelling Parallel Routines for Autotuning
  • Parallel Dynamic Programming Schemes
  • Autotuning in Parallel Dynamic Programming
    Schemes
  • Experimental Results

4
Modelling Parallel Routines for Autotuning
  • Necessary to predict accurately the execution
    time and select
  • The number of processes
  • The number of processors
  • Which processors
  • The number of rows and columns of processes (the
    topology)
  • The processes to processors assignation
  • The computational block size (in linear algebra
    algorithms)
  • The communication block size
  • The algorithm (polyalgorithms)
  • The routine or library (polylibraries)

5
Modelling Parallel Routines for Autotuning
  • Cost of a parallel program
  • arithmetic time
  • communication time
  • overhead, for synchronization, imbalance,
    processes creation, ...
  • overlapping of communication and computation

6
Modelling Parallel Routines for Autotuning
  • Estimation of the time
  • Considering computation and communication divided
    in a number of steps
  • And for each part of the formula that of the
    process which gives the highest value.

7
Modelling Parallel Routines for Autotuning
  • The time depends on the problem (n) and the
    system (p) size
  • But also on some ALGORITHMIC PARAMETERS like the
    block size (b) and the number of processors (q)
    used from the total available

8
Modelling Parallel Routines for Autotuning
  • And some SYSTEM PARAMETERS which reflect the
    computation and communication characteristics of
    the system.
  • Typically the cost of an arithmetic operation
    (tc) and the start-up (ts) and word-sending time
    (tw)

9
Modelling Parallel Routines for Autotuning
  • The values of the System Parameters could be
    obtained
  • With installation routines associated to the
    routine we are installing
  • From information stored when the library was
    installed in the system
  • At execution time by testing the system
    conditions prior to the call to the routine

10
Modelling Parallel Routines for Autotuning
  • These values can be obtained as simple values
    (traditional method) or as function of the
    Algorithmic Parameters.
  • In this case a multidimensional table of values
    as a function of the problem size and the
    Algorithmic Parameters is stored,
  • And when a problem of a particular size is being
    solved the execution time is estimated with the
    values of the stored size closest to the real
    size
  • And the problem is solved with the values of the
    Algorithmic Parameters which predict the lowest
    execution time

11
Parallel Dynamic Programming Schemes
  • There are different Parallel Dynamic Programming
    Schemes.
  • The simple scheme of the coins problem is used
  • A quantity C and n coins of values
    v(v1,v2,,vn), and a quantity q(q1,q2,,qn) of
    each type. Minimize the quantity of coins to be
    used to give C.
  • But the granularity of the computation has been
    varied to study the scheme, not the problem.

12
Parallel Dynamic Programming Schemes
  • Sequential scheme
  • for i1 to number_of_decisions
  • for j1 to problem_size
  • obtain the optimum solution with i
    decisions and problem size j
  • endfor Complete the table with the formula
  • endfor

13
Parallel Dynamic Programming Schemes
  • Parallel scheme
  • for i1 to number_of_decisions
  • In Parallel
  • for j1 to problem_size
  • obtain the optimum
  • solution with
  • i decisions
  • and problem size j
  • endfor
  • endInParallel
  • endfor

14
Parallel Dynamic Programming Schemes
  • Message-passing scheme
  • In each processor Pj
  • for i1 to number_of_decisions
  • communication step
  • obtain the optimum
  • solution with
  • i decisions
  • and the problem
  • sizes Pj has
  • assigned
  • endfor
  • endInEachProcessor

N
PO P1
P2 .................... PK-1
PK
15
Autotuning in Parallel Dynamic Programming Schemes
  • Theoretical model
  • Sequential cost
  • Computational parallel cost (qi large)
  • Communication cost
  • The only AP is p
  • The SPs are tc , ts and tw

one step
16
Autotuning in Parallel Dynamic Programming Schemes
  • How to estimate arithmetic SPs
  • Solving a small problem
  • How to estimate communication SPs
  • Using a ping-pong (CP1)
  • Solving a small problem varying the number of
    processors (CP2)
  • Solving problems of selected sizes in systems of
    selected sizes (CP3)

17
Experimental Results
  • Systems
  • SUNEt five SUN Ultra 1 and one SUN Ultra 5 (2.5
    times faster) Ethernet
  • PenET seven Pentium III FastEthernet
  • Varying
  • The problem size C 10000, 50000, 100000, 500000
  • Large value of qi
  • The granularity of the computation (the cost of a
    computational step)

18
Experimental Results
  • CP1
  • ping-pong (point-to-point communication).
  • Does not reflect the characteristics of the
    system
  • CP2
  • Executions with the smallest problem (C 10000)
    and varying the number of processors
  • Reflects the characteristics of the system, but
    the time also changes with C
  • Larger installation time (6 and 9 seconds)
  • CP3
  • Executions with selected problem (C 10000,
    100000) and system (p 2, 4, 6) sizes, and linear
    interpolation for other sizes
  • Larger installation time (76 and 35 seconds)

19
Experimental Results
Parameter selection
SUNEt
PenFE
20
Experimental Results
  • Quotient between the execution time with the
    parameter selected by each one of the selection
    methods and the lowest execution time, in SUNEt

21
Experimental Results
  • Quotient between the execution time with the
    parameter selected by each one of the selection
    methods and the lowest execution time, in PenFE

22
Experimental Results
  • Three types of users are considered
  • GU (greedy user)
  • Uses all the available processors.
  • CU (conservative user)
  • Uses half of the available processors
  • EU (expert user)
  • Uses a different number of processors depending
    on the granularity
  • 1 for low granularity
  • Half of the available processors for middle
    granularity
  • All the processors for high granularity

23
Experimental Results
  • Quotient between the execution time with the
    parameter selected by each type of user and the
    lowest execution time, in SUNEt

24
Experimental Results
  • Quotient between the execution time with the
    parameter selected by each type of user and the
    lowest execution time, in PenFE

25
Conclusions and future work
  • The inclusion of Autotuning capacities in a
    Parallel Dynamic Programming Scheme has been
    considered.
  • Different forms of modelling the scheme and how
    parameters are selected have been studied.
  • Experimentally the selection proves to be
    satisfactory, and useful in providing the users
    with routines capable of reduced time executions
  • In the future we plan to apply this technique
  • to other algorithmic schemes,
  • in hybrid, heterogeneous and distributed systems.
Write a Comment
User Comments (0)
About PowerShow.com