Shortest Path Algorithms - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Shortest Path Algorithms

Description:

Shortest Path Algorithms. Jim E. Jones. Talk Outline. Background for the ... for (k = 0; k n; k ) for (i = 0; i n; i ) for (j = 0; ... dIK. dKJ ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 40
Provided by: Jim4110
Category:

less

Transcript and Presenter's Notes

Title: Shortest Path Algorithms


1
Shortest Path Algorithms
  • Jim E. Jones

2
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

3
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

4
Weighted Directed Graph
1
0
1
4
3
6
2
7
2
3
5
Distance Matrix
1
0
1
4
3
6
2
7
2
3
6
Shortest Path Problem
  • Given the adjacency matrix A.
  • Compute the distance matrix D.

7
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

8
Floyds Algorithm
for (k 0 k lt n k) for (i 0 i lt n
i) for (j 0 j lt n j)
Dij min(Dij,
Dik Dkj)
9
Floyds Algorithm
Dij min(Dij, Dik Dkj)
j
dIJ
dKJ
k
dIK
i
10
Floyd Parallel 1
  • Give each processor a contiguous set of rows of A
    and D. (Row-wise partition)
  • Can use at most N processors.

11
Floyd Parallel 1
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j 0 j lt n j)
Dij min(Dij, Dik
Dkj)
12
Floyd Parallel 1 (P3s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j 0 j lt n j)
Dij min(Dij, Dik
Dkj)
kth row
my rows
13
Floyd Parallel 1 costs
  • Computation
  • T (N3/P) tc
  • Communication (broadcasts of kth row)
  • T N log(P) (a bN)
  • Overall
  • T (N3/P) tc N log(P) (a bN)

14
Floyd Parallel 2
  • Give each processor a contiguous block of A and
    D. (row/column partition)
  • Can use up to N2 processors.

15
Floyd Parallel 2
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

16
Floyd Parallel 2 (P14s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

my block
17
Floyd Parallel 2 (P14s view)
for (k 0 k lt n k) for (i
i_local_start i lt i_local_end 1 i)
for (j j_local_start j lt j_local_end
j) Dij
min(Dij, Dik Dkj)

kth row
my block
kth column
18
Floyd Parallel 2 costs
  • Computation
  • T (N3/P) tc
  • Communication (broadcasts of kth row)
  • T 2N log(sqrt(P)) (a bN/sqrt(P))
  • N log(P) (a bN/sqrt(P))
  • Overall
  • T (N3/P) tc N log(P) (a bN/sqrt(P))

19
Dijkstras Algorithm find shortest paths from
vertex s to all others. (Ds)
Ds 0 if (i!s) Di inf TV / set of all
vertices / for (k 0 k lt n k) find
i in T with min di for each edge (i,j)
with j in T if (dj gt di
aij) dj di
aij T - i
20
Dijkstras Algorithm
Dj min(Dj, Di Aij)
j
dJ
aIJ
I
dI
s
21
Dijkstra Parallel
  • Give each processor all of A and have it run
    serial Dijkstra to compute contiguous rows of D.
  • Can use at most N processors. Local memory must
    hold all of A.

A
D
22
Dijkstras Algorithm Parallel
for (s local_firstrow s lt local_lastrow
s) Dss 0 if (i!s) Dsi
inf TV / set of all vertices / for (k
0 k lt n k) find i in T with min
Dsi for each edge (i,j) with j in T
if (Dsj gt Dsi Aij)
Dsj Dsi Aij
T - i
23
Dijkstra Parallel costs
  • From literature, Dijkstras slower than Floyd by
    a factor F1.6.
  • Computation
  • T (N3/P) F tc
  • No Communication

24
Cost summary
25
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

26
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

27
Run Times on Bluemarlin
  • On each run, the maximum time over all the
    processors was recorded as the time for that run.
  • Three runs were made, and the median run time is
    recorded in the following tables.

28
Serial Run Times
  • In serial, the Floyd code was faster than the
    Dijkstra code.
  • Speed advantage for Floyd was less than reported
    value, our F 1.15

29
Parallel Run Times
30
N720, Run times and Speed-up
Floyd 2 fastest up to P4 Dijkstra fastest
thereafter
Dijkstra has best speed ups
31
Models of performance
  • Ping pong test code to generate data
  • Least squares fit
  • a .002 sec/message
  • b 8x10-7 sec/double

32
Dijkstra Model Comparison
  • Model
  • T (N3/P) F tc
  • No communication, no a or b terms.

33
Floyd Model Comparison
  • Poor match between model and results
  • Model appears to overestimate the cost of
    communication.

34
Talk Outline
  • Background for the problem
  • Algorithms
  • Code Listings
  • Numerical Results
  • Conclusions and Issues

35
The actual run times for the codes confirmed
expectations
  • Floyd faster than Dijkstra in serial
  • With increasing number of processors, Dijkstra
    eventually becomes faster because no
    communication occurs.
  • Speed ups were good for Floyd1, better for
    Floyd2, and best (near perfect) for Dijkstra.
  • Worst speed up was with Floyd1 and N576, where
    the speedup was 9.3 for the 16 processor run.

36
Model gave poor quantitative prediction for run
times of Floyd in parallel.
  • Communication is only broadcasts
  • The a and b terms are computed from MPI_Send and
    MPI_Recv code.
  • Plugging these a and b into
  • T N log(P) (a bN)
  • must not give a good prediction for broadcast
    time.
  • Actual broadcast times seem to be quite a bit
    smaller

37
Reconciling Model to Result
  • Clear that the cost of the broadcasts are being
    over estimated
  • Much better fit with model if latency factor a is
    reduced by a factor of 10.

38
Reconciling Model to Result
  • Clear that the cost of the broadcasts are being
    over estimated
  • Much better fit with model if latency factor a is
    reduced by a factor of 10.
  • Better still if bandwidth parameter b is also
    reduced by a factor of 10.

39
Future Work?
  • Time broadcast and see how it matches model
    (expect poor match).
  • Adjust a and b to fit.
  • Find another model with better match.
Write a Comment
User Comments (0)
About PowerShow.com