AllPairsShortestPaths for Large Graphs on the GPU - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

AllPairsShortestPaths for Large Graphs on the GPU

Description:

Introduction to Algorithms, T. Cormen ... Warshall's algorithm: transitive closure. Computes the transitive closure of a relation ... – PowerPoint PPT presentation

Number of Views:221

Avg rating:3.0/5.0

Slides: 43

Provided by: garyj2

Category:

more less

Transcript and Presenter's Notes

Title: AllPairsShortestPaths for Large Graphs on the GPU

1
All-Pairs-Shortest-Paths for Large Graphs on the
GPU

Gary J Katz1,2, Joe Kider1
1University of Pennsylvania
2Lockheed Martin ISGS

2
What Will We Cover?

Quick overview of Transitive Closure and
All-Pairs Shortest Path
Uses for Transitive Closure and All-Pairs
GPUs, What are they and why do we care?
The GPU problem with performing Transitive
Closure and All-Pairs.
Solution, The Block Processing Method
Memory formatting in global and shared memory
Results

3
Previous Work

A Blocked All-Pairs Shortest-Paths Algorithm
Venkataraman et al.
Parallel FPGA-based All-Pairs Shortest Path in
a Diverted Graph
Bondhugula et al.
Accelerating large graph algorithms on the GPU
using CUDA
Harish

4
NVIDIA GPU Architecture

Issues
No Access to main memory
Programmer needs to explicitly reference L1
shared cache
Can not synchronize multiprocessors
Compute cores are not as smart as CPUs, does
not handle if statements well

5
Background

Some graph G with vertices V and edges E
G (V,E)
For every pair of vertices u,v in V a shortest
path from u to v, where the weight of a path is
the sum of he weights of its edges

6
Adjacency Matrix
7
Quick Overview of Transitive Closure

The Transitive Closure of G is defined as the
graph G (V, E), whereE (i,j) there is
a path from vertex i to vertex j in G
-Introduction to Algorithms, T. Cormen

Simply Stated The Transitive Closure of a graph
is the list of edges for any vertices that can
reach each other
1
5
1
5
Edges 1, 5 2, 1 4, 2 4, 3 6, 3 8, 6
Edges 1, 5 2, 1 4, 2 4, 3 6, 3 8, 6 2, 5 8, 3 7,
6 7, 3
2
2
4
4
6
6
3
8
8
3
7
7
8
Warshalls algorithm transitive closure

Computes the transitive closure of a relation
(Alternatively all paths in a directed graph)
Example of transitive closure1

0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0
Design and Analysis of Algorithms - Chapter 8
7
9
Warshalls algorithm

Main idea a path exists between two vertices i,
j, iff
there is an edge from i to j or
there is a path from i to j going through vertex
1 or
there is a path from i to j going through vertex
1 and/or 2 or
there is a path from i to j going through vertex
1, 2, and/or k or
...
there is a path from i to j going through any of
the other vertices

Design and Analysis of Algorithms - Chapter 8
8
10
Warshalls algorithm

Idea dynamic programming
Let V1, , n and for kn, Vk1, , k
For any pair of vertices i, j?V, identify all
paths from i to j whose intermediate vertices are
all drawn from Vk Pijkp1, p2, , if Pijk??
then Rki, j1
For any pair of vertices i, j Rni, j, that is
Rn
Starting with R0A, the adjacency matrix, how to
get R1 ? ? Rk-1 ? Rk ? ? Rn

Vk
P1
i
j
p2
Design and Analysis of Algorithms - Chapter 8
9
11
Warshalls algorithm

Idea dynamic programming
p?Pijk p is a path from i to j with all
intermediate vertices in Vk
If k is not on p, then p is also a path from i to
j with all intermediate vertices in Vk-1
p?Pijk-1

k
Vk
Vk-1
p
i
j
Design and Analysis of Algorithms - Chapter 8
10
12
Warshalls algorithm

Idea dynamic programming
p?Pijk p is a path from i to j with all
intermediate vertices in Vk
If k is on p, then we break down p into p1 and p2
where
p1 is a path from i to k with all intermediate
vertices in Vk-1
p2 is a path from k to j with all intermediate
vertices in Vk-1

p
k
Vk
p1
p2
Vk-1
i
j
Design and Analysis of Algorithms - Chapter 8
11
13
Warshalls algorithm

In the kth stage determine if a path exists
between two vertices i, j using just vertices
among 1, , k
R(k-1)i,j (path using
just 1, , k-1)
R(k)i,j or
(R(k-1)i,k and R(k-1)k,j) (path from
i to k
and from k to j
using just 1, , k-1)

k
i
kth stage
j
Design and Analysis of Algorithms - Chapter 8
12
14
Quick Overview All-Pairs-Shortest-Path

The All-Pairs Shortest-Path of G is defined for
every pair of vertices u,v E V as the shortest
(least weight) path from u to v, where the weight
of a path is the sum of the weights of its
constituent edges.
-Introduction to Algorithms, T. Cormen

Simply Stated The All-Pairs-Shortest-Path of a
graph is the most optimal list of vertices
connecting any two vertices that can reach each
other
1
5
Paths 1 ? 5 2 ? 1 4 ? 2 4 ? 3 6 ? 3 8 ? 6 2 ? 1 ?
5 8 ? 6 ? 3 7 ? 8 ? 6 7 ? 8 ? 6 ? 3
2
4
6
8
3
7
15
Uses for Transitive Closure and All-Pairs
16
Floyd-Warshall Algorithm
1
1
1
1
1
5
2
4
Pass 1 Finds all connections that are connected
through 1
Pass 6 Finds all connections that are connected
through 6
Pass 8 Finds all connections that are connected
through 8
6
8
3
Running Time O(V3)
7
17
Parallel Floyd-Warshall
Each Processing Element needs global access to
memory
This can be an issue for GPUs
Theres a short coming to this algorithm though
18
The Question

How do we calculate the transitive closure on the
GPU to
Take advantage of shared memory
Accommodate data sizes that do not fit in memory

Can we perform partial processing of the data?
19
Block Processing of Floyd-Warshall
Organizational structure for block processing?
Data Matrix
20
Block Processing of Floyd-Warshall
21
Block Processing of Floyd-Warshall
N 4
22
Block Processing of Floyd-Warshall
K 1
i,j i,k k,j (5,1) -gt (5,1)
(1,1) (8,1) -gt (8,1) (1,1) (5,4) -gt (5,1)
(1,4) (8,4) -gt (8,1) (1,4)
K 4
i,j i,k k,j (5,1) -gt (5,4)
(4,1) (8,1) -gt (8,4) (4,1) (5,4) -gt (5,4)
(4,4) (8,4) -gt (8,4) (4,4)
Wi,j Wi,j (Wi,k Wk,j)
For each pass, k, the cells retrieved must be
processed to at least k-1
23
Block Processing of Floyd-Warshall
Putting it all Together Processing K 1-4
Pass 1 i 1-4, j 1-4 Pass 2 i
5-8, j 1-4 i 1-4, j 5-8 Pass 3 i
5-8, j 5-8
Wi,j Wi,j (Wi,k Wk,j)
24
Block Processing of Floyd-Warshall
Range i 5,8 j 5,8 k 5,8
N 8
Computing k 5-8
25
Block Processing of Floyd-Warshall
Putting it all Together Processing K 5-8
Pass 1 i 5-8, j 5-8 Pass 2 i
5-8, j 1-4 i 1-4, j 5-8 Pass 3 i
1-4, j 1-4
Transitive Closure Is complete for k 1-8
Wi,j Wi,j (Wi,k Wk,j)
26
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 1
27
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 2
28
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 3
29
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 4
30
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 5
31
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 6
32
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 7
33
Increasing the Number of Blocks

Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the
primary block
Tertiary blocks are all remaining blocks

Pass 8
In Total N Passes 3 sub-passes per pass
34
Running it on the GPU

Using CUDA
Written by NVIDIA to access GPU as a parallel
processor
Do not need to use graphics API

Grid Dimension

Memory Indexing
CUDA Provides
Grid Dimension
Block Dimension
Block Id
Thread Id

Block Id
Thread Id

Block Dimension
35
Partial Memory Indexing
1

1
N - 1
0
SP2
1
N - 1
SP3
N - 1
36
Memory Format for All-Pairs Solution

All-Pairs requires twice the memory footprint of
Transitive Closure

1
5
Connecting Node
Distance
2
4
6
1
2
3
4
5
6
7
8
8
3
1
0
1
0
1
1
2
2
3
0
1
0
1
4
N
7
5
0
1
6
7
3
8
6
7
8
3
8
2
0
1
8
6
2
0
1
2N
Shortest Path
37
Results
SM cache efficient GPU implementation compared to
standard GPU implementation
38
Results
SM cache efficient GPU implementation compared to
standard CPU implementation and cache-efficient
CPU implementation
39
Results
SM cache efficient GPU implementation compared to
best variant of Han et al.s tuned code
40
Conclusion