AllPairsShortestPaths for Large Graphs on the GPU - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

AllPairsShortestPaths for Large Graphs on the GPU

Description:

Introduction to Algorithms, T. Cormen ... Warshall's algorithm: transitive closure. Computes the transitive closure of a relation ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 43
Provided by: garyj2
Category:

less

Transcript and Presenter's Notes

Title: AllPairsShortestPaths for Large Graphs on the GPU


1
All-Pairs-Shortest-Paths for Large Graphs on the
GPU
  • Gary J Katz1,2, Joe Kider1
  • 1University of Pennsylvania
  • 2Lockheed Martin ISGS

2
What Will We Cover?
  • Quick overview of Transitive Closure and
    All-Pairs Shortest Path
  • Uses for Transitive Closure and All-Pairs
  • GPUs, What are they and why do we care?
  • The GPU problem with performing Transitive
    Closure and All-Pairs.
  • Solution, The Block Processing Method
  • Memory formatting in global and shared memory
  • Results

3
Previous Work
  • A Blocked All-Pairs Shortest-Paths Algorithm
  • Venkataraman et al.
  • Parallel FPGA-based All-Pairs Shortest Path in
    a Diverted Graph
  • Bondhugula et al.
  • Accelerating large graph algorithms on the GPU
    using CUDA
  • Harish

4
NVIDIA GPU Architecture
  • Issues
  • No Access to main memory
  • Programmer needs to explicitly reference L1
    shared cache
  • Can not synchronize multiprocessors
  • Compute cores are not as smart as CPUs, does
    not handle if statements well

5
Background
  • Some graph G with vertices V and edges E
  • G (V,E)
  • For every pair of vertices u,v in V a shortest
    path from u to v, where the weight of a path is
    the sum of he weights of its edges

6
Adjacency Matrix
7
Quick Overview of Transitive Closure
  • The Transitive Closure of G is defined as the
    graph G (V, E), whereE (i,j) there is
    a path from vertex i to vertex j in G
  • -Introduction to Algorithms, T. Cormen

Simply Stated The Transitive Closure of a graph
is the list of edges for any vertices that can
reach each other
1
5
1
5
Edges 1, 5 2, 1 4, 2 4, 3 6, 3 8, 6
Edges 1, 5 2, 1 4, 2 4, 3 6, 3 8, 6 2, 5 8, 3 7,
6 7, 3
2
2
4
4
6
6
3
8
8
3
7
7
8
Warshalls algorithm transitive closure
  • Computes the transitive closure of a relation
  • (Alternatively all paths in a directed graph)
  • Example of transitive closure1

0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0
Design and Analysis of Algorithms - Chapter 8
7
9
Warshalls algorithm
  • Main idea a path exists between two vertices i,
    j, iff
  • there is an edge from i to j or
  • there is a path from i to j going through vertex
    1 or
  • there is a path from i to j going through vertex
    1 and/or 2 or
  • there is a path from i to j going through vertex
    1, 2, and/or k or
  • ...
  • there is a path from i to j going through any of
    the other vertices

Design and Analysis of Algorithms - Chapter 8
8
10
Warshalls algorithm
  • Idea dynamic programming
  • Let V1, , n and for kn, Vk1, , k
  • For any pair of vertices i, j?V, identify all
    paths from i to j whose intermediate vertices are
    all drawn from Vk Pijkp1, p2, , if Pijk??
    then Rki, j1
  • For any pair of vertices i, j Rni, j, that is
    Rn
  • Starting with R0A, the adjacency matrix, how to
    get R1 ? ? Rk-1 ? Rk ? ? Rn

Vk
P1
i
j
p2
Design and Analysis of Algorithms - Chapter 8
9
11
Warshalls algorithm
  • Idea dynamic programming
  • p?Pijk p is a path from i to j with all
    intermediate vertices in Vk
  • If k is not on p, then p is also a path from i to
    j with all intermediate vertices in Vk-1
    p?Pijk-1

k
Vk
Vk-1
p
i
j
Design and Analysis of Algorithms - Chapter 8
10
12
Warshalls algorithm
  • Idea dynamic programming
  • p?Pijk p is a path from i to j with all
    intermediate vertices in Vk
  • If k is on p, then we break down p into p1 and p2
    where
  • p1 is a path from i to k with all intermediate
    vertices in Vk-1
  • p2 is a path from k to j with all intermediate
    vertices in Vk-1

p
k
Vk
p1
p2
Vk-1
i
j
Design and Analysis of Algorithms - Chapter 8
11
13
Warshalls algorithm
  • In the kth stage determine if a path exists
    between two vertices i, j using just vertices
    among 1, , k
  • R(k-1)i,j (path using
    just 1, , k-1)
  • R(k)i,j or
  • (R(k-1)i,k and R(k-1)k,j) (path from
    i to k

  • and from k to j

  • using just 1, , k-1)


k
i
kth stage
j
Design and Analysis of Algorithms - Chapter 8
12
14
Quick Overview All-Pairs-Shortest-Path
  • The All-Pairs Shortest-Path of G is defined for
    every pair of vertices u,v E V as the shortest
    (least weight) path from u to v, where the weight
    of a path is the sum of the weights of its
    constituent edges.
  • -Introduction to Algorithms, T. Cormen

Simply Stated The All-Pairs-Shortest-Path of a
graph is the most optimal list of vertices
connecting any two vertices that can reach each
other
1
5
Paths 1 ? 5 2 ? 1 4 ? 2 4 ? 3 6 ? 3 8 ? 6 2 ? 1 ?
5 8 ? 6 ? 3 7 ? 8 ? 6 7 ? 8 ? 6 ? 3
2
4
6
8
3
7
15
Uses for Transitive Closure and All-Pairs
16
Floyd-Warshall Algorithm
1
1
1
1
1
5
2
4
Pass 1 Finds all connections that are connected
through 1
Pass 6 Finds all connections that are connected
through 6
Pass 8 Finds all connections that are connected
through 8
6
8
3
Running Time O(V3)
7
17
Parallel Floyd-Warshall
Each Processing Element needs global access to
memory
This can be an issue for GPUs
Theres a short coming to this algorithm though
18
The Question
  • How do we calculate the transitive closure on the
    GPU to
  • Take advantage of shared memory
  • Accommodate data sizes that do not fit in memory

Can we perform partial processing of the data?
19
Block Processing of Floyd-Warshall
Organizational structure for block processing?
Data Matrix
20
Block Processing of Floyd-Warshall
21
Block Processing of Floyd-Warshall
N 4
22
Block Processing of Floyd-Warshall
K 1
i,j i,k k,j (5,1) -gt (5,1)
(1,1) (8,1) -gt (8,1) (1,1) (5,4) -gt (5,1)
(1,4) (8,4) -gt (8,1) (1,4)
K 4
i,j i,k k,j (5,1) -gt (5,4)
(4,1) (8,1) -gt (8,4) (4,1) (5,4) -gt (5,4)
(4,4) (8,4) -gt (8,4) (4,4)
Wi,j Wi,j (Wi,k Wk,j)
For each pass, k, the cells retrieved must be
processed to at least k-1
23
Block Processing of Floyd-Warshall
Putting it all Together Processing K 1-4
Pass 1 i 1-4, j 1-4 Pass 2 i
5-8, j 1-4 i 1-4, j 5-8 Pass 3 i
5-8, j 5-8
Wi,j Wi,j (Wi,k Wk,j)
24
Block Processing of Floyd-Warshall
Range i 5,8 j 5,8 k 5,8
N 8
Computing k 5-8
25
Block Processing of Floyd-Warshall
Putting it all Together Processing K 5-8
Pass 1 i 5-8, j 5-8 Pass 2 i
5-8, j 1-4 i 1-4, j 5-8 Pass 3 i
1-4, j 1-4
Transitive Closure Is complete for k 1-8
Wi,j Wi,j (Wi,k Wk,j)
26
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 1
27
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 2
28
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 3
29
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 4
30
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 5
31
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 6
32
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 7
33
Increasing the Number of Blocks
  • Primary blocks are along the diagonal
  • Secondary blocks are the rows and columns of the
    primary block
  • Tertiary blocks are all remaining blocks

Pass 8
In Total N Passes 3 sub-passes per pass
34
Running it on the GPU
  • Using CUDA
  • Written by NVIDIA to access GPU as a parallel
    processor
  • Do not need to use graphics API

Grid Dimension
  • Memory Indexing
  • CUDA Provides
  • Grid Dimension
  • Block Dimension
  • Block Id
  • Thread Id

Block Id
Thread Id

Block Dimension
35
Partial Memory Indexing
1
  • SP1

1
N - 1
0
SP2
1
N - 1
SP3
N - 1
36
Memory Format for All-Pairs Solution
  • All-Pairs requires twice the memory footprint of
    Transitive Closure

1
5
Connecting Node
Distance
2
4
6
1
2
3
4
5
6
7
8
8
3
1
0
1
0
1
1
2
2
3
0
1
0
1
4
N
7
5
0
1
6
7
3
8
6
7
8
3
8
2
0
1
8
6
2
0
1
2N
Shortest Path
37
Results
SM cache efficient GPU implementation compared to
standard GPU implementation
38
Results
SM cache efficient GPU implementation compared to
standard CPU implementation and cache-efficient
CPU implementation
39
Results
SM cache efficient GPU implementation compared to
best variant of Han et al.s tuned code
40
Conclusion
  • Advantages of Algorithm
  • Relatively Easy to Implement
  • Cheap Hardware
  • Much Faster than standard CPU version
  • Can work for any data size

Special thanks to NVIDIA for supporting our
research
41
Backup
42
CUDA
  • CompUte Driver Architecture
  • Extension of C
  • Automatically creates thousands of threads to run
    on a graphics card
  • Used to create non-graphical applications
  • Pros
  • Allows user to design algorithms that will run in
    parallel
  • Easy to learn, extension of C
  • Has CPU version, implemented by kicking off
    threads
  • Cons
  • Low level, C like language
  • Requires understanding of GPU architecture to
    fully exploit
Write a Comment
User Comments (0)
About PowerShow.com