Answering Distance Queries in directed graphs using fast matrix multiplication - PowerPoint PPT Presentation

About This Presentation
Title:

Answering Distance Queries in directed graphs using fast matrix multiplication

Description:

Answering Distance Queries in directed graphs using fast matrix multiplication ... The -th iteration matrix product will (almost certainly) plug in their shortest ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 43
Provided by: liore
Category:

less

Transcript and Presenter's Notes

Title: Answering Distance Queries in directed graphs using fast matrix multiplication


1
Answering Distance Queries in directed graphs
using fast matrix multiplication
  • Seminar in Algorithms
  • Prof. Haim Kaplan
  • Lecture by Lior Eldar
  • 1/07/2007

2
Structure of Lecture
  • Introduction History
  • Alg1 APSP
  • Alg2 preprocess query
  • Alg3 Hybrid
  • Summary

3
Problem Definition
  • Given a weighted directed graph,
    we are requested to find
  • APSP - All pairs shortest paths find for any
    pair
  • SSSP - Single Source shortest paths find all
    distances from s.
  • A hybrid problem comes to mind
  • Preprocess the graph faster than APSP
  • Answer ANY two-node distance query faster than
    SSSP.
  • Whats it good for?

4
Previously known results APSP
  • Undirected graphs
  • Approximated algorithm by Thorup and Zwick
  • Preprocess undirected weighted graph in
  • expected time.
  • Generate data structure of size
  • Answer any query in O(1)
  • BUT answer is approximate with a factor of 2k-1.
  • For non-negative integer weights at most M
    Shoshan and Zwick developed an algorithm of run
    time
  • Directed graphs Zwick - runs in

5
Previously known results - SSSP
  • Positive weights
  • Directed graphs with positive weights Dijkstra
    with
  • Undirected graphs with positive integer edge
    weights Thorup with
  • Negative weights much harder
  • Bellman-Ford
  • Goldberg and Tarjan assumes edge weight values
    are at least N.

6
New Algorithm by Yuster / Zwick
  • Solves the hybrid pre-processing-query problem
    for
  • Directed graphs
  • Integer weights from M to M
  • Achieves the following performance
  • Pre-processing
  • Query answering O(n)
  • Faster than previously known APSP (Zwick) so long
    as the number of queries is
  • Better than SSSP performance (GoldbergTarjan)
    for dense graphs with small alphabet gap of

7
Beyond the numbers
  • An extension of this algorithm allows complete
    freedom in optimization of the pre-processing -
    query problem.
  • to optimize an algorithm for an arbitrary number
    of queries q, we want preprocessing time q
    query time to be minimal.
  • This defines the ratio between query time and
    pre-processing time - completely controlled by
    the algorithm inputs.
  • Meaning if we know in advance the number of
    queries we can fine-tune the algorithm as we wish.

8
Before we begin - scope
  • Assumptions
  • No negative cycles
  • Inputs
  • Directed Weighted Graph G(V,E,w)
  • Weights are M,0,,M
  • Outputs
  • Data structure such that given any two nodes
    produces the shortest distance between them (and
    not the path itself) with high probability.

9
Matrix Multiplication
  • The matrix product CAB, where A is an
    matrix, B is , and C is matrix, is
    defined as follows
  • Define the minimal number of
    algebraic operations for computing the matrix
    product.
  • Define as the smallest
    exponent such that
  • Theorem by Coppersmith and Winograd

10
Distance Products
  • The distance product , where A is an
    matrix, B is , and C is matrix, is
    defined as follows
  • Recall if W is an n x n matrix of the edge
    weights of a graph then is the distance
    matrix of the graph.
  • Lemma by Alon can be computed
    almost as fast as regular matrix
    multiplication

11
State-of-the-art APSP
  • Randomized algorithm by Zwick that runs in time
  • Intuition
  • Computation of all short paths is intensive.
  • BUT long paths are made up of short paths once
    we pay the initial price we can leverage this
    work to compute longer paths with less effort.
  • Strategy Giving up on certainty - with a small
    number of distance updates we can be almost sure
    that any long-enough path has at least one
    representative that is updated.

12
Basic Operations
  • Truncation
  • Replace any entry larger than t with
  • Selection
  • Extract from D the elements whose row indices are
    in A, and column indices are in B.
  • Min-Assignment
  • Assign to each element the smallest between the
    two corresponding elements of D and D.

13
Pseudo-code
  • Simply sample nodes and multiply decimated
    matrices

14
On matrices and nodes
  • Column-decimated matrix

Distance between any two nodes
D
Shortest directed path from any node to any node
in B
15
On matrices and nodes(2)
  • Row-decimated matrix

Distance between any two nodes
Shortest directed path from any node in B to any
node
16
What do we prove?
  • Lemma if there is a shortest path between nodes
    i and j in G that uses at most edges, then
    after the -th iteration of the algorithm, with
    high probability we have
  • Meaning at each iteration we update with high
    probability all the paths in the graph of a
    certain length. This serves as a basis for the
    next iteration.

17
Proof Outline
  • By Induction
  • Base case easy the input W contains all paths
    of length
  • Induction step
  • Suppose that the claim holds for and
    show that it also holds for
  • Take any two nodes that their shortest distance
    is at least . The -th iteration
    matrix product will (almost certainly) plug in
    their shortest distance at location (i,j) of D.

18
Why?
  • Set
  • The path p from i to j is at least 2s/3.
  • This divides p into three subsections
  • Left at most s/3
  • Right at most s/3
  • Middle exactly s/3

19
The Details
  • The left and right thirds - help attain the
    induction step.
  • The path p(i,k) and p(k,j) are short enough at
    most 2s/3 ? good for previous step
  • The middle third ensures the fault
    probability is low enough.
  • Prob(no k is selected)
  • Probability still goes to 0 (as n tends to
    infinity) after computation of
  • entries
  • iterations

20
So
  • Assuming all previous steps were good enough
  • With high probability each long-enough path has a
    representative in B
  • The update of the D using the product
  • plugs in the correct result.
  • Note that
  • Each element is first limited to sM
  • This is necessary for the fast-matrix-multiplicati
    on algorithm

21
Complexity
  • Where does the trick hide?
  • The matrix alphabet increases linearly with
    iteration number
  • The product size decreases with iteration number
  • For each iteration
  • Alphabet size sM
  • Product complexity , where
  • Total
  • Disregarding the log function, and optimizing
    between fast and naïve matrix products we get

22
Fast Product versus Naive
assuming small M
23
Complexity Behavior
  • For a given matrix alphabet M, we find the
    cross-over point between the matrix algorithms.
  • For high r (gtM-dependent threshold) we use FMM
  • Complexity dependent on M
  • For low r (ltthreshold) we use naïve
    multiplication
  • Complexity not dependent on M
  • Q How does complexity change over the iteration
    number?

24
Pre-processing algorithm
  • Motivation
  • We rarely query all node-pairs
  • Strategy
  • Replace the costly matrix product
  • with 2 smaller products
  • Generate data structure such that each
    query costs only

25
Starting with the query
  • Pseudo-code
  • What is a sufficient trait of D, such that the
    returned value will be, with high probability
  • Answer with high probability, a node k on the
    path from i to j should have

26
Preprocessing algorithm
27
New matrix type
  • RowColumn-decimated matrix

Query data structure for any two nodes
D
Query data-structure for any 2 nodes in B
28
What do we prove?
  • Lemma 4.1 If or , and there is a
    shortest path from i to j in G that uses at most
    edges, then after the -th iteration
    of the preprocessing algorithm, with high
    probability we have .
  • Meaning D has the necessary trait for any path
    p, if we iterate long enough, then with high
    probability, for at least one node k (in p(i,j))
    the entries d(i,k), d(k,j) will contain shortest
    paths. Hence, query will return the correct
    result.

29
Proof Outline - preprocess
  • By Induction
  • Base case easy BV, and the input W contains
    all paths of length .
  • Induction step
  • Suppose that the claim holds for and
    show that it also holds for
  • Take any two nodes that their shortest distance
    is at most . The l-th iteration matrix
    products (2) will (almost certainly) plug in
    their shortest distance at location (i,j) of D
    provided that EITHER or
  • .

30
Why?
  • Set
  • The path p from i to j is at least 2s/3.
  • This divides p into three subsections
  • Left at most s/3
  • Right at most s/3
  • Middle exactly s/3

31
The Details
  • Assume that .
  • With high probability ( ) there will be k
    in p(i,j), such that (remember why?)
  • Both are also in ,since
  • We therefore attain the induction step
  • The path p(i,k) and p(k,j) are short enough at
    most 2s/3 ? good for previous step.
  • The end-points of these paths (k) are in
  • Therefore their shortest distance is in D
  • The second product then updates correctly.
    (assumption critical here)

32
Wheres the catch?
  • In APSP, we assure that
  • At every iteration l we compute the shortest path
    of length at most .
  • BUT we had to update all pairs each time
  • In the preprocess algorithm, we assure
  • At every iteration l, we compute the shortest
    path of length at most only for a
    selected subset.
  • BUT this subset covers all possible subsequent
    queries, with high probability.

33
Complexity
  • Matrix product instead of
    operations we only get
  • As before, for each iteration ,
    the alphabet size is sM.
  • Total complexity
  • No matrix-product switch here!

34
Performance
  • For small M, as long as the number of queries is
    less than we get better results
    than APSP.
  • For small M
  • The algorithm overtakes Goldbergs algorithm, if
    the graph is dense
  • For a dense-enough graph , we can
    run many SSSP queries and still be faster

35
The larger picture
  • We saw
  • Alg1 heavy pre-processing, light query
  • Alg2 light pre-processing, heavy query
  • Alg3 ?

Query-oriented (APSP)
Preprocess- oriented (pre-process)
36
The Third Way
  • Suppose we know in advance the we require no more
    than queries.
  • We use the following
  • Perform iterations of the
    APSP algorithm
  • Perform iterations of the
    pre-process algorithm
  • Take the matrix B from the last step of step 1.
    The product returns
    in any shortest-distance query.

37
Huh?
  • After the first stage ? D holds all the shortest
    path of all short paths, of lengths at most
    with high probability.
  • When the second starts stage it can be sure that
    the induction holds for all
  • The second stage takes care of the long paths,
    with respect to querying. Meaning
  • If the path is long it will have a representative
    in one of the second-phase iterations
  • If it is too-short it will fall under the
    jurisdiction of the first stage.

38
Complexity
  • The first stage ( updates) costs at most
  • The second stage costs only
  • The query costs
  • For example if want to answer a distance query
    in , we can pre-process in time

39
QA (I ask - you answer)
  • Q Why couldnt we sample B in the query step of
    Alg2 the one that initially costs O(n)?
  • A Because if the path is too short we will
    have no guarantee that it will have a
    representative in B. Alg3 solves this because
    short distances are computed rigorously.
  • Conclusion the less we sample out of V when we
    query, the more steps we need to run APSP to
    begin with.

40
Final Procedure
  • Given q queries, determine the query complexity
    using .
  • This assumes M is small enough so that we use
    fast product. Otherwise compare to
  • Execute alg3 using steps of
    APSP and steps of pre-process
  • Query all q queries.

41
Summary
  • For the problem we defined directed graph, with
    integer weights, whose absolute value is at most
    M, we have seen
  • Alg1 State-of-the-art APSP in
  • Alg2 State-of-the-art SSSP in
  • Alg3 A method to calibrate between the two, for
    a known number of queries.

42
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com