A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices


1
A Fine-Grain Hypergraph Model for 2D
Decomposition of Sparse Matrices
Umit V. Catalyurek and Cevdet Aykanat
Department of Computer EngineeringBilkent
University
2
Outline
  • Graph Partitioning
  • Hypergraph Partitioning
  • Standard Graph Model for Sparse Matrix
    Representation
  • Fine-Grain Hypergraph Model for Sparse Matrix
    Representation
  • Experiment Results
  • Applicability of the Fine-Grain Hypergraph Model

3
Graph Partitioning
  • Graph G(V, E) set of vertices V and set of
    edges E
  • every edge eij ? E connects pair of distinct
    vertices vi and vj
  • K-way graph partition by edge separator ?V1,
    V2, , VK
  • Vk is nonempty subset of V, i.e., Vk ? V,
  • parts are pairwise disjoint, i.e., Vk ? Vl ?,
  • union of K parts is equal to V, i.e., ?k1K Vk
    V.
  • an edge eij is said to be
  • cut if vi ? Vk and vj ? Vl and k?l
  • uncut if vi ? Vk and vj ? Vk
  • a partition is said to be balanced if
  • Wk ? Wavg (1 ?)
  • Wk weight of part Vk, ? maximum
    imbalance ratio
  • cost of a partition
  • cutsize(?) ?eij ? EE w(eij)
  • where EE is set of cut edges

4
Hypergraph Partitioning
  • Hypergraph H(V,N) a set of vertices V and a set
    of nets N
  • nets (hyperedges) connect two or more vertices
  • every net nj ? N is a subset of vertices, i.e.,
    nj ? V
  • graph is a special instance of hypergraph
  • K-way hypergraph partition ??V1, V2, , VK
  • a net that has at least one pin in a part is said
    to connect that part
  • connectivity set C(nj) of a net nj set of
    parts connected by nj
  • connectivity c(nj) C(nj) of a net nj
    number of parts connected by nj.
  • a net nj is said to be
  • cut if c(nj) gt 1
  • uncut if c(nj) 1
  • two cutsize definitions widely used in VLSI
    community
  • net-cut metric cutsize(?) ?n ? NE w(ni)
  • connectivity - 1 metric cutsize(?)
    ?n ? NE w(ni) (c(nj) - 1)

5
Hypergraph Partitioning
  • cut nets NE n1, n8, n15
  • connectivity sets
  • C(n1) V1,V2,
  • C(n8) C(n15) V1,V2,V3
  • connectivity values
  • c (n1 ) 2, c (n8 ) c (n15 ) 3
  • cutsize values assuming unit net weights
  • net-cut metric cutsize(?) NE 3
  • connectivity - 1 metric cutsize(?) 1 2 2
    5

6
Parallel Matrix-Vector Multiplication yAx
  • Parallel iterative solvers
  • 1D rowwise or columwise partitioning of A
  • symmetric partitioning to avoid communication
    during linear vector operations
  • all vectors are divided conformally with row or
    column partitioning
  • symmetric row/column permutation on A
  • processor Pk computes linear vector operations on
    k-th blocks of vectors.
  • rowwise Pk computes yk Ark x
  • entries of the x-vector are communicated
  • columnwise Pk computes yk Ack xk, where y ?
    yk
  • entries of the yk vectors are communicated

7
Graph Model for Representing Sparse Matrices
  • standard graph model G(V, E) for matrix A
  • vertex set one vertex vi for each row/column i
    of A
  • vi ? V ? task i of computing inner product yi
    lt ri, xgt
  • node weighting w (vi) number of nonzeros in
    row ri
  • edge set E (vi, vj) ? E ? aij ? 0 and aji ? 0
  • each edge denotes bidirectional interaction
    between tasks i and j
  • edge (vi, vj) ? E ? yi ? yi aij xj and yj ? yj
    ajixi
  • exchange of xi and xj values before local
    matrix-vector products
  • rows ri and rj assigned to different processors
  • ? communication of two words
  • edge weighting w (vi, vj) 2

8
Graph Model Minimizes the Wrong Metric
  • a 4-way rowwise partition of a sample symmetric
    matrix in graph model
  • cost(?) 2 ? 5 10 words, but actual
    communication volume is 7 words
  • P1 sends xi to both P2 and P4
  • P2 and P4 send xj, xk, xl and xm, xh ,
    respectively, to P1
  • graph model tries to minimize the total number of
    off-block-diagonal nonzeros
  • treats each off-block-diagonal nonzero entry as
    if it incurs a distinct communication
  • nonzeros in the same column of an off-diagonal
    block
  • necessitate the communication of only a single x
    value

P1
P2
P3
P4
P2
P1
h
m
l
k
j
i
Vj
P1
Vi
Vk
i
Vl
j
k
P2
l
Vm
P3
Vh
m
P4
h
P4
P3
9
Fine-Grain Hypergraph Model
  • M x M matrix A with Z nonzeros is represented by
    H(V, N)
  • Z vertices one vertex vij for each aij ? 0
  • 2 ?M nets one net for each row and for each
    column of A
  • N NR? NC
  • row nets NR m1, m2, , mM
  • column nets NC n1, n2, , nM
  • vij ? mi and vij ? nj iff aij ? 0
  • column-net nj represents dependency of atomic
    tasks to xj
  • row-net mi represents dependency of computing yi
    to partial y'i results

10
Fine-Grain Hypergraph Model
one vertex for each nonzero
11
Fine-Grain Hypergraph Model for 2D Decomposition
  • unit net weighting w(n) 1 for each net n ? N
  • use connectivity-1 metric cutsize(?) ?n ? NE
    (c(nj) - 1)
  • minimizing cutsize corresponds to minimizing
    total volume of communication
  • consistency of the model
  • exact correspondence between cutsize and
    communication volume
  • maintain symmetric partitioning yi, xi assigned
    to the same processor
  • consistency condition
  • vii ? ni and vii ? mi for each vertex vii (holds
    iff aii ? 0 )
  • consider a K-way partition ??V1, V2, , VK
    H(V, N)
  • ? induces a partition on nonzeros of matrix A
  • decode vii ? Vk ? assign yi and xi to processor
    Pk

12
Fine-Grain Hypergraph Model for 2D Decomposition
1
2
3
4
5
6
7
8
1
1
2
1
2
2
2
2
2
2
3
3
1
3
3
4
5
3
3
6
1
1
x2
7
1
2
3
1
8
3
cutsize(?) 8
Communication Volume8
13
Experimental Results
14
Experimental Results
15
Applicability of the Model
  • Parallel reduction
  • columns / x-vector inputs
  • rows / y-vector output
  • nonzeros input to output mapping computation
  • Fine-grain hypergraph model
  • models the workload partitioning
  • Doesnt restrict the place of computation to the
    owner of input or output
  • For each input output pair computation can be
    done in any of the processors
  • Communication volume is minimized and workload
    balanced!
  • Directly and exactly models the total
    communication volume
  • column-nets dependency to input ? models
    pre-communication
  • row-nets dependency of output to partial outputs
    ? models post-communication

16
End of Talk
17
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com