Sparse Matrix Techniques (Tutorial)

About This Presentation

Title:

Sparse Matrix Techniques (Tutorial)

Description:

Computer representations of sparse matrices. Sparse matrix-vector ... 'triplets' format ({i, j, val}) is not sufficient . . . Storage: 2*NNZ integers, NNZ reals ... – PowerPoint PPT presentation

Number of Views:569

Avg rating:3.0/5.0

Slides: 19

Provided by: csBer

Category:

more less

Transcript and Presenter's Notes

Title: Sparse Matrix Techniques (Tutorial)

1
Sparse Matrix Techniques(Tutorial)

X. Sherry Li
Lawrence Berkeley National Lab
Math 290 / CS 298, UCB
Jan. 31, 2007

2
Outline

Part I
Computer representations of sparse matrices
Sparse matrix-vector multiply with various
storages
Performance optimizations
Part II
Techniques for sparse factorizations
(e.g., SuperLU solver)

3
Sparse Storage Schemes

Notation
N dimension
NNZ number of nonzeros
Assume arbitrary sparsity pattern
triplets format (i, j, val) is not sufficient
. . .
Storage 2NNZ integers, NNZ reals
Not easy to randomly access one row or column
Linked list format provides flexibility, but not
friendly on modern architectures . . .
Cannot call BLAS directly

4
Compressed Row Storage (CRS)

Store nonzeros row by row contiguously
Example N 7, NNZ 19
3 arrays
Storage NNZ reals, NNZN1 integers

5
SpMV (y Ax) with CRS

dot product
No locality for x
Vector length usually short
Memory-bound 3 reads, 2 flops

6
Compressed Column Storage (CCS)

Also known as Harwell-Boeing format
Store nonzeros columnwise contiguously
3 arrays
Storage NNZ reals, NNZN1 integers

7
SpMV (y Ax) with CCS

SAXPY
No locality for y
Vector length usually short
Memory-bound 3 reads, 1 write, 2 flops

y(i) 0.0, i 1N do j 1, N . . .
column j of A t x(j) do i colptr(j),
colptr(j1) 1 y(rowind(i))
y(rowind(i)) nzval(i) t enddo enddo
8
Jagged Diagonal Storage (JDS)

Also known as ITPACK, or Ellpack storage Saad,
Kincaid et al.
Force all rows to have the same length as the
longest row,
then columns are stored contiguously
2 arrays nzval(N,L) and colind(N,L), where L
max row length
NL reals, NL integers
Usually L ltlt N

9
SpMV with JDS

Neither dot nor SAXPY
Good for vector processor long vector length (N)
Extra memory, flops for padded zeros, especially
bad if row lengths vary a lot

y(i) 0.0, i 1N do j 1, L do i 1, N
y(i) y(i) nzval(i, j) x(colind(i,
j)) enddo enddo
10
Segmented-Sum Blelloch et al.

Data structure is an augmented form of CRS
Computational structure is similar to JDS
Each row is treated as a segment in a long vector
Underlined elements denote the beginning of each
segment
(i.e., a row in A)
Dimension S L NNZ, where L is chosen to
approximate the hardware vector length

11
SpMV with Segmented-Sum

2 arrays nzval(S, L) and colind(S, L), where SL
NNZ
NNZ reals, NNZ integers
Good for vector processors
SpMV is performed bottom-up, with each row-sum
(dot) of Ax stored in the beginning of each
segment
Similar to JDS, but with more control logic in
inner-loop

do i S, 1 do j 1, L . . .
enddo enddo
12
Performance (megaflop rate) Gaeke et al.

Test matrix N 10000, NNZ 177782, random
pattern
18 nonzeros per row on average
JDS does 4.6x more operations

machine Ultra 2i Pentium 4 VIRAM
Clock rate 333 MHz 1.5 GHz 200 MHz
Peak flop rate 667 Mflops 1.5 Gflops 1.6 Gflops
CRS 29 209 110
JDS (effective) 27 6 17 4 632 137
Seg-Sum 5 29 165
13
Optimization Techniques

Matrix reordering
For CRS SpMV, can improve x-vector locality by
reducing the bandwidth of matrix A
Example reverse Cuthill-McKee (breadth-first
search)
Observed 2-3x improvement Toledo, et al.

14
Optimization Techniques

Register blocking
Find dense blocks of size r-by-c in A
(If needed, allow some zeros to be filled in)
Ax is proceeded block by block
keep c elements of x and r elements of y in
registers
x element re-used r times, y element re-used c
times
Amount of indexed load and store is reduced
Obtained up to 2.5x improvement Vuduc et al.

15
SPARSITY Im, Yelick
16
Performance Improvement Vuduc et al.
17
Other Representations

Block entry formats (e.g., multiple degrees of
freedom are associated with a single physical
location)
Constant block size (BCRS)
Varying block sizes (VBCRS)
Skyline (or profile) storage (SKS)
Lower triangle stored row by row
Upper triangle stored column by column
In each row (column), first nonzero
defines a profile
All entries within the profile
(some may be zeros) are stored

18
References

Templates for the solution of linear systems
Barrett, et al., SIAM, 1994
BeBOP http//bebop.cs.berkeley.edu/
Sparse BLAS standard
http//www.netlib.org/blas/blast-forum

Write a Comment

User Comments (0)

About PowerShow.com

Sparse Matrix Techniques (Tutorial) - PowerPoint PPT Presentation

Sparse Matrix Techniques (Tutorial)

Computer representations of sparse matrices. Sparse matrix-vector ... 'triplets' format ({i, j, val}) is not sufficient . . . Storage: 2*NNZ integers, NNZ reals ... – PowerPoint PPT presentation