Matrix Vector Multiplication Summary - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Matrix Vector Multiplication Summary

Description:

Decompose matrix along columns and divide the vector ... Getting Columns of the Matrix. In the memory the matrices are stored as rows ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 30

Provided by: saikatmuk

Category:

more less

Transcript and Presenter's Notes

Title: Matrix Vector Multiplication Summary

1
Matrix Vector Multiplication(Summary)
2
Parallel Matrix Vector Multiplication

We want to multiply a matrix by a vector
If the matrix is m rows and n columns, then the
vector must be n columns
Parallelization
We can divide the vector
Into vectors of smaller sizes
Replicate into each processor (vectors require
less memory)
We can divide the matrix
Along the rows
Along the columns
Into small blocks

Dividing the data among processors
3
Three Algorithms

Decompose matrix along rows and replicate the
vector
Decompose matrix along columns and divide the
vector
Decompose matrix in blocks and divide the vector

NOTE It makes sense to divide up the vector when
the matrix has less rows
4
Analyzing a Parallel Algorithm

Partitioning
Communication
Agglomeration Mapping
Computational Complexity
Communicational Complexity
Scalability

Algorithmic Characteristics
Performance Evaluation
5
Method 1 Matrix divided by RowVector Replicated
9
7
2
5
6
MPI_Allgatherv
7
MPI_Allgatherv
int MPI_Allgatherv ( void
send_buffer, int send_cnt,
MPI_Datatype send_type, void
receive_buffer, int receive_cnt,
int receive_disp, MPI_Datatype
receive_type, MPI_Comm communicator)
8
Agglomeration and Mapping

Static number of tasks
Regular communication pattern (all-gather)
Computation time per task is constant
Strategy

Agglomerate groups of rows
Create one task per MPI process

9
Complexity Analysis

Sequential algorithm complexity ?(n2)
Parallel algorithm computational complexity
?(n2/p)
Each processor has n/p rows of width n
Each inner product takes n(n/p)
Communication complexity of all-gather ?(log p
n)
Gather takes log(p) steps
Each processor takes n/p elements
It sends it to p-1 processors
Total log(p)latency n/p(p-1)/bandwidth
Overall complexity ?(n2/p log p)

10
Isoefficiency Analysis

Sequential time complexity ?(n2)
Parallel communication is all-gather
Communication complexity
log(p)latency (n/p(p-1))/bandwidth
When n is large, message transmission time
dominates message latency
Parallel communication time ?(n)
n2 ? Cpn ? n ? Cp and M(n) n2
T(1,n)gtCTo(n,p)
System is not highly scalable

11
Method 1 Matrix divided by ColumnVector
Decomposed
1
0
1
2
2061
1001
0023
2023
12
Getting Columns of the Matrix

In the memory the matrices are stored as rows
Let one process handle the I/O
Read the matrix from memory and distribute it as
columns

13
MPI_Scatterv
14
Header for MPI_Scatterv
int MPI_Scatterv ( void send_buffer,
int send_cnt, int
send_disp, MPI_Datatype send_type, void
receive_buffer, int
receive_cnt, MPI_Datatype receive_type,
int root, MPI_Comm communicator)
15
Communication

After calculating inner product, each processor
sends its entire partial vector to other
processes

2061
1001
2023
0023
16
Function MPI_Alltoallv
17
Header for MPI_Alltoallv
int MPI_Gatherv ( void send_buffer,
int send_cnt, int
send_disp, MPI_Datatype send_type, void
receive_buffer, int
receive_cnt, int receive_disp,
MPI_Datatype receive_type, MPI_Comm
communicator)
18
Agglomeration and Mapping

Static number of tasks
Regular communication pattern (all-to-all)
Computation time per task is constant
Strategy

Agglomerate groups of columns
Create one task per MPI process

19
Complexity Analysis

Sequential algorithm complexity ?(n2)
Parallel algorithm computational complexity
?(n2/p)
Each processor has n/p columns
Each column has n elements
Communication complexity?(p nlog(p))
Scatter takes log(p) steps sends n/p elements
to (p-1) processors
Log(p)latency(n/p)(p-1)/bandwidth
All-to-all
Option 1 Each process sends a message to the
rest, sending the
Destination only the part required
Number of messages (p-1)
Total amount sent O(n)
Complexity ( p-1)latencyn/bandwidth

Total ?(p n2/p )
20
Isoefficiency Analysis

Sequential time complexity ?(n2)
Only parallel overhead is all-to-all
When n is large, message transmission time
dominates message latency
Parallel communication time ?(n p log(p))
n2 ? Cpn ? n ? Cp
Scalability function same as rowwise algorithm
C2p

21
Printing the Results Vectors

In order to view the vector in order of indices,
only one processor should print it
This is opposite of scatter a gather operation

9
2
5
7
22
Function MPI_Gatherv
23
Header for MPI_Gatherv
int MPI_Gatherv ( void send_buffer,
int send_cnt, MPI_Datatype
send_type, void receive_buffer,
int receive_cnt, int
receive_disp, MPI_Datatype receive_type,
int root, MPI_Comm communicator)
24
Count/Displacement Arrays