Matrix Multiplication - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Matrix Multiplication

Description:

... of two n by n matrices A and B is given ... the 3 by 3 matrix case this requires 27 ... The algorithm for parallel matrix multiplication. Load the arrays ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 15
Provided by: scie251
Category:

less

Transcript and Presenter's Notes

Title: Matrix Multiplication


1
Matrix Multiplication
  • To make this discussion easier we will assume
    square matrices
  • The product of two n by n matrices A and B is
    given by
  • Note that all valid products are of the form

2
Sequential Matrix Multiplication
MODULE matrix1 CONST n 3 TYPE matrix
ARRAY 0..n-1,0..n-1 OF INTEGER VAR a,b,c
matrix i,j,k INTEGER BEGIN FOR i 1
TO n - 1 DO FOR j 0 TO n - 1 DO
ci,j 0 FOR k 0 TO n-1 DO
ci,j ci,j ai,k bk,j END
END END END matrix1.
The complexity of this algorithm is clearly
(n3). For the 3 by 3 matrix case this requires
27 multiplications Gee wouldnt it be neat to do
this in parallel ?
3
Dissection Time
a00 a01 a02 a10 a11 a12 a20 a21 a22
b00 b01 b02 b10 b11 b12 b20 b21 b22
x

a00b00a01b10a02b20 a00b01a01b11a02b21
a00b02a01b12a02b22 a10b00a11b10a12b20
a10b01a11b11a12b21 a10b02a11b12a12b22
a20b00a21b10a22b20 a20b01a21b11a22b21
a20b02a21b12a22b22
4
Parallelize
  • Organize the PE grid as a N x N x N cube
  • Place the data in the processors so that each
    computes a sum for one of the Cijs so the
    multiplication can be done in one step
  • All that is left to sum the products

5
Parallelize
a02b20 a02b21 a02b22 a12b20 a12b21
a12b22 a22b20 a22b21 a22b22
Sum Reduction
a01b10 a01b11 a01b12 a11b10 a11b11
a11b12 a21b10 a21b11 a21b12
a00b00 a00b01 a00b02 a10b00 a10b01
a10b02 a20b00 a20b01 a20b02
6
The Algorithm
  • The algorithm for parallel matrix multiplication
  • Load the arrays into the cube
  • Everyone multiplies
  • Do a REDUCE.SUM from back to front
  • Result is in the front 3x3 plane of the cube

7
Sequential Matrix Multiplication
MODULE matrix2 CONST n 3 TYPE matrix
ARRAY 0..n-1,0..n-1 OF INTEGER CONFIGURATION
grid 0..n-1,0..n-1,0..n-1 CONNECTION
front gridi,j,k -gt grid0,j,k VAR a,b,c
grid OF INTEGER i,j,k INTEGER BEGIN (
load the processor planes ) cab
SEND.frontSUM(c,c) ( retrieve the result
) END matrix2.
The complexity of this algorithm is clearly
O(log2n). However, the number of processors
required is O(n3)
8
Using Fewer Processors
b22 b12 b02
b21 b11 b01
b20 b10 b00
a02 a01 a00



a12 a11 a10
a22 a21 a20
9
Using Fewer Processors
b22 b12 b02
b21 b11 b01
b20 b10
a02 a01
a00 b00


a12 a11 a10
a22 a21 a20
10
Using Fewer Processors
b22 b12 b02
b21 b11
b20
a02
a01 b10 a00 b01
a10 b00

a12 a11
a22 a21 a20
11
Using Fewer Processors
b22 b12
b21
a02 b20 a01 b11 a00 b02
a11 b10 a10 b01
a20 b00
a12
a22 a21
12
Using Fewer Processors
b22
a01 b21 a00 b12
a12 b10 a11 b11 a10 b02
a21 b00 a20 b01
a22
13
Improving Efficiency
a00b00a01b10a02b20 a00b01a01b11a02b21
a00b02a01b12a02b22 a10b00a11b10a12b20
a10b01a11b11a12b21 a10b02a11b12a12b22
a20b00a21b10a22b20 a20b01a21b11a22b21
a20b02a21b12a22b22
b22 b12 b02
b21 b11 b01
b20 b10 b00
a00 b00 a01 b11 a02 b22
a11 b10 a12 b21 a10 b02
a22 b20 a20 b01 a21 b12
a02 a01 a00
a12 a11 a10
a22 a21 a20
14
Improving Efficiency
a00 b00 a01 b11 a02 b22
a11 b10 a12 b21 a10 b02
a22 b20 a20 b01 a21 b12
a02 b20 a00 b01 a01 b12
a10 b00 a11 b11 a12 b22
a21 b10 a22 b21 a20 b02
a01 b10 a02 b21 a00 b02
a12 b20 a10 b01 a11 b12
a20 b00 a21 b11 a22 b22
Write a Comment
User Comments (0)
About PowerShow.com