MatrixOps

About This Presentation

Title:

MatrixOps

Description:

... 0's in the upper left and 1's on the diagonal, U has zeros in the bottom right. ... upper right, and the non-0, non-1 elements of L are stored in the bottom left. ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 59

Provided by: giampier

Category:

more less

Transcript and Presenter's Notes

Title: MatrixOps

1
MatrixOps

Operations on matrices are quite important to
applications, whether the applications be from
Physics or from Economics. Good models require
very large sizes many econometric models use
systems of linear equations containing 10,000 or
more equations and variables weather models,
airplane designs, and Formula 1 cars are not far
behind, if not well ahead, and, usually, have to
deal with substantial non-linearities.
A matrix is a two-dimensional array of items from
a set with some algebraic structure - usually at
least a ring with multiplicative identity,
sometimes more, e.g. a division algebra or a
field (commutative or not).

2
MatrixOps

See http//en.wikipedia.org/wiki/Matrix_multiplic
ation
Matrix rectangular array of numbers
aiji1..m, j 1..n.
Rmxn usually denotes the set of all m by n
matrices with entries in the real numbers.
AT denotes the transpose of the matrix A if A
is an mn matrix, AT is an nm matrix obtained by
exchanging the rows and columns of A.
A vector is a one-dimensional array of numbers.
It can be a row vector or a column vector. The
transpose of a column vector is a row vector and
vice-versa.
The unit vector ei is the vector whose ith
element is 1 and all others are 0.
A zero matrix is one whose every entry is zero.

3
MatrixOps

Square Matrices. m n.
diagonal matrix aij 0 whenever i ? j.
identity matrix In the diagonal matrix with 1s
on the diagonal.
tridiagonal matrix aij 0 if i - j gt 1.
upper-triangular matrix aij 0 if i gt j.
unit-upper-triangular matrix upper-triangular
with aii 1 for all i.
lower-triangular matrix aij 0 if i lt j.
unit-lower-triangular matrix lower-triangular
matrix with aii 1 for all i.
permutation matrix has exactly one 1 in each
row and column.
symmetric matrix A AT.

4
MatrixOps

There are three primary operations that we wish
to support, followed by some others that are
appropriate only under the assumption that the
elements of the matrix come from more structured
algebraic entities.
Addition M N ???
Multiplication by a scalar a M ???
Multiplication of two matrices M N ???
where a matrix is a two dimensional array of mn
elements we lay them out in m rows of n columns
each. It is possible to construct
multi-dimensional arrays (mnp, ,
m1m2m3mq), but we will not pursue that study.

5
MatrixOps

Matrix Addition
The operation makes sense only for matrices of
the same dimensions It is applicable to
matrices whose elements are taken from a group.
Closure under addition is enough, but not very
useful.

6
MatrixOps

2. Scalar Multiplication
where is a member of the underlying algebraic
structure, which now supports multiplication.

7
MatrixOps

3. Matrix Multiplication

8
MatrixOps

We will concentrate on operations where m n p
(the case of square matrices), since that is, for
us, the more important case.
A quick analysis of nn matrix operations
Sum there are n2 elements, added in pairs. Cost
Addition is associative and
commutative as long as it satisfies the same
properties in the underling algebraic structure.
Scalar Multiplication there are n2 elements,
each multiplied by the same scalar. Cost
Matrix Multiplication each element of the
product seems to require n multiplications and
n-1 additions Cost O(n3), for the complete
multiplication. This is not too bad, since,
letting m n2, the cost is Can we do
better? (but no better than )

9
MatrixOps

Matrix multiplication is associative and
distributive if the operations are so in the
underlying algebraic structure. It is not
commutative, regardless of the commutativity of
the operations in the underlying algebraic
structure.
Two products are of some special importance
Let xT be a (1 n) row vector, y a (n 1)
column vector. The inner product of x and y is
defined as
xyT is the outer product an n n matrix.
The euclidean norm

10
MatrixOps

Matrix Inverse given a matrix A, its inverse, if
it exists, is a matrix denoted by A-1 and with
the property that AA-1 A-1A I.
Matrices need not have inverses they are then
called non-invertible or singular. If a matrix
has an inverse, it is called invertible or
non-singular.
Property Matrix inverses are unique (Ex. D.2-1).
Property if A and B are nonsingular matrices,
(BA)-1 A-1B-1.
Property inverse commutes with transpose
(A-1)T (AT)-1.

11
MatrixOps

Def. Linear Dependence. The vectors x1, x2, ,
xn are linearly dependent if there exist scalar
constants c1, c2, , cn, not all 0, such that
c1x1 c2x2 cnxn 0. If the vectors are
not linearly dependent, they are linearly
independent.
Column Rank. For an mn matrix A, this is the
size of the largest set of linearly independent
columns of A.
Row Rank. For an mn matrix A, this is the size
of the largest set of linearly independent rows
of A.
Property row-rank(A) column-rank(A) rank(A).
Property 0 rank(A) min(m,n).

12
MatrixOps

Alternate Def. of Rank A mn rank(A)
smallest number r such that there exist matrices
B and C of respective sizes mr and rn with A
BC.
Proof.???
Def. A square (nn) matrix A has full rank if
rank(A) n. An mn matrix A has full row rank
if rank(A) m and has full column rank if
rank(A) n.

13
MatrixOps

Theorem. A square matrix has full rank iff it is
nonsingular.
Theorem. A square matrix A has full rank iff
there does not exists a non-zero vector x such
that Ax 0.
Corollary. A square matrix A is singular iff it
has a null vector, i.e., if there exists a
non-zero vector x such that Ax 0.

14
MatrixOps

Strassens Algorithm. One can count, and observe
that the naïve algorithm requires 8
multiplications and 4 additions to multiply a 22
matrix. As it turn out, multiplications are
expensive, much more expensive than additions
any cleverness that reduces the number of
multiplications without changing the number
additions too much is to be preferred. Prof.
Strassen managed to come up with an algorithm
that requires only 7 multiplications. This
algorithm allows us to multiply any two matrices
whose dimensions are compatible (and powers of
2) in time O(n lg 7), which is substantially
faster than O(n3). More recent results keep
bringing the exponent farther down, but with
mixed possibilities for real applications.

15
MatrixOps

Strassens Algorithm. 7 multiplications and 18
additions. The entries can be matrices of
compatible sizes just be careful of
non-commutativity.

16
MatrixOps

The recurrence is ,
where d is a fixed constant and dn2 represents
the time for the matrix additions and
subtractions c 18 (n/2)2, where c is the
constant relative to one such operation. The
relevant version of the Master Theorem states
that T(n) can be bounded asymptotically
One could also check directly that

solves the recurrence relation. The lowest
currently known bound is
Although matrix multiplication is interesting,
one also needs to solve large systems of linear
equations. Is the complexity of their solution
related to that of matrix multiplication?

17
MatrixOps

Systems of Linear Equations. Basically, we will
look at methods for solving matrix-vector
equations Ax b, where
If A is nonsingular, then A-1 exists and
A-1Ax A-1b gt x A-1b.
Claim if A is nonsingular, the solution is
unique. Assume not there exist x and x solving.
Then
x (A-1A)x A-1(Ax) A-1(b) A-1(Ax)
(A-1A)x x.

18
MatrixOps

Rank(A) lt n A is underdetermined typically the
system has infinitely many solutions (sometimes
none).
If the number of equations is larger than the
number of unknowns, the system is overdetermined
typically, no solution (sometimes one or more).
If A is nonsingular, we can compute A-1, but the
theoretical method (based on a recursive
combinatorial expression for A-1) is very
expensive and suffers from numerical
instabilities (Maple or Mathematica wont help -
you just cannot guarantee that you can dial
yourself enough decimal digits). This problem
was recognized long ago, and reasonable methods
have been known since what is the cost of those
methods?

19
MatrixOps

LUP Decomposition. Given a nonsingular matrix A,
we look for three other matrices, L, U, and P,
such that PA LU, and
L is a unit-lower-triangular matrix (0s in upper
right half, 1s on diagonal)
U is an upper-triangular matrix (0s in lower
left half)
P is a permutation matrix (a single 1 in each
row and column).
If we find such a decomposition, then
Ax b gt PAx Pb gt LUx Pb
and we solve a problem with two triangular
matrices, which is simpler than the original one.

20
MatrixOps

A first intermediate step is to define y Ux,
where we still dont know what x is, and solve Ly
Pb.
This first problem can be solved in time
via a method known as forward substitution. Let P
Pi,pi denote the permutation matrix, where
Pi,j 0 for j ? pi, and Pi,j 1 for j
pi. p is the permutation array associated
with P. Since L is unit-lower-triangular, we are
now solving the system

21
MatrixOps

The general solution is given by
which involves no extra divisions. Now we
solve Ux y. Since U is upper triangular (not
unit, though)
Solving from the bottom (backsubstitution) we
have the general formula
which involves n divisions. A pseudocode
algorithm follows

22
MatrixOps

LUP-Solve(L, U, p, b)
n L.rows
Let x be a new vector of length n
for i 1 to n
do
for i n downto 1
do
return x
The total cost is, clearly, the ith
iteration of the for loop in lines 2 and 3 takes
i-1 multiplications and i-1 additions the ith
iteration of the loop in lines 4 and 5 takes n-i
additions, n-i-1 multiplications and one
division.

23
MatrixOps

Computing an LU decomposition. We assume that no
entries in the matrix that will be used for
division are 0. This cannot be expected in
general - and small entries are almost as bad -
but will provide us a starting point for an
algorithm. If n 1, the matrix A aij is
already decomposed. Otherwise, decompose A into
four parts

v is a size (n-1) column vector, w is a size
(n-1) row vector and A is an (n-1)(n-1) matrix.
We can now factor A as follows
24
MatrixOps

You need only multiply the right hand side to
see. The (n-1)(n-1) matrix A - vwT/a11 is
called the Schur complement of A with respect to
a11.
Claim if A is nonsingular, its Schur complement
is also nonsingular.
Pf. Suppose it is singular. It is an (n-1)(n-1)
matrix, and thus its row rank must be strictly
less than n-1. The last n-1 rows of the second
factor of A have a first column of all zeros, and
thus must have rank strictly less than n-1. This
implies that the row rank of the second factor is
less than n gt factor is singular gt A singular.

25
MatrixOps

The nonsingularity of the Schur complement allows
us recursively to apply the same ideas to it IT
has an LU decomposition, say LU A - vwT/a11
LU. Replacing in the matrix equation
If a11 0, then everything breaks. If the upper
left entry in the Schur complement vanishes at
any point, everything breaks. If a11 (or any
of the upper left entries in the Schur
complement) is very small we could have numerical
instability

26
MatrixOps

Claim if A is symmetric positive-definite then
the upper left-hand entry in the Schur complement
is never 0.
Pf. Later.
We can now present an algorithm to compute the LU
decomposition of any matrix for which the method
just presented will work note that L has 0s in
the upper right and 1s on the diagonal, U has
zeros in the bottom right. It is thus sufficient
to keep the information in a SINGLE nn matrix,
where U will be stored in the diagonal and the
upper right, and the non-0, non-1 elements of L
are stored in the bottom left.

27
MatrixOps

LU-Decomposition(A)
n rowsA
for k 1 to n
do ukk akk
for i k1 to n
do lik aik/ukk // lik holds
vi
uki aki // uki holds
wiT
for i k1 to n
do for j k1 to n
do aij aij - likuki
return L and U

28
MatrixOps

Since line 9 is triply nested, the algorithm runs
in time All other lines are no more than doubly
nested, contributing no more than O(n2).
We still have a fairly serious problem we made a
number of unwarranted assumptions about the size
of certain crucial entries. We now need to
remove those assumptions (to the extent
possible), show that we can still construct L and
U, and, at least as important, that we will not
make the asymptotic complexity any worse.

29
MatrixOps

Example for LU Decomposition

30
MatrixOps

Computing an LUP Decomposition. We are now ready
to remove the restrictions. We change the problem
to one of finding three matrices, L and U as
before, and a permutation matrix P such that PA
LU. Before we decompose A into the four
submatrices, we examine the first column. At
least one element of the first column must be
non-zero, otherwise A would be singular. If there
is more than one non-zero element, find the index
of the one with largest absolute value. If this
element resides in row k (ak1), swap it with the
element in row 1 (ak1 lt--gt a11). This action can
be accomplished by pre-multiplying A by a
permutation matrix Q, obtained from the identity
matrix by exchanging the first and kth columns.
Note that Q Q-1.

31
MatrixOps

QA is a non-singular matrix with the element of
largest absolute value among those in the first
column appearing in position (1, 1). Using the
notation from the original A
We know from the earlier discussion that the
Schur complement A - vwT/ak1 is also
non-singular. By the induction hypothesis, we
can obtain an LUP decomposition of A - vwT/ak1
with (n-1)(n-1) matrices P, L and U P (A
-vwT/ak1) LU. P is an (n-1)(n-1)
permutation matrix. Define

32
MatrixOps

We have
We need to translate this into a pseudo-code
algorithm.

33
MatrixOps

The algorithm will re-use A rather than allocate
space for both L and U. As we observed earlier,
L and U can be both stored in a single nn
matrix, and, it turns out, we do not need to keep
a copy of A. Furthermore, we do not need to keep
a full copy of P all we need is an array p1n,
where pi j means that the ith row of P
contains a 1 in the jth column. When the
procedure terminates we will have

34
MatrixOps

LUP-Decomposition(A)
n lt-- rows(A)
for i lt-- 1 to n
do pi lt-- i
for k lt-- 1 to n
do p lt-- 0
for i lt-- k to n
do if aik gt p
then p lt-- aik
k lt-- i
if p 0
then error singular matrix
exchange pk lt--gt pk
for i lt-- 1 to n
do exchange aki lt--gt aki
for i lt-- k1 to n
do aik lt-- aik/akk
for j lt-- k1 to n
do aij lt-- aij - aikakj

35
MatrixOps

Example for LUP
Decomposition

36
MatrixOps

The analysis is similar to that in the previous
case we observe that the deepest nesting level
of loops is 3, which means a time complexity of
O(n3) - in fact . Because of the reuse
of A, the extra space needed is 0.
The algorithms introduced for the solutions of
systems of linear equations exhibit the time
complexity of naïve matrix multiplication. We
can use these algorithms for the computation of a
matrix inverse if we can compute an LUP
decomposition of A, PA LU, we can solve an
equation of the form Ax b in time If
we replace b by the column matrices ei, i 1, ,
n, solving all n equations Ax ei will take time

37
MatrixOps

Since the LUP decomposition itself can be
computed in time , the computation of A-1
will take time
The question that remains is since Strassens
algorithm (and later developments) show that one
can multiply two nn matrices strictly faster
than O(n3), does this speedup have implications
(theoretical at least) for matrix inversion? Or
can we invert matrices just as fast as we can
multiply them, or does something else stop us?
We now develop the (unsurprising) answer.

38
MatrixOps

Theorem 28.7. (Multiplication is no harder than
inversion.) If we can invert an nn matrix in
time I(n), where and I(n)
satisfies the regularity condition I(3n)
O(I(n)), then we can multiply two nn matrices in
time O(I(n)).
Claim I(n) satisfies the condition above
whenever for any constants c gt 0
and d 0.
Pf. of Claim
Meaning any reasonable time complexity formula
for inversion will satisfy the condition.

39
MatrixOps

Proof. (of theorem) Let A and B be given nn
matrices.
Consider the matrix with inverse
We can compute AB by just taking the upper right
nn submatrix of D-1. D can be constructed from A
and B in time, and D-1 can be computed from D in
O(I(3n)) O(I(n)) time - where the regularity
condition comes in. The uniqueness of inverses
guarantees the contents of the upper right-hand
block. Thus M(n) O(I(n)), as claimed.

40
MatrixOps

Theorem 28.8. (Inversion is no harder than
multiplication.) Assume the time to multiply two
nn matrices is and that M(n)
satisfies the regularity conditions M(nk)
O(M(n)) for 0 k n, and M(n/2) cM(n) for
some constant c lt 1/2. Then the computation of
the inverse for any nonsingular nn matrix can be
carried out in time O(M(n)).
Proof. First, assume n is a power of 2. This can
be easily shown not to be a restriction the
first regularity condition implies that the
asymptotics can get worse only by a constant.
Temporary assumption A is symmetric and positive
definite. This will be removed later.

41
MatrixOps

Partition A into four (n/2)(n/2)
matrices where B and D are also
symmetric positive definite.
Let S D - CB-1CT be the Schur complement of A
with respect to B (more details later). One can
check that
by an explicit multiplication. The existence of
B-1 and S-1 follows from the fact that A (and
hence B and S - to be proven) is symmetric
positive definite. Using the relationships among
multiplication, inversion and transposition
B-1CT (CB-1)T and B-1CTS-1 (S-1CB-1)T.

42
MatrixOps

We can now use the Schur complement and the form
of A-1 just given to specify a recursive
algorithm involving four multiplications of
(n/2)(n/2) matrices
CB-1, (CB-1)CT,
S-1(CB-1), (CB-1)T(S-1CB-1).
We need to invert B and S (two (n/2)(n/2)
matrices) we need to multiply four (n/2)(n/2)
matrices - which we can multiply with an nn
algorithm - plus an O(n2) cost to extract
submatrices from A and perform a fixed number of
additions and subtractions on these matrices

43
MatrixOps

where the fact that allows
passing from the first to the second line. Case
3 of the Master Theorem states (in this case)
that if for
some constant and if 2f(n/2) cf(n) for
some constant c lt 1 and all large n, then
The final result follows from the observation
that any solution of the inequality with I(n) is
bounded above by a solution of the equation
resulting by replacing with . The
satisfaction of the asymptotic bound on f(n) is
trivial, since f(n) kM(n). The regularity
condition follows from the theorem hypotheses,
which are reasonable doubling the (linear)
size of the matrix should more than double the
time for multiplication - you must deal with four
times the number of items.

44
MatrixOps

We must still remove the condition that A be
symmetric positive definite. Since A is
invertible, ATA is symmetric (which does not need
invertibility) and positive definite (Thm. 28.6,
which needs invertibility). Next, we observe that
A-1 (ATA)-1AT. Inverting A can then by
accomplished by a) computing ATA b) inverting
ATA via the procedure just outlined
post-multiplying the result by AT. Each of the
three steps takes O(M(n)) time any nonsingular
matrix with real entries can be inverted in
O(M(n)) time.

45
MatrixOps

Symmetric Positive Definite Matrices. There are
a few results we need to establish to complete
the details of the proof of Thm. 28.8.
Lemma 28.9. Any positive definite matrix is
non-singular.
Proof. Suppose A is singular. Then there exists a
non-zero vector x such that Ax 0. Hence xTAx
0 and A cannot be positive definite.
Def. the kth leading submatrix of A is the
matrix Ak, consisting of the intersection of the
first k rows and k columns of A.

46
MatrixOps

Lemma 28.10. If A is a symmetric positive
definite matrix, then every leading submatrix Ak
of A is symmetric positive definite.
Proof. That Ak is symmetric is obvious. Assume it
is not positive definite. Then there exists a
k-vector xk ? 0, such that xkTAkxk 0. Extend xk
to an n-vector by padding with 0s x (xT, 0)T.
A can be decomposed into four matrices of
appropriate sizes (and symmetries) so that
contradicting the hypothesis that A is positive
definite.

47
MatrixOps

Generalization of the Schur Complement. Given a
symmetric positive definite matrix A, with Ak the
leading kk submatrix
we define the Schur complement of A with respect
to Ak as S C - BAk-1BT.
Since Ak is symmetric and positive definite, Ak-1
exists, and thus S is well-defined. We now show
that S is itself symmetric and positive definite,
thus allowing us to complete the proof of Theorem
28.8.

48
MatrixOps

Lemma 28.11 (Schur complement lemma). Let A be
symmetric positive definite, Ak a leading kk
submatrix of A. Then the Schur complement of A
w.r.t. Ak is symmetric positive definite.
Proof. A symmetric gt C symmetric. One can easily
show that BAk-1BT is symmetric, and that S is
symmetric. We still need to show that S is
positive definite. Let x be any nonzero vector,
xT (yT, zT), where the components are size
compatible with the decomposition of A. Recall
also that A-1 exists. Then

49
MatrixOps

We can verify the equalities by simple
multiplication. Note that the term in the last
set of parentheses is the Schur complement.
Since xTAx gt 0 for any nonzero x, pick any
nonzero z, and then choose y - Ak-1BTz, which
makes the first term of the last line on the
previous slide vanish, leaving xTAx zT(C -
BAk-1BT)z zTSz gt 0, which immediately implies
that the Schur complement is positive definite.
We are now finished with the details of the proof
of Theorem 28.8, and we have proven that matrix
invertibility is asymptotically equivalent to
matrix multiplication, at least under some
reasonable assumptions.

50
MatrixOps

A useful result is
Corollary 28.12. LU decomposition of a symmetric
positive definite matrix never causes a division
by 0.
Proof. Let A be the matrix. We will prove that
every pivot is strictly positive. The element
a11 is the first pivot, and a11 e1TAe1 gt 0. The
first step of the LU decomposition gives us the
Schur complement of A w.r.t a11. Since the Schur
complement is positive definite symmetric, and
its upper left hand element is the new pivot, it
must be positive. An induction gives that all
pivots are positive.

51
MatrixOps

Least-Squares Approximation. We are given a set
of m points (say, in the plane) (x1, y1), (x2,
y2), ,(xm, ym), where the yis are subject to
measurement error. We want to construct a
function F(x) such that for i 1, 2, , m,
where the errors are small.
There are several mathematical theorems (all
variants of the same most general one) that allow
us to conclude the existence (under suitable
assumptions) of a countable family of basis
functions such that every reasonable function
can be arbitrarily approximated by a finite
weighted sum of such basis functions
We choose fi(x) xi-1, so F(x) c1 c2x
cnxn-1.

52
MatrixOps

The approximation gets better as n gets larger,
but we also know that our data are noisy we have
to play the accuracy of the approximation against
the cost of computation in the context of
guaranteed partial ignorance.
We have m data points, and so we can deal with at
most m unknown quantities the m coefficients of
an approximation that uses m basis functions. We
could be using a different set of basis functions
(e.g., trigonometric polynomials, wavelets), but
we choose standard polynomials. If n m, we
have the best fit the curve can be made to go
exactly through each of the data points, and all
the errors can be made to disappear.

53
MatrixOps

The problem is that the errors are there
regardless of our wishing them away fitting the
curve exactly to the (noisy) data will not give
us the best approximation to reality. The usual
approach involves using a coarser approximation
(fewer basis functions, and thus less computation
both in the determination of F and the
computation of interpolated values) while trying
to minimize some reasonable function of the
total error.
Consider the matrix

54
MatrixOps

When we include the (as yet unknown) coefficients
of the basis function we have the linear (usually
overerdetermined) system (m gt n)
with the
errors given
The error is given by the m-vector formula
To minimize the errors, we choose to minimize the
norm of the error vector as a
function of the cjs. A necessary condition for
an extremum is the simultaneous vanishing of all
derivatives.

55
MatrixOps

We have
and differentiating
The equations on the last line are equivalent to
the matrix equation (Ac - y)TA 0, equivalent to
AT(Ac - y) 0, which implies ATAc ATy. ATA is
clearly symmetric. If A has full column rank, a
previous result implies that ATA is positive
definite, and thus has an inverse (ATA)-1. We
can solve c (ATA)-1 ATy. (ATA)-1 AT is called
the pseudoinverse of A since A is nonsquare it
cannot have a true inverse.

56
MatrixOps