Matrix Algebra

About This Presentation

Title:

Matrix Algebra

Description:

Matrix Algebra. A matrix is a rectangular array of numbers with n rows and ... an inverse such that AA-1 = I. The inverse is useful in solving matrix equations: ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 104

Provided by: michael1175

Category:

more less

Transcript and Presenter's Notes

Title: Matrix Algebra

1

Matrix Algebra

2
A matrix is a rectangular array of numbers with n
rows and m columns. It is symbolized with a bold,
upper case letter, and subscripted to indicate
its order.
G3,2
Rows
Columns
3
G3,2
The individual elements in a matrix are called
scalars, subscripted to indicated their position
in the matrix.
4
G3,2
The columns in a matrix are called vectors, and
are symbolized with lower case, bold letters.
This matrix has two vectors, each containing
three scalars.
5
This vector is also a 3 x 1 matrix. When
displayed as a row, it is symbolized differently
g3,1
g1,3
This is called the transpose of the vector.
6
This matrix is rectangular. When the number of
rows and columns are equal, the matrix is square
G3,2
F3,3
7
A square matrix has a main diagonal. The sum of
the elements of the main diagonal is called the
trace of the matrix.
F3,3
8
If fi,j fj,i for all i and j, the matrix is
symmetric.
F3,3
F is a symmetric matrix with a trace of 14.
9
If all elements of a symmetric matrix except the
main diagonal are zero, the matrix is a diagonal
matrix
F3,3
F is a symmetric, diagonal matrix with a trace of
14.
10
Matrices of the same order can be added and
subtracted. These operations take place element
by element.
11
F3,2
H3,2
F H K3,2

12
F3,2
H3,2
F - H K3,2

13
Matrix addition is commutative and associative A
B B A A B C (A B) C A (B
C) Matrix subtraction is distributive A (B
C) A B C A (B C) A B C
14
A matrix of zeros is called a zero matrix or a
null matrix. It is used in solving equations

X Y Z
15

X Y Z

-X X Y -X Z
16

17
Scalar multiplication is accomplished by
multiplying every element of a matrix by a
constant scalar
F3,3
k 5
kF
18
Matrix multiplication requires attending to a few
important rules

The order of multiplication is important.
Matrices can only be multiplied if the number of
columns of the first matrix is equal to the
number of rows of the second matrix.
The resulting matrix has an order equal to the
number or rows of the first matrix and the number
of columns of the second matrix.

19
These two must be the same.
Aji Bik Cjk The elements of C are defined as
20

A3,2
B2,3
C3,3
(4 x 1) (-1 x 4) 0
(6 x 0) (2 x 6) 12
21

Matrix multiplication is not commutative
It may not be possibleA2,3B3,5 C25 but B35A23
is impossible
When possible, the results may not be equal
A1,3B3,1 C11 but B3,1A1,3 C33

?

22
Matrix multiplication is associative ABC
(AB)C A(BC) Matrix multiplication is
distributive A(BC) AB AC But order is
important XA BX ¹ X(A B)
23
Every matrix has a transpose that is obtained by
exchanging the rows and columns
X
X
24
Transposes are useful for arranging a matrix so
that matrix multiplication is possible. Example
A common statistical requirement is to generate
the sums of squares and cross-products for a data
matrix. If Xn,v is a matrix of deviation scores,
then XnvXnv is not possible. But, XvnXnv can be
carried out.
25
Sum of deviation cross-products
?
D

D is a matrix of deviation scores for 5
individuals on 2 variables.
Sum of squared deviations
D
26
Xnv is a People x Variables matrix of deviation
scores.
XvnXnv
This matrix is one step away from what other
matrix?
27
The identity matrix is a diagonal matrix with
ones on the main diagonal
I
The identity matrix is often a useful target
matrix in statistics.
28
Multiplication by diagonal matrices is especially
important in statistics and is used to accomplish
rescaling (expanding, shrinking, standardizing).
Post-multiplication of a matrix X by a diagonal
matrix D results in the columns of X being
multiplied by the corresponding diagonal element
in D.
29
X
D
Y
The first column of X is multiplied by the
diagonal element in the first column of D
The second column of X is multiplied by the
diagonal element in the second column of D
30
Pre-multiplication of a matrix X by a diagonal
matrix D results in the rows of X being
multiplied by the corresponding diagonal element
in D.
31

X
D
Y
The first row of X is multiplied by the diagonal
element in the first row of D
The second row of X is multiplied by the diagonal
element in the second row of D
32
Scalar multiplication is just multiplication by a
diagonal matrix with a constant in the diagonal.

33
Variance-covariance matrices and correlation
matrices can be characterized by a single number
called the determinant that represents the
generalized variance. For the correlation
matrix, this number can take on values from 0 to
1. When all variables are independent (an
identity matrix), the determinant is 1. As
variables increase in their interdependence, the
determinant approaches 0. The determinant thus
indexes the redundancy among variables in a
correlation matrix.
34
Some square matrices have an inverse such that
AA-1 I. The inverse is useful in solving matrix
equations Y BR Solving for B YR-1
BRR-1 YR-1 B
This is equal to an identity matrix and so drops
out.
35
X10,20 (Z10,5 B10,5) A5,20 solve for
B X10,20 Z10,5A5,20 B10,5A5,20 X10,20 -
Z10,5A5,20 B10,5A5,20 (X10,20 -
Z10,5A5,20)A20,5 B10,5A5,20A20,5 (X10,20 -
Z10,5A5,20)A20,5(A5,20A20,5)-1
B10,5(A5,20A20,5)(A5,20A20,5)-1
I
36
Raw data for 20 women showing the relation
between height and weight.
37
Premultiplying the raw data matrix, X, by the
transpose of a vector of ones produces the sums.
The pre-multiplication of X by the transpose of
Ones produces two linear combinations of people.
38
Post-multiplying the vector, Sums, by the
diagonal matrix, Sample_Size_D (containing the
reciprocal of the sample size), transforms the
columns of Sums into means.

Pre-multiplying the new vector, Means, by the
vector, Ones, produces a matrix of means.
39

40

41
Deviation scores are created by subtracting the
matrix of means from the raw data matrix.
Multiplying the transpose of the matrix of
differences by the matrix of differences produces
the sums of squares and cross-products.
42
Pre-multiplying the sums of squares and
cross-products matrix by a diagonal matrix with
the reciprocal of the degrees of freedom on the
main diagonal produces the variance-covariance
matrix.
43
Standardized scores can be obtained by
post-multiplying the matrix of difference scores
by a diagonal matrix with the reciprocal of the
standard deviations on the main diagonal.
44
Standardization is simply a rescaling of the
variables. It does not affect the position of
people relative to each other.
The correlation between the standardized scores
is simply the sum of the cross-products divided
by the degrees of freedom.
45
The standardized data (now in a common metric)
can be transformed with weight vectors to produce
two new variables that are linear combinations.
What do they represent?
46
The two new linear combinations retain the
relative position of people, but the location of
the set is rotated.
47
The transformations have created two new
variables that are unrelated but that now have
unequal variances. Note that the trace of this
matrix still equals 2.
48
These new linear combinations can be
restandardized to put them on the same metric.
49
The independence of the new linear combinations
is now quite apparent.
50
The two weight vectors constitute a matrix that
produces an orthogonal rotation.
51
Elements of a data matrix can be displayed
spatially.
One way to illustrate the location of data points
is to place them in relation to a set of
reference vectors or axes.
x2
P1
P2
x1
It is usually convenient to use axes that are
orthogonala Cartesian system.
52
Often we choose the original variables and their
scales to serve as the reference axes.
53
It is usually more convenient to use reference
axes that have an origin of zero and are scaled
to have unit length (standardized). These are
known as standard basis vectors.
54
The location of a data point is determined by the
perpendicular projections onto the reference
vectors.
x2
P1
P2
x1
55
(No Transcript)
56
Data points can also be represented as vectors
from the origin with an angular displacement from
the x1 axis of q.
x2
P1
P2
q2
q1
x1
It is most typical to use cos(q) rather than q.
Cosines take on values between -1 and 1 and
resemble correlations in the way they mark
orthogonality.
57
Cos(q) 0
Cos(q) Þ -1.0
x1
Cos(q) Þ 1.0
58
The cosine of the angle between two vectors
indicates the extent to which they fall on a line
through the origin (cosq) close to -1 or 1),
representing a single dimension, or whether they
have relatively independent orientations (cosq
close to 0).
x2
Cos(q) is close to 0.
P1
P2
q
x1
P3
Cos(q) is close to -1.
59
With the angular displacement method, the length
of the vector becomes important.
x2
P1
l
q1
x1
60
x2
P1
l c (a2 b2)½
c
b
x1
a
The Pythagorean theorem can be used to find the
hypotenuse of the triangle formed by the vector
and either axis.
61
More generally, the length of a vector in any
number of dimensions can be found from the vector
of coefficients that define its position relative
to the reference axes
62
½
(-1.774)2
(-1.965)2
63
The origin is the group centroid for standardized
variables. Length tells us how different each
person is from the average person.
64
It is often convenient to scale vectors to have
unit length, called normalization. This is
accomplished by dividing each element of a vector
by the length of the vector. w (1 -1) Length
of w (12 -12)½ Ö2 Normalized vector w
(1/Ö2 -1/Ö2) (.707 -.707) This is just scalar
multiplication and either shrinks or expands the
vector to achieve unit length.
65
Linear combinations are created by vector
multiplication and produce a new vector. The
result of the vector multiplication is the
perpendicular projection of points onto the new
vector.
x2
a
w
LC
x1
66
Linear combinations are created by vector
multiplication and produce a new vector. The
result of the vector multiplication is the
perpendicular projection of points onto the new
vector.
x2
a
w
LC
q
x1
67
The result of the vector multiplication is the
distance from the origin to the projection of the
point on the new vector
x2
a

w
LC
x1
68
x2
a
w
This particular vector multiplication is called a
scalar product, a dot product, or the inner
product.
LC
q
x1
69
(No Transcript)
70
(No Transcript)
71
The elements of a normalized vector used to
create a linear combination are the cosines of
the angles between the old reference vectors and
the new vector. A normalized transformation
vector thus has length of one and elements that
indicate the angle of rotation relative to the
original reference vectors. For w (.707 .707),
the elements indicate that the new vector has a
45 angle with the original reference vectors.
72
If a second linear combination is formed, and
this vector is kept orthogonal to the first, then
a rigid rotation of the original reference
vectors occurs.
x2
a
w1
w2
x1
The result of the matrix multiplication is to
give the projections of Point a onto these new
vectors.
73
The elements of the weight matrix, W, for
creating the two new linear combinations are the
cosines of the angles between the old reference
vectors and the new vectors. They indicate the
angle of rotation necessary for moving the old
reference axes rigidly into the new position. The
result of the matrix multiplication is to produce
the projections onto these new vectors.
cos11q
cos12q
W
cos22q
cos21q
Each column of W is a weight vector for producing
a new linear combination.
74
(No Transcript)
75
(No Transcript)
76
For two reference vectors, any orthogonal
rotation of angle, q, can be accomplished by
constructing a weight matrix with the following
elements
This rotation creates two independent linear
combinations of the original reference vectors,
with angle of rotation, q.
77
2
135
45
1
2
45
1
225
cos11q
cos12q
.707
-.707
W

cos22q
cos21q
.707
.707
78
(No Transcript)
79
The variances of linear combinations created by
orthonormal weight matrices have some interesting
properties. The variances of the linear
combinations can be obtained by applying the same
weight matrix to the variance-covariance matrix
of untransformed data. In our example, the data
were standardized prior to forming the linear
combinations. The variance-covariance matrix for
the untransformed scores is just the correlation
matrix. Pre-multiplying the correlation matrix by
the transpose of W and then post-multiplying by W
gives
80
The trace of this matrix is 2, as it was in the
untransformed variance-covariance matrix. No
information has been gained or lost in the
transformation. The information has simply been
redistributed. Now most of the information
resides in the first linear combination (the
size variable). Its variance is 1.867. Of the
total variability contained in the data, the
size linear combination accounts for
100(1.867/2) 93.4.
81
Clearly the first linear combination (a simple
sum) would be sufficient to capture the important
information in these data. We could use it in
other analyses instead of the original two
variables. This kind of simplification is often a
target of the analysis. In those cases, we let
the statistical procedure derive the weights so
that they produce a linear combination that
accounts for as much of the original information
as possible.
82
The determinant provides an index of generalized
variance. It is often used to diagnose
redundancy. In a correlation matrix it can take
on values between 0 and 1, with values
approaching 0 indicating redundancy. Ordinarily,
the determinant is cumbersome to calculate. But,
in a diagonal matrix, the determinant is simply
the product of all the terms on the main
diagonal. The orthonormal transformation created
by W produces a diagonal matrix. The determinant
is equal to .248 (i.e., 1.867 x .133), suggesting
considerable redundancy. That was clear from the
correlation between the original two variables (r
.867).
83
x2
P2
l ?
P1
x1
It is often desirable to know the distance
between points in a reference system. This index
of similarity can be used as the basis for
clustering or multidimensional scaling.
84
x2
q3 q2 - q1
q3
q2
q1
x1
First we find the angle between the two data
points. It can be found through subtraction of
the angles between the data points and the
reference vectors.
85
x2
By finding the length of a and b, c can be found
using the Pythagorean theorem.
c
b
q3
a
x1
86
x2
P2
First we find the length of the vector from the
origin to Point 2. It is simply the square root
of the sum of squared values that index the
location of the point in the reference system l2
(x212 x222)½
l2
b
q3
x1
87
x2
P2
The cos of q3 is equal to the side adjacent
(labeled sa) divided by the hypotenuse, which is
the length of the vector.
l2
b
q3
sa
x1
sa cos(q3)/l2
88
x2
P2
The segments b and sa must have the same relation
to l2 as do x21 and x22.
l2
b
q3
sa
x1
l2 (x212 x222)½ (b2 sa2)½
b (l22 - sa2)½
89
x2
P2
The segment, a, can now be found by subtracting
sa from the length of the vector for Point 1.
l2
P1
b
l1
q3
a
sa
x1
l1 (x112 x122)½
a l1 - sa
90
x2
P2
The distance (c) between Points 1 and 2 is then
equal to (a2 b2)½
c
l2
P1
b
l1
q3
q3
a
sa
x1
91
x2
P2
c
l2
P1
b
l1
q3
a
sa
The distance between points in a multidimensional
space can be found using a generalized formula
x1
92
For two points, the distance formula can also be
shown as
The standard deviation of a difference score is
93
The cosines of the angles between the vectors for
pairs of points indicates if they lie along a
line.
94
The distances make quite clear how close
different people are in this two dimensional
space.
95
The mean of the distances for each person can be
used to summarize how close a person is to the
two-dimensional centroid. It provides information
about multidimensional deviance.
96
The cosines of the angles between different
points do not change with a rotation of the
reference vectors.
97
The distances do not change either. A rotation of
the reference vectors does not change the
relative position of the original data points. It
only changes the reference system for describing
those points.
98
A rotation of the reference vectors does not
change the relations among the data points. Here
is the relationship between the distance scores
for Person 1 using the data in the original
reference system and the data in the rotated
reference system.
99
We can also calculate the correlations between
the distance profiles for pairs of people to
further capture similarity in a standardized
way.
100
In a Cartesian coordinate system, the axes are
displayed as orthogonal. In this case that is not
really true to the nature of the data.
101
q 29.9
cos(q) .867
We could make use of information about the
correlation between the variables to construct
reference vectors that better match the data.
With standardized data, the correlation is the
cosine of the angle between the reference vectors
defined by the variables.
102
The key matrix operations of multiplication by
diagonal matrices and weight matrices serve to
take an original matrix of data, scaled according
to a set of reference vectors, and to shrink or
expand the scale and change the reference axes.
This general approach is called singular value
decomposition and can be represented by Zs
XWD-1
103
This formula can be rearranged to give X
ZsDW Any data matrix can be decomposed into
three parts