Covariance Matrix Applications - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Covariance Matrix Applications

Description:

... be a diagonal matrix i.e, the non-zero entries only appear on ... Lambda is a diagonal r x r matrix. SVD Definition. More importantly X can be written as ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 25
Provided by: cha128
Category:

less

Transcript and Presenter's Notes

Title: Covariance Matrix Applications


1
Covariance Matrix Applications
  • Dimensionality Reduction

2
Outline
  • What is the covariance matrix?
  • Example
  • Properties of the covariance matrix
  • Spectral Decomposition
  • Principal Component Analysis

3
Covariance Matrix
  • Covariance matrix captures the variance and
    linear correlation in multivariate/multidimensiona
    l data.
  • If data is an N x D matrix, the Covariance Matrix
    is a d x d square matrix
  • .Think of N as the number of data instances
    (rows) and D the number of attributes (columns).

4
Covariance Formula
  • Let Data N x D matrix.
  • The Cov(Data)

5
Example
COV(R)
6
Moral Covariance can only capture linear
relationships
7
Dimensionality Reduction
  • If you work in data analytics it is common
    these days to be handed a data set which has lots
    of variables (dimensions).
  • The information in these variables is often
    redundant there are only a few sources of
    genuine information.
  • Question How can be identify these sources
    automatically?

8
Hidden Sources of Variance
X1
X2
H1
X1 X2 X3 X4
D A T A
D A T A
D A T A
D A T A
X3
H2
X4
Model Hidden Sources are Linear Combinations of
Original Variables
9
Hidden Sources
  • If the information that the known variables
    provided was different then the covariance matrix
    between the variables should be a diagonal matrix
    i.e, the non-zero entries only appear on the
    diagonal.
  • In particular, if Hi and Hj are independent then
    E(Hi-?i)(Hj-?j)0.

10
Hidden Sources
  • So the question is what should be the hidden
    sources.
  • It turns out that the best hidden sources are
    the eigenvectors of the covariance matrix.
  • If A is a d x d matrix, then lt?, xgt is an
    eigenvalue-eigenvector pair if
  • Ax ? x

11
Explanation
a
We have two axis, X1 and X2. We want to project
the data along the direction of maximum variance.
12
Covariance Matrix Properties
  • The Covariance matrix is symmetric.
  • Non-negative eigenvalues.
  • 0 ?1 ?2 ? ?d
  • Corresponding eigenvectors
  • u1,u2,?,ud

13
Principal Component Analysis
  • Also known as
  • Singular Value Decomposition
  • Latent Semantic Indexing
  • Technique for data reduction. Essentially reduce
    the number of columns while losing minimal
    information
  • Also think in terms of lossy compression.

14
Motivation
  • Bulk of data has a time component
  • For example, retail transactions, stock prices
  • Data set can be organized as N x M table
  • N customers and the price of the calls they made
    in 365 days
  • M ltlt N

15
Objective
  • Compress the data matrix X into Xc, such that
  • The compression ratio is high and the average
    error between the original and the compressed
    matrix is low
  • N could be in the order of millions and M in the
    order of hundreds

16
Example database
We 7/10 Thr 7/11 Fri 7/12 Sat 7/13 Sun 7/14
ABC 1 1 1 0 0
DEF 2 2 2 0 0
GHI 1 1 1 0 0
KLM 5 5 5 0 0
smith 0 0 0 2 2
john 0 0 0 3 3
tom 0 0 0 1 1
17
Decision Support Queries
  • What was the amount of sales to GHI on July 11?
  • Find the total sales to business customers for
    the week ending July 12th?

18
Intuition behind SVD
y
x
y
x
Customer are 2-D points
19
SVD Definition
  • An N x M matrix X can be expressed as

Lambda is a diagonal r x r matrix.
20
SVD Definition
  • More importantly X can be written as

Where the eigenvalues are in decreasing order.
k,ltr
21
Example
22
Compression
Where k ltr lt M
23
Explanation
  • Let X be a mean-centered N x d matrix.
  • Let a be an arbitrary d x 1 unit vector
    (initially).
  • The projection of X onto a is given by Xa
  • We want to maximize the variance of Xa.
  • The constraint is that aTa 1
  • It can be shown that a is given by the solution
    of the equation (XTX - ? I)a 0
  • In other words a is the eigenvector of the
    covariance matrix and the ? is the eigenvalue.

24
(No Transcript)
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com