EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell

Description:

EE3J2 Data Mining. Slide 1. EE3J2 Data Mining. Lecture 9. Data Analysis. Martin Russell ... is useful to subtract the mean (X) from each of the data points xt. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 15
Provided by: MartinR72
Category:

less

Transcript and Presenter's Notes

Title: EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell


1
EE3J2 Data MiningLecture 9Data
AnalysisMartin Russell
2
Objectives
  • To review basic data analysis
  • To review the notions of mean, variance and
    covariance
  • To explain Principle Components Analysis (PCA)

3
Example from speech processing
  • Plot of high-frequency energy vs low-frequency
    energy, for 25 ms speech segments, sampled every
    10ms

4
Basic statistics
5
Basic statistics
  • Denote samples by
  • X x1, x2, ,xT,
  • where xt (xt1, xt2, , xtN)
  • The sample mean ?(X) is given by

6
More basic statistics
  • The sample variance ?(X) is given by

7
Covariance
  • As the x value increases, the y value also
    increases
  • This is (positive) co-variance
  • If y decreases as x increases, the result is
    negative covariance

8
Definition of covariance
  • The covariance between the mth and nth components
    of the sample data is defined by
  • In practice it is useful to subtract the mean
    ?(X) from each of the data points xt. The sample
    mean is then 0 and

9
Data with mean subtracted
10
Sample data rotated through 2?
11
Data with covariance removed
12
Principle Components Analysis
  • PCA is the technique which I used to diagonalise
    the sample covariance matrix
  • The first step is to write the covariance matrix
    in the form
  • where D is diagonal and U is a matrix
    corresponding to a rotation
  • Can do this using SVD (see lecture 8) or
    eigenvalue decomposition

13
PCA continued
U implements rotation through angle ? e1 is the
first column of U d11 is the variance in the
direction e1 e2 is the second column of U d22 is
the variance in the direction e2
e1
e2
?
14
Summary
  • Basic data analysis
  • Means, variance and covariance
  • Principle Components Analysis
Write a Comment
User Comments (0)
About PowerShow.com