Statistical Regression and Correlation - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Statistical Regression and Correlation

Description:

... method of least squares in MATLAB in ... Recall the slope of the least squares best fit line is: ... The correlation coefficient is the square root of (16) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 26
Provided by: kahun
Category:

less

Transcript and Presenter's Notes

Title: Statistical Regression and Correlation


1
Statistical Regression and Correlation
  • Download Presentation Source

2
Introduction
We are given a list of observations
And we are asked to draw some type of conclusion
fromthese observations. How are we going to do
that?
Lets see what the data looks like by plotting it
with ascatter plot first.
3
Scatter Diagram
4
The Problem Our Conclusions
My Line
Your Line
5
Scatter Diagram and Best Line
Centroid (mx,my)
(xi,yi)
ei
(x1,y1)
6
Linear Assumption
Now lets assume the relationship is linear. In
this case wealready know one point on the line
the mean
The trick Can we find a general a and b for the
line? Theanswer is YES (in some sense). In
fact, we are looking forestimators of a and b
(how do we estimate a and b from thedata
given?). The problem then is to find the
relation
7
Method of Least Squares
But if there is error in the relationship then we
have
Where ei is the residual error at each
observation. Now weactually have n such
observations, so that
8
Method of Least Squares (2)
The power in the residuals is
But P is also
We want to minimize P (the residual error power)
with
9
Least Squares Error Minimization
Minimization of P occurs when
And differentiating, we find the conditions are
(1)
(2)
10
Least Squares Error Minimization (2)
Rewrite (1) as
And since â is constant, we find
(3)
Or
11
Least Squares Error Minimization (3)
This result is exactly what we conjectured
before. Now weneed another point to make the
line. From (2) we have
(4)
Equations (3) and (4) are called the normal
equations. Theymust be solved simultaneously to
find . Solving, wefind
and
(5)
12
Least Squares Error Minimization (4)
And noting that
since
We find
(6)
13
Least Squares Error Minimization (5)
Equations (5) and (6) are most general and
computationallycumbersome. By recognizing that
(mx,my) is the point in the middle, we can
perform an axis transformation by centering this
centroidal point at the origin. The new
coordinates are
Also we note that now
14
Least Squares Error Minimization (6)
So from (5) and (6) we have
(7)
Which is equivalent to writing
(8)
15
Example
We will use the method of least squares in MATLAB
in orderto solve the equation of a line with
random errors. Given avector of observations
x0110 y -2.1129 4.8303 2.3898
7.0575 8.4386,... 8.1562 7.6587
13.8816 13.9787 19.2289 21.0155 plot(x,y,'
o') hold plot(2x) this was the ideal curve
y2x lspolyfit(x,y,2) this fits with least
squares bls(1) als(2) plot(b.xa)
16
Example (continued)
True Line y2x (red)
Least Squares Estimate (blue)
17
Correlation
Correlation measures the degree of relationship
between theindependent and dependent variables
Unexplained deviation
y
(xi,yi)
yi
(yi-my)
my
Total deviation
Explained deviation
mx
x
xi
18
Coefficient of Determination
Write the total deviation as
(9)
Square and sum both sides
(10)
Note, the cross term (from the result of normal
equations)
19
Coefficient of Determination (2)
The ratio of explained variation to total
variation, whichexplains how well the regression
line fits the observeddata is now written as
(11)
Note, this quantity lies between 0 and 1. If r2
1, then allpoints lie exactly on the regression
line. If r2 0, then theregression line does
not explain the data at all (there isno
relationship that can be drawn in this case).
20
Correlation Coefficient
Recall the slope of the least squares best fit
line is
(6)
If this quantity is 0, then the line is
horizontal. By swappingy for x in the
denominator, the slope measures regression ofx
on y. If x is not a function of y, then the
quantity
(12)
21
Correlation Coefficient (2)
But the slope measured as y vertical x horizontal
is
(13)
If no correlation exists between the two
variables beingstudied, the products of (11) and
(12) is zero
(14)
22
Correlation Coefficient (3)
For perfect correlation, the regression on x and
the regressionon y line up the two lines are
equal so that
(15)
Or we may equally write (for perfect correlation)
(16)
23
Correlation Coefficient (4)
The correlation coefficient is the square root of
(16)
(17)
Which can also be written in terms of centered
points, X, Y as
(18)
24
Correlation Coefficient (5)
The correlation coefficient can also be derived
from thecoefficient of determination as
(19)
Note, the correlation coefficient must lie in the
range
But in practice realistically 0ltrlt1.
25
Correlation Coefficient
Recall the slope of the least squares best fit
line is
(6)
If this quantity is 0, then the line is
horizontal. By swappingy for x in the
denominator, the slope measures regression ofx
on y. If x is not a function of y, then the
quantity
Write a Comment
User Comments (0)
About PowerShow.com