Action Research Correlation and Regression - PowerPoint PPT Presentation

1 / 57

About This Presentation

Title:

Action Research Correlation and Regression

Description:

Action Research Correlation and Regression – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 58

Provided by: gle9

Category:

more less

Transcript and Presenter's Notes

Title: Action Research Correlation and Regression

1
Action ResearchCorrelation and Regression

INFO 515
Glenn Booker

2
Measures of Association

Measures of association are used to determine how
strong the relationship is between two variables
or measures, and how we can predict such a
relationship
Only applies for interval or ratio scale
variables
Everything this week only applies to interval or
ratio scale variables!

3
Measures of Association

For example, I have GRE and GPA scores for a
random sample of graduate students
How strong is the relationship between GRE scores
and GPA? Do these variables relate to each other
in some way?
If there is a strong relationship, how well can
we predict the values of one variable when values
of the other variable are known?

4
Strength of Prediction

Two techniques are used to describe the strength
of a relationship, and predict values of one
variable when another variables value is known
Correlation Describes the degree (strength) to
which the two variables are related
Regression Used to predict the values of one
variable when values of the other are known

5
Strength of Prediction

Correlation and regression are linked -- the
ability to predict one variable when another
variable is known depends on the degree and
direction of the variables relationship in the
first place
We find correlation before we calculate
regression
So generating a regression without checking for a
correlation first is pointless (though well do
both at once)

6
Correlation

There are different types of statistical measures
of correlation
They give us a measure known as the correlation
coefficient
The most common procedure used is known as the
Pearsons Product Moment Correlation, or
Pearsons r

7
Pearsons r

Can only be calculated for interval or ratio
scale data
Its value is a real number from -1 to 1
Strength As the value of r approaches -1 or
1, the relationship is stronger. As the
magnitude of r approaches zero, we see little
or no relationship

8
Pearsons r

For example, r might equal 0.89, -0.9, 0.613,
or -0.3
Which would be the strongest correlation?
Direction Positive or negative correlation can
not be distinguished from looking at r
Direction of correlation depends on the type of
equation used, and the resulting constants
obtained for it

9
Example of Relationships

Positive direction -- as the independent variable
increases, the dependent variable tends to
increase
Student GRE (X) GPA1 (Y)
1 1500 4.0
2 1400 3.8
3 1250 3.5
4 1050 3.1
5 950 2.9

10
Example of Relationships

Negative direction -- as the dependent variable
increases, the independent variable decreases
Student GRE (X) GPA2 (Y)
1 1500 2.9
2 1400 3.1
3 1250 3.4
4 1050 3.7
5 950 4.0

11
Positive and Negative Correlation
Data from slide 9
Data from slide 10
Notice that high r doesnt tell whether the
correlation is positive or negative!
12
Important Note

An association value provided by a correlation
analysis, such as Pearsons r, tells us nothing
about causation
In this case, high GRE scores dont necessarily
cause high or low GPA scores, and vice versa

13
Significance of r

We can test for the significance of r (to see
whether our relationship is statistically
significant) by consulting a table of critical
values for r (Action Research p. 41/42)
Table VALUES OF THE CORRELATION COEFFICIENT FOR
DIFFERENT LEVELS OF SIGNIFICANCE
Where df (number of data pairs) 2

14
Significance of r

We test the null hypothesis that the correlation
between the two variables is equal to zero (there
is no relationship between them)
Reject the null hypothesis (H0) if the absolute
value of r is greater than the critical r value
Reject H0 if r gt rcrit
This is similar to evaluating actual versus
critical t values

15
Significance of r Example

So if we had 20 pairs of data
For two-tail 95 confidence (P.05), the critical
r value at df20-218 is 0.444
So reject the null hypothesis (hence correlation
is statistically significant) if
r gt 0.444 or r lt -0.444

16
Strength of r

Absolute value of Pearsons r indicates the
strength of a correlation
1.0 to 0.9 very strong correlation
0.9 to 0.7 strong
0.7 to 0.4 moderate to substantial
0.4 to 0.2 moderate to low
0.2 to 0.0 low to negligible correlation
Notice that a correlation can be strong, but
still not be statistically significant!
(especially for small data sets)

17
Important Notes

The stronger the r, the smaller the standard
estimate of the error, the better the prediction!
A significant r does not necessarily mean that
you have a strong correlation
A significant r means that whatever correlation
you do have is not due to random chance

18
Coefficient of Determination

By squaring r, we can determine the amount of
variance the two variables share (called
explained variance)
R Square is the coefficient of determination
So, an R Square of 0.94 means that 94 of the
variance in the Y variable is explained by the
variance of the X variable

19
What is R Squared?

The Coefficient of determination, R2, is a
measure of the goodness of fit
R2 ranges from 0 to 1
R2 1 is a perfect fit (all data points fall on
the estimated line or curve)
R2 0 means that the variable(s) have no
explanatory power

20
What is R Squared?

Having R2 closer to 1 helps choose which
regression model is best suited to a problem
Having R2 actually equal zero is very difficult
A sample of ten random numbers from Excel still
obtained an R2 of 0.006

21
Scatter Plots

Its nice to use R2 to determine the strength of
a relationship, but visual feedback helps verify
whether the model fits the data well
Also helps look for data fliers (outliers)
A scatter plot (or scatter gram) allows us to
compare any two interval or ratio scale
variables, and see how data points are related to
each other

22
Scatter Plots

Scatter plots are two-dimensional graphs with an
axis for each variable (independent variable X
and dependent variable Y)
To construct place an on the graph for each X
and Y value from the data
Seeing data this way can help choose the correct
mathematical model for the data

23
Scatter Plots
24
Models

Allow us to focus on select elements of the
problem at hand, and ignore irrelevant ones
May show how parts of the problem relate to each
other
May be expressed as equations, mappings, or
diagrams
May be chosen or derived before or after
measurement (theory vs. empirical)

25
Modeling

Often we look for a linear relationship one
described by fitting a straight line as well to
the data as possible
More generally, any equation could be used as the
basis for regression modeling, or describing the
relationship between two variables
You could have Y aX2 bln(X)
csin(dX-e)

26
Linear Model
27
Linear Model

Pearsons r for linear regression is calculated
per (Action Research p. 29/30)
Define N number of data pairs SX Sum of all
X values SX2 Sum of all (X values squared) SY
Sum of all Y values SY2 Sum of all (Y values
squared) SXY Sum of all (X values times Y
values)
Pearsons r N(SXY) (SX)(SY) /
sqrt(N(SX2) (SX)2)(N(SY2) (SY)2)

28
Linear Model

For the linear model, you could find the slope
m and Y-intercept b from
m (r) (standard deviation of Y) / (standard
deviation of X)
b (mean of Y) (m)(mean of X)
But its a lot easier to use SPSS slopeb1 and
Y intercept b0

29
Regression Analysis

Allows us to predict the likely value of one
variable from knowledge of another variable
The two variables should be fairly highly
correlated (close to a straight line)
The regression equation is a mathematical
expression of the relationship between 2
variables on, for example, a straight line

30
Regression Equation

Y mX b
In this linear equation, you predict Y values
(the dependent variable) from known values of X
(the independent variable) this is called the
regression of Y on X
The regression equation is fundamentally an
equation for plotting a straight line, so the
stronger our correlation -- the closer our
variables will fall to a straight line, and the
better our prediction will be

31
Linear Regression
y

y
y

y a bx

y y e
x
Choose best line by minimizing the sum of the
squares of the vertical distances between the
data points and the regression line
32
Standard Error of the Estimate

Is the standard deviation of data around the
regression line
Tells how much the actual values of Y deviate
from the predicted values of Y

33
Standard Error of the Estimate

After you calculate the standard error of the
estimate, you add and subtract the value from
your predicted values of Y to get a area around
the regression line within which you would expect
repeated actual values to occur or cluster if you
took many samples (sort of like a sampling
distribution for the mean.)

34
Standard Error of Estimate

The Standard Error of Estimate for Y predicted by
X issy/x sqrtsum of(Ypredicted Y)2
/(N2)where Y is each actual Y
valuepredicted Y is the Y value predicted by
the linear regressionN is the number of data
pairs
For example on (Action Research p. 33/34), Sy/x
sqrt(2.641/(10-2)) 0.574

35
Standard Error of the Estimate

So, if the standard error of the estimate is
equal to 0.574, and if you have a predicted Y
value of 4.560, then 68 of your actual values,
with repeated sampling, would fall between 3.986
and 5.134 (predicted Y /- 1 std error)
The smaller the standard error, the closer your
actual values are to the regression line, and
the more confident you can be in your prediction

36
SPSS Regression Equations

Instead of constants called m and b, b0
and b1 are used for most equations
The meaning of b0 and b1 varies, depending on
the type of equation which is being modeled
Can repress the use of b0 by unchecking
Include constant in equation

37
SPSS Regression Models

Linear modelY b0 b1X
Logarithmic modelY b0 b1ln(X) where ln
natural log
Inverse model Y b0 b1/XSimilar to the form
XY constant, which is a hyperbola

38
SPSS Regression Models

Power modelY b0(Xb1)
Compound model Y b0(b1X)
A variant of this is the Logistic model, which
requires a constant input u which is larger
than Y for any actual data pointY 1/ 1/u
b0(b1X)

Where indicates to the power of
39
SPSS Regression Models
exp means e to the power ofe 2.7182818

Exponential model Y b0exp(b1X)
Other exponential functions
S modelY exp(b0 b1/X)
Growth model (is almost identical to the
exponential model)Y exp(b0 b1X)

40
SPSS Regression Models

Polynomials beyond the Linear model (linear is a
first order polynomial)
Quadratic (second order)Y b0 b1X b2X2
Cubic (third order)Y b0 b1X b2X2
b3X3These are the only equations which use
constants b2 b3
Higher order polynomials require the Regression
module of SPSS, which can do regression using any
equation you enter

41
Y whattheflock?

To help picture these equations
Make an X variable over some typical range (0 to
10 in a small increment, maybe 0.01)
Define a Y variable
Calculate the Y variable using Transform gt
Compute and whatever equation you want to see
Pick values for b0 and b1 that arent 0, 1, or 2
Have SPSS plot the results of a regression of Y
vs X for that type of equation

42
How Apply This?

Given a set of data containing two variables of
interest, generate a scatter plot to get some
idea of what the data looks like
Choose which types of models are most likely to
be useful
For only linear models, use Analyze / Regression
/ Linear...

43
How Apply This?

Select the Independent (X) and Dependent (Y)
variables
Rules may be applied to limit the scope of the
analysis, e.g. gender1
Dozens of other characteristics may also be
obtained, which are beyond our scope here

44
How Apply This?

Then check for the R Square value in the Model
Summary
Check the Coefficients to make sure they are all
significant (e.g. Sig. lt 0.050)
If so, use the b0 and b1 coefficients from
under the B column (see Statistics for Software
Process Improvement handout), plus or minus the
standard errors SE B

45
Regression Example

For example, go back to the GSS91
political.sav data set
Generate a linear regression (Analyze gt
Regression gt Linear) for age as the Independent
variable, and partyid as the Dependent variable
Notice that R2 and the ANOVA summary are given,
with F and its significance

46
Regression Example
47
Regression Example

The R Square of 0.006 means there is a very
slight correlation (little strength)
But the ANOVA Significance well under 0.050
confirms there is a statistically significant
relationship here - its just a really weak one

48
Regression Example
49
Regression Example

The heart of the regression analysis is in the
Coefficients section
We could look up t on a critical values table,
but its easier to
See if all values of Sig are lt 0.050 - if they
are, reject the null hypothesis, meaning there is
a significant relationship
If so, use the values under B for b0 and b1
If any coefficient has Sig gt 0.050, dont use
that regression (coeff might be zero)

50
Regression Example

The answer for what is the effect of age on
political view? is that there is a very weak but
statistically significant linear relationship,
with a reduction of 0.009 (b1) political view
categories per year
From the Variable View of the data, since low
values are liberal and large values conservative,
this means that people tend to get slightly more
liberal as they get older

51
Curve Estimation Example

For the other regression options, choose Analyze
/ Regression / Curve Estimation
Define the Dependents (variable) and the
Independent variable - note that multiple
Dependents may be selected
Check which math models you want used
Display the ANOVA table for reference

52
Curve Estimation Example

SPSS Tip up to three regression models can be
plotted at once, so dont select more than that
if you want a scatter plot to go with the data
and the regressions
For the same example just used, get a summary for
the linear and quadratic models (Analyze gt
Regression gt Curve Estimation)
Find R Square for each model
Generally pick the model with largest R Square
Already saw Linear output, now see Quadratic

53
Curve Estimation Example

For the quadratic regression, R Square is
slightly higher, and the ANOVA is still
significant

54
Curve Estimation Example

The Quadratic coefficients are all significant at
the 0.050 level

Interpret as partyid (4.191 /- 0.412)
(-0.048 /- 0.018)age
(0.0003918/- 0.0001754)age2Edit the
data table, then double click on the cells to get
the values of b2 and its std error.
55
Curve Estimation Example

The data set will be plotted as the Observed
points, with the regression models shown for
comparison
Look to see which model most closely matches the
data
Look for regions of data which do or dont match
the model well (if any)

56
Curve Estimation Example
57
Curve Estimation Procedure

See which models are significant (throw out the
rest!)
Compare the R Square values to see which provides
the best fit
Use the graph to verify visually that the correct
model was chosen
Use the model equations B values and their
standard errors to describe and predict the
datas behavior

Write a Comment

User Comments (0)