Action Research Correlation and Regression - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Action Research Correlation and Regression

Description:

Action Research Correlation and Regression – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 58
Provided by: gle9
Category:

less

Transcript and Presenter's Notes

Title: Action Research Correlation and Regression


1
Action ResearchCorrelation and Regression
  • INFO 515
  • Glenn Booker

2
Measures of Association
  • Measures of association are used to determine how
    strong the relationship is between two variables
    or measures, and how we can predict such a
    relationship
  • Only applies for interval or ratio scale
    variables
  • Everything this week only applies to interval or
    ratio scale variables!

3
Measures of Association
  • For example, I have GRE and GPA scores for a
    random sample of graduate students
  • How strong is the relationship between GRE scores
    and GPA? Do these variables relate to each other
    in some way?
  • If there is a strong relationship, how well can
    we predict the values of one variable when values
    of the other variable are known?

4
Strength of Prediction
  • Two techniques are used to describe the strength
    of a relationship, and predict values of one
    variable when another variables value is known
  • Correlation Describes the degree (strength) to
    which the two variables are related
  • Regression Used to predict the values of one
    variable when values of the other are known

5
Strength of Prediction
  • Correlation and regression are linked -- the
    ability to predict one variable when another
    variable is known depends on the degree and
    direction of the variables relationship in the
    first place
  • We find correlation before we calculate
    regression
  • So generating a regression without checking for a
    correlation first is pointless (though well do
    both at once)

6
Correlation
  • There are different types of statistical measures
    of correlation
  • They give us a measure known as the correlation
    coefficient
  • The most common procedure used is known as the
    Pearsons Product Moment Correlation, or
    Pearsons r

7
Pearsons r
  • Can only be calculated for interval or ratio
    scale data
  • Its value is a real number from -1 to 1
  • Strength As the value of r approaches -1 or
    1, the relationship is stronger. As the
    magnitude of r approaches zero, we see little
    or no relationship

8
Pearsons r
  • For example, r might equal 0.89, -0.9, 0.613,
    or -0.3
  • Which would be the strongest correlation?
  • Direction Positive or negative correlation can
    not be distinguished from looking at r
  • Direction of correlation depends on the type of
    equation used, and the resulting constants
    obtained for it

9
Example of Relationships
  • Positive direction -- as the independent variable
    increases, the dependent variable tends to
    increase
  • Student GRE (X) GPA1 (Y)
  • 1 1500 4.0
  • 2 1400 3.8
  • 3 1250 3.5
  • 4 1050 3.1
  • 5 950 2.9

10
Example of Relationships
  • Negative direction -- as the dependent variable
    increases, the independent variable decreases
  • Student GRE (X) GPA2 (Y)
  • 1 1500 2.9
  • 2 1400 3.1
  • 3 1250 3.4
  • 4 1050 3.7
  • 5 950 4.0

11
Positive and Negative Correlation
Data from slide 9
Data from slide 10
Notice that high r doesnt tell whether the
correlation is positive or negative!
12
Important Note
  • An association value provided by a correlation
    analysis, such as Pearsons r, tells us nothing
    about causation
  • In this case, high GRE scores dont necessarily
    cause high or low GPA scores, and vice versa

13
Significance of r
  • We can test for the significance of r (to see
    whether our relationship is statistically
    significant) by consulting a table of critical
    values for r (Action Research p. 41/42)
  • Table VALUES OF THE CORRELATION COEFFICIENT FOR
    DIFFERENT LEVELS OF SIGNIFICANCE
  • Where df (number of data pairs) 2

14
Significance of r
  • We test the null hypothesis that the correlation
    between the two variables is equal to zero (there
    is no relationship between them)
  • Reject the null hypothesis (H0) if the absolute
    value of r is greater than the critical r value
  • Reject H0 if r gt rcrit
  • This is similar to evaluating actual versus
    critical t values

15
Significance of r Example
  • So if we had 20 pairs of data
  • For two-tail 95 confidence (P.05), the critical
    r value at df20-218 is 0.444
  • So reject the null hypothesis (hence correlation
    is statistically significant) if
  • r gt 0.444 or r lt -0.444

16
Strength of r
  • Absolute value of Pearsons r indicates the
    strength of a correlation
  • 1.0 to 0.9 very strong correlation
  • 0.9 to 0.7 strong
  • 0.7 to 0.4 moderate to substantial
  • 0.4 to 0.2 moderate to low
  • 0.2 to 0.0 low to negligible correlation
  • Notice that a correlation can be strong, but
    still not be statistically significant!
    (especially for small data sets)

17
Important Notes
  • The stronger the r, the smaller the standard
    estimate of the error, the better the prediction!
  • A significant r does not necessarily mean that
    you have a strong correlation
  • A significant r means that whatever correlation
    you do have is not due to random chance

18
Coefficient of Determination
  • By squaring r, we can determine the amount of
    variance the two variables share (called
    explained variance)
  • R Square is the coefficient of determination
  • So, an R Square of 0.94 means that 94 of the
    variance in the Y variable is explained by the
    variance of the X variable

19
What is R Squared?
  • The Coefficient of determination, R2, is a
    measure of the goodness of fit
  • R2 ranges from 0 to 1
  • R2 1 is a perfect fit (all data points fall on
    the estimated line or curve)
  • R2 0 means that the variable(s) have no
    explanatory power

20
What is R Squared?
  • Having R2 closer to 1 helps choose which
    regression model is best suited to a problem
  • Having R2 actually equal zero is very difficult
  • A sample of ten random numbers from Excel still
    obtained an R2 of 0.006

21
Scatter Plots
  • Its nice to use R2 to determine the strength of
    a relationship, but visual feedback helps verify
    whether the model fits the data well
  • Also helps look for data fliers (outliers)
  • A scatter plot (or scatter gram) allows us to
    compare any two interval or ratio scale
    variables, and see how data points are related to
    each other

22
Scatter Plots
  • Scatter plots are two-dimensional graphs with an
    axis for each variable (independent variable X
    and dependent variable Y)
  • To construct place an on the graph for each X
    and Y value from the data
  • Seeing data this way can help choose the correct
    mathematical model for the data

23
Scatter Plots
24
Models
  • Allow us to focus on select elements of the
    problem at hand, and ignore irrelevant ones
  • May show how parts of the problem relate to each
    other
  • May be expressed as equations, mappings, or
    diagrams
  • May be chosen or derived before or after
    measurement (theory vs. empirical)

25
Modeling
  • Often we look for a linear relationship one
    described by fitting a straight line as well to
    the data as possible
  • More generally, any equation could be used as the
    basis for regression modeling, or describing the
    relationship between two variables
  • You could have Y aX2 bln(X)
    csin(dX-e)

26
Linear Model
27
Linear Model
  • Pearsons r for linear regression is calculated
    per (Action Research p. 29/30)
  • Define N number of data pairs SX Sum of all
    X values SX2 Sum of all (X values squared) SY
    Sum of all Y values SY2 Sum of all (Y values
    squared) SXY Sum of all (X values times Y
    values)
  • Pearsons r N(SXY) (SX)(SY) /
    sqrt(N(SX2) (SX)2)(N(SY2) (SY)2)

28
Linear Model
  • For the linear model, you could find the slope
    m and Y-intercept b from
  • m (r) (standard deviation of Y) / (standard
    deviation of X)
  • b (mean of Y) (m)(mean of X)
  • But its a lot easier to use SPSS slopeb1 and
    Y intercept b0

29
Regression Analysis
  • Allows us to predict the likely value of one
    variable from knowledge of another variable
  • The two variables should be fairly highly
    correlated (close to a straight line)
  • The regression equation is a mathematical
    expression of the relationship between 2
    variables on, for example, a straight line

30
Regression Equation
  • Y mX b
  • In this linear equation, you predict Y values
    (the dependent variable) from known values of X
    (the independent variable) this is called the
    regression of Y on X
  • The regression equation is fundamentally an
    equation for plotting a straight line, so the
    stronger our correlation -- the closer our
    variables will fall to a straight line, and the
    better our prediction will be

31
Linear Regression
y

y
y

y a bx

y y e
x
Choose best line by minimizing the sum of the
squares of the vertical distances between the
data points and the regression line
32
Standard Error of the Estimate
  • Is the standard deviation of data around the
    regression line
  • Tells how much the actual values of Y deviate
    from the predicted values of Y

33
Standard Error of the Estimate
  • After you calculate the standard error of the
    estimate, you add and subtract the value from
    your predicted values of Y to get a area around
    the regression line within which you would expect
    repeated actual values to occur or cluster if you
    took many samples (sort of like a sampling
    distribution for the mean.)

34
Standard Error of Estimate
  • The Standard Error of Estimate for Y predicted by
    X issy/x sqrtsum of(Ypredicted Y)2
    /(N2)where Y is each actual Y
    valuepredicted Y is the Y value predicted by
    the linear regressionN is the number of data
    pairs
  • For example on (Action Research p. 33/34), Sy/x
    sqrt(2.641/(10-2)) 0.574

35
Standard Error of the Estimate
  • So, if the standard error of the estimate is
    equal to 0.574, and if you have a predicted Y
    value of 4.560, then 68 of your actual values,
    with repeated sampling, would fall between 3.986
    and 5.134 (predicted Y /- 1 std error)
  • The smaller the standard error, the closer your
    actual values are to the regression line, and
    the more confident you can be in your prediction

36
SPSS Regression Equations
  • Instead of constants called m and b, b0
    and b1 are used for most equations
  • The meaning of b0 and b1 varies, depending on
    the type of equation which is being modeled
  • Can repress the use of b0 by unchecking
    Include constant in equation

37
SPSS Regression Models
  • Linear modelY b0 b1X
  • Logarithmic modelY b0 b1ln(X) where ln
    natural log
  • Inverse model Y b0 b1/XSimilar to the form
    XY constant, which is a hyperbola

38
SPSS Regression Models
  • Power modelY b0(Xb1)
  • Compound model Y b0(b1X)
  • A variant of this is the Logistic model, which
    requires a constant input u which is larger
    than Y for any actual data pointY 1/ 1/u
    b0(b1X)

Where indicates to the power of
39
SPSS Regression Models
exp means e to the power ofe 2.7182818
  • Exponential model Y b0exp(b1X)
  • Other exponential functions
  • S modelY exp(b0 b1/X)
  • Growth model (is almost identical to the
    exponential model)Y exp(b0 b1X)

40
SPSS Regression Models
  • Polynomials beyond the Linear model (linear is a
    first order polynomial)
  • Quadratic (second order)Y b0 b1X b2X2
  • Cubic (third order)Y b0 b1X b2X2
    b3X3These are the only equations which use
    constants b2 b3
  • Higher order polynomials require the Regression
    module of SPSS, which can do regression using any
    equation you enter

41
Y whattheflock?
  • To help picture these equations
  • Make an X variable over some typical range (0 to
    10 in a small increment, maybe 0.01)
  • Define a Y variable
  • Calculate the Y variable using Transform gt
    Compute and whatever equation you want to see
  • Pick values for b0 and b1 that arent 0, 1, or 2
  • Have SPSS plot the results of a regression of Y
    vs X for that type of equation

42
How Apply This?
  • Given a set of data containing two variables of
    interest, generate a scatter plot to get some
    idea of what the data looks like
  • Choose which types of models are most likely to
    be useful
  • For only linear models, use Analyze / Regression
    / Linear...

43
How Apply This?
  • Select the Independent (X) and Dependent (Y)
    variables
  • Rules may be applied to limit the scope of the
    analysis, e.g. gender1
  • Dozens of other characteristics may also be
    obtained, which are beyond our scope here

44
How Apply This?
  • Then check for the R Square value in the Model
    Summary
  • Check the Coefficients to make sure they are all
    significant (e.g. Sig. lt 0.050)
  • If so, use the b0 and b1 coefficients from
    under the B column (see Statistics for Software
    Process Improvement handout), plus or minus the
    standard errors SE B

45
Regression Example
  • For example, go back to the GSS91
    political.sav data set
  • Generate a linear regression (Analyze gt
    Regression gt Linear) for age as the Independent
    variable, and partyid as the Dependent variable
  • Notice that R2 and the ANOVA summary are given,
    with F and its significance

46
Regression Example
47
Regression Example
  • The R Square of 0.006 means there is a very
    slight correlation (little strength)
  • But the ANOVA Significance well under 0.050
    confirms there is a statistically significant
    relationship here - its just a really weak one

48
Regression Example
49
Regression Example
  • The heart of the regression analysis is in the
    Coefficients section
  • We could look up t on a critical values table,
    but its easier to
  • See if all values of Sig are lt 0.050 - if they
    are, reject the null hypothesis, meaning there is
    a significant relationship
  • If so, use the values under B for b0 and b1
  • If any coefficient has Sig gt 0.050, dont use
    that regression (coeff might be zero)

50
Regression Example
  • The answer for what is the effect of age on
    political view? is that there is a very weak but
    statistically significant linear relationship,
    with a reduction of 0.009 (b1) political view
    categories per year
  • From the Variable View of the data, since low
    values are liberal and large values conservative,
    this means that people tend to get slightly more
    liberal as they get older

51
Curve Estimation Example
  • For the other regression options, choose Analyze
    / Regression / Curve Estimation
  • Define the Dependents (variable) and the
    Independent variable - note that multiple
    Dependents may be selected
  • Check which math models you want used
  • Display the ANOVA table for reference

52
Curve Estimation Example
  • SPSS Tip up to three regression models can be
    plotted at once, so dont select more than that
    if you want a scatter plot to go with the data
    and the regressions
  • For the same example just used, get a summary for
    the linear and quadratic models (Analyze gt
    Regression gt Curve Estimation)
  • Find R Square for each model
  • Generally pick the model with largest R Square
  • Already saw Linear output, now see Quadratic

53
Curve Estimation Example
  • For the quadratic regression, R Square is
    slightly higher, and the ANOVA is still
    significant

54
Curve Estimation Example
  • The Quadratic coefficients are all significant at
    the 0.050 level

Interpret as partyid (4.191 /- 0.412)
(-0.048 /- 0.018)age
(0.0003918/- 0.0001754)age2Edit the
data table, then double click on the cells to get
the values of b2 and its std error.
55
Curve Estimation Example
  • The data set will be plotted as the Observed
    points, with the regression models shown for
    comparison
  • Look to see which model most closely matches the
    data
  • Look for regions of data which do or dont match
    the model well (if any)

56
Curve Estimation Example
57
Curve Estimation Procedure
  • See which models are significant (throw out the
    rest!)
  • Compare the R Square values to see which provides
    the best fit
  • Use the graph to verify visually that the correct
    model was chosen
  • Use the model equations B values and their
    standard errors to describe and predict the
    datas behavior
Write a Comment
User Comments (0)
About PowerShow.com