Lab 13 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Lab 13

Description:

Regression partials out the variance in the DV (in this case IV1) ... Add collinear variables and rerun. data d1; input (bmi percent anxiety image disorder)(5.0) ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 28
Provided by: lisawil5
Category:
Tags: collinear | lab

less

Transcript and Presenter's Notes

Title: Lab 13


1
Lab 13
  • Partial Semipartial Correlations, Collinearity,
    and Nonlinear Trends

2
Partial and Semipartial Correlation
  • Partial Correlation correlation between two
    variables with the effects of a 3rd variable
    removed.
  • To test need to remove the variance attributable
    to the 3rd variable and then compute the
    correlation between the two remaining variables.

3
Partial and Semipartial Correlation (cont.)
  • Run a regression of IV1 on IV3.
  • Regression partials out the variance in the DV
    (in this case IV1) into two things variance due
    to the IV (R-squared) and variance not due to the
    IV (residuals).
  • Run a second regression of IV2 on IV3.
  • Compute a correlation between the residuals from
    the first regression and the residuals from the
    second regression. This is a correlation between
    IV1 and IV2 with IV3 partialed out.

4
Example of Partial Correlation
  • Want to know the correlation between education
    and salary. We predict that gender and minority
    of the employees will influence this correlation,
    we are going to partial out their influence.
  • Compute correlation between education and salary
    controlling for gender and minority status.

5
Example of Partial Correlation
  • data d1
  • infile 'C\WINDOWS\Desktop\lab13.txt'
  • input id sex hiredat educ title salary
    startsal jobtime prevexp minority
  • if sex "Male" then gender 1
  • if sex "Female" then gender 2
  • if minority "Yes" then minor 1
  • if minority "No" then minor 2
  • proc reg
  • model salary gender minor
  • output outdata2 rr1
  • proc reg
  • model educ gender minor
  • output outdata3 rr2
  • data merged
  • merge data2 data3
  • proc corr datamerged
  • var salary educ gender minor r1 r2
  • run

6
Output for regressing salary on gender and
minority status
  • Model MODEL1
  • Dependent
    Variable salary
  • Analysis
    of Variance

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt
    F
  • Model 2
    34116446497 17058223249 77.40 lt.0001
  • Error 471
    1.038E11 220382270
  • Corrected Total 473 1.379165E11
  • Root MSE
    14845 R-Square 0.2474
  • Dependent Mean
    34420 Adj R-Sq 0.2442
  • Coeff Var
    43.13034
  • Parameter
    Estimates
  • Parameter
    Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 42051
    3496.63766 12.03 lt.0001
  • gender 1 -15961
    1373.05406 -11.62 lt.0001
  • minor 1 8762.76693
    1652.36821 5.30 lt.0001

7
Output for regressing education on gender and
minority
  • Model MODEL1
  • Dependent
    Variable educ
  • Analysis
    of Variance

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt F
  • Model 2
    599.98412 299.99206 42.35 lt.0001
  • Error 471
    3336.48213 7.08383
  • Corrected Total 473 3936.46624
  • Root MSE
    2.66155 R-Square 0.1524
  • Dependent Mean 13.49156
    Adj R-Sq 0.1488
  • Coeff Var
    19.72749
  • Parameter
    Estimates

  • Parameter Standar
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 14.59945
    0.62690 23.29 lt.0001
  • gender 1 -2.13024
    0.24617 -8.65 lt.0001
  • minor 1 1.11934
    0.29625 3.78 0.0002

8
Example of Partial Correlation - Output
  • Simple Statistics
  • Variable N Mean Std Dev
    Sum Minimum Maximum
  • salary 474 34420 17076
    16314875 15750 135000
  • educ 474 13.49156 2.88485
    6395 8.00000 21.00000
  • gender 474 1.45570 0.49856
    690.000 1.00000 2.00000
  • minor 474 1.78059 0.41428
    844.000 1.00000 2.00000
  • r1 474 0
    14814 0 -22315 91385
  • r2 474 0
    2.65591 0 -6.70790 6.29210

9
Example of Partial Correlation Output (cont.)
  • Pearson Correlation
    Coefficients, N 474
  • Prob gt r
    under H0 Rho0
  • salary educ
    gender minor r1 r2
  • salary 1.00000 0.66056 -0.44992
    0.17734 0.86754 0.50662
  • lt.0001
    lt.0001 0.0001 lt.0001 lt.0001
  • educ 0.66056 1.00000 -0.35599
    0.13289 0.53763 0.92064
  • lt.0001
    lt.0001 0.0038 lt.0001 lt.0001
  • gender -0.44992 -0.35599 1.00000
    0.07567 0.00000 0.00000
  • lt.0001 lt.0001
    0.0999 1.0000 1.0000
  • minor 0.17734 0.13289 0.07567
    1.00000 0.00000 0.00000
  • 0.0001 0.0038
    0.0999 1.0000 1.0000
  • r1 0.86754 0.53763
    0.000 0.0000 1.00000 0.58397
  • Residual lt.0001 lt.0001 1.000
    1.0000 lt.0001

10
Collinearity
  • Collinearity means that within the set of IVs,
    some of the IVs are (nearly) totally predicted by
    the other IVs.
  • Diagnostics
  • Correlation matrix look for large correlations
    between IVs
  • Variance Inflation Factor (VIF) look for values
    greater than 10.
  • Tolerance look for small values close to zero
  • Condition indices Look for values greater than
    30. Collinearity is spotted by finding 2 or
    more variables that have large proportions of
    variance (.50 or more) that correspond to large
    condition indices.

11
Example of Collinearity analysis
  • Research on eating disorders.
  • BMI is used to approximate body fat.
  • Percent overweight
  • Appearance anxiety
  • Body image
  • Eating disorder measures the amount of behaviors
    that signal an eating disorder
  • Check for collinearity by running a correlation
    matrix and regression analysis.

12
Program
  • data d1
  • input (bmi percent anxiety image disorder)(5.0)
  • cards
  • proc corr
  • proc reg
  • model disorder bmi percent anxiety image /vif
    tol collin
  • run

13
Proc Corr Output
  • Pearson Correlation Coefficients, N 235
  • Prob gt r under H0 Rho0
  • bmi percent
    anxiety image disorder
  • bmi 1.00000 0.97992 0.43771
    0.65529 0.54376
  • lt.0001
    lt.0001 lt.0001 lt.0001
  • percent 0.97992 1.00000 0.40085
    0.68914 0.48138
  • lt.0001
    lt.0001 lt.0001 lt.0001
  • anxiety 0.43771 0.40085 1.00000
    0.33633 0.14723
  • lt.0001 lt.0001
    lt.0001 0.0240
  • image 0.65529 0.68914 0.33633
    1.00000 0.36574
  • lt.0001 lt.0001
    lt.0001 lt.0001
  • disorder 0.54376 0.48138 0.14723
    0.36574 1.00000
  • lt.0001 lt.0001
    0.0240 lt.0001

14
Proc Reg Output
  • Analysis of Variance
  • Sum of
    Mean
  • Source DF Squares
    Square F Value Pr gt F
  • Model 4 10628
    2657.02322 37.79 lt.0001
  • Error 230 16171
    70.30887
  • Corrected Total 234 26799
  • Root MSE 8.38504
    R-Square 0.3966
  • Dependent Mean 80.18298 Adj R-Sq
    0.3861
  • Coeff Var 10.45738
  • Parameter Estimates
  • Parameter
    Standard
  • Variable DF Estimate Error
    t Value Pr gt t
  • Intercept 1 56.92129 8.20539
    6.94 lt.0001
  • bmi 1 2.11256 0.27077
    7.80 lt.0001
  • percent 1 -1.61429 0.27513
    -5.87 lt.0001
  • anxiety 1 -0.19021 0.06199
    -3.07 0.0024
  • image 1 0.18510 0.08071
    2.29 0.0227

15
Collinearity Output (tol and VIF)
  • Parameter Estimates

  • Variance
  • Variable DF Tolerance
    Inflation
  • Intercept 1 .
    0
  • bmi 1 0.03632
    27.53589
  • percent 1 0.03460
    28.90430
  • anxiety 1 0.77532
    1.28979
  • image 1 0.50634
    1.97497

16
Collinearity Output (collin)
  • Collinearity Diagnostics

  • Condition
  • Number Eigenvalue Index
  • 1 4.97499
    1.00000
  • 2 0.01424
    18.69085
  • 3 0.00840
    24.33043
  • 4 0.00207
    48.99494
  • 5 0.00029641
    129.55302
  • Collinearity
    Diagnostics
  • --------------------Proportion of
    Variation--------------------
  • Number Intercept bmi percent
    anxiety image
  • 1 0.00017 0.00003 0.00002
    0.00044 0.000115
  • 2 0.06825 0.01779 0.00724
    0.16035 0.00312
  • 3 0.13010 0.00157 0.00003
    0.77013 0.04982
  • 4 0.70537 0.00941 0.00277
    0.01361 0.87007
  • 5 0.09609 0.97120 0.98993
    0.05546 0.07688

17
Add collinear variables and rerun
  • data d1
  • input (bmi percent anxiety image disorder)(5.0)
  • bmiperc bmi percent
  • cards
  • proc corr
  • proc reg
  • model disorder bmi percent anxiety image /vif
    tol collin
  • run

18
Proc Corr Output
  • Pearson Correlation Coefficients, N 235
  • Prob gt r under H0 Rho0
  • bmiperc anxiety
    image
  • bmiperc 1.00000 0.42132
    0.67569
  • lt.0001
    lt.0001
  • anxiety 0.42132 1.00000
    0.33633
  • lt.0001
    lt.0001
  • image 0.67569 0.33633
    1.00000
  • lt.0001 lt.0001

19
Proc Reg Output
  • Sum
    of Mean
  • Source DF Squares
    Square F Value Pr gt F
  • Model 3 7291.63332
    2430.54444 28.78 lt.0001
  • Error 231 19507
    84.44805
  • Corrected Total 234 26799
  • Root MSE 9.18956
    R-Square 0.2721
  • Dependent Mean 80.18298 Adj
    R-Sq 0.2626
  • Coeff Var 11.46074
  • Parameter Estimates
  • Parameter
    Standard
  • Variable DF Estimate Error
    t Value Pr gt t
  • Intercept 1 39.18791 8.53866
    4.59 lt.0001
  • bmiperc 1 0.26428 0.03998
    6.61 lt.0001
  • anxiety 1 -0.09313 0.06615
    -1.41 0.1605
  • image 1 0.04590 0.08563
    0.54 0.5924

20
Collinearity Diagnostics

  • Variance
  • Variable DF Tolerance
    Inflation
  • Intercept 1 .
    0
  • bmiperc 1 0.50098
    1.99609
  • anxiety 1 0.81758
    1.22313
  • image 1 0.54020
    1.85116
  • Condition
  • Number Eigenvalue
    Index
  • 1 3.98060
    1.00000
  • 2 0.00943
    20.54641
  • 3 0.00799
    22.32086
  • 4 0.00198
    44.82460
  • -----------------Proportion of
    Variation----------------
  • Number Intercept bmiperc
    anxiety image
  • 1 0.00030972 0.00052415
    0.00072670 0.00019231
  • 2 0.01865 0.41736
    0.56626 0.01061
  • 3 0.26238 0.19662
    0.41635 0.03812
  • 4 0.71866 0.38550
    0.01666 0.95108

21
Testing Interactions with Regression
  • data d1
  • input id sex hiredat educ title salary
    startsal jobtime prevexp minority
  • if sex "Male" then gender 1
  • if sex "Female" then gender 2
  • inter genderprevexp
  • cards
  • proc reg
  • model salary gender prevexp
  • proc reg
  • model salary gender prevexp inter
  • run

22
Proc Reg Output w/out interaction

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt
    F
  • Model 2
    32095090228 16047545114 71.43 lt.0001
  • Error 471
    1.058214E11 224673896
  • Corrected Total 473 1.379165E11
  • Root MSE
    14989 R-Square 0.2327
  • Dependent Mean
    34420 Adj R-Sq 0.2295
  • Coeff Var
    43.54827
  • Parameter
    Estimates
  • Parameter
    Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 61063
    2340.43528 26.09 lt.0001
  • prevexp 1 -28.80629
    6.68120 -4.31 lt.0001
  • gender 1 -16406
    1401.56098 -11.71 lt.0001

23
Proc Reg Output w/out interaction

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt
    F
  • Model 3
    32501255237 10833751746 48.30 lt.0001
  • Error 470
    1.054152E11 224287745
  • Corrected Total 473
    1.379165E11
  • Root MSE
    14976 R-Square 0.2357
  • Dependent Mean 34420
    Adj R-Sq 0.2308
  • Coeff Var
    43.51083
  • Parameter
    Estimates

  • Parameter Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 63525
    2969.20179 21.39 lt.0001
  • prevexp 1 -54.37887
    20.14155 -2.70 0.0072
  • gender 1 -18074
    1870.07488 -9.66 lt.0001
  • inter 1 18.45578
    13.71463 1.35 0.1790

24
Significant interaction, run proc corr and proc
gplot
  • data d1
  • input id sex hiredat educ title salary
    startsal jobtime prevexp minority
  • if sex "Male" then gender 1
  • if sex "Female" then gender 2
  • inter genderprevexp
  • cards
  • symbol1 colorblue interpolr1 valuenone
  • symbol2 colorblack interpolr2 valuenone
  • Proc Sort by gender
  • Proc gplot
  • plot salary prevexpgender
  • Proc corr
  • Var salary prevexp
  • By gender
  • run

25
(No Transcript)
26
Correlations by gender
  • ------------------- gender1 ------------------
  • Pearson Correlation Coefficients, N 258
  • Prob gt r under H0 Rho0
  • salary
    prevexp
  • salary 1.00000
    -0.20208

  • 0.0011
  • prevexp -0.20208 1.00000
  • 0.0011
  • ----------------- gender2 -----------------
  • Pearson Correlation Coefficients, N 216
  • Prob gt r under H0 Rho0
  • salary
    prevexp
  • salary 1.00000
    -0.21958

  • 0.0012
  • prevexp -0.21958 1.00000
  • 0.0012

27
In Class Examples
  • Download data8lab. Compute a partial correlation
    between iq and age controlling for knldge.
  • Download data8lab. Regress iq on knwldge and
    age. Then run the regression again and include
    the interaction term of knwldge and age.
  • Download dataset assign10.txt and check for
    multicollinearity.
Write a Comment
User Comments (0)
About PowerShow.com