Regression

About This Presentation

Title:

Regression

Description:

Sample Data and Analysis-Car weight and mpg. J:PSYCHMARTYCmpu3103Ch13-Corr&RegRegEx.sav ... reg line plus marginal info. Residual plot with 'plain' ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 37

Provided by: martyg

Category:

Tags: regression

more less

Transcript and Presenter's Notes

Title: Regression

1
Regression
2
(No Transcript)
3
Regression

Reading Assignments
TG Online http//mhhe.com/thorne4
Ch. 13 Overview, Defs., Formulas, SPSS
Your favorite stats book chapter on correlation
and regression
GS Lesson 31, 33
Multiple Regression and Partial Correlation
(After Break)
GS Lessons 34, 32 (Read over break)
Howell, Chs. 9, 15
Garson online on Multiple Regression (MR) and
ref.
http//www2.chass.ncsu.edu/garson/PA765/regress.ht
m

4
(No Transcript)
5
Linear Regression

What determines a line? (Two things.)
In geometry?
What else?
The regression equation you already know-
Fitting the regression line on the scatterplot
Y-hat
The key deviation in regression
The least squares line of best fit
Also called OLS estimation OLS
Means what?
Minimum of the sum of the squared deviations of
each y(i) from the regression line
Formulas guarantee this.

6
(No Transcript)
7
Simple Regression

Sample Data and Analysis-Car weight and mpg
J\PSYCH\MARTY\Cmpu3103\Ch13-CorrReg\RegEx.sav
http//highered.mcgraw-hill.com/sites/0072832517/s
tudent_view0/chapter13/spss_exercises.html
Regression Model (Parameters)
Yi ß0 ß1Xi ei
Yi is the response variable on the ith trial
Betas are parameters
Xi is known value of the independent variable in
the ith trial
ei is a random error term (residual) with mean
0 and variance sigma-squared
ei and ej are uncorrelated (cov(ei, ej) 0) for
all i, j, i not j
Later will add that eis are Normal
What does random imply? Where is variability?
Uncorrelated errors means?

8
Regression coefficients parameters, estimates

Parameters for (unstandardized) coefficients
ß0, ß1
Estimates (Unstandardized) b0, b1
B column in SPSS
Standardized estimates
Called beta coefficients (beta weights)
Neter and Wassserman (reg stat book) use B
Green uses Z (other variations)
Beta column in output estimates!!

9
Regression

Regression Simulation
Assumptions (GS)
Fixed-Effects
Experimental study where you (the E) exercises
some control over the IV values (predictor).
Some number of participants get treatments at
different levels.
5, 10, 15, 20 mg. of caffeine on digit span
Linear or non-linear relations possible
Assumption 1 DV is normal in the population for
each level of the IV.
Assumption 2 Population variances of the DV are
the same for all levels of the IV

10
(No Transcript)
11
Assumptions

Assumption 3
Case is a random sample
Scores are independent of each other from one
individual to the next (already stated)
Random-Effects Model
Non-Experimental Study just go out and measure
whats there
A1 X and Y are bivariate normal in the
population
Sketch
If true, only relationship is linear
Usually unrealistic
A2 Case is a random sample and scores on each
variable are independent of other scores on the
same variable
Same as in Fixed-model

12
Effect sizes

Assumptions GS
Based on R and R2
Rs of .10, .30, and .50 are small to large
R2 s

13
Path Diagrams
Correlation
r
x
y
14
Doing Normal Statistics
T-Test
x
y
15
Doing Normal Statistics
Simple Regression
y
x
16
Weight and MPG

Never run a regression without examination of the
scatterplot.
Never run a regression or multiple regression
without examination of the residual plot
Residuals vs. predictor (in some form usually
standardized)
Statisticians debate about the best residual to
use.
Residual plots allow checking of assumptions
Run scatterplot and regression
J\PSYCH\MARTY\Cmpu3103\Ch13-CorrReg\RegEx.sav
Examine plots and outputs

17
GS Ex. 1

Bobo doll analysis
J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\GreenSalki
nd\GreenSalkind\Lesson 33\Lesson 33 Exercise File
1.sav
Run scatterplot and regression examine plots and
outputs.
Assignment to run the above Ex 1 file with and
without the two high scores and compare the
results by writing an apa results section (with
residual plot) both ways and comment on the
differences.
SAS run next

Data from GS Lesson 33 Ex1 Bobo doll modeling
data.
Data GSL33Ex2 Data can be pasted in from SPSS
Editor window
Input bobo peer
datalines
1 4
0 1
2 1
1 2
18 59
1 2
2 3
22 38
0 2
2 4
Symbol valuedot IR / This changes default
symbol to dot and I(interpolation to regression
line./
Proc Gplot Data GSL33Ex2
Plot peerbobo / regeqn

19
Annotation of SAS Output

Model 1 DV peer
Anova table
Model1 (sometimes called Regression) tests the
h0 that there is no (multiple) linear regression
relationship between the DV (peer aggression) the
IV (Bobo hits).
There is a significant simple linear relationship
between amount of peer aggression and witnessed
Bobo aggression, F(1,8) 51.20, MSE 61.39,p lt
.0001, R2-adj .848.
Note MSE estimates error or residual variance
and is an index of error of prediction look at
to compare models.
R2-adj is preferred over R2 as the basic effect
size measure. Why R2-adj estimates
population rho(Y,Y-hat).

20
SAS Output

Parameter Estimates
Estimated Unstandardized regression coefficients
Write model
H0s for t tests, notation for standard errors
T and F relationship in simple regression
Estimated standardized coefficients
Scatterplot with reg line plus marginal info.
Residual plot with plain residual

21
What are residuals?

How can we get them (compute)? Sas Output stmt
OUTSAS data set gives the name of the new data
set. By default, the procedure uses the DATAn
convention to name the new data set. In the
output data set, the first variable listed after
a keyword in the OUTPUT statement contains that
statistic for the first dependent variable listed
in the MODEL statement the second variable
contains the statistic for the second dependent
variable in the MODEL statement, and so on. The
list of variables following the equal sign can be
shorter than the list of dependent variables in
the MODEL statement. In this case, the procedure
creates the new names in order of the dependent
variables in the MODEL statement. For example,
the SAS statements
proc reg dataa
model y zx1 x2
output outb
pyhat zhat
ryresid zresid
run
create an output data set named b. In addition to
the variables in the input data set, b contains
the following variables yhat, with values that
are predicted values of the dependent variable y
zhat, with values that are predicted values of
the dependent variable z
yresid, with values that are the residual values
of y
zresid, with values that are the residual values
of z
You can specify the following keywords in the
OUTPUT statement. See the "Model Fit and
Diagnostic Statistics" section for computational
formulas.
Table 61.3 Keywords for OUTPUT Statement

22
In Spss

Spss syntax
What do residuals represent?
In path diagrams

23
Multiple Regression

Find a linear combination of two or more
independent variables (predictors, explanatory
variables) that has the highest correlation with
some DV or criterion variable.
All variables are assumed quantitative (for now).
Y b0 b1X1 b2X2 ... bkXk e
bs are regression coefficients
X is the value of some IV
e is a residual the difference between Y Y-hat

24
The Best Model

Find the values of the bs that minimize the
squared residuals.
Model of Observed Score, Y
Y Y e
Where Y is the model fitted value, and
Y b0 b1X1 b2X2 ... bkXk
Data Model Residual
Mimimize the residual

25
Uses of MR

Predict the DV from multiple Ivs
Although we use prediction language,
social-behavioral sciences usually dont do much
sheer prediction.
Strength of association between DV and set of
Ivs.
Explanation - Measure correlation between on
variable and a set of other variables use
variance explained language.
Is the prediction or explanation statistically
significant?
Is some variance accounted for? How much? (Effect
size)

26
Assumptions Lite

Linear association between the variables in the
linear combination and the DV
Examine scatterplots between each IV and DV
Scatterplot matrix
Values of the Ivs are measured without error!
Residuals (es) are normally distributed and all
independent and have equal variances along all
values of the variate.
Normality and HOV of arrays (previous diagram)

27
Choosing Predictors(Model Identification)

Your theory tells you A priori selection of Ivs
Sequential (hierarchical entry) of Ivs
Sequence is theory-driven
Allows evaluation of explained variance over and
above variables already in the model
Example from Green
Stepwise (for nonthinkers)
Program chooses best predictors
Dont do (or ever tell)
Stepwise is Unwise a fishing expedition
All possible subsets (best 2 predictors best 3,
)
For applications in sheer prediction where we
dont care what the predictors are.

28
Interpretation Issues

Did I leave something out thats really
important? (Or include junk predictors)
Specification Error
How would you know?
Will the TV work outside the store
Issue of generalizability beyond the present
sample.
Naturally, your model works well with your data,
on which the model was derived
What about new data?
Is the R2 going to hold up with new data?
Coefficients still good?
All models work well on their data.
Stepwise and all possible subsets methods
capitalize on chance.
Models need to be Validated (somehow)

29
Validation Procedures

Cross-validation with new data
Resampling methods (new, but not really)
Calculating shrinkage of R2

30
Green - Issues and Examples

Research Questions
1. How accurately can a physical injury index be
predicted from a linear combination of strength
measures for elderly women?
Really explanation (accounting for variance)
rather than actual prediction. Typical language.
Accounting for variance in what?
Accounting for variance with what?
Model is Injury Quads Gluts Abdoms Arms Grip
All predictors put in the model
Is there significant variance explained? How
much? (effect size)
What predictors are doing the work (effective,
important)?
J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\GreenSalki
nd\GreenSalkind\Lesson 34\Lesson 34 Data File
1.sav

GET
FILE'J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\Gr
eenSalkind\GreenSalkind\L'
'esson 34\Lesson 34 Data File 1.sav'.
DATASET NAME DataSet1 WINDOWFRONT.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA COLLIN TOL
ZPP
/CRITERIAPIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT injury
/METHODENTER quads gluts abdoms arms grip
/SCATTERPLOT(SDRESID ,ZPRED ) .
Highlights of syntax
Examination of output
Correlation matrix Promise and Problems?
If the predictors have explanatory power, we
expect some correlations with the DV. Have?
THE BIG PROBLEM IN MULTIPLE REGRESSION
PREDICTORS ARE USUALLY CORRELATED

32
Checking Results

Assessment of correlation matrix
Significant relationship and effect size?
Write the regression equation and standardized
equation
Path model (shows correlations of predictors)
Relative importance of Predictors
Which predictors are statistically significant
Examine standardized coefficients (beta weights),
partial and part correlations.
Partial and part correlation
Partial correlation is the correlation between
the DV and a particular IV controlling for all
the other predictors on both the DV and the
particular IV
Part correlation is the correlation between the
DV and a particular IV controlling for all other
predictors on just the particular IV
Part correlation is actually the best because
it reflects the unique contribution of a
particular predictor .
Standardized coefficients, partial and part
correlations will usually indicate the same thing
as to relative importance.
The squared part correlation is a good effect
size measure for each predictor

Multicollinearity problem of correlated
predictors
Problem is Tolerance is lt .10 (Mertler, very
liberal)
Leech, Barrett, Morgan (2008) suggest lt 1- R2
Check residual plot for patterns indicating
assumption violation (what assumptions?)
Patterns in the residual plot could indicate
Nonnormality of residuals
Residuals correlated with the IVs (via Y-hat)
Variances of residuals are not constant across
values of Y-hat (or Y)
Normality of residuals? Histogram Normal plot
Are we done yet?

34
Other Types of Research Questions

2. Unordered sets of predictors
How well do the lower-body strength measures
predict the total injury index for elderly women?
How well do the upper-body strength measures
predict the total injury index?
How well do the lower-body measures predict over
and above the upper body measures and vice
versa?
Because predictors are correlated, contributions
depend on what predictors are already in the
model! Plus we have no theory to guide us as to
which to enter first.
Run multiple models
Lower and upper separately
Lower then upper and look at R2 change
Upper then lower and look at R2 change
Discuss incremental contributions in Results

3. Ordered sets of predictors The order is based
on some theory or past research or informed
judgement
How well do previous medical difficulties and age
predict total injuries for elderly women?
How well do the strength measures predict total
injuries controlling for previous medical
difficulties and age? -- !!
Run the control model then add strength measures
and look at change in R2
Examine Greens APA Results sections carefully
and Tips for MR