Regression - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Regression

Description:

Sample Data and Analysis-Car weight and mpg. J:PSYCHMARTYCmpu3103Ch13-Corr&RegRegEx.sav ... reg line plus marginal info. Residual plot with 'plain' ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 37
Provided by: martyg
Category:
Tags: regression

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
2
(No Transcript)
3
Regression
  • Reading Assignments
  • TG Online http//mhhe.com/thorne4
  • Ch. 13 Overview, Defs., Formulas, SPSS
  • Your favorite stats book chapter on correlation
    and regression
  • GS Lesson 31, 33
  • Multiple Regression and Partial Correlation
    (After Break)
  • GS Lessons 34, 32 (Read over break)
  • Howell, Chs. 9, 15
  • Garson online on Multiple Regression (MR) and
    ref.
  • http//www2.chass.ncsu.edu/garson/PA765/regress.ht
    m

4
(No Transcript)
5
Linear Regression
  • What determines a line? (Two things.)
  • In geometry?
  • What else?
  • The regression equation you already know-
  • Fitting the regression line on the scatterplot
  • Y-hat
  • The key deviation in regression
  • The least squares line of best fit
  • Also called OLS estimation OLS
  • Means what?
  • Minimum of the sum of the squared deviations of
    each y(i) from the regression line
  • Formulas guarantee this.

6
(No Transcript)
7
Simple Regression
  • Sample Data and Analysis-Car weight and mpg
  • J\PSYCH\MARTY\Cmpu3103\Ch13-CorrReg\RegEx.sav
  • http//highered.mcgraw-hill.com/sites/0072832517/s
    tudent_view0/chapter13/spss_exercises.html
  • Regression Model (Parameters)
  • Yi ß0 ß1Xi ei
  • Yi is the response variable on the ith trial
  • Betas are parameters
  • Xi is known value of the independent variable in
    the ith trial
  • ei is a random error term (residual) with mean
    0 and variance sigma-squared
  • ei and ej are uncorrelated (cov(ei, ej) 0) for
    all i, j, i not j
  • Later will add that eis are Normal
  • What does random imply? Where is variability?
    Uncorrelated errors means?

8
Regression coefficients parameters, estimates
  • Parameters for (unstandardized) coefficients
  • ß0, ß1
  • Estimates (Unstandardized) b0, b1
  • B column in SPSS
  • Standardized estimates
  • Called beta coefficients (beta weights)
  • Neter and Wassserman (reg stat book) use B
  • Green uses Z (other variations)
  • Beta column in output estimates!!

9
Regression
  • Regression Simulation
  • Assumptions (GS)
  • Fixed-Effects
  • Experimental study where you (the E) exercises
    some control over the IV values (predictor).
  • Some number of participants get treatments at
    different levels.
  • 5, 10, 15, 20 mg. of caffeine on digit span
  • Linear or non-linear relations possible
  • Assumption 1 DV is normal in the population for
    each level of the IV.
  • Assumption 2 Population variances of the DV are
    the same for all levels of the IV

10
(No Transcript)
11
Assumptions
  • Assumption 3
  • Case is a random sample
  • Scores are independent of each other from one
    individual to the next (already stated)
  • Random-Effects Model
  • Non-Experimental Study just go out and measure
    whats there
  • A1 X and Y are bivariate normal in the
    population
  • Sketch
  • If true, only relationship is linear
  • Usually unrealistic
  • A2 Case is a random sample and scores on each
    variable are independent of other scores on the
    same variable
  • Same as in Fixed-model

12
Effect sizes
  • Assumptions GS
  • Based on R and R2
  • Rs of .10, .30, and .50 are small to large
  • R2 s

13
Path Diagrams
Correlation
r
x
y
14
Doing Normal Statistics
T-Test
x
y
15
Doing Normal Statistics
Simple Regression
y
x
16
Weight and MPG
  • Never run a regression without examination of the
    scatterplot.
  • Never run a regression or multiple regression
    without examination of the residual plot
  • Residuals vs. predictor (in some form usually
    standardized)
  • Statisticians debate about the best residual to
    use.
  • Residual plots allow checking of assumptions
  • Run scatterplot and regression
  • J\PSYCH\MARTY\Cmpu3103\Ch13-CorrReg\RegEx.sav
  • Examine plots and outputs

17
GS Ex. 1
  • Bobo doll analysis
  • J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\GreenSalki
    nd\GreenSalkind\Lesson 33\Lesson 33 Exercise File
    1.sav
  • Run scatterplot and regression examine plots and
    outputs.
  • Assignment to run the above Ex 1 file with and
    without the two high scores and compare the
    results by writing an apa results section (with
    residual plot) both ways and comment on the
    differences.
  • SAS run next

18
  • Data from GS Lesson 33 Ex1 Bobo doll modeling
    data.

  • Data GSL33Ex2 Data can be pasted in from SPSS
    Editor window
  • Input bobo peer
  • datalines
  • 1 4
  • 0 1
  • 2 1
  • 1 2
  • 18 59
  • 1 2
  • 2 3
  • 22 38
  • 0 2
  • 2 4
  • Symbol valuedot IR / This changes default
    symbol to dot and I(interpolation to regression
    line./
  • Proc Gplot Data GSL33Ex2
  • Plot peerbobo / regeqn

19
Annotation of SAS Output
  • Model 1 DV peer
  • Anova table
  • Model1 (sometimes called Regression) tests the
    h0 that there is no (multiple) linear regression
    relationship between the DV (peer aggression) the
    IV (Bobo hits).
  • There is a significant simple linear relationship
    between amount of peer aggression and witnessed
    Bobo aggression, F(1,8) 51.20, MSE 61.39,p lt
    .0001, R2-adj .848.
  • Note MSE estimates error or residual variance
    and is an index of error of prediction look at
    to compare models.
  • R2-adj is preferred over R2 as the basic effect
    size measure. Why R2-adj estimates
    population rho(Y,Y-hat).

20
SAS Output
  • Parameter Estimates
  • Estimated Unstandardized regression coefficients
  • Write model
  • H0s for t tests, notation for standard errors
  • T and F relationship in simple regression
  • Estimated standardized coefficients
  • Scatterplot with reg line plus marginal info.
  • Residual plot with plain residual

21
What are residuals?
  • How can we get them (compute)? Sas Output stmt
  • OUTSAS data set gives the name of the new data
    set. By default, the procedure uses the DATAn
    convention to name the new data set. In the
    output data set, the first variable listed after
    a keyword in the OUTPUT statement contains that
    statistic for the first dependent variable listed
    in the MODEL statement the second variable
    contains the statistic for the second dependent
    variable in the MODEL statement, and so on. The
    list of variables following the equal sign can be
    shorter than the list of dependent variables in
    the MODEL statement. In this case, the procedure
    creates the new names in order of the dependent
    variables in the MODEL statement. For example,
    the SAS statements
  • proc reg dataa
  • model y zx1 x2
  • output outb
  • pyhat zhat
  • ryresid zresid
  • run
  • create an output data set named b. In addition to
    the variables in the input data set, b contains
    the following variables yhat, with values that
    are predicted values of the dependent variable y
  • zhat, with values that are predicted values of
    the dependent variable z
  • yresid, with values that are the residual values
    of y
  • zresid, with values that are the residual values
    of z
  • You can specify the following keywords in the
    OUTPUT statement. See the "Model Fit and
    Diagnostic Statistics" section for computational
    formulas.
  • Table 61.3 Keywords for OUTPUT Statement

22
In Spss
  • Spss syntax
  • What do residuals represent?
  • In path diagrams

23
Multiple Regression
  • Find a linear combination of two or more
    independent variables (predictors, explanatory
    variables) that has the highest correlation with
    some DV or criterion variable.
  • All variables are assumed quantitative (for now).
  • Y b0 b1X1 b2X2 ... bkXk e
  • bs are regression coefficients
  • X is the value of some IV
  • e is a residual the difference between Y Y-hat

24
The Best Model
  • Find the values of the bs that minimize the
    squared residuals.
  • Model of Observed Score, Y
  • Y Y e
  • Where Y is the model fitted value, and
  • Y b0 b1X1 b2X2 ... bkXk
  • Data Model Residual
  • Mimimize the residual

25
Uses of MR
  • Predict the DV from multiple Ivs
  • Although we use prediction language,
    social-behavioral sciences usually dont do much
    sheer prediction.
  • Strength of association between DV and set of
    Ivs.
  • Explanation - Measure correlation between on
    variable and a set of other variables use
    variance explained language.
  • Is the prediction or explanation statistically
    significant?
  • Is some variance accounted for? How much? (Effect
    size)

26
Assumptions Lite
  • Linear association between the variables in the
    linear combination and the DV
  • Examine scatterplots between each IV and DV
  • Scatterplot matrix
  • Values of the Ivs are measured without error!
  • Residuals (es) are normally distributed and all
    independent and have equal variances along all
    values of the variate.
  • Normality and HOV of arrays (previous diagram)

27
Choosing Predictors(Model Identification)
  • Your theory tells you A priori selection of Ivs
  • Sequential (hierarchical entry) of Ivs
  • Sequence is theory-driven
  • Allows evaluation of explained variance over and
    above variables already in the model
  • Example from Green
  • Stepwise (for nonthinkers)
  • Program chooses best predictors
  • Dont do (or ever tell)
  • Stepwise is Unwise a fishing expedition
  • All possible subsets (best 2 predictors best 3,
    )
  • For applications in sheer prediction where we
    dont care what the predictors are.

28
Interpretation Issues
  • Did I leave something out thats really
    important? (Or include junk predictors)
  • Specification Error
  • How would you know?
  • Will the TV work outside the store
  • Issue of generalizability beyond the present
    sample.
  • Naturally, your model works well with your data,
    on which the model was derived
  • What about new data?
  • Is the R2 going to hold up with new data?
    Coefficients still good?
  • All models work well on their data.
  • Stepwise and all possible subsets methods
    capitalize on chance.
  • Models need to be Validated (somehow)

29
Validation Procedures
  • Cross-validation with new data
  • Resampling methods (new, but not really)
  • Calculating shrinkage of R2

30
Green - Issues and Examples
  • Research Questions
  • 1. How accurately can a physical injury index be
    predicted from a linear combination of strength
    measures for elderly women?
  • Really explanation (accounting for variance)
    rather than actual prediction. Typical language.
  • Accounting for variance in what?
  • Accounting for variance with what?
  • Model is Injury Quads Gluts Abdoms Arms Grip
  • All predictors put in the model
  • Is there significant variance explained? How
    much? (effect size)
  • What predictors are doing the work (effective,
    important)?
  • J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\GreenSalki
    nd\GreenSalkind\Lesson 34\Lesson 34 Data File
    1.sav

31
  • GET
  • FILE'J\PSYCH\MARTY\4123\GreenSalkin5EdDATA\Gr
    eenSalkind\GreenSalkind\L'
  • 'esson 34\Lesson 34 Data File 1.sav'.
  • DATASET NAME DataSet1 WINDOWFRONT.
  • REGRESSION
  • /DESCRIPTIVES MEAN STDDEV CORR SIG N
  • /MISSING LISTWISE
  • /STATISTICS COEFF OUTS CI R ANOVA COLLIN TOL
    ZPP
  • /CRITERIAPIN(.05) POUT(.10)
  • /NOORIGIN
  • /DEPENDENT injury
  • /METHODENTER quads gluts abdoms arms grip
  • /SCATTERPLOT(SDRESID ,ZPRED ) .
  • Highlights of syntax
  • Examination of output
  • Correlation matrix Promise and Problems?
  • If the predictors have explanatory power, we
    expect some correlations with the DV. Have?
  • THE BIG PROBLEM IN MULTIPLE REGRESSION
  • PREDICTORS ARE USUALLY CORRELATED

32
Checking Results
  • Assessment of correlation matrix
  • Significant relationship and effect size?
  • Write the regression equation and standardized
    equation
  • Path model (shows correlations of predictors)
  • Relative importance of Predictors
  • Which predictors are statistically significant
  • Examine standardized coefficients (beta weights),
    partial and part correlations.
  • Partial and part correlation
  • Partial correlation is the correlation between
    the DV and a particular IV controlling for all
    the other predictors on both the DV and the
    particular IV
  • Part correlation is the correlation between the
    DV and a particular IV controlling for all other
    predictors on just the particular IV
  • Part correlation is actually the best because
    it reflects the unique contribution of a
    particular predictor .
  • Standardized coefficients, partial and part
    correlations will usually indicate the same thing
    as to relative importance.
  • The squared part correlation is a good effect
    size measure for each predictor

33
  • Multicollinearity problem of correlated
    predictors
  • Problem is Tolerance is lt .10 (Mertler, very
    liberal)
  • Leech, Barrett, Morgan (2008) suggest lt 1- R2
  • Check residual plot for patterns indicating
    assumption violation (what assumptions?)
  • Patterns in the residual plot could indicate
  • Nonnormality of residuals
  • Residuals correlated with the IVs (via Y-hat)
  • Variances of residuals are not constant across
    values of Y-hat (or Y)
  • Normality of residuals? Histogram Normal plot
  • Are we done yet?

34
Other Types of Research Questions
  • 2. Unordered sets of predictors
  • How well do the lower-body strength measures
    predict the total injury index for elderly women?
  • How well do the upper-body strength measures
    predict the total injury index?
  • How well do the lower-body measures predict over
    and above the upper body measures and vice
    versa?
  • Because predictors are correlated, contributions
    depend on what predictors are already in the
    model! Plus we have no theory to guide us as to
    which to enter first.
  • Run multiple models
  • Lower and upper separately
  • Lower then upper and look at R2 change
  • Upper then lower and look at R2 change
  • Discuss incremental contributions in Results

35
  • 3. Ordered sets of predictors The order is based
    on some theory or past research or informed
    judgement
  • How well do previous medical difficulties and age
    predict total injuries for elderly women?
  • How well do the strength measures predict total
    injuries controlling for previous medical
    difficulties and age? -- !!
  • Run the control model then add strength measures
    and look at change in R2
  • Examine Greens APA Results sections carefully
    and Tips for MR

36
Assignment
  • GS Exercises 1 5. Include a residual plot with
    your Results sections.
Write a Comment
User Comments (0)
About PowerShow.com