MARE 250 - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

MARE 250

Description:

MARE 250. Dr. Jason Turner. Multiple Regression. y. Linear Regression. y = b0 b1x ... In the simplest case - one dependent and one independent variable ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 15
Provided by: drjason1
Category:
Tags: mare | mare

less

Transcript and Presenter's Notes

Title: MARE 250


1
Multiple Regression
MARE 250 Dr. Jason Turner
2
Linear Regression
y b0 b1x
y dependent variable b0 b1 are
constants b0 y intercept b1 slope x
independent variable
Urchin density b0 b1(salinity)
3
Multiple Regression
Multiple regression allows us to learn more about
the relationship between several independent or
predictor variables and a dependent or criterion
variable For example, we might be looking for a
reliable way to estimate the age of AHI at the
dock instead of waiting for laboratory analyses
y b0 b1x
y b0 b1x1 b2x2 bnxn
4
Multiple Regression
In the social and natural sciences multiple
regression procedures are very widely used in
research Multiple regression allows the
researcher to ask what is the best predictor of
...? For example, educational researchers might
want to learn what are the best predictors of
success in high-school Psychologists may want to
determine which personality variable best
predicts social adjustment Sociologists may want
to find out which of the multiple social
indicators best predict whether or not a new
immigrant group will adapt and be absorbed into
society.
5
Multiple Regression
The general computational problem that needs to
be solved in multiple regression analysis is to
fit a straight line to a number of points
                                             
In the simplest case - one dependent and one
independent variable Can be visualized this in
a scatterplot
6
The Regression Equation
A line in a two dimensional or two-variable space
is defined by the equation YabX the animation
below shows a two dimensional regression equation
plotted with three different confidence intervals
(90, 95 99)      
In the multivariate case, when there is more than
one independent variable, the regression line
cannot be visualized in the two dimensional
space, but can be computed rather easily
7
Residual Variance and R-square
The smaller the variability of the residual
values around the regression line relative to the
overall variability, the better is our
prediction Coefficient of determination (r2) - If
we have an R-square of 0.4 we have explained 40
of the original variability, and are left with
60 residual variability. Ideally, we would like
to explain most if not all of the original
variability Therefore - r2 value is an indicator
of how well the model fits the data (e.g., an r2
close to 1.0 indicates that we have accounted for
almost all of the variability with the variables
specified in the model
8
Assumptions, Assumptions
Assumption of Linearity It is assumed that the
relationship between variables is linear -
always look at bivariate scatterplot of the
variables of interest Normality Assumption It is
assumed in multiple regression that the residuals
(predicted minus observed values) are distributed
normally (i.e., follow the normal
distribution) Most tests (specifically the
F-test) are quite robust with regard to
violations of this assumption Review the
distributions of the major variables with
histograms
9
Effects of Outliers
Outliers may be influential observations
A data point whose removal causes the regression
equation (line) to change considerably Consider
removal much like an outlier If no explanation
up to researcher

10
Stepwise Regression When is too much too much
  • Building Models via Stepwise Regression
  • Stepwise model-building techniques for regression
  • The basic procedures involve
  • identifying an initial model
  • iteratively "stepping," that is, repeatedly
    altering the model at the previous step by adding
    or removing a predictor variable in accordance
    with the "stepping criteria,"
  • terminating the search when stepping is no
    longer possible given the stepping criteria

11
For Example
We are interested in predicting values for Y
based upon several XsAge of AHI based upon SL,
BM, OP, PF We run multiple regression and get
the equation Age - 2.64 0.0382 SL 0.209
BM 0.136 OP 0.467 PF We then run a STEPWISE
regression to determine the best subset of these
variables
12
How does it work
Response is Age
S B O P Vars R-Sq R-Sq(adj) C-p
S L M P F 1 77.7 77.4 8.0
0.96215 X 1 60.3 59.8
76.6 1.2839 X 2 78.9 78.3
5.4 0.94256 X X 2 78.6 78.0
6.6 0.94962 X X 3 79.8 79.1
3.6 0.92641 X X X 3 79.1 78.3
6.5 0.94353 X X X 4 80.0
79.0 5.0 0.92897 X X X X
13
How does it work
Stepwise Regression Age versus SL, BM, OP, PF
Alpha-to-Enter 0.15 Alpha-to-Remove
0.15 Response is Age on 4 predictors, with N
84 Step 1 2
3 Constant -0.8013 -1.1103 -5.4795 BM
0.355 0.326 0.267 T-Value
16.91 13.17 6.91 P-Value 0.000
0.000 0.000 OP
0.096 0.101 T-Value 2.11
2.26 P-Value 0.038
0.027 SL
0.087 T-Value
1.96 P-Value
0.053 S 0.962 0.943
0.926 R-Sq 77.71 78.87
79.84 R-Sq(adj) 77.44 78.35
79.08 Mallows C-p 8.0 5.4 3.6
14
Who Cares?
Stepwise analysis allows you (i.e. computer) to
determine which predictor variables (or
combination of) best explain (can be used to
predict) Y Much more important as number of
predictor variables increase Helps to make
better sense of complicated multivariate data
Write a Comment
User Comments (0)
About PowerShow.com