Title: Prediction
1Prediction
- Confidence Intervals, Cross-validation, and
Predictor Selection
2Skill Set
- Why is the confidence interval for an individual
point larger than for the regression line? - Describe the steps in forward (backward,
stepwise, blockwise, all possible regressions)
predictor selection.
- What is cross-validation? Why is it important?
- What are the main problems as far as R-square and
prediction are concerned with forward (backward,
stepwise, blockwise, all possible regressions)
3Prediction v. Explanation
- Prediction is important for practice
- WWII pilot training
- Ability tests, e.g., eye-hand coordination
- Built an airplane that flew
- Fear of heights
- Favorite flavor ice cream
- Age and driving accidents
- Explanation is crucial for theory. Highly
correlated vbls may not help predict, but may
help explain. Team outcomes as function of team
resources and team backup.
4Confidence Intervals
CI for the line, i.e., the mean score
Note shape.
MSR. N sample size. The df are for MSR
(variance of residuals).
CI for a single persons score
5Computing Confidence Intervals
Suppose
Find CI for line (mean) at X1.
df N-k-1 20-1-1 18.
CI 3.81 to 7.79
For an individual at X1, what is the CI?
CI .29 to 11.31
6Review
Why is the confidence interval for the individual
wider than a similar interval for the regression
line?
Why are the confidence intervals regression
curved instead of being straight lines?
7Shrinkage
R2 is biased (sample value is too large) because
of capitalizing on chance to minimize SSe in
sample.
If the population value of R2 is zero, the
expected value in the sample is R2 k/(N-1) where
k is the number of predictors and N is the number
of people in the sample. If you have many
predictors, you can make R2 as large as you want.
What is the expected value of R-square if N
101 and k 10? Ethical issue here.
Common adjustment or shrinkage formula
This is reported by SAS (PROC REG) under Adj
R-Sq. Adjusts for both k and N and size of
initial R2.
8Shrinkage Examples
Suppose R2 is .6405 with k 4 predictors and a
sample size of 30. Then
R2 .6405 N Adj R2
15 .497
30 .583
100 .625
R2 .30 N Adj R2
15 .020
30 .188
100 .271
Note small N means lots of shrinkage but also
smaller initial R2 shrinks more.
9Cross-Validation
- Compute a and b(s) (can have one or more IVs) on
initial sample. - Find new sample, do not estimate a and b, but use
a and b to find Y. - Compute correlation between Y and Y in new
sample square. Ta da! Cross- validation R2. - Cross-validation R2 does not capitalize on chance
and estimates operational R2.
10Cross-validation (2)
- Double cross-validation
- Data splitting
- Expert judgment weights (dont try this at home)
- Math Estimates
Fixed
Random
11Review
- What is shrinkage in the context of multiple
regression? What are the things that affect the
expected amount of shrinkage? - What is cross-validation? Why is it important?
12Predictor Selection
- Widely misunderstood and widely misused.
- Algorithms labeled forward, backward, stepwise,
etc. - NEVER use for work involving theory or
explanation (hint this clearly means your thesis
and dissertation). - NEVER use for estimating importance of variables.
- Use SOLELY for economy (toss predictors).
13All Possible Regressions
Data from Pedhazur example.
GPA is grade point average. GREQ is Graduate
Record Exam, Quantitative. GREV is GRE Verbal.
MAT is Miller Analogies Test. AR is Arithmetic
Reasoning test.
14All Possible Regressions (2)
Note how easy it is to choose the model with the
highest R2 for any given number of predictors.
In predictor selection, you also need to worry
about cost. You get both V and Q GRE in one test.
Also consider what change in R2 means. Accuracy
in prediction of dropout.
15Predictor Selection Algorithms
- Forward build up from start with p value. End
when no variables meet PIN. May include duds. - Backward Start with all vbls and pull out with
POUT. May lose gems. - Stepwise Start forward, check backward at each
step. Not guaranteed to give best R2. - Blockwise not used much. Forward by blocks,
then any method (eg stepwise) within block to
choose best predictors.
16Things to Consider in PS
- Algorithms consider statistical significance, but
you have to consider practical significance and
cost, i.e., algorithms dont work well. - Surviving variables are often there by chance.
Do the analysis again and you would choose a
different set. OK for prediction. - The value of correlated variables is quite
different when considered in path analysis and
SEM.
17Hierarchical Regression
- Alternative to predictor selection algorithms
- Theory based (a priori) tests of increments to
R-square
18Example of Hierarchical Reg
Does personality increase prediction of med
school success beyond that afforded by cognitive
ability? Collect data on 250 med students for
first two years.
Model 1
R2.10 , plt.05
Model 2
R2.13 , plt.05
Model test
F(2,245)4.22, p lt .05
19Review
- Describe the steps in forward (backward,
stepwise, blockwise, all possible regressions)
predictor selection. - What are the main problems as far as R-square and
prediction are concerned with forward (backward,
stepwise, blockwise, all possible regressions) - Why avoid predictor selection algorithms when
doing substantive research (when you want to
explain variance in the DV)?