Title: Football in the 90s
1Football in the 90s
- Curtis Olswold
- University of Iowa
- 22S Honors Project
2Sunday Afternoon Ritual
- A teams strategy, or tactical approach to a game
is not unique. - There are three dominant types of offense that
exist in the NFL - 1) Pass oriented
- 2) Run oriented
- 3) Balanced (Run and Pass oriented)
3(No Transcript)
4Purpose
- Construct a model to predict the points a team
scores. - Determine the probability that a team wins given
certain factors. - Investigate whether or not there exists a
significant difference in the points a team
scores by year and week.
5Why Am I Doing This?
- Determine what and how certain variables affect
the amount of points a team scores - How effective the variables are at determining
the outcome of the game - Win or Lose
- Rules change almost annually in the NFL to
increase the amount of points a team scores, is
it really working?
6The Variables
- Score
- Rushing Yards
- Passing Yards
- Completions
- Outcome
- Passing Attempts
- Rushing Attempts
- Interceptions
- Fumbles
7The Sample
- A random sample was drawn from the population of
every regular season week from the 1990 season to
the 1999 season. - Individual team names were not identified
- From each week of each year, a sample of 5 teams
were randomly chosen. This gave a sample of 850
observations.
8Regression Model
- Score 4.92593
- 0.15441 (Rushing Attempts)
- 0.05858 (Rushing Yards)
- - 0.43734 (Passing Attempts)
- 0.20470 (Pass Completions)
- 0.08080 (Passing Yards)
- - 0.56074 (Intereceptions)
- - 1.09088 (Fumbles)
9Statistics of the Model
- R2 is the proportion of variability in Score that
is explained by the model. - Adjusted R2 is a measure of how efficient the
predictor variables are Penalizes for
overcomplicating the model. - For this model
- R-Square 0.5020
- Adj R-Sq 0.4978
- This indicates the model explains over 50 of the
variability in score and is not overly complex
10Significance of Predictors
Parameter Standard Variable DF
Estimate Error t Value Pr t
Intercept 1 4.92593 1.66448
2.96 0.0032 ratt 1 0.15441
0.04725 3.27 0.0011 ryds
1 0.05858 0.00785 7.47
0.06043 -7.24 0.20470 0.10039 2.04 0.0418
pyds 1 0.08080 0.00534
15.12 0.23748 -2.36 0.0184 fumble
1 -1.09088 0.25782 -4.23
11Interpretation of Significance
- Every parameter is significantly different from
zero. This means that each of the variables
constructively adds to the precision of the
model. - The significance level is 0.05, meaning there is
only a 5 chance of wrongly rejecting the
hypothesis that the parameter is zero, or does
not help in prediction.
12Interpretation of the Model
- For every rush attempt a team makes, the model
predicts they will score 3/20 of a point. - Every yard that a team gains on the ground
suggests 3/50 of a point increase. - When a rush attempt results in a fumble, the
teams score will decrease by 1 and 9/100 of a
point.
13Interpretation of the Model
- As a team throws the ball more, for each pass
attempt, they will decrease their score by 11/25
of a point. - However, for every completion, they will increase
their point total by 1/5 of a point. - For every yard that is gained from the completion
of a pass, a teams score increases by 2/25 of a
point. - If a pass attempt is intercepted, then their
points scored will decrease by just over ½ of a
point.
14Interpretation of the Model
- All of this can be summed up quite simply A
- rushing team is superior to a passing team.This
is - magnified if the team is able to gain substantial
- yardage per rush. Conversely, if a team passes
- many times, but completes a good percentage of
- them for good yardage, the effects of the pass
- attempt statistic are not as prevalent.
15Examples of Prediction
- Actual Predicted
- Score Value
- 19 22.4272
- 24 26.3893
- 13 5.8283
- 14 8.6153
- 11 14.5838
- 13 14.6197
- 20 24.8587
- 45 31.7385
- 20 15.1015
- 95 Confidence Level
- for the Mean
- 21.1840 23.6705
- 24.8230 27.9555
- 4.3773 7.2793
- 6.3462 10.8845
- 13.7462 15.4215
- 13.3944 15.8451
- 23.7494 25.9680
- 30.6684 32.8086
- 13.7705 16.4324
16Explanation of the Predictions
- The above are the predictions for 9 observations
of the sample. - Obviously, none are exact, which should not be
expected. - They are, however, relatively close to the actual
values.
17Residual Error vs. Predicted
Test of First and Second Moment Specification DF
Chi-Square Pr ChiSq
35 37.52 0.3543
18Diagnostic Checking
- Residual, or Prediction Error
- 1) Constant Variance
- The plot shows that the variance is
slightly shaped like a - megaphone.
- The Cook and Weisberg1 formal test
indicates that the - null hypothesis of constant variance
cannot be rejected. - 2) Normality
- The regression coefficients do not rely
upon residual - normality assumption to be asymptotically
normal2.
19Diagnostics Continued
- Variance
- Inflation
- 0
- 2.71820
- 2.49829
- 4.37648
- 5.67120
- 2.57837
- 1.18387
- 1.02425
- The Variance Inflation is a measure of the
multicollinearity (linear relationship among 2 or
more predictors) of the variables. - None of these indicates severe multicollinearity.
20Example of a Run vs. Pass Oriented Offense
- If a team rushes the ball 35 times in a game
gaining 150 yards with 1 fumble, passes 12 times
for 115 yards and no interceptions, then on
average it will score 23.5 points. If they throw
an interception then the points scored reduces to
22.9. - Now suppose a team rushes 12 times for 75 yards
without fumbling, passes 35 times, completing 19
for 315 yards with 2 interceptions. They will
average 24.085 points.
21Example of a Balanced Offense
- For a team that rushes the ball 25 times for 110
yards with 1 fumble, passes 22 times and
completing 12 for 145 yards and 2 interceptions
will score on average 17.57 points per game. - If they only throw one interception, then the
points scored becomes 18.13.
22Statistical Comparison of Offenses
- Definitions
- 1) An offense is run oriented if its attempts
- are 1.5 times or greater than its pass
- attempts.
- 2) An offense is pass oriented if its pass
attempts - are 1.5 times or more than its rush
attempts. - 3) If a teams passing and rushing attempts are
- anywhere within 1.5 of each other, then
it is - balanced.
23The ANOVA Procedure Tukey's Studentized Range
(HSD) Test for score This test controls the Type
I experimentwise error rate.
Alpha
0.05 Error Degrees of
Freedom 847 Error
Mean Square 89.94897
Critical Value of Studentized Range
3.32034 Comparisons significant at the 0.05 level
indicated by .
Difference
orient Between Simultaneous 95
Comparison Means
Confidence Limits Run -
Bala 5.2083 2.8845 7.5320
Run - Pass 9.2933
6.7978 11.7888 Bala -
Pass 4.0850 2.3737 5.7963
24Interpretation of Comparisons
- There is a difference between the orientations of
teams. - In fact, they are all different from each other!
- Run oriented teams will actually score more
points than both pass oriented and balanced
offenses. - Balanced offenses score more often than pass
oriented teams. - Why? Possibly due to the fact that more time is
used by running the football than by passing.
25Determining the Probability of Winning the Game
- A win is given a value of 1. If a team ties or
loses, they are given a value of 0. - The only variables that are in a coachs
immediate control are whether they run or pass
the ball on offense. - For this reason, only rushing and passing
attempts will be used as independent variables.
26Distribution of Wins and Losses
27Frequencies of Game Outcomes
Cumulative Cumulative Outcome
Frequency Percent Frequency
Percent Loss 426
50.12 426 50.12 Tie
23 2.71
449 52.82 Win
401 47.18 850
100.00
28Method of Analysis
- Logistic Regression will be used to model the
probability that a team wins. - The form of the model is
- (eß0 ß1 rushes ß2 passes)
- (1 eß0 ß1 rushes ß2 passes)
29Estimation of Parameters
Analysis of Maximum Likelihood Estimates
Standard Parameter DF Estimate Error
Chi-Square Pr ChiSq Intercept 1
-2.1140 0.5403 15.3082
0.0119 112.5931 1 -0.0481 0.0107 20.1301
30Interpretation of the Model
- Both Rushing and Passing Attempts are significant
factors in determining the probability of winning
a game. - The parameter estimates are in the form of the
natural logarithm. - Odds Ratios will give more insight into how the
model is affected by rushing and passing.
31Fit of the Model
Hosmer and Lemeshow Goodness-of-Fit
Test Chi-Square DF
Pr ChiSq 9.2815 8
0.3191
This shows that there is not evidence for lack of
model fit.
32Odds Ratios
- Point 95 Wald
- Effect Estimate Confidence Limits
- ratt 1.134 1.108 1.161
- patt 0.953 0.933 0.973
33Interpretation of the Odds Ratios
- For a one attempt increase in Rushing Attempts,
the odds in favor of winning are multiplied by
1.134. - For every one Pass Attempt, the odds in favor of
winning are multiplied by 0.953.
34Conversion into Probabilities
- The equation for the probability of winning a
game - P(Win) 1 / ( 1 eß0 ß1 rushes ß2 passes)
- This yields
- P(Win) 1 / (1 e 2.114 .1259 rushes
-.0481 passes)
35Some Examples
- For a team rushing 25 times and passing 25 times,
the model yields a probability of .458 that they
will win the game, or a 45.8 chance they will
win. - If a team rushes 35 times and passes only 15
times, their probability of winning is .827, or
nearly an 83 chance of victory. - Now, say that team rushes only 15 times and
passes 35 times, the probability changes to .129
or a 13 chance of winning.
36What the Model Does Not Suggest
- Given the model predicts a higher success rate if
a team rushes the ball, it may seem that a team
should never pass. If this is done, the model
gives a 98.5 chance of victory for 50 rushes and
no passes. - Obviously, if the other team knows you are never
going to pass, you wont be able to move the ball
10 yards on 3 plays very consistently. This
shows how real world circumstances arent always
modeled perfectly.
37Another Consideration
- The model also does not take into account middle
of the game strategies. In other words, the
farther you are behind, the more passes your team
will attempt. Why? Less time is taken off of
the game clock by passing.
38Does the Year and Week affect Points Scored?
- Year in and year out rules are changed to
increase scoring. - Rule changes include
- 2 point conversions allowed
- Defensive Line Encroachment Rules
- The 5-Yard Bump Rule on Receivers
- Etc.
392 Way ANOVA for Points Scored, Year and Week
Sum
of Source DF Squares
Mean Square F Value Pr F Model
25 321.405283 12.856211
1.11 0.3186 Error 824
9509.705603 11.540905 C Total
849 9831.110886
R-Square Coeff Var Root MSE tscore
Mean 0.032693 38.39034
3.397191 8.849077 Source
DF Type III SS Mean Square F Value
Pr F year 9
133.2107197 14.8011911 1.28 0.2423
week 16 188.1945631
11.7621602 1.02 0.4333
40Interpretation of the 2 Way ANOVA
- The model indicates there is not sufficient
evidence to conclude that Year and Week have no
effect on how many points are scored. - This means that for any given week in any given
year, the points scored by a team is not
affected, in this model.
41Conclusion
- All 3 statistical models point towards Rushing
- Attempts as being the important statistic in
- determining the points a team scores, and
whether - or not they win the game.
- Ball control is thus the essence to winning a
- football game. This is most readily seen by a
- team that rushes with consistency. A team is
- in better position to win if they can run and
- pass only occasionally.
42References
(1) Applied Linear Regression, 2nd Edition,
pp.135-136 Sanford Weisberg Publisher John
Wiley and Sons, 1985 (2) Applied Linear
Statistical Models, 4th Edition, pp.
54-55 Neter, Kutner, Nachtsteim,
Wasserman Publisher Irwin (Chicago) 1996 (3)
Professor Kate Cowles University of Iowa
Department of Statistics and Actuarial
Science (4) Data collected from
http//www.mrncaa.com