Title: Generalized Linear Mixed Model
1Generalized Linear Mixed Model
- English Premier League Soccer 2003/2004 Season
2Introduction
- English Premier League Soccer (Football)
- 20 Teams Each plays all others twice
(home/away) - Games consist of two halves (45 minutes each)
- No overtime
- Each team is on offense and defense for 38 games
(38 first and second halves) - Response Variable Goals in a half
- Potential Independent Variables
- Fixed Factors Home Dummy, Half2 Dummy,
Game(1-38) - Random Factors Offensive Team, Defensive Team
- Distribution of Response Poisson?
3Preliminary Summary
4Summary of Previous Slide
- Teams vary extensively on offense and defense
- Offense min38, max73, mean50.6, SD8.85
- Defense min26, max79, mean50.6, SD13.75
- Strong Negative correlation between off/def
r-0.80 - Home Teams outscore Away Teams 1.31
- Second Half outscores First Half 1.21
- No evidence of autocorrelation in total goals
scored over weeks, Durbin-Watson Stat 2.03
5Marginal Analysis No Team Effects
- Break Down Goals by Home/Half2 (380 Games)
6Summary of Previous Slide
- Means (Variances) for 4 Half Types
- Home/1st Half Mean 0.692 Variance 0.689
- Away/1st Half Mean 0.521 Variance 0.514
- Home/2nd Half Mean 0.813 Variance 0.912
- Away/2nd Half Mean 0.637 Variance 0.628
- Thus, means and variances in strong agreement
- Chi-Square Statistics for testing for Poisson
- Df (4 categories-1)-(1 Parameter estimated) 2
- P-values all exceed 0.50 (.8505, .5440, .7353,
.6957) - Goals scored consistent with Poisson Distribution
7(No Transcript)
8Generalized Linear Models
- Dependent Variable Goals Scored
- Distribution Poisson
- Link Function log
- Independent Variables Home, Half2 Dummy
Variables - Models
Model fit using generalized linear model software
packages
9Parameter Estimates / Model Fit Model 1
Distribution
Poisson Link Function
Log Dependent Variable
goals Number of Observations
Read 1520 Number of
Observations Used 1520
Criteria For Assessing Goodness Of Fit
Criterion DF
Value Value/DF Deviance
1517 1650.4574
1.0880 Scaled Deviance 1517
1650.4574 1.0880 Pearson Chi-Square
1517 1549.2570 1.0213
Scaled Pearson X2 1517 1549.2570
1.0213 Log Likelihood
-1411.0226 Algorithm converged.
10Parameter Estimates / Model Fit Model 1
Analysis Of Parameter Estimates
Standard Wald
95 Confidence Chi- Parameter DF
Estimate Error
Limits Square Intercept 1
-0.6397 0.0588 -0.7549 -0.5245
118.48 home 1 0.2624
0.0634 0.1381 0.3866
17.12 half2 1 0.1783
0.0631 0.0546 0.3020
7.98 Scale 0 1.0000 0.0000
1.0000 1.0000
Analysis Of Parameter
Estimates
Parameter Pr gt ChiSq
Intercept lt.0001
home lt.0001
half2 0.0047
Scale NOTE The scale parameter
was held fixed.
11Parameter Estimates / Model Fit Model 2
Criteria For Assessing Goodness Of Fit
Criterion DF
Value Value/DF Deviance
1516 1650.3613 1.0886
Scaled Deviance 1516
1650.3613 1.0886 Pearson Chi-Square
1516 1549.7072 1.0222
Scaled Pearson X2 1516 1549.7072
1.0222 Log Likelihood
-1410.9745 Algorithm
converged.
12Parameter Estimates / Model Fit Model 2
Analysis Of Parameter Estimates
Standard Wald 95
Confidence Chi-Parameter DF Estimate
Error Limits
SquareIntercept 1 -0.6519 0.0711
-0.7912 -0.5126 84.15home
1 0.2839 0.0941 0.0995 0.4683
9.10half2 1 0.2007
0.0958 0.0129 0.3885 4.39homehalf2
1 -0.0395 0.1274 -0.2891 0.2101
0.10Scale 0 1.0000
0.0000 1.0000 1.0000
Parameter Pr gt ChiSq
Intercept lt.0001
home 0.0026
half2 0.0363
homehalf2 0.7566
Scale NOTE
The scale parameter was held fixed.
13Testing for Home/Half2 Interaction
- H0 No Home x Half2 Interaction (bHomeHalf2 0)
- HA Home x Half2 Interaction (bHomeHalf2 ? 0)
- Test 1 Wald Test
- Test 2 Likelihood Ratio Test
14Testing for Main Effects for Home Half2
- Wald tests only reported here (both effects are
very significant) - Tests based on Model 1 (no interaction model)
15Interpreting the GLM
16Incorporating Random (Team) Effects
- Teams clearly vary in terms of offensive and
defensive skills (see slide 3) - Since many factors are inputs into team abilities
(players, coaches, chemistry), we will treat team
offensive and defensive effects as Random - There will be 20 random offensive effects (one
per team) and 20 defensive effects
17Random Team Effects
- All effects are on log scale for goals scored
- Offense Effects oi NID(0,so2)
- Defense Effects di NID(0,sd2)
- In Estimation process assume COV(oi,di)0 which
seems a stretch (but we can still observe the
covariance of the estimated random effects)
18Mixed Effects Model
- Fixed Effects Intercept, Home, Half2 (a)
- Random Effects Offteam, Defteam (b)
- Conditional Model (on Random Effects)
19Model in Matrix Notation - Example
- League has 3 Teams A, B, C
- Order of Entry of Games A_at_B, A_at_C, B_at_C, B_at_A, C_at_A,
C_at_B - Order of Entry of Scores within Game Home/1st,
Away/1st, Home/2nd, Away/2nd - 3 Offense Effects, 3 Defense Effects, 24
Observations
20Model Based on 3 Teams
21Sequence of Potential Models
- No fixed or random effects (common mean)
- Fixed home and second half effects, no random
effects - Fixed home and second half effects, random
offense team effects - Fixed home and second half effects, random
defense team effects - Fixed home and second half effects, random
offense and defense team effects
22Results Estimates (P-Values)
Model a0 aHome aHalf2 so2 sd2 sRes2 -2lnL AIC BIC
1 -.407 (.0001) N/A N/A N/A N/A 1.044 5001.9 5003.9 5009.3
2 -.6397 (.0001) .2624 (.0001) .1783 (.0052) N/A N/A 1.0213 4992.3 4994.3 4999.6
3 -.6413 (.0001) .2624 (.0001) .1783 (.0050) .01004 (.143) N/A 1.0099 4985.6 4989.6 4991.6
4 -.6592 (.0001) .2624 (.0001) .1783 (.0040) N/A .0588 (.012) 0.9630 4958.6 4962.6 4964.6
5 -.6605 (.0001) .2624 (.0001) .1783 (.0039) .0084 (.162) .0549 (.012) 0.9531 4951.9 4957.9 4960.9
- Based on Z-test, not preferred Likelihood Ratio
Test - H0so2 0 vs HAs02gt0 TS 4958.6-4951.96.7
P0.5P(c12 6.7).005 - Based on AIC, BIC, Model with both offense and
defense effects is best - No interaction found between team effects and
home or half2
23Goodness of Fit
- We Test whether the Poisson GLMM is appropriate
model by means of the Scaled Deviance - H0 Model Fits HA Model Lacks Fit
- Deviance 1570.7
- DF N-fixed parms 1520-31517
- P-valueP(c21570.7)0.1646
- No Evidence of Lack-of-Fit
- If we use Scaled Deviance, we do reject, where
scaled deviance1570.7/0.95311647.9
24Best Linear Unbiased Predictors (BLUPs)
Estimated Team (Random) Effects (Teams with High
Defense values Allow More Goals)
Estimated Fixed Effects
For each Halfijkl compute exp-0.6605HOMEiHALF2j
okdl as the BLUP
25Comparison of BLUPs with Actual Scores
- For Each Team Half, we have Actual and BLUP
- Correlation Between Actual BLUP 0.2655
- Concordant Pairs of Halves (One scores higher on
both Actual and BLUP than other) 452471 - Discordant Pairs of Halves 355617
- Gamma
- (452471-355617)/(452471355617)0.1199
- Evidence of Some Positive Association Between
actual and predicted scores
26Sources Data SoccerPunter.com Methods Littell,
Milliken, Stroup, Wolfinger(1996). SAS System
for Mixed Models Wolfinger, R. and M.
OConnell(1993). Generalized Linear Mixed
Models A Pseudo-Likelihood Approach, J.
Statist. Comput. Simul., Vol. 48, pp. 233-243.
27SAS Code
data oneinfile 'engl2003d.dat'input hteam
1-20 rteam 21-40 goals 47-48 half2 56 home 64
round 71-73if home1 then do offteamhteam
defteamrteam endelse do offteamrteam
defteamhteam endinclude 'glmm800.sas'glim
mix(datatwo, procoptmethodreml,
stmtsstr( class offteam defteam
model goals home half2 /s random offteam
defteam /s ), errorpoisson,
linklog)run