An Intelligence Approach to Evaluation of Sports Teams presentation

About This Presentation

Transcript and Presenter's Notes

Title: An Intelligence Approach to Evaluation of Sports Teams

1
An Intelligence Approach to Evaluation of Sports
Teams
by Edward Kambour, Ph.D.
1
2
Agenda

College Football
Linear Model
Generalized Linear Model
Intelligence (Bayesian) Approach
Results
Other Sports
Future Work

3
General Background

Goals
Forecast winners of future games
Beat the Bookie!
Estimate the outcome of unscheduled games
Whats the probability that Iowa would have
beaten Ohio St?
Generate reasonable rankings

4
Major College Football

No playoff system
Computer rankings are an element of the BCS
114 teams
12 games for each in a season

5
Linear Model

Rothman (1970s), Harville (1977), Stefani
(1977), , Kambour (1991), , Sagarin???
Response, Y, is the net result (point-spread)
Parameter, ?, is the vector of ratings
For a game involving teams i and j,
EY ?i - ?j

6
Linear Model (cont.)

Let X be a row vector with
EYX?

7
Regression Model Notes

Least Squares ? Normality, Homogeneity
College Football
Estimate 100 parameters
Sample size for a full season is about 600
Design Matrix is sparse and not full rank

8
Home-field Advantage

Generic Advantage (Stefani, 1980)
Force i to be home team and j the visiting team
Add an intercept term to X
Adds one more parameter to estimate
UAB Alabama
Rice Texas AM
Team Specific Advantage
Doubles the number of parameters to estimate

9
Linear Model Issues

Normality
Homogeneity
Lots of parameters, with relatively small sample
size
Overfitting
The bookie takes you to the cleaners!

10
Linear Model Issues (cont.)

Should we model point differential
A and B play twice
A by 34 in first, B by 14 in the second
A by 10 each time
Running up the score (or lack thereof)
BCS Thou shalt not use margin of victory in thy
ratings!

11
Logistic Regression

Rothman (1970s)
Linear Model
Use binary variable
Winning is all that matters
Avoid margin of victory
Coin Flips

12
Logistic Regression Issues

Still have sample size issues
Throw away a lot of information
Undefeated teams

13
Transformations

Transform the differentials to normality
Power transformations
Rothman logistic transform
Transforms points to probabilities for logistic
regression
Diminishing returns transforms
Downweights runaway scores

14
Power Transforms

Transform the point-spread
Y sign(Z)Za
a 1 ? straight margin of victory
a 0 ? just win baby
a 0 ? Poisson or Gamma ish

15
Maximum Likelihood Transform

1995-2002 seasons
MLE 0.98

Power -2ln(likelihood)
0.1 52487
0.3 41213
0.5 35128
0.67 32597
0.8 31418
1 31193
16
Predicting the Score

Model point differential
Y1 Si Sj
Additionally model the sum of the points scored
Y2 Si Sj
Fit a similar linear model (different parameter
estimates)
Forecast home and visitors score
H (Y1 Y2 )/2, V (Y2 - Y1)/2

17
Another Transformation Idea

Scores (touchdowns or field goals) are arrivals,
maybe Poisson
Final score 7 times a Poisson 3 times a
Poisson
Transform the scores to homogeneity and normality
first
The differences (and sums) should follow suit

18
Square Root Transform

Since the score is similar to a linear
combination of Poissons, square root should work
Transformation
Why k?
For small Poisson arrival rates, get better
performance (Anscombe, 1948)

19
Likelihood Test

LRT No transformation vs. square root with
fitted k
Used College Football results from 1995-2002
k 21
Transformation was significantly better
p-value 0.0023, chi-square 9.26

20
Predicting the Score with Transform

Model point differential
Additionally model the sum of the points scored
Forecast home and visitors score
H ((Y1 Y2 )/2)2 , V ((Y2 - Y1)/2)2
Note the point differential is the product

21
Unresolved Linear Model Issues

Overfitting
History
Going into the season, we have a good idea as to
how teams will do
The best teams tend to stay the best
The worst teams tend to stay the worst
Changes happen
Kansas State

22
Intelligence Model

Concept
The ratings and home-ads for year t are similar
to those of year t-1. There is some drift from
one year to the next.
Model

23
Intelligence Model (Details)

Notation
L teams
M seasons of data
Ni games in the ith season
Xi the Ni by 2L X matrix for season i
Yi the Ni vector of results for season i
?i the Ni vector of results for season I

24
Details (cont.)

Data Distribution
For all i 1, 2, , M

25
Details (cont.)

Prior Distribution

26
Details (finally, the end)

The Posterior Distribution of ?M and ?-2 is
closed form and can be calculated by an iterative
method
The Predictive Distribution for future results
(transformed sum or difference) is
straight-forward correlated normal (given the
variance)

27
Forecasts

For Scores
Simply untransform
EZ2 VarZ EZ2
For the point-spread
Product of two normals
Simulate 10000 results

28
Enhanced Model

Fit the prior parameters
Hierarchical models
Drifts and initial variances
No closed form for posterior and predictive
distributions (at least as far as I know)
The complete conditionals are straight-forward,
so Gibbs sampling will work (eventually)

29
Results(www.geocities.com/kambour/football.html)

2002 Final Rankings

Team Rating Home
Miami 72.23 (1.03) 0.21 (0.04)
Kansas St 72.04 (1.04) 0.44 (0.03)
USC 71.95 (1.03) 0.04 (0.03)
Oklahoma 71.85 (1.02) 0.18 (0.03)
Texas 71.57 (1.03) 0.36 (0.03)
Georgia 71.49 (1.03) 0.02 (0.03)
Alabama 71.45 (1.03) -0.09 (0.03)
Iowa 71.30 (1.03) 0.21 (0.04)
Florida St 71.29 (1.02) 0.43 (0.03)
Virginia Tech 71.25 (1.03) 0.12 (0.03)
Ohio St 71.18 (1.03) 0.27 (0.03)
30
Results

2002 Final Rankings

Team Rating Home
Miami 72.23 0.21
Kansas St 72.04 0.44
USC 71.95 0.04
Oklahoma 71.85 0.18
Texas 71.57 0.36
Georgia 71.49 0.02
Alabama 71.45 -0.09
Iowa 71.30 0.21
Florida St 71.29 0.43
Virginia Tech 71.25 0.12
Ohio St 71.18 0.27
31
Results

2002 Final Rankings

Ohio St 17 Miami Fl (-13) 31 0.8255 0.5228
Washington St 21 Oklahoma (-6.5) 31 0.7347 0.5797
Iowa 21 USC (-6) 30 0.7174 0.5721
NC State (E) 20 Notre Dame 17 0.5639 0.5639
Florida St (4) 24 Georgia 27 0.5719 0.5320
33
2002 Final Record

Picking Winners
522 157 0.769
Against the Vegas lines
367 307 5 0.544
Best Bets
9 7 0.563
In 2001, 11 - 4

34
ESPN College Pickem(http//games.espn.go.com/cpi
ckem/leader)

1. Barry Schultz 5830
2. Jim Dobbs 5687
3. Michael Reeves 5651
4. Fup Biz 5594
5. Joe 5587
6. Rising Cream 5562
7. Intelligence Ratings 5559

35
Ratings System Comparison(http//tbeck.freeshell.
org/fb/awards2002.html)

Todd Beck
Ph.D. Statistician
Rush Institute
Intelligence Ratings Best Predictors

36
College Football Conclusions

Can forecast the outcome of games
Capture the random nature
High variability
Sparse design
Scientists should avoid BCS
Statistical significance is impossible
Problem Complexity
Other issues

37
NFL

Similar to College Football
Square root transform is applicable
Drift is a little higher than College Football
Better design matrix
Small sample size
Playoff

38
NFL Results(www.geocities.com/kambour/NFL.html)

2002 Final Rankings (after the Super Bowl)

Team Rating Home
Tampa Bay 70.72 0.29
Oakland 70.57 0.28
Philadelphia 70.55 0.10
New England 70.16 0.12
Atlanta 70.13 0.20
NY Jets 70.10 -0.01
Pittsburgh 69.95 0.28
Green Bay 69.92 0.28
Kansas City 69.90 0.51
Denver 69.89 0.50
Miami 69.89 0.49
39
2002 Final NFL Record

Picking Winners
162 104 1 0.609
Against the Vegas lines
135 128 4 0.513
Best Bets
9 8 0.529

40
NFL Europe

Similar to College and NFL
Square root transform
Dramatic drift
Teams change dramatically in mid-season
Few teams
Better design matrix

41
College Basketball

Transform?
Much more normal (Central Limit Theorem)
A lot more games
Intersectional games
Less emphasis on programs than in College
Football
More drift
NCAA tournament

42
NCAA Basketball Pre-tournament Ratings

Team Rating Home
Arizona 100.06 3.97
Kentucky 99.33 4.32
Kansas 95.89 3.85
Texas 93.42 4.44
Duke 92.90 4.66
Oklahoma 90.19 4.31
Florida 90.65 3.99
Wake Forest 88.70 3.65
Syracuse 88.50 3.49
Xavier 87.89 3.37
Louisville 87.88 4.16
43
NBA

Similar to College Basketball
Normal No transformation
A lot more games fewer teams
Playoffs are completely different from regular
season
Regular season very balanced, strong home court
Post season less balanced, home court lessened

44
Hockey

Transform
Rare events Poissonish
Square root with k around 1
A lot more games
History matters
Playoffs seem similar to regular season
Balance

45
Soccer

Similar to hockey
Transform
Square root with low k
Not a lot of games
Friendlys versus cup play
Home pitch is pronounced
Varies widely

46
Soccer Results

Correctly forecasted 2002 World Cup final
Brazil over Germany
Correctly forecasted US run to quarter-finals
Won the PROS World Cup Soccer Pool

47
Future Enhancements

Hierarchical Approaches
Conferences
More complicated drift models
Correlations
Individual drifts
Drift during the season
Mean correcting drift
More informative priors

Write a Comment

User Comments (0)

About PowerShow.com

An Intelligence Approach to Evaluation of Sports Teams PowerPoint PPT Presentation