An Intelligence Approach to Evaluation of Sports Teams - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

An Intelligence Approach to Evaluation of Sports Teams

Description:

An Intelligence Approach to Evaluation of Sports Teams by Edward Kambour, Ph.D. * Agenda College Football Linear Model Generalized Linear Model Intelligence (Bayesian ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 48
Provided by: ekam
Category:

less

Transcript and Presenter's Notes

Title: An Intelligence Approach to Evaluation of Sports Teams


1
An Intelligence Approach to Evaluation of Sports
Teams
by Edward Kambour, Ph.D.
1
2
Agenda
  1. College Football
  2. Linear Model
  3. Generalized Linear Model
  4. Intelligence (Bayesian) Approach
  5. Results
  6. Other Sports
  7. Future Work

3
General Background
  • Goals
  • Forecast winners of future games
  • Beat the Bookie!
  • Estimate the outcome of unscheduled games
  • Whats the probability that Iowa would have
    beaten Ohio St?
  • Generate reasonable rankings

4
Major College Football
  • No playoff system
  • Computer rankings are an element of the BCS
  • 114 teams
  • 12 games for each in a season

5
Linear Model
  • Rothman (1970s), Harville (1977), Stefani
    (1977), , Kambour (1991), , Sagarin???
  • Response, Y, is the net result (point-spread)
  • Parameter, ?, is the vector of ratings
  • For a game involving teams i and j,
  • EY ?i - ?j

6
Linear Model (cont.)
  • Let X be a row vector with
  • EYX?

7
Regression Model Notes
  • Least Squares ? Normality, Homogeneity
  • College Football
  • Estimate 100 parameters
  • Sample size for a full season is about 600
  • Design Matrix is sparse and not full rank

8
Home-field Advantage
  • Generic Advantage (Stefani, 1980)
  • Force i to be home team and j the visiting team
  • Add an intercept term to X
  • Adds one more parameter to estimate
  • UAB Alabama
  • Rice Texas AM
  • Team Specific Advantage
  • Doubles the number of parameters to estimate

9
Linear Model Issues
  • Normality
  • Homogeneity
  • Lots of parameters, with relatively small sample
    size
  • Overfitting
  • The bookie takes you to the cleaners!

10
Linear Model Issues (cont.)
  • Should we model point differential
  • A and B play twice
  • A by 34 in first, B by 14 in the second
  • A by 10 each time
  • Running up the score (or lack thereof)
  • BCS Thou shalt not use margin of victory in thy
    ratings!

11
Logistic Regression
  • Rothman (1970s)
  • Linear Model
  • Use binary variable
  • Winning is all that matters
  • Avoid margin of victory
  • Coin Flips

12
Logistic Regression Issues
  • Still have sample size issues
  • Throw away a lot of information
  • Undefeated teams

13
Transformations
  • Transform the differentials to normality
  • Power transformations
  • Rothman logistic transform
  • Transforms points to probabilities for logistic
    regression
  • Diminishing returns transforms
  • Downweights runaway scores

14
Power Transforms
  • Transform the point-spread
  • Y sign(Z)Za
  • a 1 ? straight margin of victory
  • a 0 ? just win baby
  • a 0 ? Poisson or Gamma ish

15
Maximum Likelihood Transform
  • 1995-2002 seasons
  • MLE 0.98

Power -2ln(likelihood)
0.1 52487
0.3 41213
0.5 35128
0.67 32597
0.8 31418
1 31193
16
Predicting the Score
  • Model point differential
  • Y1 Si Sj
  • Additionally model the sum of the points scored
  • Y2 Si Sj
  • Fit a similar linear model (different parameter
    estimates)
  • Forecast home and visitors score
  • H (Y1 Y2 )/2, V (Y2 - Y1)/2

17
Another Transformation Idea
  • Scores (touchdowns or field goals) are arrivals,
    maybe Poisson
  • Final score 7 times a Poisson 3 times a
    Poisson
  • Transform the scores to homogeneity and normality
    first
  • The differences (and sums) should follow suit

18
Square Root Transform
  • Since the score is similar to a linear
    combination of Poissons, square root should work
  • Transformation
  • Why k?
  • For small Poisson arrival rates, get better
    performance (Anscombe, 1948)

19
Likelihood Test
  • LRT No transformation vs. square root with
    fitted k
  • Used College Football results from 1995-2002
  • k 21
  • Transformation was significantly better
  • p-value 0.0023, chi-square 9.26

20
Predicting the Score with Transform
  • Model point differential
  • Additionally model the sum of the points scored
  • Forecast home and visitors score
  • H ((Y1 Y2 )/2)2 , V ((Y2 - Y1)/2)2
  • Note the point differential is the product

21
Unresolved Linear Model Issues
  • Overfitting
  • History
  • Going into the season, we have a good idea as to
    how teams will do
  • The best teams tend to stay the best
  • The worst teams tend to stay the worst
  • Changes happen
  • Kansas State

22
Intelligence Model
  • Concept
  • The ratings and home-ads for year t are similar
    to those of year t-1. There is some drift from
    one year to the next.
  • Model

23
Intelligence Model (Details)
  • Notation
  • L teams
  • M seasons of data
  • Ni games in the ith season
  • Xi the Ni by 2L X matrix for season i
  • Yi the Ni vector of results for season i
  • ?i the Ni vector of results for season I

24
Details (cont.)
  • Data Distribution
  • For all i 1, 2, , M

25
Details (cont.)
  • Prior Distribution

26
Details (finally, the end)
  • The Posterior Distribution of ?M and ?-2 is
    closed form and can be calculated by an iterative
    method
  • The Predictive Distribution for future results
    (transformed sum or difference) is
    straight-forward correlated normal (given the
    variance)

27
Forecasts
  • For Scores
  • Simply untransform
  • EZ2 VarZ EZ2
  • For the point-spread
  • Product of two normals
  • Simulate 10000 results

28
Enhanced Model
  • Fit the prior parameters
  • Hierarchical models
  • Drifts and initial variances
  • No closed form for posterior and predictive
    distributions (at least as far as I know)
  • The complete conditionals are straight-forward,
    so Gibbs sampling will work (eventually)

29
Results(www.geocities.com/kambour/football.html)
  • 2002 Final Rankings

Team Rating Home
Miami 72.23 (1.03) 0.21 (0.04)
Kansas St 72.04 (1.04) 0.44 (0.03)
USC 71.95 (1.03) 0.04 (0.03)
Oklahoma 71.85 (1.02) 0.18 (0.03)
Texas 71.57 (1.03) 0.36 (0.03)
Georgia 71.49 (1.03) 0.02 (0.03)
Alabama 71.45 (1.03) -0.09 (0.03)
Iowa 71.30 (1.03) 0.21 (0.04)
Florida St 71.29 (1.02) 0.43 (0.03)
Virginia Tech 71.25 (1.03) 0.12 (0.03)
Ohio St 71.18 (1.03) 0.27 (0.03)
30
Results
  • 2002 Final Rankings

Team Rating Home
Miami 72.23 0.21
Kansas St 72.04 0.44
USC 71.95 0.04
Oklahoma 71.85 0.18
Texas 71.57 0.36
Georgia 71.49 0.02
Alabama 71.45 -0.09
Iowa 71.30 0.21
Florida St 71.29 0.43
Virginia Tech 71.25 0.12
Ohio St 71.18 0.27
31
Results
  • 2002 Final Rankings

Team Rating Home
Miami 72.23 0.21
Kansas St 72.04 0.44
USC 71.95 0.04
Oklahoma 71.85 0.18
Texas 71.57 0.36
Georgia 71.49 0.02
Alabama 71.45 -0.09
Iowa 71.30 0.21
Florida St 71.29 0.43
Virginia Tech 71.25 0.12
Ohio St 71.18 0.27
32
Bowl Predictions

Ohio St 17 Miami Fl (-13) 31 0.8255 0.5228
Washington St 21 Oklahoma (-6.5) 31 0.7347 0.5797
Iowa 21 USC (-6) 30 0.7174 0.5721
NC State (E) 20 Notre Dame 17 0.5639 0.5639
Florida St (4) 24 Georgia 27 0.5719 0.5320
33
2002 Final Record
  • Picking Winners
  • 522 157 0.769
  • Against the Vegas lines
  • 367 307 5 0.544
  • Best Bets
  • 9 7 0.563
  • In 2001, 11 - 4

34
ESPN College Pickem(http//games.espn.go.com/cpi
ckem/leader)
  • 1. Barry Schultz 5830
  • 2. Jim Dobbs 5687
  • 3. Michael Reeves 5651
  • 4. Fup Biz 5594
  • 5. Joe 5587
  • 6. Rising Cream 5562
  • 7. Intelligence Ratings 5559

35
Ratings System Comparison(http//tbeck.freeshell.
org/fb/awards2002.html)
  • Todd Beck
  • Ph.D. Statistician
  • Rush Institute
  • Intelligence Ratings Best Predictors

36
College Football Conclusions
  • Can forecast the outcome of games
  • Capture the random nature
  • High variability
  • Sparse design
  • Scientists should avoid BCS
  • Statistical significance is impossible
  • Problem Complexity
  • Other issues

37
NFL
  • Similar to College Football
  • Square root transform is applicable
  • Drift is a little higher than College Football
  • Better design matrix
  • Small sample size
  • Playoff

38
NFL Results(www.geocities.com/kambour/NFL.html)
  • 2002 Final Rankings (after the Super Bowl)

Team Rating Home
Tampa Bay 70.72 0.29
Oakland 70.57 0.28
Philadelphia 70.55 0.10
New England 70.16 0.12
Atlanta 70.13 0.20
NY Jets 70.10 -0.01
Pittsburgh 69.95 0.28
Green Bay 69.92 0.28
Kansas City 69.90 0.51
Denver 69.89 0.50
Miami 69.89 0.49
39
2002 Final NFL Record
  • Picking Winners
  • 162 104 1 0.609
  • Against the Vegas lines
  • 135 128 4 0.513
  • Best Bets
  • 9 8 0.529

40
NFL Europe
  • Similar to College and NFL
  • Square root transform
  • Dramatic drift
  • Teams change dramatically in mid-season
  • Few teams
  • Better design matrix

41
College Basketball
  • Transform?
  • Much more normal (Central Limit Theorem)
  • A lot more games
  • Intersectional games
  • Less emphasis on programs than in College
    Football
  • More drift
  • NCAA tournament

42
NCAA Basketball Pre-tournament Ratings

Team Rating Home
Arizona 100.06 3.97
Kentucky 99.33 4.32
Kansas 95.89 3.85
Texas 93.42 4.44
Duke 92.90 4.66
Oklahoma 90.19 4.31
Florida 90.65 3.99
Wake Forest 88.70 3.65
Syracuse 88.50 3.49
Xavier 87.89 3.37
Louisville 87.88 4.16
43
NBA
  • Similar to College Basketball
  • Normal No transformation
  • A lot more games fewer teams
  • Playoffs are completely different from regular
    season
  • Regular season very balanced, strong home court
  • Post season less balanced, home court lessened

44
Hockey
  • Transform
  • Rare events Poissonish
  • Square root with k around 1
  • A lot more games
  • History matters
  • Playoffs seem similar to regular season
  • Balance

45
Soccer
  • Similar to hockey
  • Transform
  • Square root with low k
  • Not a lot of games
  • Friendlys versus cup play
  • Home pitch is pronounced
  • Varies widely

46
Soccer Results
  • Correctly forecasted 2002 World Cup final
  • Brazil over Germany
  • Correctly forecasted US run to quarter-finals
  • Won the PROS World Cup Soccer Pool

47
Future Enhancements
  • Hierarchical Approaches
  • Conferences
  • More complicated drift models
  • Correlations
  • Individual drifts
  • Drift during the season
  • Mean correcting drift
  • More informative priors
Write a Comment
User Comments (0)
About PowerShow.com