Chapter 4 Prediction and Bayesian Inference


Chapter 4Prediction and Bayesian Inference
  • 4.1 Estimators versus predictors
  • 4.2 Prediction for one-way ANOVA models
  • Shrinkage estimation, types of predictions
  • 4.3 Best linear unbiased predictors (BLUPs)
  • 4.4 Mixed model predictors
  • 4.5 Bayesian inference
  • 4.6 Case study Forecasting lottery sales
  • 4.7 Credibility Theory
  • Appendix 4A Linear unbiased predictors

4.1 Estimators versus predictors
  • In the longitudinal data model, yit zit ai
    xit b eit , the variables ai describe
    subject-specific effects.
  • Given the data yit, zit, xit, in some problems
    it is of interest to summarize subject effects.
  • We have discussed how to estimate fixed, unknown
    parameters .
  • It is also of interest to summarize
    subject-specific effects, such as those described
    by the random variable ai.
  • Predictors are estimators of random variables.
  • Like estimators, predictors are said to be linear
    if they are formed from a linear combination of
    the response y.

Applications of prediction
  • In animal and plant breeding, one wishes to
    predict the production of milk for cows based on
    (1) their lineage (random) and (2) herds (fixed)
  • In credibility theory, one wishes to predict
    expected claims for a policyholder given exposure
    to several risk factors
  • In sample surveys, one wishes to predict the size
    of a specific age-sex-race cohort within a small
    geographical area (known as small area
  • In a survey article, Robinson (1991) also cites
    (1) ore reserve estimation in geological surveys,
    (2) measuring quality of a production plan and
    (3) ranking baseball players abilities.

4.2. Prediction for one-way ANOVA models
  • Consider the traditional one-way random effects
    ANOVA (analysis of variance) model
  • yit ma ai eit
  • Suppose that we wish to summarize the
    subject-specific conditional mean, ma ai .
  • For contrast, first consider using the fixed
    effects model with ma 0.
  • Here, we have that is the best
    (Gauss-Markov) estimate of ai.
  • This estimate is unbiased, that is, E ai.
  • This estimate has minimum variance among all
    linear unbiased estimators (BLUE).

Shrinkage estimator
  • Using the one-way random effects model.
  • Consider an estimator of ma ai that is a
    linear combination of and , that is,
  • for constants c1 and c2.
  • Calculations show that the best values of c1 and
    c2 that minimize are c2
    1 c1 and
  • For large n, we have the shrinkage estimator, or
    predictor, of ma ai to be
    , where

Example of shrinkage estimator Hypothetical Run
Times for Three Machines
  • Machine Run Times Average Run Time
  • 1 14, 12, 10, 12 1 12
  • 2 9, 16, 15, 12 2 13
  • 3 8, 10, 7, 7 3 8
  • Notation yij means the jth run from the ith
  • For example, y21 9 and y23 15.
  • Are there real differences among machines?

Example - Continued
  • To see the shrinkage effect, consider
  • Figure 4.1 Comparison of Subject-Specific Means
    to Shrinkage Estimators.

More on shrinkage estimators
  • Under the random effects model, is an
    unbiased predictor of maai in the sense that E
    - (ma ai) 0.
  • However, is inefficient in the sense that
    has a smaller mean square error than .
  • Here, has been shrunk towards the stable
  • The estimator is said to borrow strength
    from the stable estimator
  • Recall
  • Note that zi1 as either (i) Ti or (ii) sa2/
    s?2 .

Best predictors
  • From Section 3.1, it is easy to check that the
    generalized least square estimator of ma is
  • The linear predictor of ma ai that has minimum
    variance is zi (1 - zi )
    ma,GLS .
  • Here, the acronym BLUP stands for best linear
    unbiased predictor.

Types of Predictors
  • We have now introduced the BLUP of ma ai . This
    quantity is a linear combination of global
    parameters and subject-specific effects.
  • Two other types of predictors are of interest.
  • Residuals. Here, we wish to predict eit . The
    BLUP residual turns out to be
  • Forecasts. Here, we wish to predict, for L lead
    time units into the future,
  • Without serial correlation, the predictor is the
    same as the predictor of ma ai . However, we
    will see that the mean square error turns out to
    be larger.

4.3 Best linear unbiased predictors
  • This section develops best linear unbiased
    predictors in the context of mixed linear models,
    then specializes the consideration to
    longitudinal data mixed models.
  • BLUPs are developed by examining the minimum mean
    square error predictor of a random variable, w.
  • We give a development due to Harville (1976).
  • The argument is originally due to Goldberger
    (1962), who coined the phrase best linear
    unbiased predictor.
  • The acronym was first used by Henderson (1973).
  • BLUPs can also be developed as conditional
    expectations using multivariate normality
  • BLUPs can also be developed in a Bayesian context.

Mixed linear models
  • Suppose that we observe an N ? 1 random vector y
    with mean E y X b and variance Var y V.
  • We wish to predict a random variable w, that has
    mean E w l? b and Var w sw2.
  • Denote the covariance between w and y as Cov(w,y)
  • Assuming known regression parameters (b), the
    best linear (in y) predictor of w is
  • w E w covwy? V-1(y - E y ) l? b covwy
    V-1(y - X b ).
  • If w,y are multivariate normal, then w equals E
    (w y ) and hence is a minimum mean square
    predictor of w.
  • The predictor w is also a minimum mean square
    predictor of w without the assumption of
    normality. See Appendix 4A.1.

BLUPs as predictors
  • To develop the BLUP,
  • define bGLS ( X? V -1 X )-1 X? V-1 y to be the
    generalized least squares (GLS) estimator of b.
  • This is the best linear unbiased estimator
  • Replace b by bGLS in the definition of w to get
    the BLUP
  • wBLUP l? bGLS covwy ? V-1(y - X bGLS )
  • (l? - covwy? V-1X) bGLS covwy? V-1 y.
  • See Appendix 4A.2 for a check, establishing wBLUP
    as the best linear unbiased predictor of w.
  • From Appendix 4A.3, we also have the form for the
    minimum mean square error
  • Var (wBLUP - w) (l? - covwy? V-1X) ( X? V -1 X
  • (l? - covwy? V-1X)? -
    covwy? V-1 covwy sw2.

Example One-way model
  • Recall, yit ma ai eit
  • Thus, yi 1i (ma ai) ei . Thus,
  • Xi 1i and
  • With this, we note that Vi-1 (yi - Xi bGLS)
  • Thus, for predicting w ma ai we have l1 and
    Cov(w, yi) 1i sa2 for the ith subject, 0
    otherwise. Thus,

Random effect ANOVA model
  • For predicting residuals eit we have l0 and
    Cov(w, yi) se2 for the ith subject, tth time
    period, 0 otherwise.
  • Let 1it be a Ti ? 1 vector with a 1 in the tth
    position, 0 otherwise. Thus,
  • is our BLUP residual.

4.4 Mixed model predictors
  • Recall the longitudinal data mixed model
  • yi Zi ai Xi b ei
  • As described in Section 3.3, this is a special
    case of the mixed linear model. We use
  • V block diagonal (V1, ..., Vn) ,
  • where Vi Zi D Zi? Ri.
  • X (X1?, ... Xn?)?
  • For BLUP calculations, note that
  • covwy ( Cov(w, y1? ),, Cov(w, yn?) )?

Longitudinal data mixed model BLUP
  • Recall that the r.v. w has mean E w l? b and
    Var w sw2.
  • The BLUP is
  • The mean square error is Var (wBLUP - w)

BLUP special cases
  • Global parameters and subject-specific effects.
  • Suppose that the interest is in predicting linear
    combinations of global parameters b and
    subject-specific effect ai.
  • Consider linear combinations of the form
  • w c1 ai c2 b.
  • Residuals. Here, w eit .
  • Forecasts. Suppose that the ith subject is
    included in the data set predict
  • for L lead time units in the future.

Predicting global parameters and subject-specific
  • Consider linear combinations of the form w c1
    ai c2 b.
  • Straightforward calculations show that
  • E w c2 b so that l c2,
  • Cov (w, yj ) c1 D Zi for j i
  • Cov (w , yj ) 0 for j ¹ i.
  • Thus, wBLUP c2 bGLS c1 D Zi Vi-1 (yi -
    Xi bGLS ).

Special case 1
  • Take c2 0 . Because the means and variance
    expressions are true for all vectors c2, we may
    write this in vector notation to get the BLUP of
    ai, the vector
  • ai,BLUP D Zi Vi-1 (yi - Xi bGLS ).
  • This is unbiased in the sense that E ai,BLUP - ai
  • This estimate has minimum variance among all
    linear unbiased predictors (BLUP).
  • In the case of the error components model (zit
    1), this reduces to
  • For comparison, recall the fixed effects
    parameter estimate,

Motivating BLUPs
  • We can also motivate BLUPs using normal theory
  • Consider the case where ai and e are multivariate
    normally distributed.
  • Then, it can be shown that E (ai yi) D Zi?
    Vi-1 (yi -Xi b).
  • To motivate this, consider asking the question
    what realization of ai could be associated with
    yi? The expectation!
  • The BLUP is the BLUE of E (ai yi). (That is,
    replace b by bGLS.)

Special case 2
  • As another example, it is of interest to predict
  • Choose and
  • This yields
  • This predictor is of interest in actuarial
    science, where it is known as the credibility

BLUP Residuals
  • Here, w eit . Because E w 0, it follows that
    l 0.
  • Straightforward calculations show that
  • Cov (w, yj ) se2 1it for j i and
  • Cov (w , yj ) 0 for j ¹ i.
  • Here, the symbol 1it denotes a Ti 1 vector
    that has a one in the tth position and is zero
  • Thus
  • eit,BLUP se2 1it Vi-1 (yi - Xi bGLS ).
  • This can also be expressed as

Predicting future observations
  • Suppose that the ith subject is included in the
    data set predict
  • for L lead time units in the future.
  • We will assume that and
    are known.
  • It follows that
  • Straightforward calculations show that
  • Thus, the forecast of yi,TiL is
  • Thus, the forecast is the estimate of the
    conditional mean plus the serial correlation
    correction factor

Predicting future observations
  • To illustrate, consider the special case where we
    have autoregressive of order 1 (AR(1)), serially
    correlated errors.
  • Thus, we have
  • After some algebra, the L step forecast is

4.5 Bayesian Inference
  • With Bayesian statistical models, one views both
    the model parameters and the data as random
  • We assume distributions for each type of random
  • Given the parameters ß and a, the response model
  • Specifically, we assume that the responses y
    conditional on a and ß are normally distributed
    and that
  • E (y a, ß ) Z a X ß and Var (y a, ß) R.
  • Assume that a is distributed normally with mean
    ?a and variance D and that ß is distributed
    normally with mean µß and variance ?ß, each
    independent of the other.

  • The joint distribution of (a?, ß?)? is known as
    the prior distribution.
  • To summarize, the joint distribution of (a?, ß?,
    y?)? is
  • where V R Z D Z?.

Posterior Distribution
  • The distribution of parameters given the data is
    known as the posterior distribution.
  • The posterior distribution of (a?, ß?)? given y
    is normal.
  • The conditional moments are

Relation with BLUPs
  • In longitudinal data applications, one typically
    has more information about the global parameters
    ß than subject-specific parameters a.
  • Consider first the case ?ß 0, so that ß ?ß
    with probability one.
  • Intuitively, this means that ß is precisely
    known, generally from collateral information.
  • Assuming that ?a 0, it is easy to check that
    the best linear unbiased estimator (BLUE) of E (
    a y ) is
  • aBLUP D Z? V-1 ( y X bGLS)
  • Recall from equation (4.11) that aBLUP is also
    the best linear unbiased predictor in the
    frequentist (non-Bayesian) model framework.

Relation with BLUPs
  • Consider second the case where ?ß-1 0.
  • In this case, prior information about the
    parameter ß is vague this is known as using a
    diffuse prior.
  • Assuming ?a 0, one can show that
  • E ( a y ) aBLUP
  • It is interesting that in both extreme cases, we
    arrive at the statistic aBLUP as a predictor of
  • This analysis assumes D and R are matrices of
    fixed parameters.
  • It is also possible to assume distributions for
    these parameters typically, independent Wishart
    distributions are used for D-1 and R-1 as these
    are conjugate priors.
  • The general strategy of substituting point
    estimates for certain parameters in a posterior
    distribution is called empirical Bayes

Example One-way random effects ANOVA model
  • The posterior means turn out to be
  • where
  • Note that ?? measures the precision of knowledge
    about ?. Specifically, we see that ?? approaches
    one as ??2 ??, and approaches zero as ??2 ?0.

4.6 Wisconsin Lottery Sales
  • T40 weeks of sales from n 50 zip codes

Lottery Sales Data Analysis
  • Cross-sectional analysis shows that population
    size heavily influences sales, with Kenosha as an
  • Multiple time series plots
  • show the effect of jackpots that is common to all
    postal codes
  • show the heterogeneity among postal codes
    (reaffirmed by a pooling test)
  • show the heteroscedasticity that is accommodated
    through a logarithmic transformation

Lottery Sales Model Selection
  • In-sample results show that
  • One-way error components dominates pooled
    cross-sectional models
  • An AR(1) error specification significantly
    improves the fit.
  • The best model is probably the two-way error
    component model, with an AR(1) error
    specification (not yet documented)
  • Out-of-sample analysis suggests that
  • logarithmic sales is the preferred choice of
    response it outperforms sales and percentage

4.7. What is Credibility?
  • Hickmans (1975) Analogy
  • In politics, leaders begin with a reservoir of
    credibility which decreases as executive
    experience is compiled.
  • Insurance behaves in a reverse fashion!
  • Here, credibility increases as experience

Credibility Theory
  • Credibility is a technique for predicting future
    expected claims for a risk class, given past
    claims of that and related risk classes.
  • Importance
  • Credibility is widely used for pricing property
    and casualty, workers compensation and health
    care coverages.
  • According to Rodermund (1989), the concept of
    credibility has been the casualty actuaries most
    important and enduring contribution to casualty
    actuarial science.

  • Mowbray (1914 - PCAS)
  • Asked the question, how extensive is an exposure
    necessary to give a dependable pure premium?
  • This approach is now known as the limited
    fluctuation or American credibility
  • Question 1 do we have enough exposure to give
    full weight to the risk class under
  • Question 2 if not, how can we combine
    information from this and related risk classes?

More History
  • Whitney (1918 - PCAS)
  • introduced the idea of using a weighted average
    of average claims of (1) a given risk class and
    (2) all risk classes.
  • The weight is known as the credibility factor.
  • It is of the form
  • New Premium
  • Z ? Claims Experience (1 Z) ? Old Premium.

Example - Balanced Bühlmann
  • Consider the model
  • yit ? ?i ?it.
  • The credibility factor is
  • The traditional credibility estimator is

Example Hypothetical Claims for Three Towns
  • Town Claims Average Claim
  • 1 14, 12, 10, 12 1 12
  • 2 9, 16, 15, 12 2 13
  • 3 8, 10, 7, 7 3 8
  • Are there real differences among towns?
  • Mowbray - does Town 3 have enough data to support
    its own estimator of pure premiums?
  • Whitney - how can I use the information in Towns
    1 and 2 to help determine my rate for Town 3?

Response toWhitney
  • Known as the shrinkage effect
  • Comparison of Subject-Specific Means to
    Credibility Estimators.

Why study credibility theory?
  • Long history of applications a business
  • More recently, many theoretical advances with
    fewer innovative applications
  • Credibility techniques required in legal statutes
    and standards of practice
  • Standard of Practice 25 by the Actuarial
    Standards Board of the American Academy of
  • Wisconsin statutes on credibility insurance and
    disability income
  • Advanced techniques are critical for keeping up
    with competition (health insurance health
  • Innovative techniques enhance the credibility
    of the profession
