Chapter 4 Prediction and Bayesian Inference - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 4 Prediction and Bayesian Inference

Description:

Prediction and Bayesian Inference 4.1 Estimators versus predictors 4.2 Prediction for one-way ANOVA models Shrinkage estimation, types of predictions – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 43
Provided by: Technol311
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 Prediction and Bayesian Inference


1
Chapter 4Prediction and Bayesian Inference
  • 4.1 Estimators versus predictors
  • 4.2 Prediction for one-way ANOVA models
  • Shrinkage estimation, types of predictions
  • 4.3 Best linear unbiased predictors (BLUPs)
  • 4.4 Mixed model predictors
  • 4.5 Bayesian inference
  • 4.6 Case study Forecasting lottery sales
  • 4.7 Credibility Theory
  • Appendix 4A Linear unbiased predictors

2
4.1 Estimators versus predictors
  • In the longitudinal data model, yit zit ai
    xit b eit , the variables ai describe
    subject-specific effects.
  • Given the data yit, zit, xit, in some problems
    it is of interest to summarize subject effects.
  • We have discussed how to estimate fixed, unknown
    parameters .
  • It is also of interest to summarize
    subject-specific effects, such as those described
    by the random variable ai.
  • Predictors are estimators of random variables.
  • Like estimators, predictors are said to be linear
    if they are formed from a linear combination of
    the response y.

3
Applications of prediction
  • In animal and plant breeding, one wishes to
    predict the production of milk for cows based on
    (1) their lineage (random) and (2) herds (fixed)
  • In credibility theory, one wishes to predict
    expected claims for a policyholder given exposure
    to several risk factors
  • In sample surveys, one wishes to predict the size
    of a specific age-sex-race cohort within a small
    geographical area (known as small area
    estimation).
  • In a survey article, Robinson (1991) also cites
    (1) ore reserve estimation in geological surveys,
    (2) measuring quality of a production plan and
    (3) ranking baseball players abilities.

4
4.2. Prediction for one-way ANOVA models
  • Consider the traditional one-way random effects
    ANOVA (analysis of variance) model
  • yit ma ai eit
  • Suppose that we wish to summarize the
    subject-specific conditional mean, ma ai .
  • For contrast, first consider using the fixed
    effects model with ma 0.
  • Here, we have that is the best
    (Gauss-Markov) estimate of ai.
  • This estimate is unbiased, that is, E ai.
  • This estimate has minimum variance among all
    linear unbiased estimators (BLUE).

5
Shrinkage estimator
  • Using the one-way random effects model.
  • Consider an estimator of ma ai that is a
    linear combination of and , that is,
  • for constants c1 and c2.
  • Calculations show that the best values of c1 and
    c2 that minimize are c2
    1 c1 and
  • For large n, we have the shrinkage estimator, or
    predictor, of ma ai to be
    , where

6
Example of shrinkage estimator Hypothetical Run
Times for Three Machines
  • Machine Run Times Average Run Time
  • 1 14, 12, 10, 12 1 12
  • 2 9, 16, 15, 12 2 13
  • 3 8, 10, 7, 7 3 8
  • Notation yij means the jth run from the ith
    machine.
  • For example, y21 9 and y23 15.
  • Are there real differences among machines?

7
Example - Continued
  • To see the shrinkage effect, consider
  • Figure 4.1 Comparison of Subject-Specific Means
    to Shrinkage Estimators.

8
13
12
11
11.825
12.650
8.525
8
More on shrinkage estimators
  • Under the random effects model, is an
    unbiased predictor of maai in the sense that E
    - (ma ai) 0.
  • However, is inefficient in the sense that
    has a smaller mean square error than .
  • Here, has been shrunk towards the stable
    estimator
  • The estimator is said to borrow strength
    from the stable estimator
  • Recall
  • Note that zi1 as either (i) Ti or (ii) sa2/
    s?2 .

9
Best predictors
  • From Section 3.1, it is easy to check that the
    generalized least square estimator of ma is
  • The linear predictor of ma ai that has minimum
    variance is zi (1 - zi )
    ma,GLS .
  • Here, the acronym BLUP stands for best linear
    unbiased predictor.

10
Types of Predictors
  • We have now introduced the BLUP of ma ai . This
    quantity is a linear combination of global
    parameters and subject-specific effects.
  • Two other types of predictors are of interest.
  • Residuals. Here, we wish to predict eit . The
    BLUP residual turns out to be
  • Forecasts. Here, we wish to predict, for L lead
    time units into the future,
  • Without serial correlation, the predictor is the
    same as the predictor of ma ai . However, we
    will see that the mean square error turns out to
    be larger.

11
4.3 Best linear unbiased predictors
  • This section develops best linear unbiased
    predictors in the context of mixed linear models,
    then specializes the consideration to
    longitudinal data mixed models.
  • BLUPs are developed by examining the minimum mean
    square error predictor of a random variable, w.
  • We give a development due to Harville (1976).
  • The argument is originally due to Goldberger
    (1962), who coined the phrase best linear
    unbiased predictor.
  • The acronym was first used by Henderson (1973).
  • BLUPs can also be developed as conditional
    expectations using multivariate normality
  • BLUPs can also be developed in a Bayesian context.

12
Mixed linear models
  • Suppose that we observe an N ? 1 random vector y
    with mean E y X b and variance Var y V.
  • We wish to predict a random variable w, that has
    mean E w l? b and Var w sw2.
  • Denote the covariance between w and y as Cov(w,y)
    covwy.
  • Assuming known regression parameters (b), the
    best linear (in y) predictor of w is
  • w E w covwy? V-1(y - E y ) l? b covwy
    V-1(y - X b ).
  • If w,y are multivariate normal, then w equals E
    (w y ) and hence is a minimum mean square
    predictor of w.
  • The predictor w is also a minimum mean square
    predictor of w without the assumption of
    normality. See Appendix 4A.1.

13
BLUPs as predictors
  • To develop the BLUP,
  • define bGLS ( X? V -1 X )-1 X? V-1 y to be the
    generalized least squares (GLS) estimator of b.
  • This is the best linear unbiased estimator
    (BLUE).
  • Replace b by bGLS in the definition of w to get
    the BLUP
  • wBLUP l? bGLS covwy ? V-1(y - X bGLS )
  • (l? - covwy? V-1X) bGLS covwy? V-1 y.
  • See Appendix 4A.2 for a check, establishing wBLUP
    as the best linear unbiased predictor of w.
  • From Appendix 4A.3, we also have the form for the
    minimum mean square error
  • Var (wBLUP - w) (l? - covwy? V-1X) ( X? V -1 X
    )-1
  • (l? - covwy? V-1X)? -
    covwy? V-1 covwy sw2.

14
Example One-way model
  • Recall, yit ma ai eit
  • Thus, yi 1i (ma ai) ei . Thus,
  • Xi 1i and
  • With this, we note that Vi-1 (yi - Xi bGLS)
  • Thus, for predicting w ma ai we have l1 and
    Cov(w, yi) 1i sa2 for the ith subject, 0
    otherwise. Thus,

15
Random effect ANOVA model
  • For predicting residuals eit we have l0 and
    Cov(w, yi) se2 for the ith subject, tth time
    period, 0 otherwise.
  • Let 1it be a Ti ? 1 vector with a 1 in the tth
    position, 0 otherwise. Thus,
  • is our BLUP residual.

16
4.4 Mixed model predictors
  • Recall the longitudinal data mixed model
  • yi Zi ai Xi b ei
  • As described in Section 3.3, this is a special
    case of the mixed linear model. We use
  • V block diagonal (V1, ..., Vn) ,
  • where Vi Zi D Zi? Ri.
  • X (X1?, ... Xn?)?
  • For BLUP calculations, note that
  • covwy ( Cov(w, y1? ),, Cov(w, yn?) )?

17
Longitudinal data mixed model BLUP
  • Recall that the r.v. w has mean E w l? b and
    Var w sw2.
  • The BLUP is
  • The mean square error is Var (wBLUP - w)

18
BLUP special cases
  • Global parameters and subject-specific effects.
  • Suppose that the interest is in predicting linear
    combinations of global parameters b and
    subject-specific effect ai.
  • Consider linear combinations of the form
  • w c1 ai c2 b.
  • Residuals. Here, w eit .
  • Forecasts. Suppose that the ith subject is
    included in the data set predict
  • for L lead time units in the future.

19
Predicting global parameters and subject-specific
effects
  • Consider linear combinations of the form w c1
    ai c2 b.
  • Straightforward calculations show that
  • E w c2 b so that l c2,
  • Cov (w, yj ) c1 D Zi for j i
  • Cov (w , yj ) 0 for j ¹ i.
  • Thus, wBLUP c2 bGLS c1 D Zi Vi-1 (yi -
    Xi bGLS ).

20
Special case 1
  • Take c2 0 . Because the means and variance
    expressions are true for all vectors c2, we may
    write this in vector notation to get the BLUP of
    ai, the vector
  • ai,BLUP D Zi Vi-1 (yi - Xi bGLS ).
  • This is unbiased in the sense that E ai,BLUP - ai
    0.
  • This estimate has minimum variance among all
    linear unbiased predictors (BLUP).
  • In the case of the error components model (zit
    1), this reduces to
  • For comparison, recall the fixed effects
    parameter estimate,

21
Motivating BLUPs
  • We can also motivate BLUPs using normal theory
  • Consider the case where ai and e are multivariate
    normally distributed.
  • Then, it can be shown that E (ai yi) D Zi?
    Vi-1 (yi -Xi b).
  • To motivate this, consider asking the question
    what realization of ai could be associated with
    yi? The expectation!
  • The BLUP is the BLUE of E (ai yi). (That is,
    replace b by bGLS.)

22
Special case 2
  • As another example, it is of interest to predict
  • Choose and
  • This yields
  • This predictor is of interest in actuarial
    science, where it is known as the credibility
    estimator.

23
BLUP Residuals
  • Here, w eit . Because E w 0, it follows that
    l 0.
  • Straightforward calculations show that
  • Cov (w, yj ) se2 1it for j i and
  • Cov (w , yj ) 0 for j ¹ i.
  • Here, the symbol 1it denotes a Ti 1 vector
    that has a one in the tth position and is zero
    otherwise.
  • Thus
  • eit,BLUP se2 1it Vi-1 (yi - Xi bGLS ).
  • This can also be expressed as

24
Predicting future observations
  • Suppose that the ith subject is included in the
    data set predict
  • for L lead time units in the future.
  • We will assume that and
    are known.
  • It follows that
  • Straightforward calculations show that
  • Thus, the forecast of yi,TiL is
  • Thus, the forecast is the estimate of the
    conditional mean plus the serial correlation
    correction factor

25
Predicting future observations
  • To illustrate, consider the special case where we
    have autoregressive of order 1 (AR(1)), serially
    correlated errors.
  • Thus, we have
  • After some algebra, the L step forecast is

26
4.5 Bayesian Inference
  • With Bayesian statistical models, one views both
    the model parameters and the data as random
    variables.
  • We assume distributions for each type of random
    variable.
  • Given the parameters ß and a, the response model
    is
  • Specifically, we assume that the responses y
    conditional on a and ß are normally distributed
    and that
  • E (y a, ß ) Z a X ß and Var (y a, ß) R.
  • Assume that a is distributed normally with mean
    ?a and variance D and that ß is distributed
    normally with mean µß and variance ?ß, each
    independent of the other.

27
Distributions
  • The joint distribution of (a?, ß?)? is known as
    the prior distribution.
  • To summarize, the joint distribution of (a?, ß?,
    y?)? is
  • where V R Z D Z?.

28
Posterior Distribution
  • The distribution of parameters given the data is
    known as the posterior distribution.
  • The posterior distribution of (a?, ß?)? given y
    is normal.
  • The conditional moments are

29
Relation with BLUPs
  • In longitudinal data applications, one typically
    has more information about the global parameters
    ß than subject-specific parameters a.
  • Consider first the case ?ß 0, so that ß ?ß
    with probability one.
  • Intuitively, this means that ß is precisely
    known, generally from collateral information.
  • Assuming that ?a 0, it is easy to check that
    the best linear unbiased estimator (BLUE) of E (
    a y ) is
  • aBLUP D Z? V-1 ( y X bGLS)
  • Recall from equation (4.11) that aBLUP is also
    the best linear unbiased predictor in the
    frequentist (non-Bayesian) model framework.

30
Relation with BLUPs
  • Consider second the case where ?ß-1 0.
  • In this case, prior information about the
    parameter ß is vague this is known as using a
    diffuse prior.
  • Assuming ?a 0, one can show that
  • E ( a y ) aBLUP
  • It is interesting that in both extreme cases, we
    arrive at the statistic aBLUP as a predictor of
    a.
  • This analysis assumes D and R are matrices of
    fixed parameters.
  • It is also possible to assume distributions for
    these parameters typically, independent Wishart
    distributions are used for D-1 and R-1 as these
    are conjugate priors.
  • The general strategy of substituting point
    estimates for certain parameters in a posterior
    distribution is called empirical Bayes
    estimation.

31
Example One-way random effects ANOVA model
  • The posterior means turn out to be
  • where
  • Note that ?? measures the precision of knowledge
    about ?. Specifically, we see that ?? approaches
    one as ??2 ??, and approaches zero as ??2 ?0.

32
4.6 Wisconsin Lottery Sales
  • T40 weeks of sales from n 50 zip codes

33
Lottery Sales Data Analysis
  • Cross-sectional analysis shows that population
    size heavily influences sales, with Kenosha as an
    outlier
  • Multiple time series plots
  • show the effect of jackpots that is common to all
    postal codes
  • show the heterogeneity among postal codes
    (reaffirmed by a pooling test)
  • show the heteroscedasticity that is accommodated
    through a logarithmic transformation

34
Lottery Sales Model Selection
  • In-sample results show that
  • One-way error components dominates pooled
    cross-sectional models
  • An AR(1) error specification significantly
    improves the fit.
  • The best model is probably the two-way error
    component model, with an AR(1) error
    specification (not yet documented)
  • Out-of-sample analysis suggests that
  • logarithmic sales is the preferred choice of
    response it outperforms sales and percentage
    change.

35
4.7. What is Credibility?
  • Hickmans (1975) Analogy
  • In politics, leaders begin with a reservoir of
    credibility which decreases as executive
    experience is compiled.
  • Insurance behaves in a reverse fashion!
  • Here, credibility increases as experience
    increases.

36
Credibility Theory
  • Credibility is a technique for predicting future
    expected claims for a risk class, given past
    claims of that and related risk classes.
  • Importance
  • Credibility is widely used for pricing property
    and casualty, workers compensation and health
    care coverages.
  • According to Rodermund (1989), the concept of
    credibility has been the casualty actuaries most
    important and enduring contribution to casualty
    actuarial science.

37
History
  • Mowbray (1914 - PCAS)
  • Asked the question, how extensive is an exposure
    necessary to give a dependable pure premium?
  • This approach is now known as the limited
    fluctuation or American credibility
  • Question 1 do we have enough exposure to give
    full weight to the risk class under
    consideration?
  • Question 2 if not, how can we combine
    information from this and related risk classes?

38
More History
  • Whitney (1918 - PCAS)
  • introduced the idea of using a weighted average
    of average claims of (1) a given risk class and
    (2) all risk classes.
  • The weight is known as the credibility factor.
  • It is of the form
  • New Premium
  • Z ? Claims Experience (1 Z) ? Old Premium.

39
Example - Balanced Bühlmann
  • Consider the model
  • yit ? ?i ?it.
  • The credibility factor is
  • The traditional credibility estimator is

40
Example Hypothetical Claims for Three Towns
  • Town Claims Average Claim
  • 1 14, 12, 10, 12 1 12
  • 2 9, 16, 15, 12 2 13
  • 3 8, 10, 7, 7 3 8
  • Are there real differences among towns?
  • Mowbray - does Town 3 have enough data to support
    its own estimator of pure premiums?
  • Whitney - how can I use the information in Towns
    1 and 2 to help determine my rate for Town 3?

41
Response toWhitney
  • Known as the shrinkage effect
  • Comparison of Subject-Specific Means to
    Credibility Estimators.

8
13
12
11
11.825
12.650
8.525
42
Why study credibility theory?
  • Long history of applications a business
    necessity
  • More recently, many theoretical advances with
    fewer innovative applications
  • Credibility techniques required in legal statutes
    and standards of practice
  • Standard of Practice 25 by the Actuarial
    Standards Board of the American Academy of
    Actuaries
  • Wisconsin statutes on credibility insurance and
    disability income
  • Advanced techniques are critical for keeping up
    with competition (health insurance health
    economists)
  • Innovative techniques enhance the credibility
    of the profession
Write a Comment
User Comments (0)
About PowerShow.com