Cost is the Dependent Variable - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Cost is the Dependent Variable

Description:

OLS may yield biased and/or less precise estimates of means and marginal effects ... Mullahy, J. 'Much Ado about Two: Reconsidering Tetransformation and the Two-part ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 37
Provided by: temp362
Category:
Tags: ado | cost | dependent | variable

less

Transcript and Presenter's Notes

Title: Cost is the Dependent Variable


1
Cost is the Dependent Variable
  • Issues in Econometric Modeling
  • Wei Yu

2
Focus of the Analysis
  • Costs in a fixed period (e.g., annual
    person-level cost for a patients with substance
    use disorders)
  • Costs without a large number of zeros (e.g., VA
    users)

3
Patterns of Cost Data
  • Skewed distribution
  • Nonlinearity in response to covariates
  • Cost response varies by type of care (e.g.,
    outpatient to inpatient)

4
Potential Problems in OLS
  • OLS may yield biased and/or less precise
    estimates of means and marginal effects

5
Alternative Estimation Methods
  • Log transformation
  • Generalized Linear Models (GLM)

6
Log Transformation
  • log(y) Xß e
  • where E(e) 0, E(X?e) 0

7
Log Transformation (contd)
  • Advantages
  • Improved precision
  • Disadvantages
  • Results in log scale are not interesting
  • Retranformation problems
  • May not achieve linearity

8
Retransformation
  • Log (y) Xb e
  • Where E(e) 0 and E(Xe) 0
  • E(y/x) exp (Xb e) exp(Xb) E(exp(e))
  • If e is normally distributed N(0,s2),
  • E(y/x) exp(Xß) exp( 0.5 s2)

9
Retransformation (contd)
  • If e is not normally distributed, but i.i.d.
  • or exp(e) has constant mean and variance,
  • E(y/x) exp(Xß ) exp(s),
  • For the Smear retransformation
  • s 1/n(Siêi)

10
Retransformation (contd)
  • If e is heteroscedastic in x
  • E(y/x) f(x) x exp(Xb)
  • E (y/x) ? cons x exp(Xb)
  • ?E(y)/?x ? b x exp(Xb)

11
Solution
  • Using an appropriate Generalized Linear Model
    (GLM)

12
GLM
  • Model specification
  • A link function
  • A mean-variance relationship

13
GLM Picking a Link Function
  • Box-Cox test
  • Find MLE value of ? where
  • y(?) (y? 1)/ ? when ? ? 0
  • y(?) ln(y) when ? 0
  • Stata boxcox

14
GLM Picking a Link Function (contd)
  • Examples
  • If ?-1 inverse, (1 /y) Xb e
  • If ? 0 ln(y), ln(y) Xb e
  • If ? .5 square root, y1/2 Xb e
  • If ? 1 linear, y Xb e
  • If ? 2 square, y2 Xb e

15
GLM Test for Linearity
  • Pregibons link test
  • yd0d1(xbhat)d2(xbhat)2 e
  • Test d2hat 0
  • Stata linktest

d2
16
GLM Test for Linearity (contd)
  • Ramseys RESET test
  • yd0d1(xbhat)d2(xbhat)2 d3(xbhat)3
    d4(xbhat)4 e
  • Test d2hat d3hat d4hat 0
  • Stata ovtest

d2
17
GLM Test for Linearity (contd)
  • Modified Hosmer-Lemeshow Test
  • Estimate model (e.g., ln(y)xb e )
  • Retransform to get y on raw scale
  • Compute êy- y on raw scale
  • Create 10 groups, sorted by xbhat
  • F-test of whether mean residuals different from
    zero

18
GLM Test for Linearity (contd)
  • All of the above tests are diagnostic, not
    constructive.
  • If reject null, looking for problems either
  • Left side (wrong power function) or
  • Right side (wrong functional form of x)

19
GLM Determine a Mean-Variance Relationship
  • GLM family test (Park test)
  • 1. Regress y (raw scale) on x
  • 2. Save raw scale residuals ê and y
  • 3. Regress ln(ê2) on ln(y) and a constant
  • Alternative to step 1
  • GLM of y on x with gamma family and log link.

20
GLM Family Test (contd)
  • Coefficient on ln(y) gives the family
  • If ?hat 0 Gaussian (variance unrelated to mean)
  • If ?hat 1 Poisson (variance equals mean)
  • If ?hat 2 Gamma (variance exceeds mean)
  • If ?hat 3 Wald or inverse Gaussian

21
GLM Test for Over-Fitting
  • Copas test
  • Randomly split sample into two groups (half-half,
    2/3-1/3, etc)
  • Estimate model on group 1
  • Forecast to group 2
  • y2 X2b1hat
  • Regress y2 against y2
  • y2 a0 a1 y2 e
  • Test a1 1
  • Repeat 1000 times to get a distribution.

22
GLM Test for Over-fitting (contd)
  • If reject null hypothesis, over-fitting may be a
    problem
  • Examine
  • The model
  • Outliers
  • Use it to compare models

23
GLM Example
  • Sample 300,000 randomly selected VA patients
  • Dependent variable annual person-level cost
  • Independent variable age, race, common chronic
    conditions

24
GLM Example
  • Graphs of the cost distribution

25
Box-Cox Test
  • Variable Total cost
  • ? 0.04
  • Link function ln(y)

26
Link and RESET Tests
  • Model ln(y) a bX e
  • P-values
  • Link lt0.001 (b2hat -0.14)
  • RESET lt0.001
  • Both tests showing problems

27
Hosmer-Lemeshow Test
  • F-test 497.9
  • p-value lt 0.001
  • Problem in upper groups
  • Showing graph here

28
GLM Family Test
  • ?hat 1.96 (p lt 0.001)
  • Family Gamma

29
Copas Test for Ln(y)
Variable Obs Mean Std. Dev. Min Max cop
as1 1000 .06572 .0069411 .0500775 .095947 95
confidence interval for test of
slope1 .05407852 .08041483 Conclude The model
failed the Copas over-fitting test
30
Methods Tried to Fix the Model
  • Fixed outliers (both 1 and 10)
  • Take a double log transformation on total cost
  • Both methods did not improve the model fitting
  • Consider functional forms for right-side
    variables

31
Other Things We May Try
  • Consider functional forms for the right-side
    variables

32
The Final Model
  • GLM with Gamma family and log link function

33
Discussion
  • With a large number of observations,
  • tests are more likely to reject any hypothesis
    that a coefficient is zero
  • OLS may provide reasonably accurate estimates
  • When right-side variables are all indicators,
    linearity may not be a major problem

34
Questions ?
35
References
Ai, C. and E.C. Norton. Standard Errors for the
Retransformation Problem with Heteroscedasticity,
Journal of Health Economics 19(5)697-718,
2000. Blough, D.K., C.W. Madden, and M.C.
Hornbrook. Modeling risk using generalized
linear models, Journal of Health Economics 18
153-171, 1999. Duan, N. Smearing Estimate a
Nonparametric Retransformation Method, Journal
of the American Statistical Association 78
605-610, 1983. Duan, N., W.G. Manning, et al. A
Comparison of Alternative Models for the Demand
for Medical Care, Journal of Business and
Economics Statistics 1115-126, 1983. McCullagh,
P. and J.A. Nelder. 1989. Generalized linear
models, 2nd Edition. London Chapman and
Hall. Manning, W.G. The Logged Dependent
Variable, Heteroscedasticity, and the
Retransformation Problem, Journal of Health
Economics 17 283-295, 1998.
36
References (contd)
Manning, W.G., A. Basu, and J. Mullahy.
Generalized Modeling Approaches to
Risk Adjustment of Skewed Outcomes Data, Journal
of Health Economics 24 465-488, 2005. Manning,
W.G., and J. Mullahy. Estimating Log Models To
transform or not to Transform? Journal of Health
Economics 20(4) 461-494, 2001. Mullahy, J.
Much Ado about Two Reconsidering
Tetransformation and the Two-part Model in Health
Econometrics, Journal of Health Economics 17
247-281, 1998. Park, R. Estimation with
Heteroscedastic Error Terms, Econometrica 34
888, 1966. Pregibon, D. Goodness of Link Tests
for Generalized Linear Models, Applied
Statistics 29 15-24, 1980. Pregibon, D.
Goodness of Link Tests for Generalized Linear
Models, Applied Statistics 29 15-24, 1980.
Write a Comment
User Comments (0)
About PowerShow.com