Chapter 6 Transformed Linear Regression - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Chapter 6 Transformed Linear Regression

Description:

Chapter 6 Transformed Linear Regression – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 28
Provided by: FRE4157
Category:
Tags: alg | chapter | exe | jet | li | lil | linear | ola | ray | regression | transformed

less

Transcript and Presenter's Notes

Title: Chapter 6 Transformed Linear Regression


1
Chapter 6Transformed Linear Regression
  • www.mysmu.edu/faculty/zlyang

2
Contents
1. Introduction
2. Transformed Linear Regression
3. Estimate Transformation Parameter
4. Applications
3
6.1 Introduction
Recall Required conditions for the linear
regression model
  • The error ei is normally distributed.
  • The standard deviation ? of ei is constant for
    all values of Yi.
  • The errors are independent.

4
6.1 Introduction
  • When one or more of these conditions are not met,
    one may consider to transform the response Y.
  • The aims of transformation (Box and Cox 1964)
  • to induce near normality
  • to achieve constancy of error variances
  • to obtain a simpler model (i.e., remove
  • interactions, etc.)

5
6.1 Introduction
  • A brief list of transformations
  • Y log Y (for Y gt 0)
  • Use when the ? increases with Y, or
  • Use when the error distribution is positively
    skewed
  • Y Y2
  • Use when the ?2 is proportional to E(Y), or
  • Use when the error distribution is negatively
    skewed
  • Y Y1/2 (for Y gt 0)
  • Use when the ?2 is proportional to E(Y)
  • Y 1/Y
  • Use when ?2 increases significantly when Y
    increases beyond some critical value.

6
6.1 Introduction
  • All of the above transformations are special
    cases of the Box-Cox Power Transformation

Note if ?2, then h(Y,?) ? Y2 if ?1/2,
then h(Y,?) ? Y1/2 if ? -1, then h(Y,?) ?
1/Y, etc.
7
6.2 Transformed Linear Regression
Assume there exists a ? such that the transformed
responses satisfy the usual linear regression
model
  • h(Yi, ?) b0b1Xi1b2Xi2 bpXip ei
  • i 1, n, such that the errors ?i
  • are normally distributed,
  • have constant variance, and
  • are independent.
  • Such a model is called transformed linear
    regression model.

8
How to find ??
  • Based on maximum likelihood estimation (MLE)
    method
  • Since ?i N(0, ?2), the probability density
    function (pdf) of Zi h(Yi, ?) is
  • where ?i b0 b1Xi1 b2Xi2 bpXip
  • An application of change of variable technique
    gives the pdf of Yi as

9
How to find ??
This leads to the log-likelihood function
For a given ?, the function is maximized at
I n?n identity matrix P the hat matrix
10
How to find ??
Substituting back to the log-likelihood function
gives the partially maximized log-likelihood
function
Finally maximizing this function gives the
optimal value of ?, which can be used to
transform the Yis. The one can perform the
regular regression analysis on the transformed
Yis.
Note maximization of can only be
done numerically using computer software, e.g., R.
11
How to find ??
  • Steps for finding the value of ? that maximizes
  • Choose a grid of values ?1, ?2, . . . , ?K from
    -2, 2
  • Compute for k 1, 2, . . . , K.
  • Find that is the maximum among all
    K values,
  • ?k is the optimal value.

This method is called the grid search method.
12
6.3 Applications
  • Example 6.1. (The Salary Survey Data)
  • Developed from a salary survey of computer
    professionals
  • To identity and quantify variables determining
    the salary differentials
  • to check if the corporations salary
    administration guidelines were being followed.
    Variables are
  • S Salary (the response variable)
  • X Years of experience
  • E Education (1high school 2bachelor
    3advanced degree)
  • M whether the person bears with management
    responsibility

13
Example 6.1, contd Regular Linear Regression
R Code for Linear Regression salary lt-
read.table("P122.txt", headerTRUE) y lt-
salary ,1 Salary in x1 lt- salary ,2
Years of experience x2 lt- salary ,31 Dummy
for high school diploma x3 lt- salary ,32
Dummy for bachelor degree x4 lt- salary ,41
Dummy for management responsibility Fit lt-
lm(yx1x2x3x4) ( summary(Fit) ) (
anova(Fit) )
14
Example 6.1, contd Regular Linear Regression
According to R2 value, the model fits the data
very well. But
15
Example 6.1, contd Regular Linear Regression
R Code for Residual Analysis n lt- length(y) p
lt- 4 x0 lt- matrix(1,n,1) X lt- cbind(x0,x1,x2,x3,x4
) SSE lt- deviance(Fit) Sum of squares due to
errors sigh lt- sqrt(SSE/(n-p-1)) Estimate of
error standard deviation y_hat lt- predict(Fit)
Output the predicted values e_hat lt-
residuals(Fit) Output the OLS residuals
Compute the hat matrix P lt- X solve(t(X)
X) t(X) h lt- diag(P) take out the
diagonal elements of P to form a vector
16
Example 6.1, contd Regular Linear Regression
R Code for Residual Analysis The
studentized residuals r_hat lt-
e_hat/(sighsqrt(1-h))
Externally studentized residuals r_star lt-
r_hatsqrt((n-p-2)/(n-p-1-r_hat2)) plot(x1,y)
Scatter plot of y versus x1 plot(e_hat)
Index plot of OLS residuals plot(r_hat) Index
plot of studentized residuals plot(r_hat, x1)
Plot of studentized residuals against
x1 hist(r_hat) Histograms of studentized
residuals
17
Example 6.1, contd Regular Linear Regression
R Code for Box-Cox Linear Regression Searching
the lambda value that maximizes the likelihood
function I lt- diag(1,n,n) to create an identity
matrix lam lt- seq(-2,2, by0.0001) ns
length(lam) llik lt- matrix(0,1,ns) for (i in
1ns) if (lami 0) ly lt- log(y) else ly
lt- (ylami-1)/lami lliki lt-
-nlog(t(ly)(I-P)ly)/2 (lami-1)
sum(log(y)) ind lt- order(llik,lam,
decreasing TRUE) lmax lt- rbind(llik,lam)
,ind (c("lambda hat ", lmax2,1))
18
Example 6.1, contd Regular Linear Regression
R Code for Box-Cox Linear Regression Perform
regression analysis based on Box-Cox
transformation lamh lt- lmax2,1 if (lamh0) ly
lt- log(y) else ly lt- (ylamh - 1)/lamh fitbc lt-
lm(lyx1x2x3x4) ( summary(fitbc) ) (
anova(fitbc) ) e_hat lt- residuals(fitbc) r_hat
lt- e_hat/(sighsqrt(1-h)) r_star lt-
r_hatsqrt((n-p-2)/(n-p-1-r_hat2)) plot(r_hat,
x1) Plot of studentized residuals against
x1 hist(r_hat) Histograms of studentized
residuals
19
Example 6.1, contd Regular Linear Regression
Model fit improved. Problem of
Heteroscedasticity alleviated.
20
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Y Per capita expenditure on public
education X1 Per capita personal income X2
Number of residents per thousand under 18 years
of age X3 Number of people per thousand
residing in urban areas G Geographical region
(1 Northeast, 2 North Central, 3 South,
4 West)
21
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
  • Questions
  • How is the education expenditure related to the
    other variables?
  • Does the education expenditure differ among the
    regions?
  • Is the expenditure relationship stable with
    respect to time?
  • Is heteroscedasticity an issue?
  • We use Box-Cox linear regression technique to
    address these issues. (Download R-script for
    this analysis)

22
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response Y Year 1960
23
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response Y Year 1970
24
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response Y Year 1975
25
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response h(Y, ?-hat) ?-hat -0.1352 Year 1960
26
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response h(Y, ?-hat) ?-hat 0.1939 Year 1970
27
Example 6.2 Box-Cox Linear Regression for
Education Expenditure Data (1960, 1970, 1975)
Response h(Y, ?-hat) ?-hat -1.3395 Year 1975
Write a Comment
User Comments (0)
About PowerShow.com