Practical Sheet 6 Solutions

About This Presentation

Title:

Practical Sheet 6 Solutions

Description:

Use of Weighted Least Squares. In fitting models of the form. yi = f ... As already observed, least squares fitting is very sensitive to outlying observations. ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 41

Provided by: jphil

Category:

more less

Transcript and Presenter's Notes

Title: Practical Sheet 6 Solutions

1
Practical Sheet 6 Solutions
The R data frame whiteside which deals with
gas consumption is made available in R by gt
data(whiteside, packageMASS) It records the
weekly gas consumption and average external
temperature at a house in south-east England
during two heating seasons, one before and one
after cavity-wall insulation was installed. The
variables are Variable Description Gas
weekly gas consumption Temp average external
temperature during week Insul (binary factor)
Before (insulation) or After
2
(No Transcript)
3
We check for b and ? significantly different
from 0. (It is clear from the print-out about b
so this manual calculation is not normally
required).
4

The value of b is -0.3932 Now carry out a
hypothesis test. H0 b 0 H1 b ? 0 The standard
error of b is This is calculated in R as 0.01959
5
The test statistic is This calculates as
(-0.3932 0)/0.01959 -20.071
6
Ds.. .
-2.064................ 2.064
t tables using 24 degrees of freedom (there are
26 points) give cut of point of 2.064 for 2.5.
7
Since -20.071 is less than -2.064, we accept H1.
There is evidence at the 5 level of a
significant positive relationship. In fact, the
t values associated with significance levels of
1. 0.1 are 2.492 and 3.497 and so b is also
significant at the 0.1 level (very highly
significant). This corresponds to the three
stars on the R output.
8
We now check the significance of r. The computer
output gives R2 0.9438. r is the square root
of this, i.e. 0.9714. It is fairly clear that
this will be significantly different from 0 but
test anyway.
9
We know that
In this case the test statistic calculates as
84.686. Let the true correlation coefficient be
?.
10
H0 ? 0 H1 ? ? 0
As seen previously, the cut off points for the t
distribution with 24 degrees of freedom for 2.5
top and bottom are /-2.064.
11
The t value of implies H1 is accepted. There is
evidence of a non zero correlation between Gas
and Temp.
12
Fishers Transformation
13
Use of Weighted Least Squares
14
In fitting models of the form
yi f(xi) ?i i 1n, least squares
is optimal under the condition
?1.?n are i.i.d. N(0, ?2) and is a reasonable
fitting method when this condition is at least
approximately satisfied. (Most importantly we
require here that there should be no significant
outliers).
15
In the case where we have instead
?1.?n are independent N(0, ?i2), it is
natural to use instead weighted least squares
choose f from within the permitted class of
functions f to minimise
?wi(yi-f(xi))2 Where we take wi proportional to
1/?i2 (clearly only relative weights matter)

16
Example Scottish hill races data. These data
are made available in R as data(hills,
packageMASS) They give record times (minutes)
in 1984 of 35 Scottish hill races, against
distance (miles) and total height climbed (feet).
We regard time as the response variable, and seek
to model how its conditional distribution depends
on the explanatory variables distance and climb.
17
The R code pairs(hills) produces the plots shown.
18
The fitted model is time5.62xdistance0.0323x(di
stance)2 0.000262xclimb0.00000180x(climb)2e
19
For the hill races data, it is natural to assume
greater variability in the times for the longer
races, with the variability perhaps proportional
to the distance. We therefore try refitting the
quadratic model with weights proportional to
1/distance2 gt model2w lm(time -1 dist
I(dist2) climb I(climb2),data
hills-18,, weights1/dist2)
20
(No Transcript)
21
The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?

22
The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?
Note that the residual summary above is on a
reweighted scale, and cannot be directly
compared with the earlier residual summaries.
23
The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?
Note that the residual summary above is on a
reweighted scale, and cannot be directly
compared with the earlier residual
summaries. While the coefficients here appear to
have changed somewhat from those in the earlier,
unweighted, fit of Model 2, the fitted model is
not really very different.
24
This is confirmed by the plot of the residuals
from the weighted fit against those from the
unweighted fit, produced by gtplot(resid(model2w)
resid(model2))
25
(No Transcript)
26
Resistant Regression
27
As already observed, least squares fitting is
very sensitive to outlying observations. However,
there are also a large number of resistant
fitting techniques available. One such is least
trimmed squares choose f from within the
permitted class of functions f to minimise-

28
(No Transcript)
29
Example phones data. The R dataset phones in
the package MASS gives the annual number of phone
calls (millions) in Belgium over the period
1950-73. Consider the model calls a
byear The following two graphs plot the data
and shows the result of fitting the model by
least squares and then fitting the same model by
least trimmed squares.
30
(No Transcript)
31
These graphs are achieved by the following
code gt plot(callsyear) gt phoneslslm(callsyea
r) gt abline(phonesls) gt plot(callsyear) gt
library(lqs) gt phonesltslqs(callsyear) gt
abline(phoneslts)
32
The explanation for the data is that for a period
of time total length of all phone calls in each
year was accidentally recorded instead.
33
Nonparametric Regression
34
Sometimes we simply wish to fit a smooth model
without specifying any particular functional form
for f. Again there are very many techniques here.
One such is called loess. This constructs the
fitted value f(xi) for each observation i by
performing a local regression using only those
observations with x values in the neighbourhood
of xi (and attaching most weight to the closest
observations).

35
Example cars data. The R data frame cars (in
the base package) records 50 observations of
speed (mph) and stopping distance (ft). These
observations were collected in the 1920s! We
treat stopping distance as the response variable
and seek to model its dependence on speed.
36
(No Transcript)
37
We try to fit a model using loess. Possible R
code is gt data(cars) gt attach(cars) gt
plot(cars) gt library(modreg) gt carsloloess(dists
peed) gt lines(fitted(carslo)speed)
38
(No Transcript)
39
An optional argument span can be increased from
its default value of 075 to give more
smoothing gt plot(cars) gt carslo2loess(distspee
d, span1) gt lines(fitted(carslo2)speed)
40
(No Transcript)

Write a Comment

User Comments (0)