Title: Practical Sheet 6 Solutions
1 Practical Sheet 6 Solutions
The R data frame whiteside which deals with
gas consumption is made available in R by gt
data(whiteside, packageMASS) It records the
weekly gas consumption and average external
temperature at a house in south-east England
during two heating seasons, one before and one
after cavity-wall insulation was installed. The
variables are Variable Description Gas
weekly gas consumption Temp average external
temperature during week Insul (binary factor)
Before (insulation) or After
2(No Transcript)
3 We check for b and ? significantly different
from 0. (It is clear from the print-out about b
so this manual calculation is not normally
required).
4The value of b is -0.3932 Now carry out a
hypothesis test. H0 b 0 H1 b ? 0 The standard
error of b is This is calculated in R as 0.01959
5The test statistic is This calculates as
(-0.3932 0)/0.01959 -20.071
6Ds.. .
-2.064................ 2.064
t tables using 24 degrees of freedom (there are
26 points) give cut of point of 2.064 for 2.5.
7Since -20.071 is less than -2.064, we accept H1.
There is evidence at the 5 level of a
significant positive relationship. In fact, the
t values associated with significance levels of
1. 0.1 are 2.492 and 3.497 and so b is also
significant at the 0.1 level (very highly
significant). This corresponds to the three
stars on the R output.
8We now check the significance of r. The computer
output gives R2 0.9438. r is the square root
of this, i.e. 0.9714. It is fairly clear that
this will be significantly different from 0 but
test anyway.
9We know that
In this case the test statistic calculates as
84.686. Let the true correlation coefficient be
?.
10H0 ? 0 H1 ? ? 0
As seen previously, the cut off points for the t
distribution with 24 degrees of freedom for 2.5
top and bottom are /-2.064.
11 The t value of implies H1 is accepted. There is
evidence of a non zero correlation between Gas
and Temp.
12 Fishers Transformation
13Use of Weighted Least Squares
14 In fitting models of the form
yi f(xi) ?i i 1n, least squares
is optimal under the condition
?1.?n are i.i.d. N(0, ?2) and is a reasonable
fitting method when this condition is at least
approximately satisfied. (Most importantly we
require here that there should be no significant
outliers).
15In the case where we have instead
?1.?n are independent N(0, ?i2), it is
natural to use instead weighted least squares
choose f from within the permitted class of
functions f to minimise
?wi(yi-f(xi))2 Where we take wi proportional to
1/?i2 (clearly only relative weights matter)
16Example Scottish hill races data. These data
are made available in R as data(hills,
packageMASS) They give record times (minutes)
in 1984 of 35 Scottish hill races, against
distance (miles) and total height climbed (feet).
We regard time as the response variable, and seek
to model how its conditional distribution depends
on the explanatory variables distance and climb.
17The R code pairs(hills) produces the plots shown.
18The fitted model is time5.62xdistance0.0323x(di
stance)2 0.000262xclimb0.00000180x(climb)2e
19For the hill races data, it is natural to assume
greater variability in the times for the longer
races, with the variability perhaps proportional
to the distance. We therefore try refitting the
quadratic model with weights proportional to
1/distance2 gt model2w lm(time -1 dist
I(dist2) climb I(climb2),data
hills-18,, weights1/dist2)
20(No Transcript)
21The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?
22The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?
Note that the residual summary above is on a
reweighted scale, and cannot be directly
compared with the earlier residual summaries.
23The fitted model is now time4.94distance0.0548
(distance)20.00349climb 0.00000134(climb)2?
Note that the residual summary above is on a
reweighted scale, and cannot be directly
compared with the earlier residual
summaries. While the coefficients here appear to
have changed somewhat from those in the earlier,
unweighted, fit of Model 2, the fitted model is
not really very different.
24This is confirmed by the plot of the residuals
from the weighted fit against those from the
unweighted fit, produced by gtplot(resid(model2w)
resid(model2))
25(No Transcript)
26Resistant Regression
27As already observed, least squares fitting is
very sensitive to outlying observations. However,
there are also a large number of resistant
fitting techniques available. One such is least
trimmed squares choose f from within the
permitted class of functions f to minimise-
28(No Transcript)
29Example phones data. The R dataset phones in
the package MASS gives the annual number of phone
calls (millions) in Belgium over the period
1950-73. Consider the model calls a
byear The following two graphs plot the data
and shows the result of fitting the model by
least squares and then fitting the same model by
least trimmed squares.
30(No Transcript)
31These graphs are achieved by the following
code gt plot(callsyear) gt phoneslslm(callsyea
r) gt abline(phonesls) gt plot(callsyear) gt
library(lqs) gt phonesltslqs(callsyear) gt
abline(phoneslts)
32The explanation for the data is that for a period
of time total length of all phone calls in each
year was accidentally recorded instead.
33Nonparametric Regression
34Sometimes we simply wish to fit a smooth model
without specifying any particular functional form
for f. Again there are very many techniques here.
One such is called loess. This constructs the
fitted value f(xi) for each observation i by
performing a local regression using only those
observations with x values in the neighbourhood
of xi (and attaching most weight to the closest
observations).
35Example cars data. The R data frame cars (in
the base package) records 50 observations of
speed (mph) and stopping distance (ft). These
observations were collected in the 1920s! We
treat stopping distance as the response variable
and seek to model its dependence on speed.
36(No Transcript)
37We try to fit a model using loess. Possible R
code is gt data(cars) gt attach(cars) gt
plot(cars) gt library(modreg) gt carsloloess(dists
peed) gt lines(fitted(carslo)speed)
38(No Transcript)
39An optional argument span can be increased from
its default value of 075 to give more
smoothing gt plot(cars) gt carslo2loess(distspee
d, span1) gt lines(fitted(carslo2)speed)
40(No Transcript)