Regression III: Robust regressions - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Regression III: Robust regressions

Description:

Outliers are observations that deviates from all others significantly. ... their removal or weighting down may change the conclusions considerably. ... – PowerPoint PPT presentation

Number of Views:203

Avg rating:3.0/5.0

Slides: 19

Provided by: gar115

Category:

more less

Transcript and Presenter's Notes

Title: Regression III: Robust regressions

1
Regression III Robust regressions

Outliers
Tests for outlier detections
Robust regressions
Breakdown point
Least trimmed squares
M-estimators

2
Outliers

Outliers are observations that deviates from all
others significantly. They may occur by accident
or they may be results of measurement errors.
Presence of outliers may lead to misleading
results. In the example shown on the figure
without outlier mean value is -0.41 and with
outlier it is -0.01. If we want to test the
hypothesis H0mean0 then without outlier we
conclude that H0 can be rejected (p-value is
0.009), however with outlier we cannot reject
null-hypothesis (p-value is 0.98).
Analysis and dealing with outliers is an
important ingredient of modern statistical
analysis. Sometimes careful analysis of outliers,
their removal or weighting down may change the
conclusions considerably.

With outliers
Outlier
Without outlier
3
Dealing with outliers

For simple cases such as mean, variance
calculations one way of dealing with outliers is
using trimmed data for statistic calculation. For
example in the case above mean without outlier is
-.41, with outlier -0.09 and trimmed mean with
10 removed is -0.33. Trimmed mean gives better
than mean based on all data.
Often mean and other statistics are used for
testing some hypothesis. In these cases using
non-parameteric (wilcox.test) tests may be better
alternatives to t.test (wilcox.test gives p-value
0.09 and t.test gives p-value 0.98).
For simple cases like mean, covariance
calculations and test based on them usual
approach is to use rank of observations instead
of their values. Obviously when rank is used the
power of tests will be reduced, however better
conclusions could be more reliable.
Before carrying out analysis and tests it is
always good idea to visualise data and explore to
see if there are outliers.

4
Outlier detection Grubbs test

It may be a good idea to test for outliers and
remove them if possible before starting to
analyse the data and doing hypothesis testing.
One of the techniques for doing this is Grubbs
test. It tests
H0 there is an outlier versus H1 there is no
outlier
To do this Grubbs suggested using the statistic
G (max(y)-mean(y))/sd(y)
to test if the maximum value is outlier
G (mean(y)-mean(y))/sd(y)
to test if the minimum value is outlier and
G max(yi-mean(y))/sd(y)
to test if maximum or minimum is outlier. There
are versions for two outliers also.
Obviously test statistics will depend on the
number of observations.
Grubbs test (grubbs.test) is available from the
package outliers. It is not part of the standard
R distributions.

5
Outlier detection Grubbs test

Applying grubbs.test for the example we get
Grubbs test for one outlier
data del1
G 2.9063, U 0.0709, p-value 9.858e-06
alternative hypothesis highest value 4 is an
outlier
Since p-value is very small we can reject null
hypothesis that there is no outlier.
Once we are sure that there is an outlier then we
remove it and carry out tests and/or analysis
further.

6
Outlier detection Simulated distribution

If outliers package is not available then we can
generate distribution for the statistics given
above. Let us write for one of them (H0 maximum
value is not outlier)
outdist function(nsample,n)
ff vector(lengthn)
for (i in 1n)rrrnorm(nsample)ffimax(rr-me
an(rr))/sd(rr)
ff
Now we can generate distribution of the
statistics for samples of different sizes. For
example the distribution for sample of sizes
10,15 and 20 are shown on the figure. To generate
the figure the following set of commands are used
oo10 outdist(10,10000)
curve(ecdf(oo10)(x),from0,to15,lwd3)
oo15 outdist(15,10000)
curve(ecdf(oo15)(x),from0,to15,lwd3)
oo20 outdist(20,10000)
curve(ecdf(oo20)(x),from0,to15,lwd3)
Obviously as the sample size increases
probability that
large values will be genuinely observed will
increase. That is why as sample size increases
the distribution shifts to the right.

7
Outlier detection Simulated distribution

Once we have the desired distributions we can
calculate For example for the case we considered
we sample size is 11. Let us generate empirical
cumulative probability distribution and use it
for outlier detection
oo11outdist(11,10000)
ec11 ecdf(oo11)
st max(del1-mean(del1))/sd(del1)
1-ec11(st)
This sequence of commands will produce p-values.
If the maximum value is 4 then p-value is 0, if
the maximum value is 2 then p-value is 0.001 and
when maximum value is 1 then p-value is 0.04. We
can reject null-hypothesis for first and the
second case, however for the third case we should
be careful.

8
Breakdown point

Breakdown point of an estimator is a fraction of
of sample if changed arbitrarily that does not
affect the estimation significantly. For example
for mean value if we changing by value
arbitrarily we can change mean value as much as
we want. Let us take an example
-0.8 -0.6 -0.3 0.1 -1.1 0.2 -0.3 -0.5 -0.5 -0.3
4.0
Sample size is 11, mean value is -0.09. If we
change the last value to 100 then the mean value
becomes 8.72. Breakdown point of mean value is 0.
Another limiting case is median. Median of the
above sample is -0.3. If we change one value and
make it extremely large then median will not
change much. For example if we change the last
value to -100 then median will become -0.5.
Breakdown point for median is 0.5, i.e. more than
50 of the sample should be changed arbitrarily
to change the median arbitrarily. Breakdown point
0.5 is the theoretical limit.
Efficiency of estimators with high breakdown
point is usually worse than those with lower
breakdown point. In other words variances of
estimators with high breakdown point are larger.

9
Outliers and regression
Regression no outliers

Let us remind us the form of the least-squares
equations for regressions. Again x is a vector of
input (predictor) parameters, ß is a vector of
parameters, y is output, the number of sample
points is n.
As we know in special case when g(ß,x) ß, and ß
is a single value then least-squares estimation
gives mean value of y. We can consider above
estimation as an extension of mean value
estimation. Breakdown point of this estimation is
0, so least-squares is very sensitive to
outliers.
There are several approaches to deal with
outliers in regression analysis. We will consider
only two of them 1) least-trimmed squares 2)
M-estimators

Regression with an outlier
Outlier
10
Least trimmed squares

Least trimmed squares works iteratively.
Set up initial values for the model parameters
(for example using simple least squares method
implemented in lm)
Calculate squared residuals ri2(yi-g(xi,ß))2
Sort squared residuals
Remove fraction of observations for which squared
residuals are large
Minimise least squares using these observations
only
Repeat 2)-6) until convergence achieved.
The function lts in R does LTS and several others
as a special case of LTS (least median and least
quantile squares). The number of used residuals
for different methods are different. For
least-median it is (n1)/2, for least quantile
(np1)/2 and for LTS it is n/2(p1)/2,
where is the integer part of the argument

The result of default lqs
11
Robust M-estimators

General extension of least-squares have the form
Form the function ? defines various forms of
robust M-estimators. When ?(z)z2 it becomes
simple least-squares.
Let us first analyse this function. To minimise
this function let us use Gauss-Newton method. To
use this method we need the first and second
derivatives (more precisely an approximation for
the second derivative)
Where ?, ? are the first and the second
derivative of ?. In Gauss-Newton methods the
second term of the second derivative equation is
usually ignored. Usually ?? and ?w notations
are used. If we look at the equations we can see
that it looks like an extension of least-squares
equations. The minimisation of the function is
done iteratively using iteratively reweighted
least squares (IRLS or IWLS).
? function is an influence function. Analysis of
values of this function at the observations may
help to understand outliers in the data and how
are dealt with.

12
Forms of robust regression
Example of ? and ? (Geman and Mcclure function)

Robust M-estimators are usually chosen so that to
make contribution of gradients for large
residuals small, in other words to weight down
large deviations. They can be chosen either using
? or ?.
Basic idea behind robust estimators is For small
deviations behaviour of the function should be
similar to least squares and for large deviations
contributions should be weighted down. Different
functions differ by degree of weighting.

13
Forms of robust regression

Most popular forms of robust estimators are
Huber
Tukeys bisquare
Geman and Mcclure
Welsch
t-distribution (actually it is a little bit
modified form of log t distribution)

14
Robust estimators

The result of robust regression with Huber
function is very good. In practice choice of the
function will depend on the number and severity
of outliers.

15
Robust estimators Dagnostic plots
16
Boxplots for residuals and weighted residuals
17
R commands for robust estimation

lqs least trimmed, least median estimation
rlm robust linear model estimation using
M-estimator
outliers package may be helpful.

18
References

P. J. Huber (1981) Robust Statistics.
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw
and W. A. Stahel (1986) Robust Statistics The
Approach based on Influence Functions
A. Marazzi (1993) Algorithms, Routines and S
Functions for Robust Statistics,
W. N. and Ripley, B. D. (2002) Modern Applied
Statistics with S

Write a Comment

User Comments (0)