ROBUST STATISTICS - PowerPoint PPT Presentation

About This Presentation

Title:

ROBUST STATISTICS

Description:

ROBUST STATISTICS INTRODUCTION Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are ... – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 46

Provided by: ceylan

Category:

more less

Transcript and Presenter's Notes

Title: ROBUST STATISTICS

1
ROBUST STATISTICS

2
INTRODUCTION

Robust statistics provides an alternative
approach to classical statistical methods. The
motivation is to produce estimators that are not
excessively affected by small departures from
model assumptions. These departures may include
departures from an assumed sample distribution or
data departuring from the rest of the data (i.e.
outliers).
The goal is to produce statistical procedures
which still behave fairly well under deviations
from the assumed model.

3
INTRODUCTION

Mosteller and Tukey define two types of
robustness
resistance means that changing a small part, even
by a large amount, of the data does not cause a
large change in the estimate
robustness of efficiency means that the statistic
has high efficiency in a variety of situations
rather than in any one situation. Efficiency
means that the estimate is close to optimal
estimate given that we know what distribution
that the data comes from. A useful measure of
efficiency is
Efficiency (lowest variance feasible)/ (actual
variance)
Many statistics have one of these properties.
However, it can be difficult to find statistics
that are both resistant and have robustness of
efficiency.

5
MEAN VS MEDIAN

7
ROBUST MEASURE OF VARIABILITY

8
ROBUST MEASURE OF CORRELATION

The Pearson correlation coefficient is an optimal
estimator for Gaussian data. However, it is not
resistant and it does not have robustness of
efficiency.
The percentage bend correlation estimator,
discussed in Shoemaker and Hettmansperger and
also by Wilcox, is both resistant and robust of
efficiency.
When the underlying data are bivariate normal,
?pb gives essentially the same values as
Pearsons ?. In general, ?pb is more robust to
slight changes in the data than ?.

9
Example

We use a simple dataset from Wright and London
(2009) where we are interested whether the length
and heat of a chile are related. The length was
measured in centimeters, the heat on a scale from
0 (for sissys) to 11 (nuclear).
head(chile)
name length heat
1 Afric 5.00 5.0
2 Aji 7.50 7.0
3 Aji_A 11.43 7.5
4 Aji_C 3.81 9.0
5 Aji_F 6.00 2.0
6 Aji_L 6.35 7.0
library(vioplot)
attach(chile)
plot(length, heat, xlimc(0,32), ylimc(0,11))
vioplot(length, col"red", at0, horizontalTRUE,
addTRUE)
vioplot(heat, col"blue", at0, horizontalFALSE,
addTRUE
with(chile, pbcor(length, heat))
Call
pbcor(x length, y heat)
Robust correlation coefficient -0.3785

10
ROBUST MEASURE OF CORRELATION

A second robust correlation measure is the
Winsorized correlation ?w, which requires the
specification of the amount of Winsorization.
The computation is simple it uses Persons
correlation formula applied on the Winsorized
data.

11
Example

In a study on the effect of consuming alcohol,
the number hangover symptoms were measured for
two independent groups, with each subject
consuming alcohol and being measured on three
different occasions. One group consisted of sons
of alcoholics and the other one was a control
group. Here we are interested in the Winsorized
correlations across the three time points for the
participants in the alcoholic group
library(WRS2)
library(reshape)
hangctr lt- subset(hangover, subset group
"alcoholic")
hangctr
symptoms group time id
61 0 alcoholic 1 21
62 0 alcoholic 1 22
63 0 alcoholic 1 23
64 0 alcoholic 1 24
65 0 alcoholic 1 25
66 0 alcoholic 1 26
67 0 alcoholic 1 27
0 alcoholic 1 28
hangwide lt- cast(hangctr, id time, value
"symptoms"),-1
hangwide
1 2 3
1 0 2 1
2 0 0 0

Cast a molten data frame into the reshaped or
aggregated form you want
12

winall(hangwide)
Call
winall(x hangwide)
Robust correlation matrix
1 2 3
1 1.0000 0.2651 0.4875
2 0.2651 1.0000 0.6791
3 0.4875 0.6791 1.0000
p-values
1 2 3
1 NA 0.27046 0.03935
2 0.27046 NA 0.00284
3 0.03935 0.00284 NA

13
ORDER STATISTICS AND ROBUSTNESS

Ordered statistics and their functions are
usually somewhat robust (e.g. median, MAD, IQR),
but not all ordered statistics are robust (e.g.
X(1), X(n), RX(n)? X(1).

REMARK mean, a-trimmed mean and a-Winsorized
mean, median are particular cases of L-statistics
L-statistics linear combination of order
statistics.
Classical estimators are highly influenced by the
outlier
Robust estimate computed from all observations is
comparable with the classical estimate applied to
non-outlying data
How to compare robust estimators?

16
(Standardized) sensitivity curve (SC)

17
Sensitivity curve (example)

Data ??10??1,,??10 are the natural logs of
the annual incomes of 10 people.
9.52 9.68 10.16 9.96 10.08 9.99 10.47 9.91 9.92
15.21
.y9 consist of the 9 regular observations

18
(Finite sample) breakdown point

Given data set with nobs.
If replace m of obs. by any outliers and
estimator stays in a bounded set, but doesn't
when we replace (m1), the breakdown point of the
estimator at that data set is m/n.
breakdown point of the mean 0

19
(Finite sample) breakdown pointof the median

?? is even
?? is odd

20
INFLUENCE FUNCTION (Hampel, 1974)

21
INFLUENCE FUNCTION (IF)

23
PLOT OF THE INFLUENCE FUNCTION AT N(0,1)

24
SC and IF

IF small fraction e of identical outliers
SC fraction of contamination is 1/??

25
M-ESTIMATORS

26
M-ESTIMATORS

27
M-ESTIMATORS

28
M-ESTIMATORS

29
M-ESTIMATOR

When an estimator is robust, it may be inferred
that the influence of any single observation is
insufficient to yield any significant offset.
There are several constraints that a robust
M-estimator should meet
1. The first is of course to have a bounded
influence function.
2. The second is naturally the requirement of the
robust estimator to be unique.

Briefly we give a few indications of these
functions
L2 (least-squares) estimators are not robust
because their influence function is not bounded.
L1 (absolute value) estimators are not stable
because the ? -function x is not strictly
convex in x. Indeed, the second derivative at x0
is unbounded, and an indeterminant solution may
result.
L1?L2 estimators reduce the influence of large
errors, but they still have an influence because
the influence function has no cut off point.

33
EXAMPLES OF M-ESTIMATORS

The mean corresponds to ?(x) x2, and the median
to ?(x) x. (For even n any median will solve
the problem.) The function
corresponds to metric trimming and large outliers
have no influence at all. The function
is known as metric Winsorizing2 and brings in
extreme observations to µc.

34
EXAMPLES OF M-ESTIMATORS

The corresponding -log f is
and corresponds to a density with a Gaussian
center and double-exponential tails. This
estimator is due to Huber.

35
EXAMPLES OF M-ESTIMATORS

Tukeys biweight has
where denotes the positive part of. This
implements soft trimming. The value R 4.685
gives 95 efficiency at the normal.
Hampels ? has several linear pieces,

for example, with a 2.2s, b 3.7s, c 5.9s.
36

37
ROBUST REGRESSION

Robust estimation procedures dampen the influence
of outlying cases, as compared to ordinary LSE,
in an effort to provide a better fit for the
majority of cases.
LEAST ABSOLUTE RESIDUALS (LAR) REGRESSION
Estimates the regression coefficients by
minimizing the sum of absolute deviations of Y
observations from their means
Since absolute deviations rather than squared
ones are involved, LAR places less emphasis on
outlying observations than does the method of LS.
Residuals ordinarily will not sum to 0. Solution
for estimated coefficients may not be unique.

38
ROBUST REGRESSION

ITERATIVELY REWEIGHTED LEAST SQUARES (IRLS)
ROBUST REGRESSION It uses weighted least squares
procedure.
This regression uses weights based on how far
outlying a case is, as measured by the residual
for that case. The weights are revised with each
iteration until a robust fit has been obtained.

39
ROBUST REGRESSION

LEAST MEDIAN OF SQUARES (LMS) REGRESSION
Other robust regression methods Some involve
trimming extreme squared deviations before
applying LSE, others are based on ranks. Many of
the robust regression procedures require
intensive computing.

40
EXAMPLE

This data set gives n 24 observations about the
annual numbers of telephone calls made (calls, in
millions of calls) in Belgium in the last two
digits of the year (year) see Rousseeuw and
Leroy (1987), and Venables and Ripley (2002). As
it can be noted in Figure there are several
outliers in the y-direction in the late 1960s.

Let us start the analysis with the classical OLS
fit.
library(MASS)
attach(phones)
plot(year,calls)
fit.ols lt- lm(callsyear)
summary(fit.ols,corF)
Residuals
Min 1Q Median 3Q Max
-78.97 -33.52 -12.04 23.38 124.20
Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) -260.059 102.607 -2.535 0.0189
year 5.041 1.658 3.041 0.0060
---
Signif. codes 0 0.001 0.01 0.05
. 0.1 1
Residual standard error 56.22 on 22 degrees of
freedom
Multiple R-squared 0.2959, Adjusted
R-squared 0.2639

abline(fit.olscoef)
par(mfrowc(1,4))
plot(fit.ols,12)
plot(fit.ols,4)
hmat.p lt- hat(model.matrix(fit.ols))
h.phone lt- hat(hmat.p)
cook.d lt- cooks.distance(fit.ols)
plot(h.phone/(1-h.phone),cook.d,xlab"h/(1-h)",yla
b"Cook distance")

In order to take into account of observations
related to high values of the residuals, i.e. the
outliers in the late 1960s, consider a robust
regression based on Huber-type estimates
fit.hub lt- rlm(callsyear,maxit50)
fit.hub2 lt- rlm(callsyear,scale.est"proposal
2")
summary(fit.hub,corF)
Residuals
Min 1Q Median 3Q Max
-18.314 -5.953 -1.681 26.460 173.769
Coefficients
Value Std. Error t value
(Intercept) -102.6222 26.6082 -3.8568
year 2.0414 0.4299 4.7480
Residual standard error 9.032 on 22 degrees of
freedom
summary(fit.hub2,corF)
Residuals
Min 1Q Median 3Q Max
-68.15 -29.46 -11.52 22.74 132.67
Coefficients

gt summary(rlm(callsyear, psipsi.bisquare),
corF)
Residuals
Min 1Q Median 3Q Max
-1.6585 -0.4143 0.2837 39.0866 188.5376
Coefficients
Value Std. Error t value
(Intercept) -52.3025 2.7530 -18.9985
year 1.0980 0.0445 24.6846
Residual standard error 1.654 on 22 degrees of
freedom

From the results and also from THE PREVIOUS PLOT,
we note that there are some differences with the
OLS estimates, in particular this is true for the
Huber-type estimator with MAD.
Consider again some classic diagnostic plots
about the robust fit the plot of the observed
values versus the fitted values, the plot of the
residuals versus the fitted values, the normal
QQ-plot of the residuals and the fit weights of
the robust estimator. Note that there are some
observations with low Huber-type weights which
were not identified by the classical Cooks
statistics.