QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators presentation

About This Presentation

Transcript and Presenter's Notes

Title: QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators

1
Penalized Trimmed Squares and e-Insensitive Loss
Function for Unmasking Multiple Outliers in
Regression ICORS06-LISBON
G. Zioutas, A. Avramidis and L. Pitsoulis Dept.
of Mathematics and Physics Sciences, Faculty of
Technology, Aristotle University Of
Thessaloniki Greece e-mail zioutas_at_eng.auth.gr
2
Outline

Regression Problem
Least Trimmed Squares (LTS)
Penalized Trimmed Squares (PTS)
Unmasking Multiple High Leverage Points
QMIP formulation for PTS
Support Vectors, e-insensitive Regression
Detecting Outliers with the e-Insensitive-PTS
procedure
Numerical comparisons
Conclusions Future Work

3
Regression Problem
Consider the linear regression model

where
y is the n?1 vector of dependent variable
X is a n?p matrix of regressor variables
ß is a 1?p vector of unknown parameters
u is the n?1 vector of random error

We observe a sample (xi, yi),...,(xn, yn) and we
wish to construct an estimator for the
parameters ß.
4
Standard Linear Regression
u
Find ß such that
n points in Rp, represented by an n x p matrix
X. y in Rn is the vector to be approximated.
5

Least Squares Estimator
Defined by minimizing the squared error
loss function

Points which are far from the predicted line
(outliers) are overemphasized.
Least Squares Estimators are very sensitive to
outliers

6
Outliers in Linear Regression
Outlier affects the regression line
u
Find ß such that
7
Least Trimmed Squares (LTS)High Break Down
Estimator (Rousseeuw and Leroy1987)

Fits the best subset of k observations
Removes the rest (n-k) observations
The LTS estimator is defined by minimizing
where k is the coverage, kn/2, given a priory

In real applications the coverage k is unknown
8
LTS fittingcoverage k
Delete n-k potential outliers
y
Fit k points
b
X
In practice the coverage k is unknown
9
Penalized Trimmed Squares (PTS)

The proposed PTS estimator minimizes the total
sum of
the k (k is not given a priory) square residuals
in the clean data
and the penalties for deleting the rest n-k
observations

PTS is the OLS of the clean data subset k
10
Penalized Trimmed Squares (PTS) some regression
diagnostics

Deleting (xi, yi), the sum square errors is
reduced by
Under Gaussian conditions, (xi, yi) is considered
outlier if

11
The general principles of PTS procedure

Delete an observation if it causes significant
reduction in the sum square errors.

Penalize the loss function with for
deleting (xi, yi)

(3s)2 is consider as significant reduction in
the loss function
12
The PTS robust loss function

PTS can be defined equivalently by solving the
problem

Constant for penalizing large adjusted residuals

In the final solution, clean data subset, all
errors are bounded by

13
Performance of PTS estimator

Given parameters as
robust error scale estimate of s (LTS)
deleting penalty
For data with contamination
of y-outliers
of few x-outliers (high leverage outliers)
The performance of PTS is successful

The PTS can not handle groups of x-outliers or
group of high leverage outliers in the data
14
Masking Problem Group of high leverage points in
same direction

Deleting a masked leverage point, the reduction
in the sum square residuals may be too small.

15
Reduction of penalties for masked high leverage
points

To overcome the masking problem reduce the
deleting penalty for the masked leverage points
For the choice of weights we use information
from
The initial leverage of each data point (xi, yi)

The clean leverage of each point (xi, yi) as it
joins the clean data subset , MCD (Rousseeuw and
Driessenn 1999)

16
Our proposal for penalty weights for unmasking
outliers

In a robust estimate we wish for every data point
(xi, yi)

This can be done by down-weighting the penalty
,

Since, applying to
the initial data set gives

17
Our proposal for deleting penalties in PTS

For each data point (xi, yi)
given the initial and final leverages
down-weight the deleting penalties

Finally, for each data point (xi, yi) the
deleting penalty is

18
The QMIP formula for PTS Convex formula
Penalty for deleting potential outliers

The QMIP formula is convex
A unique global optimum solution exists
A PTS estimate is obtained

19
Computation load of PTS

The exact computation of PTS is difficult.
The exact algorithm for PTS is a combinatory one.
The exact algorithm is suitable for small data
sets, i.e. ncomputation load.
Faster probabilistic algorithms could be
developed for larger samples.

In the presented work we consider alternative
approach for PTS, bringing ideas from Support
Vector Regression
20
Support Vectors Regression

Alternative robust procedure against noisy data
Reduces the complexity of the regression model,
by reducing the magnitude of the parameters.
Attempts to fit a tube of radius e to the data
set,
by ignoring (tolerating) small errors, u
Gain sparseness and even better robustness and
efficiency.

Cristmann A. and Steiwart I. (2004)
Vapnick,V. N (1998)
Smola A. and Scholkopf B.(1998, 2000)

21
Vapniks e-insensitive loss function

Vapnik (1998) devices the so called e-insensitive
loss function

Small errors (below some e0) are not penalized
in the loss function

Consider a tube with radius e to fit the data
and measure the stochastic error u outside the
tube

22
Vapniks e-insensitive regression

Only the points outside the tube enter the
stochastic term.
Ignore errors from those points which are close
Points close to actual regression have zero loss.

23
Our e-insensitive loss functionsquare form

Our adaptation to e-insensitive loss function

The regression error u contribute in a quadratic
fashion, instead of linearly.
Allow an interval of size e with uniform error.
Errors bellow some e ( uinstead of zero.

Consider a tube to fit the data and measure the
error u from the axis of the tube.
24
Our e-insensitive regression
It is attempted to fit a tube to the data.

Only the points outside the tube enter the
stochastic term.
For points close to actual regression measure ue

25
Quadratic Program for e-Insensitive regression
tolerance

tolerance e should be large as possible, while
preserving accuracy

Under Gaussian conditions good efficiency could
be obtained for e 0.612 s (B. Scholkopf and
J.Smola, 2002).
26
e-Insensitive PTS loss function (IPTS)

Bring together,
e-Insensitive loss function
and Penalized Trimmed Squares, PTS

From our empirical results e0.8s was a good
choice for faster computation and efficiency.

27
e-Insensitive PTS regression IPTS QMIP program
Penalty for deleting outliers
Tolerance as a constraint

The IPTS formula is convex
An IPTS estimate can be obtained.
Tolerance yields sparsness

28
e-Insensitive PTS regression
e
y
e
X

Appropriate emphasis on medium residuals (risk
part)
Deemphasize
small errors u,
big errors u.

29
Numerical Testing

Compare our methods PTS and IPTS with the LTS and
MM methods for
Robustness and efficiency
Datasets
Contaminated artificial data, 50 points in R2
Hawkins, dataset 75 points in R3
Contaminated artificial data, 500 points in R2
Solutions obtained using
S-PLUS 6.1 for LTS, MM
Fort/QMIP for PTS and IPTS formula
Hardware All experiments run on 1200 Mhz Athlon
AMD.

30
Experimental Results good leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
31
Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-insensitive PTS approach improves the
performance
32
Experimental Results bad high leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
33
IPTS regression large data set
y
X

Increase the radius e
earn computation time,
reasonable efficiency.

34
e-Insensitive PTS procedurefor detecting outliers

PHASE A
Step 1. Obtain robust s and hi (LTS, MCD)
Step 2. Calculate the penalties 3swi , choice of
e
Step 3. Solve the QMIP of the e-Insensitive PTS
formula
Step 4. Find the basic clean subset k after
removing the n-k deleted points in the QMIP
solution

35
Re-inclusion of good leverage points

PHASE B
Step 5. Apply the OLS to the clean subset k,
obtain estimates ßk , sk
Step 6. Compute the scaled prediction error
based on the clean subset k.

Re-include (xi, yi) into the
clean data subset
Step 8. Compute the OLS estimate from the new
data subset

36
Experimental Results Heavy contamination , 10
high leverage outliers, 6 y-outliers Number of
observations n50
The e-Insensitive PTS approach improves the
performance
37
Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-Insensitive PTS approach improves
efficiency and computation time
38
Comparison ResultsLarge artificial data set 500
points, in R2 ,including 120 outliers
The e-Insensitive PTS approach reduces
significantly the computation time
39
Comparison Results Hawkins,Bradu and Kass
artificial data75 points, in R3 ,including 10
bad x-outliers
IPTS procedure improves significant the
computation time
40
Conclusions and Future Work

Conclusions
The PTS estimator based on MCD results can be
used successfully in unmasking regression
outliers.
The e-Insensitive PTS approach has improved
significantly the robustness without sacrificing
much efficiency.
significantly the computation load.
Support Vectors Machine with PTS is a new
approach for detecting outliers for large data
set.
Future Work
Develop statistical models for the penalties and
insensitive parameter e

41
End
Thank you very much
42
Huber-type robust procedure
outlier
Initial error u
pulling distance s
y
Down-weighted error u
b
X

Robust procedure pulls the outlier towards the
regression line

43
The over-fitting problem

Close points may be wrong due to noise only
Line should be influenced by real data, not
noise,
(Mangassarian and Musicant 2000)

Ignore errors from those points which are close!

Write a Comment

User Comments (0)

About PowerShow.com

QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators PowerPoint PPT Presentation