Title: QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators
1Penalized Trimmed Squares and e-Insensitive Loss
Function for Unmasking Multiple Outliers in
Regression ICORS06-LISBON
G. Zioutas, A. Avramidis and L. Pitsoulis Dept.
of Mathematics and Physics Sciences, Faculty of
Technology, Aristotle University Of
Thessaloniki Greece e-mail zioutas_at_eng.auth.gr
2Outline
- Regression Problem
- Least Trimmed Squares (LTS)
- Penalized Trimmed Squares (PTS)
- Unmasking Multiple High Leverage Points
- QMIP formulation for PTS
- Support Vectors, e-insensitive Regression
- Detecting Outliers with the e-Insensitive-PTS
procedure - Numerical comparisons
- Conclusions Future Work
3Regression Problem
Consider the linear regression model
- where
- y is the n?1 vector of dependent variable
- X is a n?p matrix of regressor variables
- ß is a 1?p vector of unknown parameters
- u is the n?1 vector of random error
We observe a sample (xi, yi),...,(xn, yn) and we
wish to construct an estimator for the
parameters ß.
4Standard Linear Regression
u
Find ß such that
n points in Rp, represented by an n x p matrix
X. y in Rn is the vector to be approximated.
5- Least Squares Estimator
- Defined by minimizing the squared error
- loss function
- Points which are far from the predicted line
(outliers) are overemphasized. - Least Squares Estimators are very sensitive to
outliers
6Outliers in Linear Regression
Outlier affects the regression line
u
Find ß such that
7Least Trimmed Squares (LTS)High Break Down
Estimator (Rousseeuw and Leroy1987)
- Fits the best subset of k observations
- Removes the rest (n-k) observations
- The LTS estimator is defined by minimizing
- where k is the coverage, kn/2, given a priory
In real applications the coverage k is unknown
8LTS fittingcoverage k
Delete n-k potential outliers
y
Fit k points
b
X
In practice the coverage k is unknown
9Penalized Trimmed Squares (PTS)
- The proposed PTS estimator minimizes the total
sum of - the k (k is not given a priory) square residuals
in the clean data - and the penalties for deleting the rest n-k
observations
PTS is the OLS of the clean data subset k
10Penalized Trimmed Squares (PTS) some regression
diagnostics
- Deleting (xi, yi), the sum square errors is
reduced by - Under Gaussian conditions, (xi, yi) is considered
outlier if
11The general principles of PTS procedure
- Delete an observation if it causes significant
reduction in the sum square errors.
- Penalize the loss function with for
deleting (xi, yi)
(3s)2 is consider as significant reduction in
the loss function
12The PTS robust loss function
- PTS can be defined equivalently by solving the
problem
Constant for penalizing large adjusted residuals
- In the final solution, clean data subset, all
errors are bounded by
13Performance of PTS estimator
- Given parameters as
- robust error scale estimate of s (LTS)
- deleting penalty
- For data with contamination
- of y-outliers
- of few x-outliers (high leverage outliers)
- The performance of PTS is successful
The PTS can not handle groups of x-outliers or
group of high leverage outliers in the data
14Masking Problem Group of high leverage points in
same direction
- Deleting a masked leverage point, the reduction
in the sum square residuals may be too small.
15Reduction of penalties for masked high leverage
points
- To overcome the masking problem reduce the
deleting penalty for the masked leverage points - For the choice of weights we use information
from - The initial leverage of each data point (xi, yi)
- The clean leverage of each point (xi, yi) as it
joins the clean data subset , MCD (Rousseeuw and
Driessenn 1999)
16Our proposal for penalty weights for unmasking
outliers
- In a robust estimate we wish for every data point
(xi, yi)
- This can be done by down-weighting the penalty
,
- Since, applying to
the initial data set gives
17Our proposal for deleting penalties in PTS
- For each data point (xi, yi)
- given the initial and final leverages
- down-weight the deleting penalties
- Finally, for each data point (xi, yi) the
deleting penalty is
18The QMIP formula for PTS Convex formula
Penalty for deleting potential outliers
- The QMIP formula is convex
- A unique global optimum solution exists
- A PTS estimate is obtained
19Computation load of PTS
- The exact computation of PTS is difficult.
- The exact algorithm for PTS is a combinatory one.
- The exact algorithm is suitable for small data
sets, i.e. ncomputation load. - Faster probabilistic algorithms could be
developed for larger samples.
In the presented work we consider alternative
approach for PTS, bringing ideas from Support
Vector Regression
20Support Vectors Regression
- Alternative robust procedure against noisy data
- Reduces the complexity of the regression model,
- by reducing the magnitude of the parameters.
- Attempts to fit a tube of radius e to the data
set, - by ignoring (tolerating) small errors, u
- Gain sparseness and even better robustness and
efficiency.
- Cristmann A. and Steiwart I. (2004)
- Vapnick,V. N (1998)
- Smola A. and Scholkopf B.(1998, 2000)
21Vapniks e-insensitive loss function
- Vapnik (1998) devices the so called e-insensitive
loss function
- Small errors (below some e0) are not penalized
in the loss function
- Consider a tube with radius e to fit the data
- and measure the stochastic error u outside the
tube
22Vapniks e-insensitive regression
- Only the points outside the tube enter the
stochastic term. - Ignore errors from those points which are close
- Points close to actual regression have zero loss.
23Our e-insensitive loss functionsquare form
- Our adaptation to e-insensitive loss function
- The regression error u contribute in a quadratic
fashion, instead of linearly. - Allow an interval of size e with uniform error.
- Errors bellow some e ( uinstead of zero.
Consider a tube to fit the data and measure the
error u from the axis of the tube.
24Our e-insensitive regression
It is attempted to fit a tube to the data.
- Only the points outside the tube enter the
stochastic term. - For points close to actual regression measure ue
25Quadratic Program for e-Insensitive regression
tolerance
- tolerance e should be large as possible, while
preserving accuracy
Under Gaussian conditions good efficiency could
be obtained for e 0.612 s (B. Scholkopf and
J.Smola, 2002).
26e-Insensitive PTS loss function (IPTS)
- Bring together,
- e-Insensitive loss function
- and Penalized Trimmed Squares, PTS
- From our empirical results e0.8s was a good
choice for faster computation and efficiency.
27e-Insensitive PTS regression IPTS QMIP program
Penalty for deleting outliers
Tolerance as a constraint
- The IPTS formula is convex
- An IPTS estimate can be obtained.
- Tolerance yields sparsness
28e-Insensitive PTS regression
e
y
e
X
- Appropriate emphasis on medium residuals (risk
part) - Deemphasize
- small errors u,
- big errors u.
29Numerical Testing
- Compare our methods PTS and IPTS with the LTS and
MM methods for - Robustness and efficiency
- Datasets
- Contaminated artificial data, 50 points in R2
- Hawkins, dataset 75 points in R3
- Contaminated artificial data, 500 points in R2
- Solutions obtained using
- S-PLUS 6.1 for LTS, MM
- Fort/QMIP for PTS and IPTS formula
- Hardware All experiments run on 1200 Mhz Athlon
AMD.
30Experimental Results good leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
31Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-insensitive PTS approach improves the
performance
32Experimental Results bad high leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
33IPTS regression large data set
y
X
- Increase the radius e
- earn computation time,
- reasonable efficiency.
34e-Insensitive PTS procedurefor detecting outliers
- PHASE A
- Step 1. Obtain robust s and hi (LTS, MCD)
- Step 2. Calculate the penalties 3swi , choice of
e - Step 3. Solve the QMIP of the e-Insensitive PTS
formula - Step 4. Find the basic clean subset k after
removing the n-k deleted points in the QMIP
solution
35Re-inclusion of good leverage points
- PHASE B
- Step 5. Apply the OLS to the clean subset k,
obtain estimates ßk , sk - Step 6. Compute the scaled prediction error
based on the clean subset k.
- Re-include (xi, yi) into the
clean data subset - Step 8. Compute the OLS estimate from the new
data subset
36Experimental Results Heavy contamination , 10
high leverage outliers, 6 y-outliers Number of
observations n50
The e-Insensitive PTS approach improves the
performance
37Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-Insensitive PTS approach improves
efficiency and computation time
38Comparison ResultsLarge artificial data set 500
points, in R2 ,including 120 outliers
The e-Insensitive PTS approach reduces
significantly the computation time
39Comparison Results Hawkins,Bradu and Kass
artificial data75 points, in R3 ,including 10
bad x-outliers
IPTS procedure improves significant the
computation time
40Conclusions and Future Work
- Conclusions
- The PTS estimator based on MCD results can be
used successfully in unmasking regression
outliers. - The e-Insensitive PTS approach has improved
- significantly the robustness without sacrificing
much efficiency. - significantly the computation load.
- Support Vectors Machine with PTS is a new
approach for detecting outliers for large data
set. - Future Work
- Develop statistical models for the penalties and
insensitive parameter e
41 End
Thank you very much
42Huber-type robust procedure
outlier
Initial error u
pulling distance s
y
Down-weighted error u
b
X
- Robust procedure pulls the outlier towards the
regression line
43The over-fitting problem
- Close points may be wrong due to noise only
- Line should be influenced by real data, not
noise, - (Mangassarian and Musicant 2000)
- Ignore errors from those points which are close!