QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators PowerPoint PPT Presentation

presentation player overlay
1 / 43
About This Presentation
Transcript and Presenter's Notes

Title: QuadraticMixed Integer Programming Models QMIP for Robust Regression Estimators


1
Penalized Trimmed Squares and e-Insensitive Loss
Function for Unmasking Multiple Outliers in
Regression ICORS06-LISBON
G. Zioutas, A. Avramidis and L. Pitsoulis Dept.
of Mathematics and Physics Sciences, Faculty of
Technology, Aristotle University Of
Thessaloniki Greece e-mail zioutas_at_eng.auth.gr
2
Outline
  • Regression Problem
  • Least Trimmed Squares (LTS)
  • Penalized Trimmed Squares (PTS)
  • Unmasking Multiple High Leverage Points
  • QMIP formulation for PTS
  • Support Vectors, e-insensitive Regression
  • Detecting Outliers with the e-Insensitive-PTS
    procedure
  • Numerical comparisons
  • Conclusions Future Work

3
Regression Problem
Consider the linear regression model
  • where
  • y is the n?1 vector of dependent variable
  • X is a n?p matrix of regressor variables
  • ß is a 1?p vector of unknown parameters
  • u is the n?1 vector of random error

We observe a sample (xi, yi),...,(xn, yn) and we
wish to construct an estimator for the
parameters ß.
4
Standard Linear Regression
u
Find ß such that
n points in Rp, represented by an n x p matrix
X. y in Rn is the vector to be approximated.
5
  • Least Squares Estimator
  • Defined by minimizing the squared error
  • loss function
  • Points which are far from the predicted line
    (outliers) are overemphasized.
  • Least Squares Estimators are very sensitive to
    outliers

6
Outliers in Linear Regression
Outlier affects the regression line
u
Find ß such that
7
Least Trimmed Squares (LTS)High Break Down
Estimator (Rousseeuw and Leroy1987)
  • Fits the best subset of k observations
  • Removes the rest (n-k) observations
  • The LTS estimator is defined by minimizing
  • where k is the coverage, kn/2, given a priory

In real applications the coverage k is unknown
8
LTS fittingcoverage k
Delete n-k potential outliers
y
Fit k points
b
X
In practice the coverage k is unknown
9
Penalized Trimmed Squares (PTS)
  • The proposed PTS estimator minimizes the total
    sum of
  • the k (k is not given a priory) square residuals
    in the clean data
  • and the penalties for deleting the rest n-k
    observations

PTS is the OLS of the clean data subset k
10
Penalized Trimmed Squares (PTS) some regression
diagnostics
  • Deleting (xi, yi), the sum square errors is
    reduced by
  • Under Gaussian conditions, (xi, yi) is considered
    outlier if

11
The general principles of PTS procedure
  • Delete an observation if it causes significant
    reduction in the sum square errors.
  • Penalize the loss function with for
    deleting (xi, yi)

(3s)2 is consider as significant reduction in
the loss function
12
The PTS robust loss function
  • PTS can be defined equivalently by solving the
    problem

Constant for penalizing large adjusted residuals
  • In the final solution, clean data subset, all
    errors are bounded by


13
Performance of PTS estimator
  • Given parameters as
  • robust error scale estimate of s (LTS)
  • deleting penalty
  • For data with contamination
  • of y-outliers
  • of few x-outliers (high leverage outliers)
  • The performance of PTS is successful

The PTS can not handle groups of x-outliers or
group of high leverage outliers in the data
14
Masking Problem Group of high leverage points in
same direction
  • Deleting a masked leverage point, the reduction
    in the sum square residuals may be too small.

15
Reduction of penalties for masked high leverage
points
  • To overcome the masking problem reduce the
    deleting penalty for the masked leverage points
  • For the choice of weights we use information
    from
  • The initial leverage of each data point (xi, yi)
  • The clean leverage of each point (xi, yi) as it
    joins the clean data subset , MCD (Rousseeuw and
    Driessenn 1999)

16
Our proposal for penalty weights for unmasking
outliers
  • In a robust estimate we wish for every data point
    (xi, yi)
  • This can be done by down-weighting the penalty
    ,
  • Since, applying to
    the initial data set gives

17
Our proposal for deleting penalties in PTS
  • For each data point (xi, yi)
  • given the initial and final leverages
  • down-weight the deleting penalties
  • Finally, for each data point (xi, yi) the
    deleting penalty is

18
The QMIP formula for PTS Convex formula
Penalty for deleting potential outliers
  • The QMIP formula is convex
  • A unique global optimum solution exists
  • A PTS estimate is obtained

19
Computation load of PTS
  • The exact computation of PTS is difficult.
  • The exact algorithm for PTS is a combinatory one.
  • The exact algorithm is suitable for small data
    sets, i.e. ncomputation load.
  • Faster probabilistic algorithms could be
    developed for larger samples.

In the presented work we consider alternative
approach for PTS, bringing ideas from Support
Vector Regression
20
Support Vectors Regression
  • Alternative robust procedure against noisy data
  • Reduces the complexity of the regression model,
  • by reducing the magnitude of the parameters.
  • Attempts to fit a tube of radius e to the data
    set,
  • by ignoring (tolerating) small errors, u
  • Gain sparseness and even better robustness and
    efficiency.
  • Cristmann A. and Steiwart I. (2004)
  • Vapnick,V. N (1998)
  • Smola A. and Scholkopf B.(1998, 2000)

21
Vapniks e-insensitive loss function
  • Vapnik (1998) devices the so called e-insensitive
    loss function
  • Small errors (below some e0) are not penalized
    in the loss function
  • Consider a tube with radius e to fit the data
  • and measure the stochastic error u outside the
    tube

22
Vapniks e-insensitive regression
  • Only the points outside the tube enter the
    stochastic term.
  • Ignore errors from those points which are close
  • Points close to actual regression have zero loss.

23
Our e-insensitive loss functionsquare form
  • Our adaptation to e-insensitive loss function
  • The regression error u contribute in a quadratic
    fashion, instead of linearly.
  • Allow an interval of size e with uniform error.
  • Errors bellow some e ( uinstead of zero.

Consider a tube to fit the data and measure the
error u from the axis of the tube.
24
Our e-insensitive regression
It is attempted to fit a tube to the data.
  • Only the points outside the tube enter the
    stochastic term.
  • For points close to actual regression measure ue

25
Quadratic Program for e-Insensitive regression
tolerance
  • tolerance e should be large as possible, while
    preserving accuracy

Under Gaussian conditions good efficiency could
be obtained for e 0.612 s (B. Scholkopf and
J.Smola, 2002).
26
e-Insensitive PTS loss function (IPTS)
  • Bring together,
  • e-Insensitive loss function
  • and Penalized Trimmed Squares, PTS
  • From our empirical results e0.8s was a good
    choice for faster computation and efficiency.

27
e-Insensitive PTS regression IPTS QMIP program
Penalty for deleting outliers
Tolerance as a constraint
  • The IPTS formula is convex
  • An IPTS estimate can be obtained.
  • Tolerance yields sparsness

28
e-Insensitive PTS regression
e
y
e
X
  • Appropriate emphasis on medium residuals (risk
    part)
  • Deemphasize
  • small errors u,
  • big errors u.

29
Numerical Testing
  • Compare our methods PTS and IPTS with the LTS and
    MM methods for
  • Robustness and efficiency
  • Datasets
  • Contaminated artificial data, 50 points in R2
  • Hawkins, dataset 75 points in R3
  • Contaminated artificial data, 500 points in R2
  • Solutions obtained using
  • S-PLUS 6.1 for LTS, MM
  • Fort/QMIP for PTS and IPTS formula
  • Hardware All experiments run on 1200 Mhz Athlon
    AMD.

30
Experimental Results good leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
31
Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-insensitive PTS approach improves the
performance
32
Experimental Results bad high leverage points
6 Number of observations n50
The e-Insensitive PTS approach improves the
performance
33
IPTS regression large data set
y
X
  • Increase the radius e
  • earn computation time,
  • reasonable efficiency.

34
e-Insensitive PTS procedurefor detecting outliers
  • PHASE A
  • Step 1. Obtain robust s and hi (LTS, MCD)
  • Step 2. Calculate the penalties 3swi , choice of
    e
  • Step 3. Solve the QMIP of the e-Insensitive PTS
    formula
  • Step 4. Find the basic clean subset k after
    removing the n-k deleted points in the QMIP
    solution

35
Re-inclusion of good leverage points
  • PHASE B
  • Step 5. Apply the OLS to the clean subset k,
    obtain estimates ßk , sk
  • Step 6. Compute the scaled prediction error
    based on the clean subset k.
  • Re-include (xi, yi) into the
    clean data subset
  • Step 8. Compute the OLS estimate from the new
    data subset

36
Experimental Results Heavy contamination , 10
high leverage outliers, 6 y-outliers Number of
observations n50
The e-Insensitive PTS approach improves the
performance
37
Experimental Results x-outliers 6, good
leverage points 4, y-outliers 6 Number of
observations n50
The e-Insensitive PTS approach improves
efficiency and computation time
38
Comparison ResultsLarge artificial data set 500
points, in R2 ,including 120 outliers
The e-Insensitive PTS approach reduces
significantly the computation time
39
Comparison Results Hawkins,Bradu and Kass
artificial data75 points, in R3 ,including 10
bad x-outliers
IPTS procedure improves significant the
computation time
40
Conclusions and Future Work
  • Conclusions
  • The PTS estimator based on MCD results can be
    used successfully in unmasking regression
    outliers.
  • The e-Insensitive PTS approach has improved
  • significantly the robustness without sacrificing
    much efficiency.
  • significantly the computation load.
  • Support Vectors Machine with PTS is a new
    approach for detecting outliers for large data
    set.
  • Future Work
  • Develop statistical models for the penalties and
    insensitive parameter e

41
End
Thank you very much
42
Huber-type robust procedure
outlier
Initial error u
pulling distance s
y
Down-weighted error u
b
X
  • Robust procedure pulls the outlier towards the
    regression line

43
The over-fitting problem
  • Close points may be wrong due to noise only
  • Line should be influenced by real data, not
    noise,
  • (Mangassarian and Musicant 2000)
  • Ignore errors from those points which are close!
Write a Comment
User Comments (0)
About PowerShow.com