Title: Comparative analysis of best regression subset search methods
1 Comparative analysis of best regression subset
search methods
2Goal of research
- To make comparative analysis of efficiency of
best linear regression subset search methods. - To create modification of best linear regression
subset search methods.
3Description of modeling problem
- F set of models
- CR(f)- quality criteria for model f
- m total number of arguments in a model
- k maximal number of arguments in a model
- Task
- to find model f(k) that will be
- f arg min CR(f)
- f ? F
4Quality Criterias for models
- Residual sum of squares RSS
- Akaike Criteria
- Mallows Criteria
- Important feature of Akaike and Mallows
criterias is that when complexity of model is
fixed, they will have minimum on the same model
5Investigated Methods
- Full Check method
- La Motte-Hocking method
- Stepwise regression
6Full Check Method
- Description step-by-step checking of all
models. - Number of structures
7La Motte-Hocking Method
- Method Basis
- RSS is increasing when additional arguments are
excluded from model. - Method Description
- All models of complexity m-s (s-models) are built
and estimated. Then they are ordered according to
their RSS values. - From best model of complexity m-s are
additionally excluded (m-k-s) regressors. All
such possible models are found (excluded same s
regressors as in best model and additionally
(m-k-s) regressors). Best model among this models
is found. - If best model of complexity k has less value of
RSS than next model of complexity s, search is
finished. Otherwise steps 2,3 are repeated for
next s-model. - Number of structures total number of checked
structures number of models of complexity 1
number of models of complexity (m-s)number of
models of complexity in group that corresponds to
best s-model.
8Stepwise Regression
- Method Description on each step from model is
excluded element that gives least increasing of
quality criteria. Than element that leads to most
decreasing of quality criteria is included in
model. Algorithm stops when change of quality
criteria isnt significant. Or when required
complexity is achieved. - Number of structures not larger than mk(for
backward/forward methods)
9Comparison of models of best regression subset
search
- Full check (combinatorial algorithm)
- Result corresponds to best model for whole set of
models - Simple algorithm
- Doesnt work for complexities more than 20-25
- Too much calculations
- Algorithms that exclude part of structures from
models to be checked (La Motte-Hocking, Furnival) - Result is same as for full check method
- Much less models are checked
- Many preparational calculations
- Complex algorithms, hard to modify
- Stepwise methods(stepwise regression, backward
regression, forward regression) - Small amount of calculations
- Simple algorithm
- Result not always corresponds to result of full
check for same input data.
10Modification of La Motte-Hocking method
- Decreasing number of k-structures to be built
- Approach was proposed by authors of method. Main
disadvantage probability that search should be
performed in all groups. - Decreasing number of structures that are checked
in each group - Using stepwise methods for search within group.
Disadvantage stepwise methods arent accurate.
11Comparative Analysis Methodology
- Compared Factors
- running time
- number of checked models
- accuracy of result
- Input data for research
- Sample length 25
- Total number of arguments in the model 20
- Noise10
- For La Motte-Hocking algorithm k3
12Number of checked structures
- Comparative graph of number of checked structures
for search in fixed complexity levels. - Standard and modified La Motte-Hocking alogrithms
are better only for complexities near to m/2. - Stepwise regressions is better for almost all
levels of complexity, but results are usually
worse.
13Number of checked structures
Comparative graph of number of checked structures
for full check Standard and modified La
Motte-Hocking methods decrease number of
structures. Modified method is better only for
large complexities.
14Comparison of running time of algorithms
Comparative graph of running time for search on
fixed levels of complexity.
Comparative graph of running time for full check.
Graphs of running time are proportional to graphs
of number of checked strucrures, so additional
calculations dont slow down methods. Modified La
Motte-Hocking method is slow, calculations should
be optimized.
15Accuracy of results
Graph of accuracy of results for different
methods. Search was performed on the fixed levels
of complexity. Result of La Motte-Hocking
method is same as result of full check. Result of
modified La Motte-Hocking method may not be
accurate for small complexities. Stepwise
regression isnt an accurate method.
16Number of checked models for different values of
parameter k
La Motte-Hocking algorithm. Complexity of model
10, total number of arguments 20.
17Comparison of efficiency of methods
18Results
- Results of comparative analysis of best linear
subset search are retrieved - Modification of La Motte-Hocking method is
implemented and investigated - Recommendations on improving La Motte-Hocking
methods are developed