Comparative analysis of best regression subset search methods - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Comparative analysis of best regression subset search methods

Description:

To create modification of best linear regression subset search methods. Goal of research ... Number of structures: not larger than mk(for backward/forward methods) 9 ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: I119119
Category:

less

Transcript and Presenter's Notes

Title: Comparative analysis of best regression subset search methods


1
Comparative analysis of best regression subset
search methods
2
Goal of research
  • To make comparative analysis of efficiency of
    best linear regression subset search methods.
  • To create modification of best linear regression
    subset search methods.

3
Description of modeling problem
  • F set of models
  • CR(f)- quality criteria for model f
  • m total number of arguments in a model
  • k maximal number of arguments in a model
  • Task
  • to find model f(k) that will be
  • f arg min CR(f)
  • f ? F

4
Quality Criterias for models
  • Residual sum of squares RSS
  • Akaike Criteria
  • Mallows Criteria
  • Important feature of Akaike and Mallows
    criterias is that when complexity of model is
    fixed, they will have minimum on the same model

5
Investigated Methods
  • Full Check method
  • La Motte-Hocking method
  • Stepwise regression

6
Full Check Method
  • Description step-by-step checking of all
    models.
  • Number of structures

7
La Motte-Hocking Method
  • Method Basis
  • RSS is increasing when additional arguments are
    excluded from model.
  • Method Description
  • All models of complexity m-s (s-models) are built
    and estimated. Then they are ordered according to
    their RSS values.
  • From best model of complexity m-s are
    additionally excluded (m-k-s) regressors. All
    such possible models are found (excluded same s
    regressors as in best model and additionally
    (m-k-s) regressors). Best model among this models
    is found.
  • If best model of complexity k has less value of
    RSS than next model of complexity s, search is
    finished. Otherwise steps 2,3 are repeated for
    next s-model.
  • Number of structures total number of checked
    structures number of models of complexity 1
    number of models of complexity (m-s)number of
    models of complexity in group that corresponds to
    best s-model.

8
Stepwise Regression
  • Method Description on each step from model is
    excluded element that gives least increasing of
    quality criteria. Than element that leads to most
    decreasing of quality criteria is included in
    model. Algorithm stops when change of quality
    criteria isnt significant. Or when required
    complexity is achieved.
  • Number of structures not larger than mk(for
    backward/forward methods)

9
Comparison of models of best regression subset
search
  • Full check (combinatorial algorithm)
  • Result corresponds to best model for whole set of
    models
  • Simple algorithm
  • Doesnt work for complexities more than 20-25
  • Too much calculations
  • Algorithms that exclude part of structures from
    models to be checked (La Motte-Hocking, Furnival)
  • Result is same as for full check method
  • Much less models are checked
  • Many preparational calculations
  • Complex algorithms, hard to modify
  • Stepwise methods(stepwise regression, backward
    regression, forward regression)
  • Small amount of calculations
  • Simple algorithm
  • Result not always corresponds to result of full
    check for same input data.

10
Modification of La Motte-Hocking method
  • Decreasing number of k-structures to be built
  • Approach was proposed by authors of method. Main
    disadvantage probability that search should be
    performed in all groups.
  • Decreasing number of structures that are checked
    in each group
  • Using stepwise methods for search within group.
    Disadvantage stepwise methods arent accurate.

11
Comparative Analysis Methodology
  • Compared Factors
  • running time
  • number of checked models
  • accuracy of result
  • Input data for research
  • Sample length 25
  • Total number of arguments in the model 20
  • Noise10
  • For La Motte-Hocking algorithm k3

12
Number of checked structures
  • Comparative graph of number of checked structures
    for search in fixed complexity levels.
  • Standard and modified La Motte-Hocking alogrithms
    are better only for complexities near to m/2.
  • Stepwise regressions is better for almost all
    levels of complexity, but results are usually
    worse.

13
Number of checked structures
Comparative graph of number of checked structures
for full check Standard and modified La
Motte-Hocking methods decrease number of
structures. Modified method is better only for
large complexities.
14
Comparison of running time of algorithms
Comparative graph of running time for search on
fixed levels of complexity.
Comparative graph of running time for full check.
Graphs of running time are proportional to graphs
of number of checked strucrures, so additional
calculations dont slow down methods. Modified La
Motte-Hocking method is slow, calculations should
be optimized.
15
Accuracy of results
Graph of accuracy of results for different
methods. Search was performed on the fixed levels
of complexity. Result of La Motte-Hocking
method is same as result of full check. Result of
modified La Motte-Hocking method may not be
accurate for small complexities. Stepwise
regression isnt an accurate method.
16
Number of checked models for different values of
parameter k
La Motte-Hocking algorithm. Complexity of model
10, total number of arguments 20.
17
Comparison of efficiency of methods
18
Results
  • Results of comparative analysis of best linear
    subset search are retrieved
  • Modification of La Motte-Hocking method is
    implemented and investigated
  • Recommendations on improving La Motte-Hocking
    methods are developed
Write a Comment
User Comments (0)
About PowerShow.com