Variable Selection and Enterprise Miner - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Variable Selection and Enterprise Miner

Description:

Section 4.1 Variable Selection and Enterprise Miner Objectives Discuss the need for variable selection. Explain the methods of variable selection available in ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 11
Provided by: SusanW155
Category:

less

Transcript and Presenter's Notes

Title: Variable Selection and Enterprise Miner


1
Section 4.1
  • Variable Selection and Enterprise Miner

2
Objectives
  • Discuss the need for variable selection.
  • Explain the methods of variable selection
    available in Enterprise Miner.
  • Demonstrate the use of different variable
    selection methods.

3
The Curse of Dimensionality
  • As the number of input variable to a model
    increases, there is an exponential increase in
    the data required to densely populate the model
    space.
  • If all input variables are used to populate the
    model hard to fit a model to (real) noisy data.
  • Furthermore, some variables maybe redundant and
    others are irrelevant.

4
The Curse of Dimensionality
1D
2D
3D
5
Methods of Variable Selection
  • Stepwise Regression
  • Decision Trees
  • Variable Selection Node

6
Stepwise Regression
  • Uses multiple regression p-values to eliminate
    variables.
  • Weakness May not perform well with many
    potential input variables.

7
Decision Trees
  • Calculate the measure of variable importance.
  • Variable with importance less than 0.05 is
    rejected from the model.
  • Retain only the variables important in growing
    the tree for further modeling.
  • Grow a large tree bushier tree is often more
    useful in variable selection.
  • Severe pruning often results in too few variable
    being selected.

8
Variable Selection Node
  • Selection based on one of two criteria
  • a) R-square (for binary and non-binary target)
  • Computes squared correlation for each variable.
    Variable with value less than specified criterion
    (default 0.005)
  • The remaining variables are evaluated using
    forward stepwise R-square regression. Variable
    with less than the threshold criterion (default
    0.05) is rejected.
  • For binary targets, logistic regression is
    perform using the predicted value from the
    forward stepwise regression

9
Variable Selection Node
  • Selection based on one of two criteria
  • a) Chi-square (for binary target only)
  • Binary split is done to maximize the chi-square
    value of 2X2 frequency table.
  • For nominal ordinal variables ? each level is
    decompose into binary dummy variables.
  • For interval variables ? range of each variable
    is divided into number of categories for split
    (resulted into bins with equal size default
    50 bins)

10
Demonstration
  • This demonstration illustrates variable selection
    using the decision tree node and the variable
    selection node.
Write a Comment
User Comments (0)
About PowerShow.com