New software facilities - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

New software facilities

Description:

... database, pruned tree (Target ... It does not require a separate pruning set. ... Also, subtrees are not pruned only if their corrected error estimates is at ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 19
Provided by: evan60
Category:

less

Transcript and Presenter's Notes

Title: New software facilities


1
New software facilities
  • Developed between
  • Dec 2004 and Feb 2005

2
New implemented facilities
  • Automatic and manual splits for
  • Gini index of impurity.(used for classification)
  • Variance measure of impurity.(used for
    regression)

3
Manual Split
4
Manual Split
5
Installer
6
General Options
7
CRT options
Obs For a large datasets use higher numbers
8
New implemented facilities
  • Pruning using the Pessimistic Error Pruning
    algorithm (PEP) can be used only for pruning
    classification trees by Ross Quinlan (Quinlan,
    1987).
  • Data transference from any node to another one.
  • Disposal of sub nodes of any node, except leaves.

9
New implemented facilities
  • Data filter and the selection of blocks of data
  • Uses the set operations Union and
    Intersection.
  • The variables can be either categorical or
    numerical.
  • The comparison operators are , lt, lt, gt, gt and
    ltgt (different).
  • The null keyword matches blank values.
  • Silly-proof functions.
  • The data block can be shown before transference.

10
Data filter and the selection of blocks of data
11
Spartacus ICU database, full grow tree (Target
outcome, 49 nodes)
12
Answer tree ICU database, full grown tree
(Target outcome, 11 nodes)
13
Spartacus ICU database, pruned tree (Target
outcome, 11 nodes)
14
Answer tree ICU database, pruned tree (Target
outcome, 1 node only )
15
Coming soon
  • Tree misclassification tests.
  • C4.5 implementation.
  • More prune methods.
  • Globalized numeric formats.
  • Recalculation of the gain if a custom split is
    applied.
  • Full support for missing values.

16
QA
  • Any specific facility for the next meeting?
  • See you by the end of May?
  • http//evandro.org/calendar

17
(No Transcript)
18
PEP
  • PEP (pessimistic error pruning) algorithm. PEP is
    a post-prune top down algorithm. It does not
    require a separate pruning set. PEP try to
    compensate  the overly optimistic estimates based
    on resubstitution error ( error based on the
    training sample tested with itself). These
    estimates are overly optimistic since error rates
    on training data are typically lower than on the
    test data. PEP attempts to get a more accurate
    error estimate by imposing correction for the
    binomial distribution by adding 0.5 to the number
    of errors associated with each node. Also,
    subtrees are not pruned only if their corrected
    error estimates is at least one standard error
    less than the estimated error of their root node.
Write a Comment
User Comments (0)
About PowerShow.com