Title: New software facilities
1New software facilities
- Developed between
- Dec 2004 and Feb 2005
2New implemented facilities
- Automatic and manual splits for
- Gini index of impurity.(used for classification)
- Variance measure of impurity.(used for
regression)
3Manual Split
4Manual Split
5Installer
6General Options
7CRT options
Obs For a large datasets use higher numbers
8New implemented facilities
- Pruning using the Pessimistic Error Pruning
algorithm (PEP) can be used only for pruning
classification trees by Ross Quinlan (Quinlan,
1987). - Data transference from any node to another one.
- Disposal of sub nodes of any node, except leaves.
9New implemented facilities
- Data filter and the selection of blocks of data
- Uses the set operations Union and
Intersection. - The variables can be either categorical or
numerical. - The comparison operators are , lt, lt, gt, gt and
ltgt (different). - The null keyword matches blank values.
- Silly-proof functions.
- The data block can be shown before transference.
10Data filter and the selection of blocks of data
11Spartacus ICU database, full grow tree (Target
outcome, 49 nodes)
12Answer tree ICU database, full grown tree
(Target outcome, 11 nodes)
13Spartacus ICU database, pruned tree (Target
outcome, 11 nodes)
14Answer tree ICU database, pruned tree (Target
outcome, 1 node only )
15Coming soon
- Tree misclassification tests.
- C4.5 implementation.
- More prune methods.
- Globalized numeric formats.
- Recalculation of the gain if a custom split is
applied. - Full support for missing values.
16QA
- Any specific facility for the next meeting?
- See you by the end of May?
- http//evandro.org/calendar
17(No Transcript)
18PEP
- PEP (pessimistic error pruning) algorithm. PEP is
a post-prune top down algorithm. It does not
require a separate pruning set. PEP try to
compensate the overly optimistic estimates based
on resubstitution error ( error based on the
training sample tested with itself). These
estimates are overly optimistic since error rates
on training data are typically lower than on the
test data. PEP attempts to get a more accurate
error estimate by imposing correction for the
binomial distribution by adding 0.5 to the number
of errors associated with each node. Also,
subtrees are not pruned only if their corrected
error estimates is at least one standard error
less than the estimated error of their root node.