Scalable Decision Tree SPRINT - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Scalable Decision Tree SPRINT

Description:

Scalable Decision Tree. SPRINT. Project Members. Kaushal Mittal. Abhishek Seth. Amar Agrawal ... Current decision tree implementation in Weka fails for large datasets. ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 7
Provided by: itIi
Category:

less

Transcript and Presenter's Notes

Title: Scalable Decision Tree SPRINT


1
Scalable Decision Tree SPRINT
  • Project Members
  • Kaushal Mittal
  • Abhishek Seth
  • Amar Agrawal

2
Problem Statement
  • Current decision tree implementation in Weka
    fails for large datasets.
  • Scalable implementation of decision trees in
    Weka.
  • Support for disk resident data.

3
Challenges
  • Instance class in Weka loads the entire training
    data in memory.
  • Multiple copies of the instance data made at
    several points during the training.
  • Other classes assume the existence of
    memory-resident instance data.

4
Changes in Weka
  • Extended the Instance class to support disk
    resident data.
  • Use of cache and random access files.
  • Changes to the Evaluation class to work with the
    new SInstance class.

5
Decision Tree Classifier
  • Design similar to Weka classifier J48.
  • SPRINT algorithm implemented.
  • Use of disk resident attribute lists.
  • Generates a binary classifier tree.
  • Uses Gini index as split criteria.

6
Results
  • Accuracy comparable to J48.
  • Glass 214
  • J48 - 100
  • Sprint 91.667
  • Adult
  • J48 83.3
  • Sprint 79.8
  • Execution time More than default J48 for small
    data sets(IO). For large data sets, Weka fails.
Write a Comment
User Comments (0)
About PowerShow.com