Scalable Decision Tree SPRINT PowerPoint PPT Presentation

presentation player overlay
1 / 6
About This Presentation
Transcript and Presenter's Notes

Title: Scalable Decision Tree SPRINT


1
Scalable Decision Tree SPRINT
  • Project Members
  • Kaushal Mittal
  • Abhishek Seth
  • Amar Agrawal

2
Problem Statement
  • Current decision tree implementation in Weka
    fails for large datasets.
  • Scalable implementation of decision trees in
    Weka.
  • Support for disk resident data.

3
Challenges
  • Instance class in Weka loads the entire training
    data in memory.
  • Multiple copies of the instance data made at
    several points during the training.
  • Other classes assume the existence of
    memory-resident instance data.

4
Changes in Weka
  • Extended the Instance class to support disk
    resident data.
  • Use of cache and random access files.
  • Changes to the Evaluation class to work with the
    new SInstance class.

5
Decision Tree Classifier
  • Design similar to Weka classifier J48.
  • SPRINT algorithm implemented.
  • Use of disk resident attribute lists.
  • Generates a binary classifier tree.
  • Uses Gini index as split criteria.

6
Results
  • Accuracy comparable to J48.
  • Glass 214
  • J48 - 100
  • Sprint 91.667
  • Adult
  • J48 83.3
  • Sprint 79.8
  • Execution time More than default J48 for small
    data sets(IO). For large data sets, Weka fails.
Write a Comment
User Comments (0)
About PowerShow.com