Linear Clustering Algorithm - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Linear Clustering Algorithm

Description:

Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta Overview Introduction Data Preprocessing Data Mining Data Visualization Experiment ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 14
Provided by: cseMsuEd3
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Linear Clustering Algorithm


1
Linear Clustering Algorithm
BY Horne Ken Khan Farhana Padubidri
Shweta

2
Overview
  • Introduction
  • Data Preprocessing
  • Data Mining
  • Data Visualization
  • Experiment
  • Conclusion

3
Responsibility
  • Data Preprocessing Farhana Ken
  • Data Mining Ken
  • Data Visualization Shweta

4
Overview
  • A Linear Clustering Algorithm
  • Applications
  • Feature selection
  • Choose features based on information gain
  • Discretization
  • Partition based on data set characteristics

5
Data Preprocessing
  • Data Ferret(Federated Electronic
    Research,Review,Extraction Tabulation Tool)
  • Install the software
  • Web-version
  • http//www.thedataweb.org/what_ferrett.html

6
Data Pre-processing Step
  • Extracted data from CPS (Current Population
    Survey)
  • Pre-processing
  • Number of features 43
  • Year 2007-2008
  • 115,000/month rows over 50 states
  • After preprocessing 23
  • Normalization

7
Data Mining
  • Algorithm
  • Choose an ordinal attribute (X)
  • Order data points based on attribute
  • List potential partition points (between
    successive values of X)
  • For each potential partition point P
  • Calculate distance of data points where X lt P to
    X gt P
  • Results
  • Can partition data points
  • Order data points by information gain

8
Data Mining
  • Test dataset

9
Data Mining
  • Test dataset 2

10
Experimental Setup
  • Environment
  • Data Ferret Data Pre-processing
  • Java Platform Implement the Data Mining
    Algorithm
  • Data Visualization
  • Google App Engine
  • Datastore API
  • Python, javascript and Django Framework
  • Google Chart API
  • Hardware
  • Windows XP laptop Core2 2.16 GHz
  • 2.00 GB RAM (that hurt)

11
Visualization Demo
  • Link for the web-site
  • http//householdstructure-project.appspot.com/

12
Conclusions
  • Preliminary results are encouraging
  • Discretization was successful
  • Lessons learnt and future work
  • Comparison with other methods on well known
    datasets
  • Evaluate performance in feature selection
  • OPTIMIZE
  • Don't pick a novel dataset novel algorithm at
    the same time

13
Thank you
  • Questions
Write a Comment
User Comments (0)
About PowerShow.com