Title: Running Clustering Algorithm in Weka
1Running Clustering Algorithm in Weka
Presented by Rachsuda Jiamthapthaksin Computer
Science Department University of Houston
2What is Weka?
- Data mining software in Java
- Supervised learning (classification)
- Unsupervised learning (clustering)
- Tools
- Exploration
- Visualization
- Experiment
- Statistical summary
3Download Weka
- http//www.cs.waikato.ac.nz/ml/weka/
- Window (weka-3-5-6jre.exe)
- Linux
4Getting Start
5Memory Limitation in Weka
- Run Chooser from DOS to increase memory
- C\gt java -Xmx128m -classpath ./progra1/weka-
3-5/weka.jar weka.gui.GUIChooser
6Weka GUI
7Explorer
8Open Files (.csv, .arff)
9Datasets Description
Datasets statistics
Attributes
10Remove Class Attribute
Non-class attributes
11Select A Clustering Algorithm
12Select A Clustering Algorithm
13Select A Clustering Algorithm
14Parameters Setting
15Run A Clustering Algorithm
16DBSCAN Results
- Run information
- Scheme weka.clusterers.DBScan -E 0.9 -M 6
-I weka.clusterers.forOPTICSAndDBScan.Databases.Se
quentialDatabase -D weka.clusterers.forOPTICSAndDB
Scan.DataObjects.EuclidianDataObject - Relation iris-weka.filters.unsupervised.attri
bute.Remove-R5 - Instances 150
- Attributes 4
- sepallength
- sepalwidth
- petallength
- petalwidth
- Test mode evaluate on training data
- Model and evaluation on training set
- DBScan clustering results
- Clustered DataObjects 150
- Number of attributes 4
17Simplify A Tested Dataset
18Simplify A Tested Dataset
19Parameters Setting
20DBSCAN Clustering Results
- Run information
- Scheme weka.clusterers.DBScan -E 0.3 -M 50
-I weka.clusterers.forOPTICSAndDBScan.Databases.Se
quentialDatabase -D weka.clusterers.forOPTICSAndDB
Scan.DataObjects.EuclidianDataObject - Relation iris-weka.filters.unsupervised.attri
bute.Remove-R1-2,5 - Instances 150
- Attributes 2
- petallength
- petalwidth
- Test mode evaluate on training data
- Model and evaluation on training set
- DBScan clustering results
- Clustered DataObjects 150
- Number of attributes 2
- Epsilon 0.3 minPoints 50
- Index weka.clusterers.forOPTICSAndDBScan.Database
s.SequentialDatabase
21Run k-Means in Weka
22Parameters Setting
23k-Means Clustering Results
- Run information
- Scheme weka.clusterers.SimpleKMeans -N 2
-S 10 - Relation iris-weka.filters.unsupervised.attri
bute.Remove-R1-2,5 - Instances 150
- Attributes 2
- petallength
- petalwidth
- Test mode evaluate on training data
- Model and evaluation on training set
- kMeans
-
- Number of iterations 6
- Within cluster sum of squared errors
5.179687509974782
24ArffViewer Convert Datasets Extension
25Open A Datasets file
26Select A Datasets File
27View the Dataset
28Manipulate the Dataset (Optional)
29Save As .Arff File
30Weka Documentation