Earthquake Prediction using Data Mining Tools presentation

About This Presentation

Transcript and Presenter's Notes

Title: Earthquake Prediction using Data Mining Tools

1

Earthquake Prediction using Data Mining Tools
Mrinalini Kabbur
Ritu Chinya
Progress Report

2
Introduction

An earthquake is a sudden movement of the Earth,
caused by the abrupt release of strain that has
accumulated over a long time.
Earthquakes remain to be one of the unpredictable
natural hazards so far.
The goal of earthquake prediction is to give
warning of potentially damaging earthquakes early
enough to allow appropriate response to the
disaster, enabling people to minimize loss of
life and property.

3
Project Design

This project deals with Earthquake classification
and prediction using Data mining tools.
Weka was used to develop the model
Naïve Bayesian was used to classify unknown class
label.
Used C4.5 with 66 split to classify the data and
10-fold cross validation to evaluate accuracy.

4
Method

Installalation of Weka
Weka is a set of software for machine learning
and mining
Developed at the University of Waikato in New
Zealand
Available for free
Easy to use Graphical User Interface

Learning Weka
Both of us were new to weka
Used tutorial by Svetlana Aksanova
Looked up the internet for additional information
on Weka
Gathering EarthQuake Data Set
Consists of the Earthquakes that happened in
the Northern California region during 2005.
Data gathered from United States Geological
Survey (USGS) website.

Data preprocessing
Weka algorithms work on ARFF format
But the data was in HTML format as shown below.

7
The data was in HTML format as shown below.
8
Data Preprocessing (Contd)

So the data had to be transferred to an Excel
file.
Tough to directly convert from HTML to Excel.
So the data was first saved in the word format.

9
Excel Format
10

Conversion from Excel to ARFF format.
Save the Excel file as csv.
Used awk commands to format the data.
Keyed in some missing data.

Data Cleansing
The earthquake data contained many parameters.
They include
Date and time
Longitude
Latitude
Depth
Magnitude
Event ID
Source
Magt
Nst
Gap
Clo
Attributes of interest include
Date and Time
Longitude
Latitude
Depth
Magnitude

12
Date and time fields are not considered while
applying the classification algorithm. The filter
weka.filters.unsupervised.attribute.Remove is
applied to remove the date and time attribute.
This is shown below.
13

Descretize
Attributes contain numeric data.
Some Weka algorithms like ID3 require nominal
attribute Values.
Convertion of numeric attributes to nominal.
The attributes Longitude, Latitude, Depth and
Magnitude are all desctretized by using the
filter weka.filters.unsupervised.attribute.Descre
tize.

Apply Classification rules to come up with
Decision trees
Rules sets
Algorithms used for modelling
C4.5
Naïve Bayesian

15
C4.5

We have considered two cases.
Cross-Validation Evaluates the classifier by
cross-validation, using the number of folds that
are entered in the Folds text field.
Percentage split Evaluates the classifier on how
well it predicts a certain percentage of the
data, which is held out for testing. The amount
of data held out depends on the value entered in
the field.

16
First we will consider the classifier based on
how well it predicts 66 of the test data as
shown in the below.
17
Run Analysis
18
Run Information gives you the following
information the algorithm you used - J48 the
relation name Earthquake number of
instances in the relation 113 number of
attributes in the relation 4 and the list of
the attributes Longitude, Latitude, Depth,
Magnitude. the test mode you selected split66
Classifier model is a un-pruned decision tree in
textual form that was produced on the full
training data. As you can see, the first split
is on the Longitude attribute, at the second
level, the splits are on Latitude and
Longitude
Below the tree structure, there is a number of
leaves (which is 10), and the number of nodes in
the tree - size of the tree (which is 19). The
program gives a time it took to build the model,
which is 0.06 seconds.
In this case only 67 of 113 training instances
have been classified correctly. This indicates
that the results obtained from the training data
are not optimistic compared with what might
be obtained from the independent test set from
the same source.
19
WEKA also lets you to visualize decision tree
20

Accuracy Estimation
Ten fold Cross validation
Snapshot of Naïve
Bayesian classification
using Weka

21
Run Information

Run information
Scheme weka.classifiers.bayes.NaiveBayes
Relation Earthquake-weka.filters.unsupervised
.attribute.Discretize-B10-M-1.0-Rlast
Instances 113
Attributes 4
Latitude
Longitude
Depth
Magnitude
Test mode 10-fold cross-validation
Classifier model (full training set)
Naive Bayes Classifier
Time taken to build model 0.06 seconds
Stratified cross-validation
Summary
Correctly Classified Instances 69
61.0619
Incorrectly Classified Instances 44
38.9381
Kappa statistic -0.0061
Mean absolute error 0.1187

22
Run Information (Cont)

Detailed Accuracy By Class
TP Rate FP Rate Precision Recall F-Measure
Class
0.972 0.976 0.627 0.972 0.762
'(-inf-3.41'
0 0 0 0 0
'(3.41-3.82'
0 0.019 0 0 0
'(3.82-4.23'
0 0 0 0 0
'(4.23-4.64'
0 0 0 0 0
'(4.64-5.05'
0 0 0 0 0
'(5.05-5.46'
0 0 0 0 0
'(5.46-5.87'
0 0 0 0 0
'(5.87-6.28'
0 0 0 0 0
'(6.28-6.69'
0 0.009 0 0 0
'(6.69-inf)'
Confusion Matrix
a b c d e f g h i j lt-- classified
as
69 0 1 0 0 0 0 0 0 1 a
'(-inf-3.41'
24 0 1 0 0 0 0 0 0 0 b
'(3.41-3.82'
8 0 0 0 0 0 0 0 0 0 c
'(3.82-4.23'
6 0 0 0 0 0 0 0 0 0 d
'(4.23-4.64'
2 0 0 0 0 0 0 0 0 0 e
'(4.64-5.05'

23
Learnings from the project

We both were new to Weka and learnt to use Weka
software.
It was challenging to analyze large amount of
data as compared to what we did in our home
works.
We realized that data pre-processing indeed takes
a long time.
We got a clear understanding of C4.5 and Naïve
Bayesian classification algorithms.

24
Division of work

We worked together on all the tasks.

Conclusion
We realized that data mining tools are very
powerful and save a lot of time for classifying
huge amount data. We found that using C4.5
algorithm and 66 of data as training data gave
an accuracy of 67 whereas 10-fold
cross-validation gave an accuracy of 62 in the
case of earthquake data. The Naïve Bayesian
algorithm also correctly classified 61 of the
test data. So, the results were pretty close. All
in all, the project was very interesting and
challenging and we enjoyed working on it.
25
Reference

http//www.studentprogress.com/appln/colleges/cogr
ec/Papers/D_05.pdf
www.meteoquake.org/our.html
http//www.cs.waikato.ac.nz/ml/weka/index.html
http//gaia.ecs.csus.edu/mei/215/tutorial.html
http//www.ngdc.noaa.gov/seg/hazard/sig_srch_idb.s
html
Weka Explorer tutorial by Svetlana Aksanova

Write a Comment

User Comments (0)

About PowerShow.com

Earthquake Prediction using Data Mining Tools PowerPoint PPT Presentation