DR. UREERAT SUKSAWATCHON - PowerPoint PPT Presentation

About This Presentation
Title:

DR. UREERAT SUKSAWATCHON

Description:

Weka Introducing DR. UREERAT SUKSAWATCHON 321641 DATA MINING What is Weka? Waikato Environment for Knowledge Analysis Since 1997 by Waikato University, New Zealand ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 31
Provided by: csus98
Category:

less

Transcript and Presenter's Notes

Title: DR. UREERAT SUKSAWATCHON


1
Weka Introducing
  • DR. UREERAT SUKSAWATCHON
  • 321641 DATA MINING

2
What is Weka?
  • Waikato Environment for Knowledge Analysis
  • Since 1997 by Waikato University, New Zealand
  • Opensource software used for data analysis by
    data mining techniques
  • The system is written in JAVA and distributed
    under the terms of the GNU General Public License
  • It runs on any platform
  • Linux, Windows, Mac OS, PDA

3
How do we get?
  • http//www.cs.waikato.ac.nz/ml/weka
  • Free Weka tutorial on Web and book

4
Starting Weka
  • Wekas GUI

5
Weka Explorer
Tab ????????????????????
Workspace
Status Bar
6
Weka Explorer
  • Preprocess
  • Classify
  • Cluster
  • Associate
  • Select Attributes
  • Visualize

7
Preparing the data
  • ??????
  • Instance
  • Attribute
  • ???????????????????????????? Weka
  • ???? CSV (Comma-Separated Value)
  • ???? ARFF (Attribute-Relation File Format)
  • ?????????
  • Data Preprocessing with Weka
  • ??????????????????????????????????
  • ????????????????????
  • ??????????? Outliers

8
Ex Customer Data
  • ??????????????????????????????????????????????????
    ???????

????????????? ?????????? ??? ???? ??????
1 ????? ??? 25 12,000
2 ???? ???? 18 7,000
3 ?????? ???? 35 35,000
4 ??????? ???? 15 4,000
5 ????? ???? 300 20,000
Instance ??????????????????????????
Attribute ?????????????????????? ? ?????????
9
Ex Customer Data
  • ??????????????????? Attribute
  • Numeric
  • ???????????????????
  • ????????????????
  • ?????? attribute ????????????? ???? ?????????
  • Non-numeric ???? Categorical
  • ???????????????? ?
  • ??????????????????
  • ?????? attribute ?????????? ??????

10
Ex Customer Data
  • ???????????????????????? Weka
  • Open file ?????????? CSV ???? ARFF
    ??????????????????????
  • Open URL ?????????? CSV ???? ARFF ?????????????
  • Open DB ?????????????????????
  • Generate ??????????????????? ??????????????????
    ??????????????? instances ???????? attributes

11
Ex Customer Data
  • ?????????? CSV (Comma-Separated Value)
  • ?????????????? Comma (,) ??????????? attribute
  • ????????? Excel ?????????????? CSV (????? Save
    ???? CSV Format)

?????????????????? CSV
???????????? Excel
?????????? CSV ???????????
12
Ex Customer Data
  • ?????????? ARFF (Attribute-Relation File Format)
  • ??????????? Weka ????????????
  • ???????? 2 ????
  • ???? Header ?????????????????????????? ?
  • ???????????????? (relation)
  • ???????????? attribute
  • ?????????????????????? attribute (data type)
  • ???? Data ???????????????????????? attribute
    ???????????????????????????

13
Ex Customer Data
  • Tag ???????????????? header ?????? ARFF
  • _at_relation ltrelation-namegt
  • ????????????????????????????????
  • _at_attribute ltattribute-namegt ltdata typegt
  • ??????????????? attribute ????????????????
  • Tag ???????????????? Data ?????? ARFF
  • _at_data
  • ??????????? ????????????????????????????????????
    ????????????????? comma ??????????? attribute
  • ??? comment ????????????

14
Ex Customer Data
15
Ex Customer Data
  • ??????????????????? attribute
  • ???????????????????
  • ????????? ????????????
  • ??? keyword numeric
  • ????????????????????? (nominal) ?????????????
  • ???? ?????? ??????? ??????? ???
  • ???????????????????????????????????????? set ????
    sex 0,1,2 ???????
  • ???????????????????????????????? CSV ??? ARFF

16
Ex Weather Data
  • ????????????? weather.arff ?????? C\Program
    Files\Weka-3-6\data
  • ??????????????????????????????????????????????????
    ?
  • ??????????????????????????? 14 ???
  • ????? http//www.theweatherprediction.com/habyhint
    s/285/

17
Ex Weather Data
  • ????????????? weather.arff ?????? C\Program
    Files\Weka-3-6\data
  • ??????????????????????????????????????????????????
    ?
  • ??????????????????????????? 14 ???
  • ???? weather.arff ???? editor
  • ????? http//www.theweatherprediction.com/habyhint
    s/285/

18
Ex Weather Data
  • ???? Weka ????? Explorer -gt ?????? Open file -gt
    ????????? weather.arff

1
6
2
4
3
5
19
Ex Weather Data
  • ??????????????? Preprocess ??????? Filter
  • ?????????? ???????????????????????????????????????
    ????? nominal ????????
  • Discretize ???????????????? (numeric or real)
    ?????????????????? (nominal)
  • StringToNominal ??????????????????????????????
    (string) ????????????????????????? (nominal)
    ?????????????????????
  • ?????????????????????
  • ReplaceMissingValue ??????????????????????
  • ???????? Outliers
  • InterquartileRange ??????????????????????????????
    ?

20
Ex Weather Data
  • Discretization ???????????????? (numeric or
    real) ?????????????????? (nominal)

Discretize
21
Ex Weather Data
  • Missing Value
  • ?????????????????? ??????????
  • ??????????????????????????
  • ??????????????????????????
  • ???????????????????????????????????????????
  • ??????????? Cutomer ??????????????????????????

????????????? ?????????? ??? ???? ??????
1 ????? ??? 25 12,000
2 ???? ???? 18 7,000
3 ?????? ???? 35 35,000
4 ??????? ???? 15 4,000
5 ????? ???? 300 20,000
22
Ex Weather Data
  • ??? Replace missing value
  • ?????????????? ? ?????????????????????????????????
    ???????????? replace
  • ??????????????????? ?????????? ????????? (mean)
    ???????? attribute ???? ?
  • ??????????????????? ?????????? ???????????????????
    ????? (mode) ?? attribute ???? ?

23
Ex Weather Data
  • ??? Replace missing value

????????????? ?????????? ??? ???? ??????
1 ????? ??? 25 12,000
2 ???? ???? 18 7,000
3 ?????? ???? 35 35,000
4 ??????? ? 15 4,000
5 ????? ???? ? 20,000
24
Ex Weather Data
  • ??? Replace missing value

????????????? ?????????? ??? ???? ??????
1 ????? ??? 25 12,000
2 ???? ???? 18 7,000
3 ?????? ???? 35 35,000
4 ??????? ???? 15 4,000
5 ????? ???? 22.25 20,000
25
Ex Weather Data
  • ???????????????????? CustomerData.arff
  • Save ?????????? CustomerData_wmissing.arff
  • ?????? Choose -gt filters-gtunsupervised-gtattribute-
    gtReplaceMissingValues ?????????? Apply

26
Ex Weather Data
  • Detect Outlier
  • Outlier ??????????????????????????????????????????
    ?????????? ????????????????????????????? noise
    ?????
  • ?????????????????? Interquartile range (IQR)
  • ??????????? outlier ??????? customer_outlier.arff
  • Edit ????????????
  • ?????? outlier ??????????????????????????????
    ???????? attribute Customer_ID ??? Name
  • ?????? Choose -gt filters-gtunsupervised-gtattribute-
    gtInterquartileRange ?????????? Apply

27
Memory Error
  • ????????? Weka ???????????????????????????????????
    ??????????????? (memory) ????????????????
    ??????????????????????????????????????
  • ??????? error ??? ?????????????????????????
  • ???????????????????????????????????? ???????????
  • Approx_mem number of attributes number of
    instances 8
  • ???????????? ?????? 10,000,000 instances ?? 10
    attributes ????????????????????
  • 10,000,000108 800,000,000 800 MB

28
Memory Error
  • ??????????????????????????? ????????????
    Gernerate
  • ????? numExamples ???? 1,000,000 ??????????
    Gernerate

?????? Click ?????????? parameters
29
Memory Error
  • ?????????????????????

30
Memory Error
  • ???????????????????????????????
  • C\Program Files\Weka-3-6\RunWeka.ini
  • ???????????? maxheap ??????????
    ????????????????????????????????????????
Write a Comment
User Comments (0)
About PowerShow.com