Title: Data mining exercise with SPSS Clementine Lab 2
1Data mining exercisewith SPSS ClementineLab 2
- Winnie Lam
- Email cswinnie_at_comp.polyu.edu.hk
- Website http//www.comp.polyu.edu.hk/cswinnie/
- The Hong Kong Polytechnic University
- Department of Computing
Last update22/09/2005
2REVIEW
- KDD process
- Differences between nodes
- How to build streams in Clementine
- How to do data preparation with Clementine
- How to perform Association modeling
- Questions?
3Practical Tryout
4Data Understanding
Data file http//www.comp.polyu.edu.hk/ cswinnie
/data/MyData_lab2.mdb
- Tables
- MyData_lab2a MyData_lab2b
Total no. of records 1500 Total no. of
products 10
Total no. of records 500 Total no. of
products 10
5Data Understanding
Goal Import Data to Clementine
- Add Data Source (ODBC) in Control Panel
- Open Administrative Tools in Control Panel
- Add user data source
- Choose Microsoft Access Driver
- Fill in the Data Source Name and Select the
database file
6Data Understanding
Import Data to Clementine (cont)
Add Nodes Database (in Source Palette)
- Add new database connection
- Choose Data Source and Connect
- Select Table .MyData_lab2a
Repeat for another table
7Data Understanding
Goal Combine the two sets of data
Add Nodes Append (in Record Ops Palette)
Discover data characteristics with these nodes
8Data Understanding
Goal Discover data characteristics
9Data Understanding
Goal Discover data characteristics
10Data Understanding
Goal Discover data characteristics
Double click
Result
11Data Preparation
12Data Preparation
Goal Unify value with the same symbol
Add Node Filler (in Field Ops Palette)
Repeat the steps for False
13Data Preparation
Goal Sampling (select 10 randomly)
Add Node Sample (in Record Ops Palette)
1000 ? 100 RECORDS
Result
Roughly 10
Set max. size
14Data Preparation
Goal Save data in Cache to speed up the
running process
- Right click on the Sample node
- Choose Cache gt Enable
15Data Mining
16Association
Goal Define IN and OUT fields
Add Nodes Type (in Field Ops Palette)
1
2
17Association
Goal Study relationships between attributes
Add Nodes Web (in Graph Palette)
Result
18Association
Goal Perform Association modeling
Add Nodes Apriori (in Modeling Palette)
We dont need to select the Consequents and
Antecedents, because we have already defined in
the Type node.
19Association
- Try to adjust different
- support value
- confidence value
- Other than Apriori,
- you can try other association modeling nodes
like - GRI
- CARMA
20Quiz
Find out a 4-item interesting rule with highest
"Rule Support " and "Lift Ratio from the
dataset. Hints Data preprocessing is needed.
Dataset TestData in MyData_lab2.mdb
Interesting Rule IF Buy book(Storyteller's
Daughter) AND Buy book(Harry Potter and the
Half-Blood Prince) AND Buy book(A Million Little
Pieces) THEN Buy book(Undead and Unreturnable)
21SUMMARY
- Today, youve learnt
- How to analyze another type of data source
- How to combine data from different sources
- Random sampling
- Unify data values
- How to discover relationships between attributes
- how to perform Association modeling