Data mining exercise with SPSS Clementine Lab 2

About This Presentation

Title:

Data mining exercise with SPSS Clementine Lab 2

Description:

REVIEW. KDD process. Differences between nodes. How to build ... AND Buy book(Harry Potter and the Half-Blood Prince) AND Buy book(A Million Little Pieces) ... – PowerPoint PPT presentation

Number of Views:1946

Avg rating:4.0/5.0

Slides: 22

Provided by: LAP7

Category:

more less

Transcript and Presenter's Notes

Title: Data mining exercise with SPSS Clementine Lab 2

1
Data mining exercisewith SPSS ClementineLab 2

Winnie Lam
Email cswinnie_at_comp.polyu.edu.hk
Website http//www.comp.polyu.edu.hk/cswinnie/
The Hong Kong Polytechnic University
Department of Computing

Last update22/09/2005
2
REVIEW

KDD process
Differences between nodes
How to build streams in Clementine
How to do data preparation with Clementine
How to perform Association modeling
Questions?

3
Practical Tryout
4
Data Understanding
Data file http//www.comp.polyu.edu.hk/ cswinnie
/data/MyData_lab2.mdb

Tables
MyData_lab2a MyData_lab2b

Total no. of records 1500 Total no. of
products 10
Total no. of records 500 Total no. of
products 10
5
Data Understanding
Goal Import Data to Clementine

Add Data Source (ODBC) in Control Panel

Open Administrative Tools in Control Panel
Add user data source
Choose Microsoft Access Driver
Fill in the Data Source Name and Select the
database file

6
Data Understanding
Import Data to Clementine (cont)
Add Nodes Database (in Source Palette)

Add new database connection
Choose Data Source and Connect
Select Table .MyData_lab2a

Repeat for another table
7
Data Understanding
Goal Combine the two sets of data
Add Nodes Append (in Record Ops Palette)
Discover data characteristics with these nodes
8
Data Understanding
Goal Discover data characteristics
9
Data Understanding
Goal Discover data characteristics
10
Data Understanding
Goal Discover data characteristics
Double click
Result
11
Data Preparation
12
Data Preparation
Goal Unify value with the same symbol
Add Node Filler (in Field Ops Palette)
Repeat the steps for False
13
Data Preparation
Goal Sampling (select 10 randomly)
Add Node Sample (in Record Ops Palette)
1000 ? 100 RECORDS
Result
Roughly 10
Set max. size
14
Data Preparation
Goal Save data in Cache to speed up the
running process

Right click on the Sample node
Choose Cache gt Enable

15
Data Mining

Association

16
Association
Goal Define IN and OUT fields
Add Nodes Type (in Field Ops Palette)
1
2
17
Association
Goal Study relationships between attributes
Add Nodes Web (in Graph Palette)
Result
18
Association
Goal Perform Association modeling
Add Nodes Apriori (in Modeling Palette)
We dont need to select the Consequents and
Antecedents, because we have already defined in
the Type node.
19
Association

Try to adjust different
support value
confidence value

Other than Apriori,
you can try other association modeling nodes
like
GRI
CARMA

20
Quiz
Find out a 4-item interesting rule with highest
"Rule Support " and "Lift Ratio from the
dataset. Hints Data preprocessing is needed.
Dataset TestData in MyData_lab2.mdb
Interesting Rule IF Buy book(Storyteller's
Daughter) AND Buy book(Harry Potter and the
Half-Blood Prince) AND Buy book(A Million Little
Pieces) THEN Buy book(Undead and Unreturnable)
21
SUMMARY

Today, youve learnt
How to analyze another type of data source
How to combine data from different sources
Random sampling
Unify data values
How to discover relationships between attributes
how to perform Association modeling

Write a Comment

User Comments (0)

About PowerShow.com

Data mining exercise with SPSS Clementine Lab 2 - PowerPoint PPT Presentation

Data mining exercise with SPSS Clementine Lab 2

REVIEW. KDD process. Differences between nodes. How to build ... AND Buy book(Harry Potter and the Half-Blood Prince) AND Buy book(A Million Little Pieces) ... – PowerPoint PPT presentation