Data mining exercise with SPSS Clementine Lab 2 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data mining exercise with SPSS Clementine Lab 2

Description:

REVIEW. KDD process. Differences between nodes. How to build ... AND Buy book(Harry Potter and the Half-Blood Prince) AND Buy book(A Million Little Pieces) ... – PowerPoint PPT presentation

Number of Views:1946
Avg rating:4.0/5.0
Slides: 22
Provided by: LAP7
Category:

less

Transcript and Presenter's Notes

Title: Data mining exercise with SPSS Clementine Lab 2


1
Data mining exercisewith SPSS ClementineLab 2
  • Winnie Lam
  • Email cswinnie_at_comp.polyu.edu.hk
  • Website http//www.comp.polyu.edu.hk/cswinnie/
  • The Hong Kong Polytechnic University
  • Department of Computing

Last update22/09/2005
2
REVIEW
  • KDD process
  • Differences between nodes
  • How to build streams in Clementine
  • How to do data preparation with Clementine
  • How to perform Association modeling
  • Questions?

3
Practical Tryout
4
Data Understanding
Data file http//www.comp.polyu.edu.hk/ cswinnie
/data/MyData_lab2.mdb
  • Tables
  • MyData_lab2a MyData_lab2b

Total no. of records 1500 Total no. of
products 10
Total no. of records 500 Total no. of
products 10
5
Data Understanding
Goal Import Data to Clementine
  • Add Data Source (ODBC) in Control Panel
  • Open Administrative Tools in Control Panel
  • Add user data source
  • Choose Microsoft Access Driver
  • Fill in the Data Source Name and Select the
    database file

6
Data Understanding
Import Data to Clementine (cont)
Add Nodes Database (in Source Palette)
  • Add new database connection
  • Choose Data Source and Connect
  • Select Table .MyData_lab2a

Repeat for another table
7
Data Understanding
Goal Combine the two sets of data
Add Nodes Append (in Record Ops Palette)
Discover data characteristics with these nodes
8
Data Understanding
Goal Discover data characteristics
9
Data Understanding
Goal Discover data characteristics
10
Data Understanding
Goal Discover data characteristics
Double click
Result
11
Data Preparation
12
Data Preparation
Goal Unify value with the same symbol
Add Node Filler (in Field Ops Palette)
Repeat the steps for False
13
Data Preparation
Goal Sampling (select 10 randomly)
Add Node Sample (in Record Ops Palette)
1000 ? 100 RECORDS
Result
Roughly 10
Set max. size
14
Data Preparation
Goal Save data in Cache to speed up the
running process
  • Right click on the Sample node
  • Choose Cache gt Enable

15
Data Mining
  • Association

16
Association
Goal Define IN and OUT fields
Add Nodes Type (in Field Ops Palette)
1
2
17
Association
Goal Study relationships between attributes
Add Nodes Web (in Graph Palette)
Result
18
Association
Goal Perform Association modeling
Add Nodes Apriori (in Modeling Palette)
We dont need to select the Consequents and
Antecedents, because we have already defined in
the Type node.
19
Association
  • Try to adjust different
  • support value
  • confidence value
  • Other than Apriori,
  • you can try other association modeling nodes
    like
  • GRI
  • CARMA

20
Quiz
Find out a 4-item interesting rule with highest
"Rule Support " and "Lift Ratio from the
dataset. Hints Data preprocessing is needed.
Dataset TestData in MyData_lab2.mdb
Interesting Rule IF Buy book(Storyteller's
Daughter) AND Buy book(Harry Potter and the
Half-Blood Prince) AND Buy book(A Million Little
Pieces) THEN Buy book(Undead and Unreturnable)
21
SUMMARY
  • Today, youve learnt
  • How to analyze another type of data source
  • How to combine data from different sources
  • Random sampling
  • Unify data values
  • How to discover relationships between attributes
  • how to perform Association modeling
Write a Comment
User Comments (0)
About PowerShow.com