Title: Data mining exercise with SPSS Clementine Lab 4
1Data mining exercisewith SPSS ClementineLab 4
- Winnie Lam
- Email cswinnie_at_comp.polyu.edu.hk
- Website http//www.comp.polyu.edu.hk/cswinnie/
- The Hong Kong Polytechnic University
- Department of Computing
Last update22/09/2005
2Data mining process
Define target discover useful data
Data Understanding
Obtain Clean Useful data
Data Preprocessing
Discover patterns
Modeling (Data Mining)
3Modeling Tools Clustering
- K-means. An approach to clustering that defines k
clusters and iteratively assigns records to
clusters based on distances from the mean of each
cluster until a stable solution is found. - TwoSteps. A clustering method that involves
preclustering the records into a large number of
subclusters and then applying a hierarchical
clustering technique to those subclusters to
define the final clusters. - Kohonen Networks. A type of neural network used
for clustering. Also known as a self organizing
map (SOM).
4Classification
With predefined class!
5Clustering
No class is defined previously!
CROSS
STAR
k4
Red TRIANGLE
Yellow TRIANGLE
6Data Understanding
- Data file is located in
- http//www.comp.polyu.edu.hk/cswinnie/lab.html
7Data Understanding
- Given Data file (MyData_lab4.mdb)
Answer by yourself 1. How many no. of tables
? 2. How many no. of attributes in each table? 3.
How many no. of records for each table? 4. Which
field(s) contain(s) dirty data? (e.g.
invalid/ missing)
Hint ODBC, Table, Data Audit, Quality, Type
8Data Preparation
- Data Cleaning
- Data Transformation
9Data Cleaning
Goal Discard all records with blanks/ null values
Useful Nodes Filler and Type (in Field Ops
Palette)
Result
Answer by yourself How many no. of records
left?
1241 ? 1195
10Data Transformation
Goal Merge table lab4 and Shop_Info
link TID SHOP_CD
Add Node Merge (in Record Ops Palette)
lab4 TID dt gp1 gp2 ref_no cl prod_cd
Shop_Info dist_cd shop_cd staffs manager Area
Answer by yourself What is/are the key(s) for
merging?
11Data Transformation
Goal Merge table lab4 and Shop_Info
Add Node Merge (in Record Ops Palette)
- Step 1
- Merge table lab4 and link
Result
Field from link
12Data Transformation
Goal Merge table lab4, link and Shop_Info
Add Node Merge (in Record Ops Palette)
- Step 2.
- Merge result in step 1 to table Shop_Info
Fields from Shop_Info
Result
1
2
13Data Transformation
Goal Add a new attribute Weekday and Time
Useful Node Derive (in Field Ops Palette)
Newly derived
Result
Remember the formula? Try to use the formula
calculator
Weekday datetime_day_name(datetime_weekday(dt))
Time datetime_hour(dt)
14Data Transformation
Goal Divide the Time field into 3 intervals
(Fixed-width)
2
1
3
- Steps
- Add Binning node and specify no. of bins
- Add Type node to update the information of
newly added information - Add Re-classify to rename the bins to Morning,
Afternoon, Evening
15Data Transformation
Goal Divide the Staff field into 5 intervals
(Fixed-width)
Result
16Data Mining - Classification
Goal Classification for product attribute
Add Node C5.0 (in Modeling Palette)
Answer by yourself Anything is missing?
Action Update the data value by adding a Type
node
17Data Mining - Classification
Goal Classification for product attribute as a
decision tree
Result
18Data Mining - Clustering
Goal Divide the data into 3 clusters
Add Node K-means (in Modeling Palette)
Pick suitable inputs for clustering
Result
Note You may adjust the value of k (no. of
clusters)
19SUMMARY
- Today, youve learnt
- Discard missing values
- Derive new attributes
- Merge tables
- Perform discretization (binning)
- Classification modeling
- Clustering modeling