Data mining exercise with SPSS Clementine Lab 4 - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Data mining exercise with SPSS Clementine Lab 4

Description:

... exercise. with SPSS Clementine. Lab 4. Winnie Lam. Email: ... CROSS. Yellow. TRIANGLE. STAR. Red. TRIANGLE. k=4. Data Understanding. Data file is located in: ... – PowerPoint PPT presentation

Number of Views:459
Avg rating:3.0/5.0
Slides: 20
Provided by: LAP7
Category:

less

Transcript and Presenter's Notes

Title: Data mining exercise with SPSS Clementine Lab 4


1
Data mining exercisewith SPSS ClementineLab 4
  • Winnie Lam
  • Email cswinnie_at_comp.polyu.edu.hk
  • Website http//www.comp.polyu.edu.hk/cswinnie/
  • The Hong Kong Polytechnic University
  • Department of Computing

Last update22/09/2005
2
Data mining process
Define target discover useful data
Data Understanding
Obtain Clean Useful data
Data Preprocessing
Discover patterns
Modeling (Data Mining)
3
Modeling Tools Clustering
  • K-means. An approach to clustering that defines k
    clusters and iteratively assigns records to
    clusters based on distances from the mean of each
    cluster until a stable solution is found.
  • TwoSteps. A clustering method that involves
    preclustering the records into a large number of
    subclusters and then applying a hierarchical
    clustering technique to those subclusters to
    define the final clusters.
  • Kohonen Networks. A type of neural network used
    for clustering. Also known as a self organizing
    map (SOM).

4
Classification
With predefined class!
5
Clustering
No class is defined previously!
CROSS
STAR
k4
Red TRIANGLE
Yellow TRIANGLE
6
Data Understanding
  • Data file is located in
  • http//www.comp.polyu.edu.hk/cswinnie/lab.html

7
Data Understanding
  • Given Data file (MyData_lab4.mdb)

Answer by yourself 1. How many no. of tables
? 2. How many no. of attributes in each table? 3.
How many no. of records for each table? 4. Which
field(s) contain(s) dirty data? (e.g.
invalid/ missing)
Hint ODBC, Table, Data Audit, Quality, Type
8
Data Preparation
  • Data Cleaning
  • Data Transformation

9
Data Cleaning
Goal Discard all records with blanks/ null values
Useful Nodes Filler and Type (in Field Ops
Palette)
Result
Answer by yourself How many no. of records
left?
1241 ? 1195
10
Data Transformation
Goal Merge table lab4 and Shop_Info
link TID SHOP_CD
Add Node Merge (in Record Ops Palette)
lab4 TID dt gp1 gp2 ref_no cl prod_cd
Shop_Info dist_cd shop_cd staffs manager Area
Answer by yourself What is/are the key(s) for
merging?
11
Data Transformation
Goal Merge table lab4 and Shop_Info
Add Node Merge (in Record Ops Palette)
  • Step 1
  • Merge table lab4 and link

Result
Field from link
12
Data Transformation
Goal Merge table lab4, link and Shop_Info
Add Node Merge (in Record Ops Palette)
  • Step 2.
  • Merge result in step 1 to table Shop_Info

Fields from Shop_Info
Result
1
2
13
Data Transformation
Goal Add a new attribute Weekday and Time
Useful Node Derive (in Field Ops Palette)
Newly derived
Result
Remember the formula? Try to use the formula
calculator
Weekday datetime_day_name(datetime_weekday(dt))
Time datetime_hour(dt)
14
Data Transformation
Goal Divide the Time field into 3 intervals
(Fixed-width)
2
1
3
  • Steps
  • Add Binning node and specify no. of bins
  • Add Type node to update the information of
    newly added information
  • Add Re-classify to rename the bins to Morning,
    Afternoon, Evening

15
Data Transformation
Goal Divide the Staff field into 5 intervals
(Fixed-width)
Result
16
Data Mining - Classification
Goal Classification for product attribute
Add Node C5.0 (in Modeling Palette)
Answer by yourself Anything is missing?
Action Update the data value by adding a Type
node
17
Data Mining - Classification
Goal Classification for product attribute as a
decision tree
Result
18
Data Mining - Clustering
Goal Divide the data into 3 clusters
Add Node K-means (in Modeling Palette)
Pick suitable inputs for clustering
Result
Note You may adjust the value of k (no. of
clusters)
19
SUMMARY
  • Today, youve learnt
  • Discard missing values
  • Derive new attributes
  • Merge tables
  • Perform discretization (binning)
  • Classification modeling
  • Clustering modeling
Write a Comment
User Comments (0)
About PowerShow.com