Basic Data Mining Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Basic Data Mining Techniques

Description:

Title: Data Mining in CRM Author: Lee, Yue-Shi Last modified by: sjyen Created Date: 12/9/2001 3:01:49 AM – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 24
Provided by: LeeYu8
Category:

less

Transcript and Presenter's Notes

Title: Basic Data Mining Techniques


1
Basic Data Mining Techniques
2
Contents
  • Query Tools
  • Statistical Techniques
  • Visualization Techniques
  • Case-Based Learning (K-Nearest Neighbor)

3
Query Tools and Statistical Techniques
  • ????????????
  • ????????????????
  • ???????????????
  • ?? ??????
  • ??????????????????????? ????????
  • ???????????????

4
??????
????????
???? ????
????? ?????
5
Query Tools and Statistical Techniques
Naive Predictions
6
Query Tools and Statistical Techniques
7
Query Tools and Statistical Techniques
8
Query Tools and Statistical Techniques
9
Query Tools and Statistical Techniques
10
Query Tools and Statistical Techniques
11
Visualization Techniques (Scatter Diagram)
Music Magazine
12
Distance between Data Points
13
K-Nearest Neighbor
  • Records that are close to each other live in each
    others neighborhood
  • Customers of the same type (cluster) will show
    the same behavior
  • Do as your neighbors do
  • Not really a learning technique
  • Disadvantage
  • Inefficiency
  • It is difficult to understand that the
    performance of k-nearest neighbor is better than
    naïve prediction

r
14
K-Nearest Neighbor
15
Result of the K-Nearest Neighbor Process
67.1
70.2
55.3
85.4
91.9
16
????
17
????
18
K-Nearest Neighbors for 036
  • C1 1 0 0 1 0 0 1
  • M1 0 1 1 1 0 0 1
  • Distance 3 or Similarity 4
  • C1 1 0 0 1 0 0 1
  • M2 0 1 1 1 0 1 1
  • Distance 4 or Similarity 3

19
K-Nearest Neighbors for 036
M1 4 M8 3 M15 4 M22 2
M2 3 M9 4 M16 6 M23 4
M3 6 M10 4 M17 4 M24 4
M4 5 M11 3 M18 5 M25 6
M5 4 M12 5 M19 6 M26 4
M6 4 M13 7 M20 7
M7 5 M14 6 M21 3
If Similarity_Threshold is 6 Then 7 Neighbors
(M3, M13, M14, M16, M19, M20, M25) are selected.
Similarity
20
Summarize these 7 Neighbors
  • Neighbor 1
  • 111 134 388 262 261 266 268 012 260 184 238 091
    104 142 038
  • Neighbor 2
  • 240 256 290 441 442 442 510 518 518 520 522 001
    005 016 184
  • Neighbor 3
  • none
  • Neighbor 4
  • 402 193 228 179 227 111 204 364
  • Neighbor 5
  • 280
  • Neighbor 6
  • 193
  • Neighbor 7
  • 186 189 193 214 239 179 227 263 240

Like Movies
21
Like Movies for 036
  • Count 03 Movie ???? (193)
  • Count 02 Movie ???? (184)
  • Count 02 Movie ?? (240)
  • Count 02 Movie ???? (442)
  • Count 02 Movie ???? (518)
  • Count 02 Movie ????? (111)
  • Count 02 Movie ???? (179)
  • Count 02 Movie ???? (227)

22
Data Mining Tool Query Tool
  • Suppose a large database containing millions of
    records that describe customers purchases
  • Who bought which product on what date?
  • What is the average turnover in July?
  • What is an optimal segmentation of clients?
  • What are the most important trends in customer
    behavior?
  • If you know exactly what you are looking for, use
    query tool
  • If you know only vaguely what you are looking
    for, use data mining tool

23
Data Mining Tool Query Tool
Write a Comment
User Comments (0)
About PowerShow.com