Title: Basic Data Mining Techniques
1Basic Data Mining Techniques
2Contents
- Query Tools
- Statistical Techniques
- Visualization Techniques
- Case-Based Learning (K-Nearest Neighbor)
3Query Tools and Statistical Techniques
- ????????????
- ????????????????
- ???????????????
- ?? ??????
- ??????????????????????? ????????
- ???????????????
4??????
????????
???? ????
????? ?????
5Query Tools and Statistical Techniques
Naive Predictions
6Query Tools and Statistical Techniques
7Query Tools and Statistical Techniques
8Query Tools and Statistical Techniques
9Query Tools and Statistical Techniques
10Query Tools and Statistical Techniques
11Visualization Techniques (Scatter Diagram)
Music Magazine
12Distance between Data Points
13K-Nearest Neighbor
- Records that are close to each other live in each
others neighborhood - Customers of the same type (cluster) will show
the same behavior - Do as your neighbors do
- Not really a learning technique
- Disadvantage
- Inefficiency
- It is difficult to understand that the
performance of k-nearest neighbor is better than
naïve prediction
r
14K-Nearest Neighbor
15Result of the K-Nearest Neighbor Process
67.1
70.2
55.3
85.4
91.9
16????
17????
18K-Nearest Neighbors for 036
- C1 1 0 0 1 0 0 1
- M1 0 1 1 1 0 0 1
- Distance 3 or Similarity 4
- C1 1 0 0 1 0 0 1
- M2 0 1 1 1 0 1 1
- Distance 4 or Similarity 3
19K-Nearest Neighbors for 036
M1 4 M8 3 M15 4 M22 2
M2 3 M9 4 M16 6 M23 4
M3 6 M10 4 M17 4 M24 4
M4 5 M11 3 M18 5 M25 6
M5 4 M12 5 M19 6 M26 4
M6 4 M13 7 M20 7
M7 5 M14 6 M21 3
If Similarity_Threshold is 6 Then 7 Neighbors
(M3, M13, M14, M16, M19, M20, M25) are selected.
Similarity
20Summarize these 7 Neighbors
- Neighbor 1
- 111 134 388 262 261 266 268 012 260 184 238 091
104 142 038 - Neighbor 2
- 240 256 290 441 442 442 510 518 518 520 522 001
005 016 184 - Neighbor 3
- none
- Neighbor 4
- 402 193 228 179 227 111 204 364
- Neighbor 5
- 280
- Neighbor 6
- 193
- Neighbor 7
- 186 189 193 214 239 179 227 263 240
Like Movies
21Like Movies for 036
- Count 03 Movie ???? (193)
- Count 02 Movie ???? (184)
- Count 02 Movie ?? (240)
- Count 02 Movie ???? (442)
- Count 02 Movie ???? (518)
- Count 02 Movie ????? (111)
- Count 02 Movie ???? (179)
- Count 02 Movie ???? (227)
22Data Mining Tool Query Tool
- Suppose a large database containing millions of
records that describe customers purchases - Who bought which product on what date?
- What is the average turnover in July?
- What is an optimal segmentation of clients?
- What are the most important trends in customer
behavior? - If you know exactly what you are looking for, use
query tool - If you know only vaguely what you are looking
for, use data mining tool
23Data Mining Tool Query Tool