Title: What is Data Mining ?
1 What is Data Mining ?
- Jinseog Kim
- Department of Statistics Information Science
- Dongguk University
- jinseog.kim_at_gmail.com
2????? ?? ?? ??
???
??
????
- ????
- Point of Sale
- ATM
- ????
- ????
- ??
- ????
- ????
- ??????
- A?? ???? 80? B??? ????
- ????? ??? ???? 6??? ??
- A??? ?? ??? B??? 2?
- ?? ??? ??? ??
- ????? ?
- ??? ??
- ??? ?? ??? ?
- ????? ????? ?
- ??? ?? ???? ?
- ??? ?
3Data Mining ?? ?
- ???? ??????
- ??? ??? ????
- ???? ?? ??? ????
- ?? ??? ????? ???? ??? ??
- ??? ????? ??, ??, ??, ??,??? ???
4? ?
- ???? ??? ?? ??? ??
- ?????? ???? ?? ??
- ??? ??? ???-POS data, Internet Log
- ??, ??? ?? (???)
- ??? ??? ??
- ????? ?? ??
- ????(Machine Learning) ??? ??
- Knowledge Discovery, Knowledge Extraction,
Machine Learning, Data/Pattern Analysis
5Data Mining ??
- ??? ??
- ??? ??? ?? ??
- ??? ??
- ?? ?? ??? ?? ?? ??
- ???, ???, ???,
- ?? ??
- ?? ??
- ??? (??), ?? ??
- ??, ???
6Data Mining ??
Select
Transform
Mine
Assimilate
????
????
????
?? ? ??
DATABASE
??? ???
Extracted Data
Selected Data
Assimilated data
Transformed Data
Visualization
???
7??????(CRM)? ?
????
????
????
?? ? ??
DATABASE
??? ???
Targeting for Sales
?????? ??? ????
- ????
- POS Data
- Survey data
60? ??? ??? ?? ??
?????? (buys the same brand 80 of time)
8Data Mining?? ??
- u ??? ??, ??? ??? ???
- u ??? ??????? ??? ???
- u ??? ?? ??? ???
9Data Mining?? ?? ??? ??? ??? ???
- Summarization (??)
- Association (??? ??)
- Classification (??)
- Clustering (???)
- Characterization (????)
- Sequential Pattern Discovery (??????)
- Trend (?? ??)
- Deviation Detection (??????)
10Data Mining?? ?? ??? DB? ??? ???
- Relational DB
- transactional DB
- Object-oriented DB
- Spatial DB
- Temporal DB
- Textual vs Multimedia
- Hetrogeneous,
11Data Mining?? ?? ?? ??? ???
- ????, ???? ??
- ??? ??, rule induction
- ????? ??? functional mapping? ??
- ??? ?? algorithm? ??
- ??? ??/ ????
- Statistical Classification(supervised learning)
- Clustering Techniques(unsupervised learning)
- Time Series Analysis,
-
12??? ?? ??
- Transaction DB? ????
- lt??????gt???? ?
- RULE ??? ??
- A gt B support, confidence
- support (A and B) / (total transactions)
- confidence (A and B) / (A)
- ? ?? gt ??? (Agrawal, ??? ?????? ??)
- ?? 1 ??????? ????
- ?? 2 AMAZON.COM
- ????? ??
- ?? 3 ??? ??????
- ???? ?? ? ???? ?? ??(??????)
13??? ?? ??
Association Rules with Maximum support of 50
?? ??
14Classification
- ?????? ??? ??? ??
- ????? Class-label ? feature set?? ??
- ????(Supervised Learning)? ??
- ????? ??? ??, ??? ??
- ??? ??? ??? ? ??? ?? ??
- ?? Credit Approval, ?? ??
- ? ??? ???? ? ????? ?? ???? ?? ?? ??
- Decision Tree, ???, ??? ???(logistic model, LDA,
QDA)
15Classification Example
??, ???, ??, ???, ??????
Classifier
Class 1 ??? ??
Class 2 ??? ??
Class 3 ??? ??
16Decision Tree Classifier
- ?????? Decision Tree ???? ??
- ID3, CART, C5.0
17Neural Network Classifier
- ??? ?????? ??? ???? ??
- ??? Neuron? ????? ???
- ?? ???? ??
- Error-back-propagation ????????
- ??? Functional Mapping? ?? ???
18Neural Network Classifier
19Sequential Pattern Discovery
- Transaction ????? ??? ?? ??
- ??
- ??????? ?? ?? ??
- ???? ?? ??
- ?? ??? ?? ?? ??, ??
- ??? ??? ?? ??, ??
- ???
- ??? ??? ??
- Hidden Markov Model for doubly stochastic process
modeling
20Sequential Pattern Example
Sequential Pattern in DataBase
21Similar Time Series
Matching Curve Found
22Clustering(???)
- ?? ???? ?? ???? ???? ??? ??? ?? ???? ??
- ????? ??? ???
- Unsupervised Learning Algorithms
- Symbolic, Neural Network based (Kohonen Feature
Map) - Statistical clustering ???
- ??
- ???? ??? ??? - ?? ??? ??
- ??? ???, ????? ?? ?? ????
23Clustering Example
24Symbolic Clustering
Similarity 2
Similarity 2
Diff3
Diff2.83
Diff3
Similarity 3
Total Score for this cluster partition average
similarity average difference
2.33 2.94 5.27
25Data Mining Interface
- Interactive Mining
- GUI? ?? Task? ??
- Data Mining Query Language
- find association rules
- related to gpa, birth_place, family_income
- from student
- where major CS and birth_place Seoul
- with support threshold 0.05
- with confidence threshold 0.7
26Kohonens Feature Map
- ???? ??? ??? ??
- ??? ??? ??? ???? ???? ??
- ??? Feature Map??? ? ??? ??
- ???? ??
- Feature Map ?? ??? ?? Difference
- ????? ?? ??
- 1) ??? ?? X? ?? ? ?? ?? N? ??
- 2) N? ? N? ???? ????? X? ???? ??
- 3) ?? ??? ??? ??? ??? ?? ?? ??
27???? ??? ?? ??(customer segmentation)
- ?????? ?
- ??? ?? ???? ??
- ? ??? ???? ?
- ?? ??? ??? ???? ??? ????? ?
- ?? ??? ?? ??? ?????
- ??? ?? ?? ? ?? ??? ??????
- ?? ??? ??? ????? ?
- ?? ??
- ??? ?????(mass marketing)?? ????? ????
?????(personalization or target marketing)?? ?? - ?? ??, ????, ?? ??, ?? ??
28??? ?? ??
Scoring???
???? ???? ???? ??? ??
?? ?? ?? ?? ????
????
? ??? ????
29??????? ??? Overview
???? ???? ???? ?????
???? DB
Credit ???
Decision Tree
??? ??
???? ??
?? ??? Scoring (Neural Network
Scoring ???
Credit ?? ? ???? ??
30?? ?? ???? ????
- LG?????
- ???? ????? ??? ??
- ?? ???? ???? ???? ?? ?? ??
- ????? ?? ??
- ????, ????, ??? ??, ??? ??
- ??? ???? Fraud Score ??
- 1995? LG???? ???? 14???? ??
- ?? ??? ??
31Data mining Tools
- IBM Intelligent Miner
- SAS E-miner
- Splus Insightful
-
32?? ????
?   ? ?    ?    ?    ?
??/??? ??? ????? ??? DM (Direct Mail)? ??? ???? ?? ?? ?? ??/??? ?? ?? ????? ??? ??? ?? ?? ????, ??? ???? ??, ????, ???
??/?? ???? ?? ?? ?? ?? ???? ?? ? ???? ???? ?? ? ?? ?? ???? ?? ???? ?? ???? ?? ?? ?? ?? ?? ??
?? ????? ?? ??? ?? ?? ?? ??? ?? ?? ?? ?? ??? ?? ??? ??? ??? ?? ??
?? ??? ??/?? ??? ??? ?? ?? ?? ?? ?? ?? ?? ? ?? ?? ????? ?? ?? ?? ?? ?? ?? ? ?? ??
33?? ??
- Mining Business Databases, Brachman, et al.,
CACM, Vol39, No11, 1996 - Mining Scientific Data, Fayyad, et al., CACM,
Vol39, No11, 1996 - Quest(IBM Almaden)
- http//www.almaden.ibm.com/cs/quest
- DBMiner(Simon Fraser Univ.)
- http//db.cs.sfu.ca/DBMiner
- KDD(GTE)
- http//info.gte.com/kdd/index.html
- International Conference on Knowledge Discovery
and Data Mining - Advances in Knowledge Discovery and Data Mining,
MIT press, 1996
34? ?
- ??? ?? ?? gt ??, ??? ?? ??
- ??????? ??? ??
- ??? ??????? ??? ??
- ???? ??? ??? ?? ??? ??
- ?? ?? ??? ?? ?
- ??? ?????? ?? ?? ??
- Hot Research Item