Knowledge%20Discovery%20in%20Database%20(KDD) - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge%20Discovery%20in%20Database%20(KDD)

Description:

Lack of domain consistency. Cleaning. Enrichment. Need extra information about the clients consisting of date of birth, income, ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 17
Provided by: leeyu2
Category:

less

Transcript and Presenter's Notes

Title: Knowledge%20Discovery%20in%20Database%20(KDD)


1
Knowledge Discovery in Database (KDD)
2
Knowledge Discovery Process
  • The whole process of extraction of implicit,
    previously unknown and potentially useful
    knowledge from a large database
  • It includes data selection, cleaning, enrichment,
    coding, data mining, and reporting
  • Data Mining is the key stage of Knowledge
    Discovery Process
  • The process of finding the desired information
    from large database

3
Knowledge Discovery Process
  • Example the database of a magazine publisher
    which sells five types of magazines cars,
    houses, sports, music and comics
  • Data mining Find interesting customer properties
  • What is the profile of a reader of a car
    magazine?
  • Is there any correlation between an interest in
    cars and an interest in comics?
  • Apply knowledge discovery process

4
Data Selection
  • Select the information about people who have
    subscribed to a magazine

5
Cleaning
  • Pollutions Type errors, moving from one place to
    another without notifying change of address,
    people give incorrect information about
    themselves
  • Pattern Recognition Algorithms

6
Cleaning
  • Lack of domain consistency

7
Enrichment
  • Need extra information about the clients
    consisting of date of birth, income, amount of
    credit, and whether or not an individual owns a
    car or a house

8
Enrichment
  • The new information need to be easily joined to
    the existing client records
  • Extract more knowledge

9
Coding
  • We select only those records that have enough
    information to be of value (row)
  • Project the fields in which we are interested
    (column)

10
Coding
11
Coding
  • Code the information which is too detailed
  • Address to region
  • Birth date to age
  • Divide income by 1000
  • Divide credit by 1000
  • Convert cars yes-no to 1-0
  • Convert purchase date to month numbers starting
    from 1990
  • The way in which we code the information will
    determine the type of patterns we find
  • Coding has to be performed repeatedly in order to
    get the best results

12
Coding
  • We are interested in the relationships between
    readers of different magazines
  • Perform flattening operation

13
Knowledge Discovery Process
14
Business-Question-Driven Process
15
Steps of a KDD Process
  • Learning the Application Domain
  • Relevant Prior Knowledge and Goals of Application
  • Creating a Target Data Set
  • Data Selection
  • Data Cleaning and Enrichment
  • May Take 60 of Effort
  • Data Reduction and Transformation (Coding)
  • Find Useful Features, Dimensionality Reduction
  • Choosing Functions of Data Mining
  • Summarization, Association, Classification,
    Regression, Clustering,
  • Choosing the mining algorithm(s)
  • Data mining
  • Search for Patterns of Interest
  • Pattern Evaluation and Knowledge Presentation
  • Visualization, Transformation, Removing Redundant
    Patterns, etc.
  • Use of Discovered Knowledge

16
Exercises 1
  1. ?? RFM ??? ?????
  2. ?????? (Data Mining)??????
  3. ????????????? ?
  4. ????????????
  5. ?????????????? (????????????).
  6. ??????Knowledge Discovery in Database (KDD) ?????.
Write a Comment
User Comments (0)
About PowerShow.com