Some working definitions - PowerPoint PPT Presentation

About This Presentation
Title:

Some working definitions

Description:

Data Mining and Knowledge Discovery in Databases (KDD) are used interchangeably Data mining = the discovery of interesting, meaningful and actionable ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: leid4
Category:

less

Transcript and Presenter's Notes

Title: Some working definitions


1
Some working definitions.
  • Data Mining and Knowledge Discovery in
    Databases (KDD) are used interchangeably
  • Data mining
  • the discovery of interesting, meaningful and
    actionable patterns hidden in large amounts of
    data
  • Multidisciplinary field originating from
    artificial intelligence, pattern recognition,
    statistics, machine learning, econometrics, .

2
Data mining is a process
  • Business objectives
  • Model Development
  • Model objective
  • Data collection preparation
  • Model construction
  • Model evaluation
  • Combining models with business knowledge into
    decision logic
  • Model / decision logic deployment
  • Model / decision logic monitoring

3
Data mining is a processa marketing example
  • Business objectives
  • Cross sell MMS bundle to lapsed users / non users
  • Model Development
  • Model objective
  • For consumers with no MMS bundle in past 6
    months, predict MMS bundle ownership yes/no in
    next three months
  • Data collection preparation
  • All fields for all active customers as of end
    APR05 remove all customers with MMS bundle in
    NOV04-APR05 Left join MMS Bundle field from
    MAY05, JUNE05, JULY05
  • Model construction
  • Build various models to predict MMS Bundle MAY or
    JUNE or JULY N on 70 if the data
  • Model evaluation
  • Evaluate predictive power on 70 data for model
    development and 30 test set
  • Combining models with business knowledge into
    decision logic
  • Target the top 30 and randomly test two
    propositions (50 MMS for 5Euro 100MMS for
    7.50Euro) across two channel (Direct mail and
    SMS)
  • Model / decision logic deployment
  • Run the campaign
  • Model / decision logic monitoring
  • Compare predctions against actual response to
    evaluate model quality and robustness
  • What propositions / channels work best

4
Data mining tasks
  • Undirected, explorative, descriptive,
    unsupervised data mining
  • Matching search
  • Profile rule extraction
  • Clustering segmentation dimension reduction
  • Directed, predictive, supervised data mining
  • Predictive modeling

5
Data mining task example Clustering
segmentation
6
Data mining task example Clustering
segmentation
7
Start Looking Glass
Source Sentient Information Systems
(www.sentient.nl)
8
Tussenresultaat looking glass
Source Sentient Information Systems
(www.sentient.nl)
9
Resultaat Looking Glass
Source Sentient Information Systems
(www.sentient.nl)
10
Resultaat Looking Glass
Source Sentient Information Systems
(www.sentient.nl)
11
Data mining task examplepredictive modeling
12
Data mining task examplepredictive modeling
Collected data
13
Data mining task examplepredictive modeling
score (0 x Income) (-1 x Age) (25 x
Children)
14
Data mining techniques for predictive modeling
  • Linear and logistic regression
  • Decision trees
  • Neural Networks
  • Nearest Neighbor
  • Genetic Algorithms
  • .

15
Linear Regression Models
score (0 x Income) (-1 x Age) (25 x
Children)
16
Regression in pattern space
Only a single line available in pattern space to
separate classes
Class square
income
Class circle
age
17
Decision Trees
20000 customers
response 1
Income gt150000?
no
yes
18800 customers
1200 customers
Purchases gt10?
balancegt50000?
no
yes
no
800 customers
400 customers
etc.
response 1,8
response 0,1
18
Decision Trees in Pattern Space
Line pieces perpendicular to axes Each line is a
split in the tree, two answers to a question
income
age
19
Decision Trees in Pattern Space
Goal classifier is to seperate classes (circle,
square) on the basis of attribute age and
income Each line corresponds to a split in the
tree Decision areas are tiles in pattern space
weight
age
20
Nearest Neighbour
  • Data itself is the classification model, so no
    abstraction like a tree etc.
  • For a given instance x, search the k instances
    that are most similar to x
  • Classify x as the most occurring class for the k
    most similar instances

21
Nearest Neighbor in Pattern Space
Classification
new instance Any decision area
possible Condition enough data available
fe weight
fe age
22
Nearest Neighbor in Pattern Space
Voorspellen
Any decision area possible Condition enough data
available
bvb. weight
f.e. age
23
Example classification algorithm 3Neural
Networks
  • Inspired by neuronal computation in the brain
    (McCullough Pitts 1943 (!))
  • Input (attributes) is coded as activation on the
    input layer neurons, activation feeds forward
    through network of weighted links between neurons
    and causes activations on the output neurons (for
    instance diabetic yes/no)
  • Algorithm learns to find optimal weight using the
    training instances and a general learning rule.

24
Neural Networks
  • Example simple network (2 layers)
  • Probability of being diabetic f (age
    weightage body mass index weightbody mass
    index)

age body_mass_index
Weightbody mass index
weightage
Probability of being diabetic
25
Neural Networks in Pattern Space
Classification
Simpel network only a line available (why?) to
seperate classes Multilayer network Any
classification boundary possible
f.e. weight
f.e. age
26
Dilberts Perspective on Data Mining
Write a Comment
User Comments (0)
About PowerShow.com