Title: Data Mining and Knowledge Discovery
1- Data Mining and Knowledge Discovery
- for Strategic Business Optimization
- Peter van der Putten
- ALP Group, LIACS KiQ Ltd
- November 2004
2Why is a business in business?
- Successful businesses create a lot of added value
for their customers and capture it - Maximize long term profit
- Optimize Maximize sales, minimize costs,
minimize risk
3Challenges
- Businesses are bigger
- Fragmentation of products, customer interaction
channels, market segments - Fierce competition, chaotic economic climate and
dynamic customer behavior - Data glut information overflow
- Solution data mining knowledge discovery for
strategic business optimization
4Credit scoring case minimizing loan risk while
maximizing loan acception
5Marketing case maximizing direct mail response
while minimizing cost
A model was created that predicts the probability
to respond to a mailing. By using the model to
select customers to mail we could reach 50 of
the responders by mailing only 20 of all
customers
6Siebel
OMEGA predicts a slight preference for general
insurance and offers a one-click cross-sell
button.
Although the next customer might have preferences
as well, the exit risk is overriding. Using a
combination of predictive models and business
rules, OMEGA suggests to Siebel an immediate
attempt to retain the customer.
OMEGA offers Siebel the appropriate text for its
script engine.
Within general insurance, OMEGA predicts a
preference for car insurance and offers one-click
access to the appropriate script.
OMEGA again offers Siebel the appropriate text to
execute a retention script.
7Overview
- Why Data Mining?
- The Data Mining Process
- Data Mining Tasks
- Data Mining Techniques
- Future Outlook
- Data Mining Opportunities by Sector and Function
- QA
8Some working definitions.
- Data Mining and Knowledge Discovery in
Databases (KDD) are used interchangeably - Data mining
- the discovery of interesting, meaningful and
actionable patterns hidden in large amounts of
data - Multidisciplinary field originating from
artificial intelligence, pattern recognition,
statistics, machine learning, econometrics, .
9Data mining is a process
- Model Development
- Objective
- Data collection preparation
- Model construction
- Model evaluation
- Combining models with business knowledge into
decision logic - Model / decision logic deployment
- Model / decision logic monitoring
10Data mining tasks
- Undirected, explorative, descriptive,
unsupervised data mining - Matching search
- Profile rule extraction
- Clustering segmentation
- Directed, predictive, supervised data mining
- Predictive modeling
11Data mining task example Clustering
segmentation
12Data mining task example Clustering
segmentation
13Start Looking Glass
14Tussenresultaat looking glass
15Resultaat Looking Glass
16Resultaat Looking Glass
17Data mining task examplepredictive modeling
18Data mining task examplepredictive modeling
Collected data
19Data mining task examplepredictive modeling
Known customer behaviour
20Data mining task examplepredictive modeling
score (0 x Income) (-1 x Age) (25 x
Children)
21Data mining task examplepredictive modeling
- Recruitment
- Who will respond to a mailing campaign?
- To who can we cross sell which products?
- What will be the customer value one year from
now? - Retention
- Who is going to cancel his/her mobile phone
subscription. Should I attempt to keep this
customer? - Which customers have accounts that will go
dormant? - Risk
- Should I sell a loan to this person?
- How much money will someone claim on a policy?
- Is this caller going to pay his bills?
22Data mining techniques for predictive modeling
- Linear and logistic regression
- Decision trees
- Neural Networks
- Genetic Algorithms
- .
23Linear Regression Models
score (0 x Income) (-1 x Age) (25 x
Children)
24Regression in pattern space
Only a single line available in pattern space to
separate classes
Class square
income
Class circle
age
25Decision Trees
20000 customers
response 1
Income gt150000?
no
yes
18800 customers
1200 customers
Purchases gt10?
balancegt50000?
no
yes
no
800 customers
400 customers
etc.
response 1,8
response 0,1
26Decision Trees in Pattern Space
Line pieces perpendicular to axes Each line is a
split in the tree, two answers to a question
income
age
27Infotrees (Genetic Programming)
- Nested regression formulas
- sum(average(region, spend), max(age, children))
28Infotrees in Pattern Space
Infotrees can seperate any class in pattern
space, even if the class boundary is non-linear ?
Can model complex customer behavior
income
age
29Genetic Algorithms / Programming
- How to find the best Infotree? Genetic algorithms
- Based on the idea of evolution
- Start with (random) Infotrees
- Build a new generation
- Fittest models can reproduce to create offspring,
worst models die - Small amount of mutation occurs to keep exploring
- Repeat process
30Notes about Infotree models Cross-over
- New models can be created by cross-over
- part of one model is swapped with part of another
- parts may chosen randomly or intelligently
31Notes about Infotree modelsMutation
- New models can be created by mutation
- part of a model (a sub-tree, operator or
predictor) is changed - part and type of change may chosen randomly or
intelligently
Sub-tree
convex
concave
children
age
Operator
convex
concave
children
age
convex
Predictor
concave
children
age
32Short Demo(if time allows)
- Model to predict caravan policy ownership
- Combining this model with other models and
business rules
33Data Mining the Future
- Business (marketing)
- More fine-grained segmentation down to the
cluster or individual level - More personalised actions, inbound and outbound,
in all customer contact channels - Optimization of both value for the business and
the customer - Privacy
- Technical
- From Data Mining to Decisioning, combining
multiple models with business rules - Monitoring business and model performance
- Data Mining Process Automation
34Lets discussData Mining Opportunities by
Function
- Marketing, Sales, CRM
- Product Development, RD
- Manufacturing, Production, Logistics
- Customer service
- Finance
- Procurement
- Human Resources
- IT
- .
35Lets discussData Mining Opportunities by Sector
- Retail
- Telco
- Pharma
- Government
- Automotive
- Oil
- Charity
- Consumers / Citizens
- .
36The Paper Requirements
- 2500 words -10, APA style references
- No plagiarism / copying! Rephrase in your own
words, reference, cite quote - Two parts of each 1250 words
- Your grasp of the research topic what is data
mining? Own interpretation, clear, put into
context - Memo to CEO/CIO of a specific company / industry
what are the benefits/changes/opportunities and
next steps (best practice, proof of concept)?
Impact, convincing, plan to action.
37The Paper Suggestions
- Suggestions for companies
- KPN Mobile, Marketing how to reduce loss of
customers to competitors - Dutch Police, Strategic Innovation opportunities
for law enforcement, privacy implications - Pfizer, Drug Discovery using data mining to find
new drugs - Google, Product Management / RD opportunities
for new data mining features to enhace customer
experience - Your Idea!
38The Paper Resources
- Webpage for this talk
- http//www.liacs.nl/putten/ictvision.html
- General Writing Resources
- http//www.liacs.nl/putten/writingpapers.html
- Homepage
- www.liacs.nl/putten , mail putten_at_liacs.nl
39Dilberts Perspective on Data Mining