Title: Chapter 6 slides, Computer Networking, 3rd edition
1Data Mining and Its Applications
Data Mining Techniques For Marketing, Sales,
and Customer Support, by Michael J.A. Berry and
Gordon Linoff, John Wiley Sons, Inc.,
1997. Discovering Data Mining from concept to
implementation, by Cabena, Harjinian, Stadler,
Verhees and Zanasi, Prentice Hall, 1997. Building
Data Mining Applications for CRM, by Alex Berson,
Stephen Smith and Kurt Thearling, McGraw Hall,
1999. Data Mining Cookbook Modeling Data for
Marketing, Risk, and Customer Relationship
Management, by Olivia Parr Rud, John Wiley
Sons, Inc, 2001. Mastering Data Mining The Art
and Science of Customer Relationship management,
by Michael J.A. Berry and Gordon S. Linoff, John
Wiley Sons, Inc, 2000. Machine Learning, by Tom
M. Mitchell, McGraw-Hill, 1997. Data Mining
Concepts and Techniques, by Jiawei Han and
Micheline Kamber, Morgan Kaufmann,
2001. Introduction to Data Mining, by Pang-Ning
Tan, Michael Steinbach, and Vipin Kumar, Addison
Wesley, 2005.
2Why Mine Data?
- Lots of data is being collected and warehoused
- Web data, e-commerce
- purchases at department/grocery stores
- Bank/Credit Card transactions
- Computers have become cheaper and more powerful
- Competitive Pressure is Strong
- Provide better, customized services for an edge
(e.g. in Customer Relationship Management)
3Mining Large Data Sets - Motivation
- There is often information hidden in the data.
- Human analysts may take weeks to discover useful
information. - Much of the data is never analyzed at all.
The Data Gap
Total new disk (TB) since 1995
Number of analysts
4What is Data Mining?
- Many Definitions
- Non-trivial extraction of implicit, previously
unknown and potentially useful information from
data - Exploration analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns
5What is (not) Data Mining?
- What is Data Mining?
- Certain names are more prevalent in certain US
locations (OBrien, ORurke, OReilly in Boston
area) - Group together similar documents returned by
search engine according to their context (e.g.
Amazon rainforest, Amazon.com,)
- What is not Data Mining?
- Look up phone number in phone directory
- Query a Web search engine for information about
Amazon
6Origins of Data Mining
- Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems - Traditional techniquesmay be unsuitable due to
- Enormity of data
- High dimensionality of data
- Heterogeneous, distributed nature of data
Statistics/AI
Machine Learning/ Pattern Recognition
Data Mining
Database systems
7Data Mining Tasks
- Prediction Methods
- Use some variables to predict unknown or future
values of other variables. - Description Methods
- Find human-interpretable patterns that describe
the data.
From Fayyad, et.al. Advances in Knowledge
Discovery and Data Mining, 1996
8Data Mining Tasks...
- Classification
- Clustering
- Association Rule Discovery
- Sequential Pattern Discovery
9The Virtuous Cycle of Data Mining
Identify business problems and
areas where analyzing data can
Act on the information
provide value
Measure the results of your efforts to provide
insight on how to
exploit your data.
Taken from a talk given by Michael J.A. Berry on
Data Mining for CRM.
10Some Typical Business Problems
- Customer profiling
- Customer segmentation
- Direct marketing
- Customer retention
- Basket analysis (retail)
- Cross selling
- Fraud detection
11Customer Profiling
- Question
- What kinds of customers were profitable in last
year? - Data
- Customer details such as Age, Gender, Occupation,
Salary Levels, Account, etc. - Earnings from customers in last year.
- Data Mining
- Divide customers into profitability categories
according to earnings such as highly profitable,
profitable, non-profitable, loss. - Find rules using data mining techniques.
- Analyze the rules and take actions.
12Customer Profiling Rules
- IF age gt 30 and Age lt45 and
- occupation is professional and
- salary level is between 50,000 and 70,000
- Then this user is profitable
- The rules are with some statistic support such as
support and confidence.
13Customer Segmentation
- Consumers are not same. They need to be treated
differently. Segmentation is essential in
marketing. - Different spending capability
- Different spending potentials
- Different behaviors
- Different profitability
- Different preferences
- Different hobbies
- Different life style
-
14Customer Segmentation
- Customer segmentation is a process to divide
customers into different groups or segments.
Customers in the same segment have similar needs
or behaviors so that similar marketing strategies
or service policies can be applied to them. - Customer segments are required in several
business areas including - Marketing
- Customer services
- Products and service development
- Sales promotion
- Customer retention
15Direct Marketing
- Question
- Select a customer mailing list for a product
campaign. - Purposes
- Reduce the campaign cost and obtain a high
responding rate. - Data
- Customer details and previous campaign data.
16Life Cycle of a Loan Product
17Business Objectives
- Mellon Bank Corporation is a major financial
services company head-quarted in Pittsburgh. - Build an extendible loan secured by the values of
a clients own property. - Achieve the highest possible Return On Investment
(ROI). - Based on customers with DDA, build a model for
HELOC.
18Data Preparaton
- The primary data source was the approximately
40,000 Mellon customers who had (or once had)
HELCOCs and DDAs. - Data
- Demographic data sourced both internally and
externally (age, income, length of residence, and
other indicators of economic condition) - DDA data (history of loan balance over 3, 6, 9,
12, 18 months, history of returned checks,
history of interest rates. - Property data sourced externally (home purchase
price, loan-to-value ratio) - Other data related to credit worthiness
- Use 120 variables
19(No Transcript)
20Responders
21Classification
22Customer Retention
- Question
- Find out what kinds of customers tend to churn
and build a model which can predict the
likely-to-churn customers. - Data mining solution
- Collect data about the customers who have
churned. - Select a set of customers who have been loyal.
- Merge the two data sets to form training, testing
and evaluation data sets.
23Basket Analysis
24Basket Analysis
A
A
B
A
B
B
C
C
C
D
C
D
D
E
E
Rule A ? D C ? A A ? C B C ? D
Support 2/5 2/5 2/5 1/5
Confidence 2/3 2/4 2/3 1/3
25Cross Selling Citicorp/Travelers Groups merger
- Online Newshour April 7, 1998 reports
- In the largest proposed corporate merger in
history, the banking giant Citicorp and insurance
titan Travelers will join forces. The new
company, to be called Citigroup, would be the
largest financial services company in the world
- One of the rationale for the merger
- MARCUS ALEXIS, Northwestern University commented
- Well, they have competitive issues. There
are certain synergies. Not only do customers like
to get a full range of services from a single
vendor but also there are certain economies in
cross selling by them.
(Online Newshour April 7, 1998)
26Opportunities of Cross Selling
- Travelers Group can increase sales of insurance
products from Citicorp customer base. - Citicorp can increase sales of financial services
from Travelers Group customer base. - Customers get convenience by doing one stop
shopping for both financial service and insurance
products.
27Cross Selling and up Selling
- Cross selling is the process of selling current
customers new products after they purchased
products of different categories. - E.g., sell car maintenance products to customers
who just bought new cars. - Up selling is the process of selling current
customers upgraded products or services after
they purchased products of same category. - E.g., Sell mobile voice service users data service
28Understanding Customers
More Efficient Acquisition
Longer Lasting Relationship
More Frequent Up/Cross Sell
More Profit
Profit
Revenue
Less Loss
Time
Loss
Taken from SPSS talk.
29Understanding Customers
More Efficient Acquisition
Longer Lasting Relationship
More Frequent Up/Cross Sell
Even More Profit
Profit
Revenue
Less Loss
Time
Loss
Taken from SPSS talk.
30How Cross Selling Works
- Assume a marketing manager in a bank has the
following products for customers - Saving account
- Check account
- Standard credit card
- Gold credit card
- Primary mortgage
- Secondary mortgage
- The manager wants to design a new campaign to
customers who - Prepare to buy a new home
- Prepare to refinance an existing home
- Prepare to add a second mortgage
31How to match customers with offers
- Determine three offers to customers
- New first mortgage
- Refinance of first mortgage
- Second mortgage
- Each customer is only made one offer
Customers
32The Impact of Fraud
- GAO (The United States General Accounting Office)
cited 19.1 billion in improper government
payments in 17 major programs for fiscal year
1998. - Medicare 12.6 Billion
- Supplemental Security Income 1.6 B
- The Food Stamp Program 1.4 B
- Old Age and Survival Insurance 1.2 B
- Disability Insurance 941 Million
- Housing Subsidies 847 Million
- Veterans Benefits, Unemployment Insurance and
Others 514 Million
33Background
- HIC (The Health Insurance Commission) in
Australia is a federal government agency. - HIC pays insurance claims more than 20 million
Australian dollars and pay out about A8 billion
in funds every year. - More than 300 million transactions are processed
and stored every year. 1.3TB in five year.
34Preventing Fraud and Abuse
- Business Objectives
- The focus of the HIC project was on the recent
and steady 10 annual rise in the cost of
pathology claims for clinical tests. - Approaches
- To identify potential fraudulent claims or claims
arising from inappropriate practice, and - To develop general profiles of the GP practices
in order to compare practice behaviors of
individual GPs.
35Data Proprocessing
- Two databases
- Episode Database
- One Episode record records a patient visit.
- In total, 6.8 million records.
- There were 227 different pathology tests.
- GP (doctor) database
- There are 17,000 records related to active GPs
- The behavior of 10,409 GPs was to be studied.
- A matrix of 10,409 by 227 elements.
- The elements were then scaled from 0 to 1 with
respect to the total number of tests of each kind.
36Input to Segmentation
37Overview
38Data Mining
- They conducted association rule mining, when
support 0.25,the team decided that the
presence of some tests in the input database was
causing spurious rules to be revealed (Pathology
Episode Initiation (PEI)). - PEI tests depend on who ordered them and where
they were ordered. - When the PEI tests were removed, the number of
rules dropped significantly.
39Result Analysis
- A request for a microscopic examination of feces
for parasites (OCP) was associated with a
cultural examination of feces (FCS) in 0.85 of
cases. - A 92.6 chance that if OCP tests were requested,
they would be done with FCS. - A 0.61 of chance, OCP was associated with a
different more expensive test called MCS32, which
costs A13.55 per test.
40GP Profiles
41Discussions
- Segment 13
- Represent the majority of traditional GPs who
are practicing conventionally. 5,450 GPs. Total
52 of GPs. - Only 6.2 of the medical pathology tests
- Segment 4
- 54 GPs. Only 0.51 of GPs.
- 2.7 of the medical pathology tests.
42?? 2004.4.21
43(No Transcript)