Title: Data analysis and data mining
1Data analysis and data mining
2DATA ANALYSIS
- Successful data analysis requires progressing
through the different stages in the analysis
process. - Problem formulation identify it!
- Preparations
- Final analysis using statistical techniques or
data mining techniques. - Visualisation or reporting
3 THE PROCESS FOR DATA MINING
4consumer insight lies at the heart of all
marketing and communication strategy,and
that consumers are multi-faceted and complex
creatures,and that true consumer insight comes
only with a 360 view.
5- Data
- Data are any facts, numbers, or text that can be
processed by a computer. Today, organizations are
accumulating vast and growing amounts of data in
different formats and different databases. This
includes - operational or transactional data such as, sales,
cost, inventory, payroll, and accounting - nonoperational data, such as industry sales,
forecast data, and macro economic data - meta data - data about the data itself, such as
logical database design or data dictionary
definitions
6- Information
- The patterns, associations, or relationships
among all this data can provide information. For
example, analysis of retail point of sale
transaction data can yield information on which
products are selling and when. - Knowledge
- Information can be converted into knowledge about
historical patterns and future trends. For
example, summary information on retail
supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of
consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most
susceptible to promotional efforts.
7- Data Warehouses
- Dramatic advances in data capture, processing
power, data transmission, and storage
capabilities are enabling organizations to
integrate their various databases into data
warehouses. - Data warehousing is defined as a process of
centralized data management and retrieval. - Data warehousing represents an ideal vision of
maintaining a central repository-STORAGE -of all
organizational data. Centralization of data is
needed to maximize user access and analysis. - Dramatic technological advances are making this
vision a reality for many companies. And, equally
dramatic advances in data analysis software are
allowing users to access this data freely. - The data analysis software is what supports data
mining.
8Data Mining
- (sometimes called data or knowledge discovery) is
the process of analyzing data from different
perspectives and summarizing it into useful
information - information that can be used to
increase revenue, cuts costs, or both. - Data mining software is one of a number of
analytical tools for analyzing data. It allows
users to analyze data from many different
dimensions or angles, categorize it, and
summarize the relationships identified. - Technically, data mining is the process of
finding correlations or patterns among dozens of
fields in large relational databases. - Extremely large datasets
- Discovery of the non-obvious
- Useful knowledge that can improve processes
- Can not be done manually.
9Data Mining (cont.)
10Data Mining (cont.)
- Data Mining is a step of Knowledge Discovery in
Databases (KDD) Process - Data Warehousing
- Data Selection
- Data Preprocessing
- Data Transformation
- Data Mining
- Interpretation/Evaluation
- Data Mining is sometimes referred to as KDD and
DM and KDD tend to be used as synonyms
11Data Mining Evaluation
12Data Mining is Not
- Data warehousing
- SQL / Ad Hoc Queries / Reporting
- Software Agents
- Online Analytical Processing (OLAP)
- Data Visualization
13- What can data mining do?
- Data mining is primarily used today by companies
with a strong consumer focus - retail, financial,
communication, and marketing organizations. - It enables these companies to determine
relationships among "internal" factors such as
price, product positioning, or staff skills, and
"external" factors such as economic indicators,
competition, and customer demographics. - And, it enables them to determine the impact on
sales, customer satisfaction, and corporate
profits. - Finally, it enables them to "drill down" into
summary information to view detail transactional
data. - With data mining, a retailer could use
point-of-sale records of customer purchases to
send targeted promotions based on an individual's
purchase history. - By mining demographic data from comment or
warranty cards, the retailer could develop
products and promotions to appeal to specific
customer segments.
14- For example, Blockbuster Entertainment mines its
video rental history database to recommend
rentals to individual customers. American Express
can suggest products to its cardholders based on
analysis of their monthly expenditures. - WalMart is pioneering massive data mining to
transform its supplier relationships. WalMart
captures point-of-sale transactions from over
2,900 stores in 6 countries and continuously
transmits this data to its massive 7.5
terabyte Teradata data warehouse. WalMart allows
more than 3,500 suppliers, to access data on
their products and perform data analyses. - These suppliers use this data to identify
customer buying patterns at the store display
level. They use this information to manage local
store inventory and identify new merchandising
opportunities. In 1995, WalMart computers
processed over 1 million complex data queries.
15- Data mining consists of five major elements
- 1. Extract, transform, and load transaction data
onto the data warehouse system. - 2. Store and manage the data in a
multidimensional database system. - 3. Provide data access to business analysts and
information technology professionals. - 4. Analyze the data by application software.
- 5. Present the data in a useful format, such as a
graph or table.
16Terms
- Web mining searching and processing data on the
internet is referred to this. - three types of webmining are listed as
- Web structure mining
- Web usage mining
- Web content mining
17Types
- web structure mining places websites and the
pages or items that contain in a network of
connected websites. - Web usage mining focuses on browsing behavior
- Web-content mining is all about discovering
useful content on the worldwide web.
18(No Transcript)
19Data Mining Motivation
- Changes in the Business Environment
- Customers becoming more demanding
- Markets are saturated
- Databases today are huge
- More than 1,000,000 entities/records/rows
- From 10 to 10,000 fields/attributes/variables
- Gigabytes and terabytes
- Databases a growing at an unprecedented rate
- Decisions must be made rapidly
- Decisions must be made with maximum knowledge
20Data Mining Motivation
- The key in business is to know something that
nobody else knows. - Aristotle Onassis
- To understand is to perceive patterns.
- Sir Isaiah Berlin
21Data Mining Applications
22Data Mining ApplicationsRetail
- Performing basket analysis
- Which items customers tend to purchase together.
This knowledge can improve stocking, store layout
strategies, and promotions. - Sales forecasting
- Examining time-based patterns helps retailers
make stocking decisions. If a customer purchases
an item today, when are they likely to purchase a
complementary item? - Database marketing
- Retailers can develop profiles of customers with
certain behaviors, for example, those who
purchase designer labels clothing or those who
attend sales. This information can be used to
focus costeffective promotions. - Merchandise planning and allocation
- When retailers add new stores, they can improve
merchandise planning and allocation by examining
patterns in stores with similar demographic
characteristics. Retailers can also use data
mining to determine the ideal layout for a
specific store.
23SALES FORECASTING
24(No Transcript)
25Data Mining ApplicationsBanking
- Card marketing
- By identifying customer segments, card issuers
and acquirers can improve profitability with more
effective acquisition and retention programs,
targeted product development, and customized
pricing. - Cardholder pricing and profitability
- Card issuers can take advantage of data mining
technology to price their products so as to
maximize profit and minimize loss of customers.
Includes risk-based pricing. - Fraud detection
- Fraud is enormously costly. By analyzing past
transactions that were later determined to be
fraudulent, banks can identify patterns. - Predictive life-cycle management
- DM helps banks predict each customers lifetime
value and to service each segment appropriately
(for example, offering special deals and
discounts).
26Data Mining ApplicationsTelecommunication
- Call detail record analysis
- Telecommunication companies accumulate detailed
call records. By identifying customer segments
with similar use patterns, the companies can
develop attractive pricing and feature
promotions. - Customer loyalty
- Some customers repeatedly switch providers, or
churn, to take advantage of attractive
incentives by competing companies. The companies
can use DM to identify the characteristics of
customers who are likely to remain loyal once
they switch, thus enabling the companies to
target their spending on customers who will
produce the most profit.
27Data Mining ApplicationsOther Applications
- Customer segmentation
- All industries can take advantage of DM to
discover discrete segments in their customer
bases by considering additional variables beyond
traditional analysis. - Manufacturing
- Through choice boards, manufacturers are
beginning to customize products for customers
therefore they must be able to predict which
features should be bundled to meet customer
demand. - Warranties
- Manufacturers need to predict the number of
customers who will submit warranty claims and the
average cost of those claims. - Frequent flier incentives
- Airlines can identify groups of customers that
can be given incentives to fly more.
28Data Mining in CRMCustomer Life Cycle
- Customer Life Cycle
- The stages in the relationship between a customer
and a business - Key stages in the customer lifecycle
- Prospects people who are not yet customers but
are in the target market - Responders prospects who show an interest in a
product or service - Active Customers people who are currently using
the product or service - Former Customers may be bad customers who did
not pay their bills or who incurred high costs - Its important to know life cycle events (e.g.
retirement)
29Data Mining in CRMCustomer Life Cycle
- What marketers want Increasing customer revenue
and customer profitability - Up-sell
- Cross-sell
- Keeping the customers for a longer period of time
- Solution Applying data mining
30THE DIFFERENCE
- upsell is to get the customer to spend more money
buy a more expensive model of the same type of
product, or add features / warranties that relate
to the product in question. - A cross-sell is to get the customer to spend more
money buy adding more products from other
categories than the product being viewed or
purchased.
L
31- heres no stock way to present product
recommendations. Common labels for
recommendations are - Recommended productsYou may also
likeCustomers who bought X also
boughtCustomers who viewed X also
viewedFrequently bought togetherStuff you
need (Radio Shack, for accessories)Stuff you
may want (Radio Shack, for items in other
categories)More from this (category, brand,
author, artist)Looks hot withComplete the
look
32(No Transcript)
33Data Mining in CRM
- DM helps to
- Determine the behavior surrounding a particular
lifecycle event - Find other people in similar life stages and
determine which customers are following similar
behavior patterns
34Data Mining in CRM (cont.)
Data Warehouse
Data Mining
Customer Profile
Customer Life Cycle Info.
Campaign Management
35Data Mining Techniques
36- Predictive modelling leverages statistics to
predict outcomes - Most often the event one wants to predict is in
the future, but predictive modelling can be
applied to any type of unknown event, regardless
of when it occurred. For example, predictive
models are often used to detect crimes and
identify suspects, after the crime has taken
place. - In many cases the model is chosen on the basis
of detection theory to try to guess the
probability of an outcome given a set amount of
input data, for example given an email
determining how likely that it is spam.
37- A decision tree is a decision support tool that
uses a tree-like graph or model of decisions and
their possible consequences, includingchance event
outcomes, resource costs, and utility. It is one
way to display an algorithm. - Decision trees are commonly used in operations
research, specifically in decision analysis, to
help identify a strategy most likely to reach
agoal.
38Predictive Data Mining
39Prediction
Honest has round eyes and a smile
40Decision Trees
height hair eyes class short blond blue A tall blo
nd brown B tall red blue A short dark blue B tall
dark blue B tall blond blue A tall dark brown B sh
ort blond brown B
41Decision Trees (cont.)
hair
dark
blond
red
Does not completely classify blonde-haired
people. More work is required
Completely classifies dark-haired and red-haired
people
42Decision Trees (cont.)
hair
dark
blond
red
Decision tree is complete because 1. All 8 cases
appear at nodes 2. At each node, all cases are
in the same class (A or B)
eye
blue
brown
tall B short B
43Decision TreesLearned Predictive Rules
44Decision TreesAnother Example
45Rule Induction
- Try to find rules of the form
- IF ltleft-hand-sidegt THEN ltright-hand-sidegt
- This is the reverse of a rule-based agent, where
the rules are given and the agent must act. Here
the actions are given and we have to discover the
rules! - Prevalence probability that LHS and RHS occur
together (sometimes called support factor,
leverage or lift) - Predictability probability of RHS given LHS
(sometimes called confidence or strength)
46- In data mining, association rules are useful for
analyzing and predicting customer behavior. They
play an important part in shopping basket data
analysis, product clustering, catalog design and
store layout. - Association rules are if/then statements that
help uncover relationships between seemingly
unrelated data in a relational database or other
information repository. An example of an
association rule would be - "If a customer buys a dozen eggs, he is 80
likely to also purchase milk.
47Use of Rule Associations
- Coupons, discounts
- Dont give discounts on 2 items that are
frequently bought together. Use the discount on
1 to pull the other - Product placement
- Offer correlated products to the customer at the
same time. Increases sales - Timing of cross-marketing
- Send camcorder offer to VCR purchasers 2-3 months
after VCR purchase - Discovery of patterns
- People who bought X, Y and Z (but not any pair)
bought W over half the time
48Product placement
49(No Transcript)
50GOADANA
51Clustering
- The art of finding groups in data
- Objective gather items from a database into sets
according to (unknown) common characteristics - Much more difficult than classification since the
classes are not known in advance (no training) - Technique unsupervised learning
52The K-Means Clustering Method
10
9
8
7
6
5
Update the cluster means
Assign each of the objects to most similar center
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
reassign
reassign
K2 Arbitrarily choose K objects as initial
cluster center
Update the cluster means
53Chapter 8 customer segmentation
- Segmentation is a research process in which the
market is divided up into homogeneous customer
groups that respond in the same way to marketing
stimuli from the supplier.
54CUSTOMER SEGMENTATION
55Bonomo and Shapiro (1983) B2B
- 5 criteria
- Demographic factors industrial classification
company size and location. - Operating variables technology, user status,
customer capabilities, - Purchasing approaches how purchasing is
organised, .. - Situational factors involves the urgency, the
specific application and the order size. - Personal characteristics the values and norms of
the employees working for the prospect or
customer, their general loyalty and attitude to
risk.
56Segmentation technique
- Markets can be segmented in a large number of
ways. - the guideliness of the segmentation solution
process - Measurable the size, purchasing power and
characteristics of the segment can be measured, - Substantial the segments are large and
profitable enough to serve. - Accessible the segments can be reached and
served effectivelly. - Differentiable the segments are conceptually
distinguishable and respond differently to
different marketing stimuli. - Actionable effective programs can be formulated
for attracting and serving the segments.
57Segmentation research used in compiling the list
- RFM- recency frequency monetary value
- CHAID- chi squared automated interaction
detection - CART- classification and regression trees
58RFM
- It was developed first.
- Developed to identify the most attractive
prospects. - Focusing on the frequency and the most recent
transaction date in addition to the annual amount
spent, produces better selections and higher
response percentages.
59CHAID and CART
- A Chaid analysis produces a tree diagram.
- At the top of the diagram, the response to the
marketing campaigns are shown for the entire
customer database. (8.2) - The organisation has 240.000 customers of which
an average of 4.36 responds to a marketing
activity. On the level below these customers are
split according to the most discriminating
significant segmentation criterion.
60CART
- it is often compared to CHAID.
- Cart is not limited to numbers of variables and
classes that can be included.
61- Customer
- Organizational market
62(No Transcript)
63(No Transcript)
64Not all customers are the same
Highly profitable customer Mixed-profitability customer Losing customer
Highly profitable product
Profitable product
Mixed-profitability product _
Losing product _ _
65Chapter 9
- Retention and cross sell analyses
66Retention
- Holding on the customers.
- Companies must arrive at definitions of former
and current customers. - Does someone become a departing customer at the
moment they no longer buy a certain product. - a consumer for example stop buying fresh meat at
a particular market but continues to shop for a
variety of packaged goods.
67Customer Retention Strategies
- Welcome
- Reliability
- Responsiveness
- Recognition
- Personalization
- Reward Strategies
68A welcome strategy
- The organizations appreciation for the
initiation of a relationship. - Creating a delightful surprise, making a good
first impression - First touch additional customer information
- Reassure the buyers that they have made the
correct choices. - Treat like a first date. Dont overdo it!
-
69Reliability
- The organization can repeat the exchange time
and time again with the same satisfying results. - Keep promise
- Ensure consistent quality
- Continuous promotion is still the key.
70Responsiveness
- The organization shows customers it really cares
about their needs and feelings. - Loyal employees create loyal customers. Internal
marketing. - Customer-contacted employees should have the
authority as well as the responsibility for date
to date operational activities and CRM decision.
71Recognition
- Special attention or appreciation that identifies
someone as having been known before. - People respond to recognition.
- Recognition and appreciation help maintain and
reinforce relationships.
72Personalization
- Use CRM system to tailor promotions and products
to the specific customers. - Offer engine take customer data after it is
analyzed and applies it to create the offer or
message that is appropriate to the individual
customer. Ex., My site, Click stream analysis,
free ride, etc.
73Access strategy
- Identify how customers will be able to interact
with the organization. - General contact, product return, technical
report, service representative, change a mailing
address - Is the access quick and easy?
74A Communication process
75Cross-sell
- This is all about offering your customer items
that can complement their purchase. A retailer
could offer software such as Microsoft Office, or
perhaps a keyboard. - Think about when you are on Amazon.com and you
see Best Value with the book you selected (in
the below example the book, The Time Travelers
Wife) and get another book (A Long, Long Time)
at a bundled price a great cross-sell. Amazon
also uses, Customers Who Bought This Item Also
Bought which is another cross-selling
opportunity.
76Upsell vs cross sell
- An upsell occurs during a purchase, where the
customer is made aware of the ability to get even
more of what he or she was looking for. For
example, you can book an economy class trip to
NEW YORK for 750, but for an additional 200,
you can upgrade to business class and get more
comfort. - A cross sell occurs either during or immediately
after a purchase, where the customer is made
aware of ways to accessorize the deal. For
example, now that youve booked your trip to NEW
YORK, you can, for an additional 350, get four
nights at an upscale hotel on the beach along
with a rental car.
77UPSELL
- Suggesting your customer buys the more expensive
model of the same product or service or that
they add a feature that would make it more
expensive. With upsell youre suggesting they pay
more in exchange for a better product or service. - For example
- Buying a 42 TV instead of a 40
- Upgrading from economy to business class for a
flight - Adding an extended warranty
78examples of Common Upselling Techniques
- Jewelry Recommending a higher-quality and more
expensive brand of the same product - Fast food Asking a customer if they would like
to super size their meal - Fine dining Asking a customer if they would like
a higher quality alcohol instead - Computers Asking a customer if they would like
the same laptop with more hard drive space or
more RAM - Electronics Asking customers if they would like
an extended warranty plan to go along with their
purchase - Electronics Asking a customer if they would like
to upgrade from a 40 television to a 42
television - SaaS Providing website customers a checkout
option whereby they can pay for an entire years
worth of service upfront at a lower per-month
cost instead of signing up for the typical
month-to-month service - Travel Asking a customer if they would like to
upgrade from coach to first-class - Night clubs Asking a customer if they would like
to upgrade their cover charge to VIP level.
79(No Transcript)
80(No Transcript)
81(No Transcript)
82THE ONLINE ENVIRONMENT
83WWW-WORLD WIDE WEB
- Web 1.0
- Very first it was read only medium.
- Webpages
- Web 2.0
- Web platforms, geocities, wordpress, facebook,
- People can share their ideas, photos, videos,
ideas, status,
84(No Transcript)
85Google adwords
86- Lego factory story. Page 304305
87Search engines