Introduction to Data Mining 2 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Introduction to Data Mining 2

Description:

Are long waits in check-out lines a cause of customer attrition? ... Did some offers work better than others? Did these customers purchase additional products? ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 26
Provided by: ronn165
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Mining 2


1
Introduction to Data Mining 2
2
Data Warehouse
  • For organizational learning to take place, data
    from many sources must be gathered together and
    organized in a consistent and useful way hence,
    Data Warehousing (DW)
  • DW allows an organization (enterprise) to
    remember what it has noticed about its data
  • Data Mining techniques make use of the data in a
    DW

3
Data Warehouse
Enterprise Database
Customers
Orders
Transactions
Vendors
Etc
Etc
  • Data Miners
  • Farmers they know
  • Explorers - unpredictable

Copied, organized summarized
Data Warehouse
Data Mining
4
Data Warehouse
  • A data warehouse is a copy of transaction data
    specifically structured for querying, analysis
    and reporting.
  • Note that the data warehouse contains a copy of
    the transactions which are not updated or changed
    later by the transaction system.
  • Also note that this data is specially structured,
    and may have been transformed when it was copied
    into the data warehouse.

5
Data Mart
  • A Data Mart is a smaller, more focused Data
    Warehouse a mini-warehouse.
  • A Data Mart typically reflects the business rules
    of a specific business unit within an enterprise.

6
Data Mining Flavors
  • Supervised or Directed Attempts to explain or
    categorize some particular target field such as
    income or response. (regression)
  • Unsupervised or Undirected Attempts to find
    patterns or similarities among groups of records
    without the use of a particular target field or
    collection of predefined classes. (clustering)

7
Terminologies
  • Independent variable
  • Input, predictor, attribute, x
  • Dependent variable
  • Output, target, outcome, response, y

8
Terminologies
  • Supervised learning
  • Given D, a set of (x,y) pairs, find f such that y
    f(x). X can be a vector.
  • Classification when y is categorical, eg yes/no
  • Regression or Prediction when y is continuous

9
Terminologies
  • Danger of over-fitting
  • T shirts for all SNU students. fit to sample
    students. (sample vs population)
  • Fig 2.1 and 2.2 of textbook
  • Complex model (eg higher order polynomial)
    overfits
  • Training / Validation / Test split of D for
    supervised learning
  • Training used to find fs
  • Val used to find the final f (avoid overfitting)
  • Test used to evaluate the final f
  • Fig 2.3

10
Terminologies
  • Unsupervised learning
  • Given D, a set of x,
  • Clustering is to find a partition where each
    subset contains similar x while different subsets
    contain different xs.
  • Association Rule mining or Affinity Grouping
    is to find an association rule/pattern among xs.
  • Overfitting? Train/Vali/Test split?

11
Terminologies
  • Variable selection or dimensionality reduction
  • Parsimony or compactness
  • Data/Pattern selection/reduction, sampling
  • Outliers
  • Data that lie outside the usual range
  • Error? Or important pattern?
  • Missing value
  • Remove
  • Impute or replace with a value
  • Normalizing or standardization or scaling
  • Age (0100) vs salary (010M)

12
Data Minings Biggest Challenge
  • The largest challenge a data miner may face is
    the sheer volume of data in the data warehouse.
  • It is quite important, then, that summary data
    also be available to get the analysis started.
  • A major problem is that this sheer volume may
    mask the important relationships the data miner
    is interested in.
  • The ability to overcome the volume and be able to
    interpret the data is quite important.

13
But
  • Finding patterns is not enough
  • Business must
  • Respond to the patterns by taking action
  • Turning
  • Data into Information
  • Information into Action
  • Action into Value
  • Hence, the Virtuous Cycle of DM

14
Data Minings Virtuous Cycle
  • Identify the business opportunity
  • Mining data to transform it into actionable
    information
  • Acting on the information
  • Measuring the results

Textbook interchanges problem with
opportunity
15
1. Identify the Business Opportunity
  • Many business processes are good candidates
  • New product introduction
  • Direct marketing campaign
  • Understanding customer attrition/churn
  • Evaluating the results of a test market
  • Measurements from past DM efforts
  • What types of customers responded to our last
    campaign?
  • Where do the best customers live?
  • Are long waits in check-out lines a cause of
    customer attrition?
  • What products should be promoted with our XYZ
    product?
  • TIP When talking with business users about data
    mining opportunities, make sure you focus on the
    business problems/opportunities and not on
    technology and algorithms.

16
2. Mining data to transform it into actionable
information
  • Success is making business sense of the data
  • Numerous data issues
  • Bad data formats (alpha vs numeric, missing,
    null, bogus data)
  • Confusing data fields (synonyms and differences)
  • Lack of functionality (I wish I could)
  • Legal ramifications (privacy, etc.)
  • Organizational factors (unwilling to change our
    ways)
  • Lack of timeliness

17
3. Acting on the Information
  • This is the purpose of Data Mining with the
    hope of adding value
  • What type of action?
  • Interactions with customers, prospects, suppliers
  • Modifying service procedures
  • Adjusting inventory levels
  • Consolidating
  • Expanding
  • Etc

18
4. Measuring the Results
  • Assesses the impact of the action taken
  • Often overlooked, ignored, skipped
  • Planning for the measurement should begin when
    analyzing the business opportunity, not after it
    is all over
  • Assessment questions (examples)
  • Did this ____ campaign do what we hoped?
  • Did some offers work better than others?
  • Did these customers purchase additional products?
  • Tons of others

19
Data Minings Virtuous Cycle
  • Identify the business opportunity
  • Mining data to transform it into actionable
    information
  • Acting on the information
  • Measuring the results

Textbook interchanges problem with
opportunity
20
Learning Things that are not True
  • Patterns may not represent any underlying rule
  • Sample may not reflect its parent population,
    hence bias
  • Data may be at the wrong level of detail
    (granularity aggregation)
  • Examples?

21
Example
22
Things that are True, but not Useful
  • Learning things that cannot be used
  • Examples?
  • result of marketing campaign

23
Data Mining Steps
  • Translate biz opportunity (problem) into DM
    opportunity (problem)
  • Select appropriate data
  • Get to know the data
  • Create a model set
  • Fix problems with the data
  • Transform data to bring information to the
    surface
  • Build models
  • Assess models
  • Deploy models
  • Assess results
  • Begin again

24
Data mining is not a linear process
25
Data Mining in Press
  • the 2008 technologies by Technology Review
  • Read two articles
  • Reality Mining
  • Surprise Modeling
Write a Comment
User Comments (0)
About PowerShow.com