Mining Case Bases for Action Recommendation - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Mining Case Bases for Action Recommendation

Description:

... do to help Sammy get accepted to ... What to do to help Sammy to get loan approval? Plan 1 (Dylan): get ... How to prevent Sammy from leaving the ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 23
Provided by: QYa3
Category:

less

Transcript and Presenter's Notes

Title: Mining Case Bases for Action Recommendation


1
Mining Case Bases for Action Recommendation
  • Qiang Yang and Hong Cheng
  • HKUST

2
Motivation
  • Data Mining
  • Given a dataset, can discover who is a good
    customer
  • But
  • What to do with a bad customer?
  • What to do with a good customer whos turning
    bad?
  • Our answer mine actionable plans

Data Mining
Actions
3
Example 1
  • What to do to help Sammy get accepted to
    postgraduate school?
  • Plan 1 (Dylan) improve Rec to above 4 AND GPA to
    3.6
  • Plan 2 (Steve) improve TOEFL to 610

4
Example 2
  • What to do to help Sammy to get loan approval?
  • Plan 1 (Dylan) get higher income to 80K
  • Plan 2 (Beatrice) get Married!!

5
Example 3
  • How to prevent Sammy from leaving the mobile
    phone company?
  • Plan 1 (Dylan) change Fee Reduction to 30, and
    increase free calls by 20 min
  • Plan 2 (Beatrice) M? F !! ??

6
Action Recommendation Problem
  • Recognize who are the (potential) negative-class
    members
  • Segmentation Problem
  • Recommend near optimal actions to help them
    switch to positive class
  • Planning to achieve goals
  • What does near-optimal mean?
  • Cost
  • Probability of success

Utilities
7
Our Solution
  • First, discover who are the negative members
  • May need to build classifiers using machine
    learning
  • Second, discover goals or positive cases
  • May need different role models for different
    negative members
  • Need to balance
  • Efficiency, Optimality, Cost
  • Utilities
  • Focus of this paper case mining

8
Case Based Reasoning Cycle
  • Create
  • Maintain
  • Retrieve
  • Revise
  • Key Point how to find case bases?

9
Our Solution
  • find cluster centroids
  • find class boundary
  • Need distance metric

10
Related Work
  • Mining Optimal Actions for Intelligent CRM
    Ling, Chen, Yang, Chen, Industry Applications
    Paper (ICDM 02)
  • Sequential Cost-Sensitive Decision Making with
    Reinforcement LearningPednault, Abe, and
    Zadrozny. (KDD'02)
  • MetaCost A general method for making classifiers
    cost sensitive. P. Domingos. (KDD99)

11
Data Cleaning
  • Taking a database with large number of
    attributes, many attributes are irrelevant
  • We remove the irrelevant attributes using the
    OddsLogRatio Method
  • Mlademnic and Grobelnik, 1999
  • If Av occurs frequently in ve instances while
    infrequently in -ve instances, or vice versa,
    this attribute-value pair has discriminative power

12
Solution 1 Clustering ? Centroids
13
Solution 2 SVM ? Boundary Points
14
Case Retrieval and Adaptation
  • x negative instance t positive instance
  • p(t) the probability density around an
    instance t,
  • cost(x, t) the cost of switch from x to target
    case t,
  • maxCost the maximum value among the different
    costs of switching from x to every possible case
    y in the case base.

Cost of an attribute
15
Artificial Two-Class Data (Min Cost)
  • Task
  • comparing Centroid-based and SVM methods
  • Conjecture
  • performance depends on distribution ?
    well-separated versus closely mixed

16
Artificial Two-Class Data (Min Cost)
17
Artificial Two-Class Data (CPU Time)
18
UCI Data Set German Data
19
IBM Synthetic Data 100K records
20
Experiment on KDDCUP 98
  • 479 attributes, 95,412 training instances

21
On-going Work Multi-step Planning
  • Construct the association graph from the
    frequent subsequences.

22
Conclusions
  • Objectives
  • Case Base Mining from databases
  • Purpose generating plans for CRM
  • Algorithms
  • SVM is OK for small, well separated data
  • Centroids-based method is better for large,
    mixed-distribution data
  • Utility based more realistic
  • Future
  • better utility models, apply to clustering
  • Negative member identification
Write a Comment
User Comments (0)
About PowerShow.com