Affinity Analysis for Selecting "Next Best Activity" - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Affinity Analysis for Selecting "Next Best Activity"

Description:

occupation. debt. Blue collar. Blue collar. Blue collar. Low. Blue collar. Blue collar. Blue collar ... occupation? White collar. accept. 41,7% reject. 58,3 ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 19
Provided by: tomb63
Category:

less

Transcript and Presenter's Notes

Title: Affinity Analysis for Selecting "Next Best Activity"


1
Decision Treesdivide conquer
Tom Breur London, 23 April 2008 tombreur_at_xlntconsu
lting.com www.xlntconsulting.com 31-6-463 468 75
2
Agenda
  • Features of decision trees
  • Overview algorithms
  • Exploration and prediction
  • Automatic ? manual
  • Pitfalls best practices
  • Decision trees ? regression

3
Features of decision trees
  • Symbolic analysis ? recursive partitioning ?
    decision trees (inductive learning)
  • Split record set arriving at each node, using
    best variable ? rinse repeat
  • Stop when
  • All records in leaf belong to same class
  • No variable can be found for splitting

4
Example decision tree
buroscore
credit?
occupation
debt
No
Blue collar
Low
Low
Yes
White collar
Low
High
occupation?
White collar
Blue collar
High
No
Blue collar
Low
White collar
No
High
High
Blue collar
Low
No
High
debt?
Yes
White collar
High
Medium
Low, Medium
High
Yes
White collar
Low
High
Yes
White collar
Medium
High
No
Blue collar
High
Low
Blue collar
No
High
High
No
Blue collar
High
Low
Yes
White collar
Medium
High
5
Overview algorithms (1)
  • Usually categorical target, sometimes continuous
    (e.g. CART?)
  • Usually binary target, sometimes multiple
    categories (e.g. CHAID)
  • Usually statistical loss function, sometimes
    information theory (e.g. ID3, C4.5, C5.0)
  • Binary or multiple splits nominal, ordinal or
    continuous predictors

6
Overview algorithms (2)
7
Exploration and prediction
  • Segmented prediction
  • Insight in complex structures
  • Discover noteworthy interactions
  • Sanity check
  • Foster adoption
  • Spur data-driven business innovation

Every predictive model must be accompanied by
insight
8
Automatic ? manual
  • Manual apply domain expertise (aka model
    engineering)
  • Variable selection
  • Business problem
  • Implementation specification
  • Short ? long term lift characteristics
  • Transient/behavioural ? stable variables
  • Manual tree building drives variable development

9
Pitfalls best practices (1)
  • Pitfall 1
  • Leakers?/Anachronistic variables? when the
    model looks too good to be true, it probably is
  • Best practice 1
  • Plot sorted univariate relation between input
    output, look for a drop (suspect) ? e.g. 1st
    two variables next slide

10
Example best practice 1
11
Pitfalls best practices (2)
  • Pitfall 2
  • Never assume it is the tree, it is always a
    (possible) tree
  • Best practice 2
  • Describe associations between most important
    input variables and target, even if variables do
    not appear in the (eventual) tree

12
Pitfalls best practices (3)
  • Pitfall 3
  • Overtraining, overly optimistic prognosis
  • Best practice 3
  • Divide mining set into 3 parts? training-,
    test-, and evaluation set(50-40-10 feels about
    right)

13
Pitfalls best practices (4)
  • Pitfall 4
  • Replace missing by constant (mean/mode)
  • Best practice 4
  • Identify rightfully missing yes/no?
  • If replacing, append boolean was previously
    missing
  • Avoid adding bias, intelligent imputation

14
Decision trees ? regression
  • Few sound comparative studies
  • Most familiar technique works best
  • On average regression predicts more accurately
  • Alternative considerations
  • Spur development of variables
  • Innovate business

15
Conclusion
  • Decision trees are
  • Flexible
  • Versatile
  • Gentle learning curve
  • Superior insight drives
  • Development of (better) predictive variables
  • Innovation of business
  • Manual tree building enhances data learning

16
Resources - history decision trees
  • AID Morgan Sonquist (1963)
  • THAID Messenger Mandell (1972), Morgan
    Messenger (1973), Morgan Sonquist (1973)
  • CHAID Hartigan (1975)
  • CHAID Kass (1980)
  • CART Breiman, Friedman, Olshen Stone (1984)
  • ID3 Quinlan (1986)
  • FACT Loh Vanichestakul (1988)
  • Exhaustive CHAID Biggs, de Ville Suen (1991)
  • MARS Friedman (1991)
  • C4.5 Quinlan (1992)
  • CHAID Magidson (1993, 1994)
  • FIRM Hawkins (1995)
  • QUEST Loh Shih (1997)
  • C5.0 Quinlan (1998)
  • CRUISE Kim Loh (2001)
  • GUIDE Loh (2002)

17
Resources - software
  • www.angoss.com
  • www.dtreg.com
  • www.lumenaut.com
  • www.micrsoft.com
  • www.portraitsoftware.com
  • www.rulequest.com
  • www.salford-systems.com
  • www.sas.com
  • www.spss.com
  • www.vanguardsw.com
  • www.xlstat.com
  • www.xpertrule.com

18
Resources - references
  • ?Breiman, Friedman, Olshen Stone (1984)
    Classification and Regression Trees. ISBN
    0412048418
  • ?Berry Linoff (1999) Mastering Data Mining.
    ISBN 0471331236
  • ?Pyle (1999) Preparing Data for Data Mining.
    ISBN 1558605290
  • ?Pyle (2003) Business Modeling and Data Mining.
    ISBN 155860653X
Write a Comment
User Comments (0)
About PowerShow.com