Title: Profiting%20from%20Data%20Mining
1Profiting from Data Mining
- Bob Stine
- Department of Statistics
- The Wharton School, Univ of Pennsylvania
- April 5, 2002
- www-stat.wharton.upenn.edu/bob
2Overview
- Critical stages of data mining process
- Choosing the right data, people, and problems
- Modeling
- Validation
- Automated modeling
- Feature creation and selection
- Exploiting expert knowledge, insights
- Applications
- Little detail Biomedical finding predictive
risk factors - More detail Financial predicting returns on
the market - Lots of detail Credit anticipating the onset
of bankruptcy
3Predicting Health Risk
- Who is at risk for a disease?
- Example detect osteoporosis without expense of
x-ray - Goals
- Improving public health
- Savings on medical care
- Confirm an informal model with data mining
- Many types of features, interested groups
- Clinical observations of doctors
- Laboratory measurements, genetic
- Self-reported behavior
- Missing data
4Predicting the Stock Market
- Small, hands-on example
- Goals
- Better retirement savings?
- Money for that special vacation? College?
- Trade-offs risk vs return
- Lots of free data
- Access to accurate historical time trends, macro
factors - Recent data more useful than older data
- Simple modeling technique
- Validation
5Predicting the Market Specifics
- Build a regression model
- Response is return on the value-weighted SP
- Use standard forward/backward stepwise
- Battery of 12 predictors with interactions
- Train the model during 1992-1996 (training data)
- Model captures most of variation in 5 years of
returns - Retain only the most significant features
(Bonferroni) - Predict returns in 1997 (validation data)
- Another version in Foster, Stine Waterman
6Historical patterns?
?
7Fitted model predicts...
Exceptional Feb return?
8What happened?
Training Period
9Claimed versus Actual Error
Actual
SquaredPredictionError
Claimed
10Over-confidence?
- Over-fitting
- Model fits the training data too well better
than it can predict the future. - Greedy fitting procedure Optimization
capitalizes on chance - Some intuition
- Coincidences
- Cancer clusters, the birthday problem
- Illustration with an auction
- What is the value of the coins in this jar?
11Auctions and Over-fitting
- What is the value of these coins?
12Auctions and Over-fitting
- Auction jar of coins to a class of MBA students
- Histogram shows the bids of 30 students
- Most were suspicious, but a few were not!
- Actual value is 3.85
- Known as Winners Curse
- Similar to over-fittingbest model like high
bidder
13Profiting from data mining?
- Wheres the profit in this?
- Mining the miners vs getting value from your
data - Lost opportunities
- Importance of domain knowledge
- Validation as a measure of success
- Prediction provides an explicit check
- Does your application predict something?
14Pitfalls and Role of Management
- Over-fitting is dominated by other issues
- Management support
- Life in silos
- Coordination across domains
- Responsibility and reward
- Accountability
- Who gets the credit when it succeeds?Who suffers
if the project is not successful?
15Specific Potholes
- Moving targets
- Lets try this with something else.
- Irrational expectations
- I could have done better than that.
- Not with my data
- Its our data. You cant use it.
- You did not use our data properly.
16Back to a real application
- Emphasis on the statistical issues
17Predicting Bankruptcy
- Goal
- Reduce losses stemming from personal bankruptcy
- Possible strategies
- If can identify those with highest risk of
bankruptcyTake some action - Call them for a friendly chat about
circumstances - Unilaterally reduce credit limit
- Trade-off
- Good customers borrow lots of money
- Bad customers also borrow lots of money
18Predicting Bankruptcy
- Needle in a haystack
- 3,000,000 months of credit-card activity
- 2244 bankruptcies
- Simple predictor that all are OK looks pretty
good. - What factors anticipate bankruptcy?
- Spending patterns? Payment history?
- Demographics? Missing data?
- Combinations of factors?
- Cash Advance Las Vegas Problem
- We consider more than 100,000 predictors!
19Modeling Predictive Models
- Build the modelIdentify patterns in training
data that predict future observations. - Which features are real? Coincidental?
- Evaluate the modelHow do you know that it works?
- During the model construction phase
- Only incorporate meaningful features
- After the model is built
- Validate by predicting new observations
20Are all prediction errors the same?
- Symmetry
- Is over-predicting as costly as under-predicting?
- Managing inventories and sales
- Visible costs versus hidden costs
- Does a false positive a false negative?
- Classification in data mining
- Credit modeling, flagging risky customers
- False positive call a good customer bad
- False negative fail to identify a bad
- Differential costs for different types of errors
21Building a Predictive Model
- So many choices
- Structure What type of model?
- Neural net
- CART, classification tree
- Additive model or regression spline
- Identification Which features to use?
- Time lags, natural transformations
- Combinations of other features
- Search How does one find these features?
- Brute force has become cheap.
22Our Choices
- Structure
- Linear regression with nonlinearity via
interactions - All 2-way and some 3-way, 4-way interactions
- Missing data handled with indicators
- Identification
- Conservative standard error
- Comparison of conservative t-ratio to adaptive
threshold - Search
- Forward stepwise regression
- Coming Dynamically changing list of features
- Good choice affects where you search next.
23Identifying Predictive Features
- Classical problem of variable selection
- Thresholding methods (compare t-ratio to
threshold) - Akaike information criterion (AIC)
- Bayes information criterion (BIC)
- Hard thresholding and Bonferroni
- Arguments for adaptive thresholds
- Empirical Bayes
- Information theory
- Step-up/step-down tests
24Adaptive Thresholding
- Threshold changes to conform to attributes of
data - Easier to add features as more are found.
- Threshold for first predictor
- Compare conservative t-ratio to Bonferroni.
- Bonferroni is about Sqrt(2 log p)
- If something significant is found, continue.
- Threshold for second predictor
- Compare t-ratio to reduced threshold
- New threshold is about Sqrt(2 log p/2)
25Adaptive Thresholding Benefits
- EasyAs easy and fast as implementing the
standard criterion that is used in stepwise
regression. - TheoryResulting model provably as good as best
Bayes model for the problem at hand. - Real worldIt works! Finds models with real
signal, and stops when the signal runs out.
26Bankruptcy Model Construction
- Data reserve 80 for validation
- Training data
- 600,000 months
- 458 bankruptcies
- Validation data
- 2,400,000 months
- 1786 bankruptcies
- Selection via adaptive thresholding
- Compare sequence of t-statistics to Sqrt(2 log
p/q) - Dynamic expansion of feature space
27Bankruptcy Model Preview
- Predictors
- Initial search identifies 39
- Validation SS monotonically falls to 1650
- Linear fit can do no better than 1735
- Expanded search of higher interactions finds a
bit more - Nature of predictors comprising the interactions
- Validation SS drops 10 more
- Validation Lift chart
- Top 1000 candidates have 351 bankrupt
- More validation Calibration
- Close to actual Pr(bankrupt) for most groups.
28Bankruptcy Model Fitting
- Where should the fitting process be stopped?
29Bankruptcy Model Fitting
- Our adaptive selection procedure stops at a model
with 39 predictors.
30Bankruptcy Model Validation
- The validation indicates that the fit gets better
while the model expands. Avoids over-fitting.
31Bankruptcy Model Linear?
- Choosing from linear predictors (no interactions)
does not match the performance of the full search.
32Bankruptcy Model More?
- Searching higher-order interactions offers modest
improvement.
33Lift Chart
- Measures how well model classifies sought-for
group - Depends on rule used to label customers
- Very high thresholdLots of lift, but few
bankrupt customers are found. - Lower thresholdLift drops, but finds more
bankrupt customers.
34Generic Lift Chart
Model
Random
35Bankruptcy Model Lift
- Much better than diagonal!
36Calibration
- Classifier assigns Prob(BR)rating to a
customer. - Weather forecast
- Among those classified as 2/10 chance of BR,
how many are BR? - Closer to diagonal is better.
37Bankruptcy Model Calibration
- Over-predicts risk above claimed probability 0.4
38Summary of Bankruptcy Model
- Automatic, adaptive selection
- Finds patterns that predict new observations
- Predictive, but not easy to explain
- Dynamic feature set
- Current research
- Information theory allows changing search space
- Finds more structure than direct search could
find - Validation
- Essential only for judging fit.
- Better than hand-made models that take years to
create.
39So, wheres the profit in DM?
- Automated modeling has become very powerful,
avoiding problems of over-fitting. - Role for expert judgment remains
- What data to use?
- Which features to try first?
- What are the economics of the prediction errors?
- Collaboration
- Data sources
- Data analysis
- Strategic decisions