Profiting%20from%20Data%20Mining - PowerPoint PPT Presentation

About This Presentation

Title:

Profiting%20from%20Data%20Mining

Description:

Illustration with an auction. What is the value of the coins in this jar? Wharton ... Auctions and Over-fitting. Auction jar of coins to a class of MBA students ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 40

Provided by: shar261

Learn more at: http://www-stat.wharton.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Profiting%20from%20Data%20Mining

1
Profiting from Data Mining

Bob Stine
Department of Statistics
The Wharton School, Univ of Pennsylvania
April 5, 2002
www-stat.wharton.upenn.edu/bob

2
Overview

Critical stages of data mining process
Choosing the right data, people, and problems
Modeling
Validation
Automated modeling
Feature creation and selection
Exploiting expert knowledge, insights
Applications
Little detail Biomedical finding predictive
risk factors
More detail Financial predicting returns on
the market
Lots of detail Credit anticipating the onset
of bankruptcy

3
Predicting Health Risk

Who is at risk for a disease?
Example detect osteoporosis without expense of
x-ray
Goals
Improving public health
Savings on medical care
Confirm an informal model with data mining
Many types of features, interested groups
Clinical observations of doctors
Laboratory measurements, genetic
Self-reported behavior
Missing data

4
Predicting the Stock Market

Small, hands-on example
Goals
Better retirement savings?
Money for that special vacation? College?
Trade-offs risk vs return
Lots of free data
Access to accurate historical time trends, macro
factors
Recent data more useful than older data
Simple modeling technique
Validation

5
Predicting the Market Specifics

Build a regression model
Response is return on the value-weighted SP
Use standard forward/backward stepwise
Battery of 12 predictors with interactions
Train the model during 1992-1996 (training data)
Model captures most of variation in 5 years of
returns
Retain only the most significant features
(Bonferroni)
Predict returns in 1997 (validation data)
Another version in Foster, Stine Waterman

6
Historical patterns?
?
7
Fitted model predicts...
Exceptional Feb return?
8
What happened?
Training Period
9
Claimed versus Actual Error
Actual
SquaredPredictionError
Claimed
10
Over-confidence?

Over-fitting
Model fits the training data too well better
than it can predict the future.
Greedy fitting procedure Optimization
capitalizes on chance
Some intuition
Coincidences
Cancer clusters, the birthday problem
Illustration with an auction
What is the value of the coins in this jar?

11
Auctions and Over-fitting

What is the value of these coins?

12
Auctions and Over-fitting

Auction jar of coins to a class of MBA students
Histogram shows the bids of 30 students
Most were suspicious, but a few were not!
Actual value is 3.85
Known as Winners Curse
Similar to over-fittingbest model like high
bidder

13
Profiting from data mining?

Wheres the profit in this?
Mining the miners vs getting value from your
data
Lost opportunities
Importance of domain knowledge
Validation as a measure of success
Prediction provides an explicit check
Does your application predict something?

14
Pitfalls and Role of Management

Over-fitting is dominated by other issues
Management support
Life in silos
Coordination across domains
Responsibility and reward
Accountability
Who gets the credit when it succeeds?Who suffers
if the project is not successful?

15
Specific Potholes

Moving targets
Lets try this with something else.
Irrational expectations
I could have done better than that.
Not with my data
Its our data. You cant use it.
You did not use our data properly.

16
Back to a real application

Emphasis on the statistical issues

17
Predicting Bankruptcy

Goal
Reduce losses stemming from personal bankruptcy
Possible strategies
If can identify those with highest risk of
bankruptcyTake some action
Call them for a friendly chat about
circumstances
Unilaterally reduce credit limit
Trade-off
Good customers borrow lots of money
Bad customers also borrow lots of money

18
Predicting Bankruptcy

Needle in a haystack
3,000,000 months of credit-card activity
2244 bankruptcies
Simple predictor that all are OK looks pretty
good.
What factors anticipate bankruptcy?
Spending patterns? Payment history?
Demographics? Missing data?
Combinations of factors?
Cash Advance Las Vegas Problem
We consider more than 100,000 predictors!

19
Modeling Predictive Models

Build the modelIdentify patterns in training
data that predict future observations.
Which features are real? Coincidental?
Evaluate the modelHow do you know that it works?
During the model construction phase
Only incorporate meaningful features
After the model is built
Validate by predicting new observations

20
Are all prediction errors the same?

Symmetry
Is over-predicting as costly as under-predicting?
Managing inventories and sales
Visible costs versus hidden costs
Does a false positive a false negative?
Classification in data mining
Credit modeling, flagging risky customers
False positive call a good customer bad
False negative fail to identify a bad
Differential costs for different types of errors

21
Building a Predictive Model

So many choices
Structure What type of model?
Neural net
CART, classification tree
Additive model or regression spline
Identification Which features to use?
Time lags, natural transformations
Combinations of other features
Search How does one find these features?
Brute force has become cheap.

22
Our Choices

Structure
Linear regression with nonlinearity via
interactions
All 2-way and some 3-way, 4-way interactions
Missing data handled with indicators
Identification
Conservative standard error
Comparison of conservative t-ratio to adaptive
threshold
Search
Forward stepwise regression
Coming Dynamically changing list of features
Good choice affects where you search next.

23
Identifying Predictive Features

Classical problem of variable selection
Thresholding methods (compare t-ratio to
threshold)
Akaike information criterion (AIC)
Bayes information criterion (BIC)
Hard thresholding and Bonferroni
Arguments for adaptive thresholds
Empirical Bayes
Information theory
Step-up/step-down tests

24
Adaptive Thresholding

Threshold changes to conform to attributes of
data
Easier to add features as more are found.
Threshold for first predictor
Compare conservative t-ratio to Bonferroni.
Bonferroni is about Sqrt(2 log p)
If something significant is found, continue.
Threshold for second predictor
Compare t-ratio to reduced threshold
New threshold is about Sqrt(2 log p/2)

25
Adaptive Thresholding Benefits

EasyAs easy and fast as implementing the
standard criterion that is used in stepwise
regression.
TheoryResulting model provably as good as best
Bayes model for the problem at hand.
Real worldIt works! Finds models with real
signal, and stops when the signal runs out.

26
Bankruptcy Model Construction