Rule Discovery for Fraud Detection - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Rule Discovery for Fraud Detection

Description:

Given a set of page views, predict whether the visitor will view ... Non-crawlers. Hand selected rules with near perfect accuracy. Rule Generator. Applying ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 14
Provided by: gadip
Category:

less

Transcript and Presenter's Notes

Title: Rule Discovery for Fraud Detection


1
KDD Cup 2000 Question 1
2
Overview
  • Objective
  • Given a set of page views, predict whether the
    visitor will view another page or not
  • Data
  • Raw Data - Clicks
  • Aggregated Data - Sessions
  • Some sessions clipped in the middle
  • Indicator Session continues
  • Methods and Tools
  • Exploratory Data Analysis - SAS
  • Classification Tree Amdocs Business Insight
    Tool
  • Decision tree
  • Rules Extraction
  • Modeling
  • Combining models

3
The Winning Model - Introduction
This model combines Artificial intelligence,
i.e. Automated procedures with Human intuition /
Domain knowledge decisions
4
The Winning Model - general scheme
5
Building Main Model
Decision Tree
Decision Tree
Decision Tree
5 trees
5 trees
5 trees
built on 34000 cases
built on 34000 cases
built on 34000 cases
6
Description of sub-models
Each model captures a different aspect of the
overall behavior in the data. Combining or
ensembling the models provides the best
prediction results.
Best rule
Chooses most accurate rule satisfied by each
record
Logistic regression on rule set raw field
values combine to define score for each record
Hybrid Model
Logistic regression on rule set defines score for
each record as a combination of rules the record
satisfies
Merged Rules
7
Applying Main Model
Decision Tree
Decision Tree
Decision Tree
5 trees
5 trees
5 trees
built on 34000 cases
built on 34000 cases
built on 34000 cases
Rule Generator
Rule Generator
Rule Generator
1466 rules
1466 rules
1466 rules
111 continue rules
111 continue rules
111 continue rules
Best
Hybrid
Merged
Best
Hybrid
Merged
Best
Hybrid
Merged
Rule
Model
Rules
Rule
Model
Rules
Rule
Model
Rules
8
The Winning Model - general scheme
9
Small Whitebox
10
Small Whitebox
Decision Tree
Applying The Model
11
The prediction
The prediction is not that much better than
choosing the majority class. But it is enough to
win first place!
12
Final Considerations
  • Since both types of errors (false positives and
    true negatives) are given the same weight, a
    segment must have a very high probability of
    continuing to justify not being classified as the
    majority class.
  • The ratio of continue / not continue in the test
    set must be estimated as accurately as possible.
  • The cutoff point (which score threshold divides
    the two classes) must be carefully chosen.

13

The End
Write a Comment
User Comments (0)
About PowerShow.com