Investigative Data Mining in Fraud Detection - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Investigative Data Mining in Fraud Detection

Description:

Investigative Data Mining in Fraud Detection Overview (1) Investigative Data Mining and Problems in Fraud Detection Definitions Technical and Practical Problems ... – PowerPoint PPT presentation

Number of Views:302

Avg rating:3.0/5.0

Slides: 46

Provided by: Clift151

Category:

more less

Transcript and Presenter's Notes

Title: Investigative Data Mining in Fraud Detection

1
Investigative Data Mining in Fraud Detection
2
Overview (1)

Investigative Data Mining and Problems in Fraud
Detection
Definitions
Technical and Practical Problems
Existing Fraud Detection Methods
Widely used methods
The Crime Detection Method
Comparisons with Minority Report
Classifiers as Precogs
Combining Output as Integration Mechanisms
Cluster Detection as Analytical Machinery
Visualisation Techniques as Visual Symbols

3
Overview (2)

Implementing the Crime Detection System
Preparation Component
Investigation objectives
Collected data
Preparation of collected data to achieve
objectives
Implementing the Crime Detection System Action
Component
Which experiments generate best predictions?
Which is the best insight?
How can the new models and insights be deployed
within an organisation?
Contributions and Recommendations
Significant research contributions
Proposed solutions

4
Literature and Acknowledgements
Dick P K (1956) Minority Report, Orion Publishing
Group, London, Great Britain. Abagnale F (2001)
The Art of the Steal How to Protect Yourself and
Your Business from Fraud, Transworld Publishers,
NSW, Australia. Mena J (2003) Investigative Data
Mining for Security and Criminal Detection,
Butterworth Heinemann, MA, USA.
Elkan C (2001) Magical Thinking in Data Mining
Lessons From CoIL Challenge 2000, Department of
Computer Science and Engineering, University of
California, San Diego, USA. Prodromidis A (1999)
Management of Intelligent Learning Agents in
Distributed Data Mining Systems, Unpublished PhD
thesis, Columbia University, USA. Berry M and
Linoff G (2000) Mastering Data Mining The Art
and Science of Customer Relationship Management,
John Wiley and Sons, New York, USA. Han J and
Kamber M (2001) Data Mining Concepts and
Techniques, Morgan Kaufmann Publishers. Witten I
and Frank E (1999) Data Mining Practical Machine
Learning Tools and Techniques with Java, Morgan
Kauffman Publishers, CA, USA.
5
Investigative Data Mining and Problems in Fraud
Detection
6
Investigative Data Mining - Definitions

Investigative
Official attempt to extract some truth, or
insights, about criminal activity from data
Data Mining
Process of discovering, extracting and analysing
of meaningful patterns, structure, models, and
rules from large quantities of data.
Spans several research areas such as database,
machine learning, neural networks, data
visualisation, statistics, and distributed data
mining.
Investigative Data Mining
Applied to law enforcement,
Industry, and
Private databases

7
Fraud Detection - Definitions

Fraud
Criminal deception, use of false representations
to obtain an unjust advantage, or to injure the
rights and interests of another
Diversity of Fraud
Against organisations, governments, and
individuals
Committed by external parties, internal
management, and non-management employees
Caused by customers, service providers, and
suppliers
Prevalent in insurance, credit card, and
telecommunications
Most common in automobile, travel, and household
contents
Cost of Fraud
Automobile insurance fraud alone AUD32 million
for nine Australian companies

8
Fraud Detection Problems - Technical

Imperfect data
Usually not collected for data mining
Inaccurate, incomplete, and irrelevant data
attributes
Highly skewed data
Many more legitimate than fraudulent examples
Higher chances of overfitting
Black-box predictions
Numerical outputs incomprehensible to people

9
Fraud Detection Problems - Practical

Lack of domain knowledge
Important attributes, likely relationships, and
known patterns
Three types of fraud offenders and their modus
operandi
Great variety of fraud scenarios over time
Soft fraud Cost of investigation gt Cost of
fraud
Hard fraud Circumvents anti-fraud measures
Assessing data mining potential
Predictive accuracy are useless for skewed data
sets

10
Existing Fraud Detection Methods
11
Widely Used Methods in Fraud Detection

Insurance Fraud
Cluster detection -gt decision tree induction -gt
domain knowledge, statistical summaries, and
visualisations
Special case neural network classification -gt
cluster detection
Credit Card Fraud
Decision tree and naive Bayesian classification
-gt stacking
Telecommunications Fraud
Cluster detection -gt scores and rules

12
The Crime Detection Method
13
Comparisons with Minority Report

Precogs
Foresee and prevent crime
Each precog contains multiple classifiers
Integration Mechanisms
Combine predictions
Analytical Machinery
Record, study, compare, and represent predictions
in simple terms
Single computer
Visual Symbols
Explain the final predictions
Graphical visualisations, numerical scores, and
descriptive rules

14
The Crime Detection Method
15
Classifiers as Precogs

Precog One Naive Bayesian Classifiers
Statistical paradigm
Simple and Fast
Redundant and not normally distributed
attributes
Precog Two C4.5 Classifiers
Computer metaphor
Explain patterns and quite fast
Scalability and efficiency issues
Precog Three Backpropagation Classifiers
Brain metaphor
Long training times and extensive parameter
tuning
Advantages and disadvantages
For details on how the problems were tackled,
please refer to the thesis

16
Combining Output as Integration Mechanisms

Cross Validation
Divides training data into eleven data partitions
Each data partition used for training, testing,
and evaluation once
Slightly better success rate
Bagging
Unweighted majority voting on each example or
instance
Combine predictions from same algorithm or
different algorithms
Increases success rate
For details on how the technique works, please
refer to the thesis

1 2 3 4 5 6 7 8 9 10 11 Main Prediction
fraud fraud legal fraud legal fraud legal fraud fraud legal fraud fraud
fraud fraud fraud legal legal fraud legal legal legal fraud legal legal
17
Combining Output as Integration Mechanisms

Stacking
Meta-classifier
Base classifiers present predictions to
meta-classifier
Determines the most reliable classifiers
For details on how the technique works, please
refer to the thesis

18
Combining Output as Integration Mechanisms

Stacking (2)

19
Cluster Detection as Analytical
MachineryVisualisation Techniques as Visual
Symbols

Analytical Machinery Self Organising Maps
Clusters high dimensional elements into more
simple, low dimensional maps
Automatically groups similar instances together
Do not specify an easy-to-understand model
Visual Symbols Classification and Clustering
Visualisations
Classification visualisation confusion matrix
- naive Bayesian visualisation
Clustering visualisation - column graph
For details on how the problems were tackled,
please refer to the thesis

20
Steps in the Crime Detection Method
21
Implementing the Crime Detection
SystemPreparation Component
22
The Crime Detection System
23
The Crime Detection System Preparation Component

Problem Understanding
Determine investigation objectives
- Choose
- Explain
Assess situation
- Available tools
- Available data set
- Cost model
Determine data mining objectives
- Max hits/Min false alarms
Produce project plan
- Time
- Tools
For details, refer to the thesis

24
The Crime Detection System Preparation Component

Data Understanding
Describe data
- 11550 examples (1994 and 1995)
- 3870 instances (1996)
- 33 attributes
- 6 fraudulent
Explore data
- Claim trends by month
- Age of vehicles
- Age of policy holder
Verify data
- Good data quality
- Duplicate attribute, highly skewed attributes

25
The Crime Detection System Preparation Component

Data Preparation
Select data
- All, except one attribute, are retained for
analysis
Clean data
- Missing values replaced
- Spelling mistakes corrected
Format data
- All characters converted to lowercase
- Underscore symbol

26
The Crime Detection System Preparation Component

Data Preparation
Construct data
- Derived attributes
- weeks_past
- is_holidayweek_claim
- age_price_wsum
- Numerical input
- 14 attributes scaled between 0 and 1
- 19 attributes represented by one-of-N or
binary encoding
For details, refer to the thesis

27
The Crime Detection System Preparation Component

Data Preparation
Partition data
- Data multiplication or oversampling
- For example, 50/50 distribution

28
Implementing the Crime Detection SystemAction
Component
29
The Crime Detection System Action Component

Modelling
Generate experiment design (1)

Experiment Number Technique or Algorithm Data Distribution
I Naive Bayes 50/50
II Naive Bayes 40/60
III Naive Bayes 30/70
IV Backpropagation Determined by Experiments I, II, III
V C4.5 Determined by Experiments I, II, III
VI Bagging -
VII Stacking -
VIII Stacking and Bagging -
IX Backpropagation 5/95
X Self Organising Map 5/95
30
The Crime Detection System Action Component

Modelling
Generate experiment design (2)

Test A B C D E F G H I J K Overall Success Rate
Training Set Partition 1 2 3 4 5 6 7 8 9 10 11
Testing Set Partition 2 3 4 5 6 7 8 9 10 11 1
Evaluation Set Partition 3 4 5 6 7 8 9 10 11 1 2
Evaluating Success Rate A B C D E F G H I J K Average W
Bagging Predictions A B C D E F G H I J K Bagged X
Producing Classifier 1 2 3 4 5 6 7 8 9 10 11
Scoring Set Success Rate A B C D E F G H I J K Average Y
Bagging Main Score Predictions A B C D E F G H I J K Bagged Z
31
The Crime Detection System Action Component

Modelling
Build models (1)
- Bagged X outperformed Averaged W
- Bagged Z performed marginally better than
Averaged Y
- Experiment II achieved highest cost savings
than I and III
- 40/60 distribution most appropriate under the
cost model
- Experiment V achieved highest cost savings
than II and IV
- C4.5 algorithm is the best algorithm for the
data set

32
The Crime Detection System Action Component

Modelling
Build models (2)
- Experiment VIII achieved slightly better cost
savings than V
- Combining models from different algorithms is
better than the single algorithm
- The top 15 classifiers from stacking consisted
of 9 C4.5, 4 backpropagation, and 2 naive
Bayesian classifiers
For details, refer to the thesis

33
The Crime Detection System Action Component

Modelling
Build models (3)
- No scores from D2K software
- Experiment IX demonstrates sorted scores and
predefined thresholds result in focused
investigations
- Satisfies Paretos Law
- Rules did not provide insights
- Already in domain knowledge and data attribute
exploration
- Experiment X requires 5 clusters for
visualisation
- age_of_policyholder
- weeks_past, is_holidayweek_claim
- make, accident_area, vehicle_category,
age_price_wsum, number_of_cars, base_policy
For details, refer to the thesis

34
The Crime Detection System Action Component

Modelling
Assess models (1)
- Training and score data sets too small
- Students t-test with k-1 degrees of freedom
- McNemars hypothesis test
For details, refer to the thesis

Rank Experiment Number Technique or Algorithm Cost Savings Overall Success Rate Percentage Saved
1 VIII Stacking and Bagging 167,069 60 29.71
2 V C4.5 40/60 165,242 60 29.38
3 VI Bagging 127,454 64 22.66
4 VII Stacking 104,887 70 18.65
5 II Naive Bayes 40/60 94,734 70 16.85
6 IX Backpropagation 5/95 89, 232 75 15.87
7 IV Backpropagation 40/60 -6,488 92 -1.15
35
The Crime Detection System Action Component

Modelling
Assess models (2)
- Clusters 1, 2, and 3 have higher occurrences
of fraud in 1996
- Clusters 1, 3, and 5 consist of several makes
of inexpensive cars
- Utility vehicles, rural areas, and liability
policies
- Clusters 2 and 4 contain claims submitted many
weeks after the accidents
- Toyota, sport cars, and multiple policies

Cluster Number of instances Descriptive Cluster Profile
1 215 Cluster 1 contains a large number of 21 to 25 year olds. The insured vehicles are relatively new.
2 166 Cluster 2 also contains a large number of 21 to 25 year olds. The claims are usually reported 10 weeks past the accident. The insured vehicles are usually sport cars.
3 268 Cluster 3 has almost all 16 to 17 year old fraudsters. The insured vehicles are mainly Acuras, Chevrolets, and Hondas. The insured vehicles are usually utility cars.
4 103 Cluster 4 has claims are usually reported 20 weeks past the accident. Almost all insured cars are Toyotas and the fraudster has a high probability of getting 3 to 4 cars insured. Claims are unlikely to be submitted during holiday periods.
5 171 Cluster 5 consists of mainly Fords, Mazdas, and Pontiacs. Higher chances of rural accidents and the base policy type are likely to be liability.
36
The Crime Detection System Action Component

Modelling
Assess models (3)
- Statistical evaluation of descriptive cluster
profiles
- Cluster 4
- 3121 Toyota car claims, 6 or 187 fraudulent
- 2148 Toyota sedan car claims, expect 6 or 129
to be fraudulent with 10 standard deviation
- Actual 171 fraudulent Toyota sedan car claims,
z-score of 3.8 standard deviation
- This is an insight because it is statistically
reliable, not known previously, and actionable

Cluster Group Claims No. and of Fraud Sub-Group Claims Expected No. of Fraud Actual No. of Fraud z-Score
1 All claims 15420 923 (6) 21 to 25 year olds 108 2 16 5
2 Sport cars 5358 84 (1.6) 21 to 25 year olds Sport cars 32 1 10 9.5
3 16 to 17 year olds 320 31 (9.7) Honda 16 to 17 year olds 31 3 31 9.3
37
The Crime Detection System Action Component

Modelling
Assess models (4)
- Append main predictions from 3 algorithms and
final predictions from bagging to 615 fraudulent
instances
- 25 cannot be detected by any algorithms,
highest lift in Clusters 1 and 2
- All can be detected by at least 1 algorithm in
Cluster 3
- Not all fraudulent instances can be detected
- Domain knowledge, cluster detection, and
statistics offer explanation
- 101 cannot be detected by 2 algorithms
- Weakness of bagging
- Other alternatives

38
The Crime Detection System Action Component

Evaluation
Evaluate results
- Experiment VIII generate the best predictions
with cost savings of about 168, 000. This is
almost 30 of total cost savings possible
- Most statistically reliable insight is the
knowledge of 21 to 25 year olds who drive sport
cars
Review process
- Unsupervised learning to derive clusters first
- More training data partitions
- More skewed distributions
- Cost model too simplistic
- Probabilistic Neural Networks

39
The Crime Detection System Action Component

Deployment
Plan deployment
- Manage geographically distributed databases
using distributed data mining
- Take time into account
Plan monitoring and maintenance
- Determined by rate of change in external
environment and organisational requirements
- Rebuild models when cost savings are below a
certain percentage of maximum cost savings
possible

40
Contributions and Recommendations
41
Contributions

New Crime Detection Method
Crime Detection System
Cost Model
Visualisations
Statistics
Score-based Feature
Extensive Literature Review
In-depth Analysis of Algorithms

42
Recommendations Technical Problems

Imperfect data
Statistical evaluation and confidence intervals
Preparation component of crime detection system
Derived attributes
Cross validation
Highly skewed data
Partitioned data with most appropriate
distribution
Cost model
Black-box predictions
Classification and clustering visualisation
Sorted scores and predefined thresholds, rules

43
Recommendations Practical Problems

Lack of domain knowledge
Action component of crime detection system
Extensive literature review
Great variety of fraud scenarios over time
SOM
Crime detection method
Choice of algorithms
Assessing data mining potential
Quality and quantity of data
Cost model
z-scores

44
Transforming Minority Report from Science
Fiction to Science Fact
INVESTIGATIVE DATA MINING IN FRAUD DETECTION

1 INTRODUCTION
The world is overwhelmed with terabytes of
data
but there are only few effective and efficient
ways to analyse and interpret it.
The purpose of the research is to simulate the
Precrime System from the science fiction novel,
Minority Report, using data mining methods and
techniques, to extract insights from enormous
amounts of data to
detect white-collar crime
The application is in uncovering fraudulent
claims in automobile insurance
The objectives are to overcome the technical
and practical problems of data mining in fraud
detection

3 RESULTS ON AUTOMOBILE INSURANCE DATA
Through the use of integration mechanisms, the
highest cost savings is achieved
The analytical machinery facilitated the
interesting discovery of 21 to 25 year old
fraudsters who used sport cars as their crime
tool

4 DISCUSSION
Black-box approach from the precogs are
transformed into a
semi-transparent approach
by using analytical machinery and visual symbols
to analyse and interpret the predictions
Precogs can be
shared between organisations
to increase the accuracy of the predictions,
without violating competitive and legal
requirements
The analytical machinery transforms
multidimensional data into two-dimensional
clusters which contain similar data to enable the
data analyst to easily
differentiate the groups of fraud. It also allows
the data analyst to
assess the algorithms ability
to cope with evolving fraud
The crime detection method provides a flexible
step-by-step approach
to generating predictions from any three
algorithms, and uses some form of integration
mechanisms to increase the likelihood of correct
final predictions

Precogs, or precognitive elements, are entities
which have the knowledge to predict that
something will happen. Figure 1 uses three
precogs to foresee and prevent crime by stopping
potentially guilty criminals
Each precog contains multiple classification
models, or classifiers, trained with one data
mining technique to extrapolate the future
The three precogs are different from each
other because they are trained by different data
mining algorithms. For example, the first,
second, and third precog are trained using naive
Bayesian, C4.5, and backpropagation algorithms.
The precogs require numerical inputs of past
examples to output corresponding predictions for
new instances

2 THE CRIME DETECTION METHOD

Integration Mechanisms are needed. As each
precog outputs its many predictions for each
instance, all are counted and the class with the
highest tally is chosen as the main prediction
Figure 1 shows that the main predictions can
be combined either by majority count (bagging) or
the predictions can be fed back into one of the
precogs (stacking), to derive a final prediction

5 CONCLUSION
Other possible applications of this crime
detection method are
Anti-terrorism
Burglary
Customs declaration fraud
Drug-related homocides
Drug smuggling
Government financial transactions
Sexual offences

Figure 1 Predictions using Precogs, Analytical
Machinery, and Visual Symbols

Analytical Machinery, or cluster detection,
records, studies, compares, and represents the
precogs predictions in easily understood terms
The analytical machinery is represented by the
Self Organising Map (SOM) which clusters the
similar data into groups
Figure 1 demonstrates that main predictions
and final predictions are appended to the
clustered data to determine the fraud
characteristics which cannot be detected, and the
most important attributes are selected for
visualisation

Scores are numbers with a specified range,
which indicates the relative risk that a
particular data instance maybe fraudulent, to
rank instances
Rules are expressions in the form of Body ?
Head, where Body describes the conditions under
which the rule is generated and Head is the class
label

Visual Symbols, or visualisations, integrate
human perceptual abilities in the data analysis
process by presenting the data in some visual and
interactive form
The naive Bayesian and C4.5 visualisations
facilitate analysis of classifier predictions and
performance, and column graphs aid the
interpretation of clustering results