Investigative Data Mining in Fraud Detection - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Investigative Data Mining in Fraud Detection

Description:

Investigative Data Mining in Fraud Detection Overview (1) Investigative Data Mining and Problems in Fraud Detection Definitions Technical and Practical Problems ... – PowerPoint PPT presentation

Number of Views:302
Avg rating:3.0/5.0
Slides: 46
Provided by: Clift151
Category:

less

Transcript and Presenter's Notes

Title: Investigative Data Mining in Fraud Detection


1
Investigative Data Mining in Fraud Detection
2
Overview (1)
  • Investigative Data Mining and Problems in Fraud
    Detection
  • Definitions
  • Technical and Practical Problems
  • Existing Fraud Detection Methods
  • Widely used methods
  • The Crime Detection Method
  • Comparisons with Minority Report
  • Classifiers as Precogs
  • Combining Output as Integration Mechanisms
  • Cluster Detection as Analytical Machinery
  • Visualisation Techniques as Visual Symbols

3
Overview (2)
  • Implementing the Crime Detection System
    Preparation Component
  • Investigation objectives
  • Collected data
  • Preparation of collected data to achieve
    objectives
  • Implementing the Crime Detection System Action
    Component
  • Which experiments generate best predictions?
  • Which is the best insight?
  • How can the new models and insights be deployed
    within an organisation?
  • Contributions and Recommendations
  • Significant research contributions
  • Proposed solutions

4
Literature and Acknowledgements
Dick P K (1956) Minority Report, Orion Publishing
Group, London, Great Britain. Abagnale F (2001)
The Art of the Steal How to Protect Yourself and
Your Business from Fraud, Transworld Publishers,
NSW, Australia. Mena J (2003) Investigative Data
Mining for Security and Criminal Detection,
Butterworth Heinemann, MA, USA.
Elkan C (2001) Magical Thinking in Data Mining
Lessons From CoIL Challenge 2000, Department of
Computer Science and Engineering, University of
California, San Diego, USA. Prodromidis A (1999)
Management of Intelligent Learning Agents in
Distributed Data Mining Systems, Unpublished PhD
thesis, Columbia University, USA. Berry M and
Linoff G (2000) Mastering Data Mining The Art
and Science of Customer Relationship Management,
John Wiley and Sons, New York, USA. Han J and
Kamber M (2001) Data Mining Concepts and
Techniques, Morgan Kaufmann Publishers. Witten I
and Frank E (1999) Data Mining Practical Machine
Learning Tools and Techniques with Java, Morgan
Kauffman Publishers, CA, USA.
5
Investigative Data Mining and Problems in Fraud
Detection
6
Investigative Data Mining - Definitions
  • Investigative
  • Official attempt to extract some truth, or
    insights, about criminal activity from data
  • Data Mining
  • Process of discovering, extracting and analysing
    of meaningful patterns, structure, models, and
    rules from large quantities of data.
  • Spans several research areas such as database,
    machine learning, neural networks, data
    visualisation, statistics, and distributed data
    mining.
  • Investigative Data Mining
  • Applied to law enforcement,
  • Industry, and
  • Private databases

7
Fraud Detection - Definitions
  • Fraud
  • Criminal deception, use of false representations
    to obtain an unjust advantage, or to injure the
    rights and interests of another
  • Diversity of Fraud
  • Against organisations, governments, and
    individuals
  • Committed by external parties, internal
    management, and non-management employees
  • Caused by customers, service providers, and
    suppliers
  • Prevalent in insurance, credit card, and
    telecommunications
  • Most common in automobile, travel, and household
    contents
  • Cost of Fraud
  • Automobile insurance fraud alone AUD32 million
    for nine Australian companies

8
Fraud Detection Problems - Technical
  • Imperfect data
  • Usually not collected for data mining
  • Inaccurate, incomplete, and irrelevant data
    attributes
  • Highly skewed data
  • Many more legitimate than fraudulent examples
  • Higher chances of overfitting
  • Black-box predictions
  • Numerical outputs incomprehensible to people

9
Fraud Detection Problems - Practical
  • Lack of domain knowledge
  • Important attributes, likely relationships, and
    known patterns
  • Three types of fraud offenders and their modus
    operandi
  • Great variety of fraud scenarios over time
  • Soft fraud Cost of investigation gt Cost of
    fraud
  • Hard fraud Circumvents anti-fraud measures
  • Assessing data mining potential
  • Predictive accuracy are useless for skewed data
    sets

10
Existing Fraud Detection Methods
11
Widely Used Methods in Fraud Detection
  • Insurance Fraud
  • Cluster detection -gt decision tree induction -gt
    domain knowledge, statistical summaries, and
    visualisations
  • Special case neural network classification -gt
    cluster detection
  • Credit Card Fraud
  • Decision tree and naive Bayesian classification
    -gt stacking
  • Telecommunications Fraud
  • Cluster detection -gt scores and rules

12
The Crime Detection Method
13
Comparisons with Minority Report
  • Precogs
  • Foresee and prevent crime
  • Each precog contains multiple classifiers
  • Integration Mechanisms
  • Combine predictions
  • Analytical Machinery
  • Record, study, compare, and represent predictions
    in simple terms
  • Single computer
  • Visual Symbols
  • Explain the final predictions
  • Graphical visualisations, numerical scores, and
    descriptive rules

14
The Crime Detection Method
15
Classifiers as Precogs
  • Precog One Naive Bayesian Classifiers
  • Statistical paradigm
  • Simple and Fast
  • Redundant and not normally distributed
    attributes
  • Precog Two C4.5 Classifiers
  • Computer metaphor
  • Explain patterns and quite fast
  • Scalability and efficiency issues
  • Precog Three Backpropagation Classifiers
  • Brain metaphor
  • Long training times and extensive parameter
    tuning
  • Advantages and disadvantages
  • For details on how the problems were tackled,
    please refer to the thesis

16
Combining Output as Integration Mechanisms
  • Cross Validation
  • Divides training data into eleven data partitions
  • Each data partition used for training, testing,
    and evaluation once
  • Slightly better success rate
  • Bagging
  • Unweighted majority voting on each example or
    instance
  • Combine predictions from same algorithm or
    different algorithms
  • Increases success rate
  • For details on how the technique works, please
    refer to the thesis

1 2 3 4 5 6 7 8 9 10 11 Main Prediction
fraud fraud legal fraud legal fraud legal fraud fraud legal fraud fraud
fraud fraud fraud legal legal fraud legal legal legal fraud legal legal
17
Combining Output as Integration Mechanisms
  • Stacking
  • Meta-classifier
  • Base classifiers present predictions to
    meta-classifier
  • Determines the most reliable classifiers
  • For details on how the technique works, please
    refer to the thesis

18
Combining Output as Integration Mechanisms
  • Stacking (2)

19
Cluster Detection as Analytical
MachineryVisualisation Techniques as Visual
Symbols
  • Analytical Machinery Self Organising Maps
  • Clusters high dimensional elements into more
    simple, low dimensional maps
  • Automatically groups similar instances together
  • Do not specify an easy-to-understand model
  • Visual Symbols Classification and Clustering
    Visualisations
  • Classification visualisation confusion matrix
  • - naive Bayesian visualisation
  • Clustering visualisation - column graph
  • For details on how the problems were tackled,
    please refer to the thesis

20
Steps in the Crime Detection Method
21
Implementing the Crime Detection
SystemPreparation Component
22
The Crime Detection System
23
The Crime Detection System Preparation Component
  • Problem Understanding
  • Determine investigation objectives
  • - Choose
  • - Explain
  • Assess situation
  • - Available tools
  • - Available data set
  • - Cost model
  • Determine data mining objectives
  • - Max hits/Min false alarms
  • Produce project plan
  • - Time
  • - Tools
  • For details, refer to the thesis

24
The Crime Detection System Preparation Component
  • Data Understanding
  • Describe data
  • - 11550 examples (1994 and 1995)
  • - 3870 instances (1996)
  • - 33 attributes
  • - 6 fraudulent
  • Explore data
  • - Claim trends by month
  • - Age of vehicles
  • - Age of policy holder
  • Verify data
  • - Good data quality
  • - Duplicate attribute, highly skewed attributes

25
The Crime Detection System Preparation Component
  • Data Preparation
  • Select data
  • - All, except one attribute, are retained for
    analysis
  • Clean data
  • - Missing values replaced
  • - Spelling mistakes corrected
  • Format data
  • - All characters converted to lowercase
  • - Underscore symbol

26
The Crime Detection System Preparation Component
  • Data Preparation
  • Construct data
  • - Derived attributes
  • - weeks_past
  • - is_holidayweek_claim
  • - age_price_wsum
  • - Numerical input
  • - 14 attributes scaled between 0 and 1
  • - 19 attributes represented by one-of-N or
    binary encoding
  • For details, refer to the thesis

27
The Crime Detection System Preparation Component
  • Data Preparation
  • Partition data
  • - Data multiplication or oversampling
  • - For example, 50/50 distribution

28
Implementing the Crime Detection SystemAction
Component
29
The Crime Detection System Action Component
  • Modelling
  • Generate experiment design (1)

Experiment Number Technique or Algorithm Data Distribution
I Naive Bayes 50/50
II Naive Bayes 40/60
III Naive Bayes 30/70
IV Backpropagation Determined by Experiments I, II, III
V C4.5 Determined by Experiments I, II, III
VI Bagging -
VII Stacking -
VIII Stacking and Bagging -
IX Backpropagation 5/95
X Self Organising Map 5/95
30
The Crime Detection System Action Component
  • Modelling
  • Generate experiment design (2)

Test A B C D E F G H I J K Overall Success Rate
Training Set Partition 1 2 3 4 5 6 7 8 9 10 11  
Testing Set Partition 2 3 4 5 6 7 8 9 10 11 1  
Evaluation Set Partition 3 4 5 6 7 8 9 10 11 1 2  
Evaluating Success Rate A B C D E F G H I J K Average W
Bagging Predictions A B C D E F G H I J K Bagged X
Producing Classifier 1 2 3 4 5 6 7 8 9 10 11  
Scoring Set Success Rate A B C D E F G H I J K Average Y
Bagging Main Score Predictions A B C D E F G H I J K Bagged Z
31
The Crime Detection System Action Component
  • Modelling
  • Build models (1)
  • - Bagged X outperformed Averaged W
  • - Bagged Z performed marginally better than
    Averaged Y
  • - Experiment II achieved highest cost savings
    than I and III
  • - 40/60 distribution most appropriate under the
    cost model
  • - Experiment V achieved highest cost savings
    than II and IV
  • - C4.5 algorithm is the best algorithm for the
    data set

32
The Crime Detection System Action Component
  • Modelling
  • Build models (2)
  • - Experiment VIII achieved slightly better cost
    savings than V
  • - Combining models from different algorithms is
    better than the single algorithm
  • - The top 15 classifiers from stacking consisted
    of 9 C4.5, 4 backpropagation, and 2 naive
    Bayesian classifiers
  • For details, refer to the thesis

33
The Crime Detection System Action Component
  • Modelling
  • Build models (3)
  • - No scores from D2K software
  • - Experiment IX demonstrates sorted scores and
    predefined thresholds result in focused
    investigations
  • - Satisfies Paretos Law
  • - Rules did not provide insights
  • - Already in domain knowledge and data attribute
    exploration
  • - Experiment X requires 5 clusters for
    visualisation
  • - age_of_policyholder
  • - weeks_past, is_holidayweek_claim
  • - make, accident_area, vehicle_category,
    age_price_wsum, number_of_cars, base_policy
  • For details, refer to the thesis

34
The Crime Detection System Action Component
  • Modelling
  • Assess models (1)
  • - Training and score data sets too small
  • - Students t-test with k-1 degrees of freedom
  • - McNemars hypothesis test
  • For details, refer to the thesis

Rank Experiment Number Technique or Algorithm Cost Savings Overall Success Rate Percentage Saved
1 VIII Stacking and Bagging 167,069 60 29.71
2 V C4.5 40/60 165,242 60 29.38
3 VI Bagging 127,454 64 22.66
4 VII Stacking 104,887 70 18.65
5 II Naive Bayes 40/60 94,734 70 16.85
6 IX Backpropagation 5/95 89, 232 75 15.87
7 IV Backpropagation 40/60 -6,488 92 -1.15
35
The Crime Detection System Action Component
  • Modelling
  • Assess models (2)
  • - Clusters 1, 2, and 3 have higher occurrences
    of fraud in 1996
  • - Clusters 1, 3, and 5 consist of several makes
    of inexpensive cars
  • - Utility vehicles, rural areas, and liability
    policies
  • - Clusters 2 and 4 contain claims submitted many
    weeks after the accidents
  • - Toyota, sport cars, and multiple policies

Cluster Number of instances Descriptive Cluster Profile
1 215 Cluster 1 contains a large number of 21 to 25 year olds. The insured vehicles are relatively new.
2 166 Cluster 2 also contains a large number of 21 to 25 year olds. The claims are usually reported 10 weeks past the accident. The insured vehicles are usually sport cars.
3 268 Cluster 3 has almost all 16 to 17 year old fraudsters. The insured vehicles are mainly Acuras, Chevrolets, and Hondas. The insured vehicles are usually utility cars.
4 103 Cluster 4 has claims are usually reported 20 weeks past the accident. Almost all insured cars are Toyotas and the fraudster has a high probability of getting 3 to 4 cars insured. Claims are unlikely to be submitted during holiday periods.
5 171 Cluster 5 consists of mainly Fords, Mazdas, and Pontiacs. Higher chances of rural accidents and the base policy type are likely to be liability.
36
The Crime Detection System Action Component
  • Modelling
  • Assess models (3)
  • - Statistical evaluation of descriptive cluster
    profiles
  • - Cluster 4
  • - 3121 Toyota car claims, 6 or 187 fraudulent
  • - 2148 Toyota sedan car claims, expect 6 or 129
    to be fraudulent with 10 standard deviation
  • - Actual 171 fraudulent Toyota sedan car claims,
    z-score of 3.8 standard deviation
  • - This is an insight because it is statistically
    reliable, not known previously, and actionable

Cluster Group Claims No. and of Fraud Sub-Group Claims Expected No. of Fraud Actual No. of Fraud z-Score
1 All claims 15420 923 (6) 21 to 25 year olds 108 2 16 5
2 Sport cars 5358 84 (1.6) 21 to 25 year olds Sport cars 32 1 10 9.5
3 16 to 17 year olds 320 31 (9.7) Honda 16 to 17 year olds 31 3 31 9.3
37
The Crime Detection System Action Component
  • Modelling
  • Assess models (4)
  • - Append main predictions from 3 algorithms and
    final predictions from bagging to 615 fraudulent
    instances
  • - 25 cannot be detected by any algorithms,
    highest lift in Clusters 1 and 2
  • - All can be detected by at least 1 algorithm in
    Cluster 3
  • - Not all fraudulent instances can be detected
  • - Domain knowledge, cluster detection, and
    statistics offer explanation
  • - 101 cannot be detected by 2 algorithms
  • - Weakness of bagging
  • - Other alternatives

38
The Crime Detection System Action Component
  • Evaluation
  • Evaluate results
  • - Experiment VIII generate the best predictions
    with cost savings of about 168, 000. This is
    almost 30 of total cost savings possible
  • - Most statistically reliable insight is the
    knowledge of 21 to 25 year olds who drive sport
    cars
  • Review process
  • - Unsupervised learning to derive clusters first
  • - More training data partitions
  • - More skewed distributions
  • - Cost model too simplistic
  • - Probabilistic Neural Networks

39
The Crime Detection System Action Component
  • Deployment
  • Plan deployment
  • - Manage geographically distributed databases
    using distributed data mining
  • - Take time into account
  • Plan monitoring and maintenance
  • - Determined by rate of change in external
    environment and organisational requirements
  • - Rebuild models when cost savings are below a
    certain percentage of maximum cost savings
    possible

40
Contributions and Recommendations
41
Contributions
  • New Crime Detection Method
  • Crime Detection System
  • Cost Model
  • Visualisations
  • Statistics
  • Score-based Feature
  • Extensive Literature Review
  • In-depth Analysis of Algorithms

42
Recommendations Technical Problems
  • Imperfect data
  • Statistical evaluation and confidence intervals
  • Preparation component of crime detection system
  • Derived attributes
  • Cross validation
  • Highly skewed data
  • Partitioned data with most appropriate
    distribution
  • Cost model
  • Black-box predictions
  • Classification and clustering visualisation
  • Sorted scores and predefined thresholds, rules

43
Recommendations Practical Problems
  • Lack of domain knowledge
  • Action component of crime detection system
  • Extensive literature review
  • Great variety of fraud scenarios over time
  • SOM
  • Crime detection method
  • Choice of algorithms
  • Assessing data mining potential
  • Quality and quantity of data
  • Cost model
  • z-scores

44
Transforming Minority Report from Science
Fiction to Science Fact
INVESTIGATIVE DATA MINING IN FRAUD DETECTION
  • 1 INTRODUCTION
  • The world is overwhelmed with terabytes of
    data
  • but there are only few effective and efficient
    ways to analyse and interpret it.
  • The purpose of the research is to simulate the
    Precrime System from the science fiction novel,
    Minority Report, using data mining methods and
    techniques, to extract insights from enormous
    amounts of data to
  • detect white-collar crime
  • The application is in uncovering fraudulent
    claims in automobile insurance
  • The objectives are to overcome the technical
    and practical problems of data mining in fraud
    detection
  • 3 RESULTS ON AUTOMOBILE INSURANCE DATA
  • Through the use of integration mechanisms, the
    highest cost savings is achieved
  • The analytical machinery facilitated the
    interesting discovery of 21 to 25 year old
    fraudsters who used sport cars as their crime
    tool
  • 4 DISCUSSION
  • Black-box approach from the precogs are
    transformed into a
  • semi-transparent approach
  • by using analytical machinery and visual symbols
    to analyse and interpret the predictions
  • Precogs can be
  • shared between organisations
  • to increase the accuracy of the predictions,
    without violating competitive and legal
    requirements
  • The analytical machinery transforms
    multidimensional data into two-dimensional
    clusters which contain similar data to enable the
    data analyst to easily
  • differentiate the groups of fraud. It also allows
    the data analyst to
  • assess the algorithms ability
  • to cope with evolving fraud
  • The crime detection method provides a flexible
    step-by-step approach
  • to generating predictions from any three
    algorithms, and uses some form of integration
    mechanisms to increase the likelihood of correct
    final predictions
  • Precogs, or precognitive elements, are entities
    which have the knowledge to predict that
    something will happen. Figure 1 uses three
    precogs to foresee and prevent crime by stopping
    potentially guilty criminals
  • Each precog contains multiple classification
    models, or classifiers, trained with one data
    mining technique to extrapolate the future
  • The three precogs are different from each
    other because they are trained by different data
    mining algorithms. For example, the first,
    second, and third precog are trained using naive
    Bayesian, C4.5, and backpropagation algorithms.
  • The precogs require numerical inputs of past
    examples to output corresponding predictions for
    new instances

2 THE CRIME DETECTION METHOD
  • Integration Mechanisms are needed. As each
    precog outputs its many predictions for each
    instance, all are counted and the class with the
    highest tally is chosen as the main prediction
  • Figure 1 shows that the main predictions can
    be combined either by majority count (bagging) or
    the predictions can be fed back into one of the
    precogs (stacking), to derive a final prediction
  • 5 CONCLUSION
  • Other possible applications of this crime
    detection method are
  • Anti-terrorism
  • Burglary
  • Customs declaration fraud
  • Drug-related homocides
  • Drug smuggling
  • Government financial transactions
  • Sexual offences

Figure 1 Predictions using Precogs, Analytical
Machinery, and Visual Symbols
  • Analytical Machinery, or cluster detection,
    records, studies, compares, and represents the
    precogs predictions in easily understood terms
  • The analytical machinery is represented by the
    Self Organising Map (SOM) which clusters the
    similar data into groups
  • Figure 1 demonstrates that main predictions
    and final predictions are appended to the
    clustered data to determine the fraud
    characteristics which cannot be detected, and the
    most important attributes are selected for
    visualisation
  • Scores are numbers with a specified range,
    which indicates the relative risk that a
    particular data instance maybe fraudulent, to
    rank instances
  • Rules are expressions in the form of Body ?
    Head, where Body describes the conditions under
    which the rule is generated and Head is the class
    label
  • Visual Symbols, or visualisations, integrate
    human perceptual abilities in the data analysis
    process by presenting the data in some visual and
    interactive form
  • The naive Bayesian and C4.5 visualisations
    facilitate analysis of classifier predictions and
    performance, and column graphs aid the
    interpretation of clustering results
  • REFERENCES
  • Dick P K (1956) Minority Report, Orion Publishing
    Group, London, Great Britain.
  • Done by Clifton Phua for Honours 2003
  • Supervised by Dr. Damminda Alahakoon

45
Questions?
Write a Comment
User Comments (0)
About PowerShow.com