Title: CS490D: Introduction to Data Mining Prof. Chris Clifton
1CS490DIntroduction to Data MiningProf. Chris
Clifton
- April 14, 2004
- Fraud and Misuse Detection
2What is Fraud Detection?
- Identify wrongful actions
- Is right and wrong universal?
- If so, why not just prevent wrong actions
- Identify actions by the wrong people
- Identify suspect actions
- Legal
- But probably not right
3In Data Mining terms
- Classification?
- Classify into fraudulent and non-fraudulent
behavior - What do we need to do this?
- Outlier Detection
- Assume non-fraudulent behavior is normal
- Find the exceptions
- Problems?
4Solution Differential Profiling
- Determine individual behavior
- What is normal for the individual
- What separates one individual from another
- Gives profile of individual behavior
- How do we do this?
Classification Mining
5Has this been done?Intrusion Detection
(LaneBrodley)
- Profiled computer users based on command
sequences - Command
- Some (but not all) argument information
- Sequence information
6ResultsAccuracy Time to Alarm
7Scaling Issues
- What happens with millions of users?
- Credit card
- Cell phone
- What about new users?
- Ideas?
8Multi-user profiles
- Cluster users
- Develop profiles for clusters
- E.g., differential profiling
- Old customers Do they match profile for their
cluster? - Allows wider range of acceptable behavior
- New customer Do they match any profile?
9Data mining for detection and prevention
10Data mining defined
- The process of discovering meaningful new
relationships, patterns and trends by sifting
through data using pattern recognition
technologies as well as statistical and
mathematical techniques. - - The Gartner Group
11Matching known fraud/non-compliance
- Which new cases are similar to known cases?
- How can we define similarity?
- How can we rate or score similarity?
12Anomalies and irregularities
- How can we detect anomalous or unusual behavior?
- What do we mean by usual?
- Can we rate or score cases on their degree of
anomaly?
13Data mining is not
- Blindapplication of analysis/modeling
algorithms - Brute-force crunching of bulk data
- Black box technology
- Magic
14How do you mine data?
- Use the Cross Industry Standard Process for Data
Mining (CRISP-DM) - Based on real-world lessons
- Focus on business issues
- User-centric interactive
- Full process
- Results are used
15Techniques used to identify fraud
- Predict and Classify
- Regression algorithms (predict numeric outcome)
neural networks, CART, Regression, GLM - Classification algorithms (predict symbolic
outcome) CART, C5.0, logistic regression
- Group and Find Associations
- Clustering/Grouping algorithms K-means,
Kohonen, 2Step, Factor analysis - Association algorithms apriori, GRI, Capri,
Sequence
16Techniques for finding fraud
- Predict the expected value for a claim, compare
that with the actual value of the claim. - Those cases that fall far outside the expected
range should be evaluated more closely
17Techniques for finding fraud
Decision Trees and Rules
- Build a profile of the characteristics of
fraudulent behavior. - Pull out the cases that meet the historical
characteristics of fraud.
18Techniques for finding fraud
Clustering and Associations
- Group behavior using a clustering algorithm
- Find groups of events using the association
algorithms - Identify outliers and investigate
19Fraud detection using CRISP-DM
- Provides a systematic way to detect fraud and
abuse - Ensures auditing and investigative efforts are
maximized - Continually assesses and updates models to
identify new emerging fraud patterns - Leads to higher recoupments
20Data mining in action Fraud, waste and
abusecase studies
21How can data mining help?
- Payment error prevention
- Billing and payment fraud
- Audit selection
22Payment Error Prevention
The US Health Care Finance Administration needed
to isolate the likely causes of payment error by
developing a profile of acceptable billing
practices and...
used this information to focus their auditing
effort
23Payment error prevention solution
- Clementine
- Using audited discharge records, built profiles
of appropriate decisions such as diagnosis coding
and admission - Matched new cases
- Cases not matching are audited
24Payment error prevention results
- Detected 50 of past incorrect payments
resulting in significant recovery of funding lost
to payment errors - PRO analysts able to use resultant Clementine
models to prevent future error
25Billing and payment fraud
The US Defense Finance and Accounting Service
needed to find fraud in millions of Dept of
Defense transactions and...
Identified suspicious cases to focus
investigations
26Billing and payment fraud solution
- Clementine
- Detection models based on known fraud patterns
- Analyzed all transactions scored based on
similarity to these known patterns - High scoring transactions flagged for
investigation
27Billing and payment fraud results
- Identified over 1,200 payments for further
investigation - Integrated the detection process
- Anomaly detection methods (e.g., clustering) will
serve as sentinel systems for previously
undetected fraud patterns
28Audit selection
The Washington State Department of Revenue needed
to detect erroneous tax returns and...
Focused audit investigations on cases with the
highest likely adjustments
29Audit selection solution
- Clementine
- Using previously audited returns
- Model adjustment (recovery) per auditor hour
based on return information - Models will then score future returns showing
highest potential adjustment
30Audit selection results
- Maximizes auditors time by focusing on cases
likely to yield the highest return - Closes the tax gap
31Data mining - key to detecting and preventing
fraud, waste and abuse
- Learn from the past
- High quality, evidence based decisions
- Predict
- Prevent future instances
- React to changing circumstances
- Models kept current, from latest data