PKDD Discovery Challenge (not only) on Financial Data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

PKDD Discovery Challenge (not only) on Financial Data

Description:

PKDD Discovery Challenge (not only) on ... short time for analysis (2-3 months) ... Workshop at PKDD to present the results and discus them with domain experts ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 21
Provided by: ber59
Category:

less

Transcript and Presenter's Notes

Title: PKDD Discovery Challenge (not only) on Financial Data


1
PKDD Discovery Challenge (not only) on Financial
Data
  • Petr Berka
  • Laboratory for Intelligent Systems
  • University of Economics, Prague
  • berka_at_vse.cz

2
Cups, Challenges, Competitions
  • KDD Cups (since 1997)
  • KDD Sisyphus at ECML 1998
  • PKDD Discovery Challenges (since 1999)
  • COIL Competition 2000
  • PAKDD Challenge 2000
  • PT Challenge 2000, 2001
  • JSAI KDD Challenge 2001
  • EUNITE Competition 2001, 2002
  • . . .

3
PKDD Discovery Challenge Idea
  • Realistic data mining conditions
  • collaborative rather then competitive nature
  • rather vague specification of the problem
  • Differences to real KDD projects
  • short time for analysis (2-3 months)
  • only indirect access to domain and data experts
    during KDD process

4
Challenge Settings
  • Data and their full description available on the
    web for all participants
  • Submissions evaluated by domain experts (but no
    ordering, no winners and losers)
  • Workshop at PKDD to present the results and
    discus them with domain experts
  • Results and comments of experts available on the
    web (after the workshop)

5
PKDD Challenges http//lisp.vse.cz/challenge
  • 1999, Prague
  • financial data, thrombosis data
  • 2000, Lyon
  • financial data, modified thrombosis data
  • 2001, Freiburg
  • modified thrombosis data
  • 2002, Helsinki
  • atherosclerosis data, hepatitis data

6
Financial Challenge Background
  • Czech bank offering private accounts
  • Available data for pilot study (29000 clients)
  • personal characteristics
  • basic info about accounts
  • transactions for three months
  • Proposed tasks
  • segmentation (defining different types of clients
    w.r.t. debt)
  • early detection of debts

7
Financial Challenge Data
8
Contributions
  • Method oriented
  • show a method/system working on the data
  • Problem oriented (prototype solutions)
  • loan and/or credit cards description
  • loan and/or credit cards classification
  • initial exploration
  • relation between branches
  • clients segmentation

9
Description of loans
  • Relations between loan category and account
    characteristics
  • Coufal et al, 1999 - GUHA
    Mikšovský et al, 1999 - EXCEL

10
Classification of loans
  • Detecting risky clients before they are granted a
    loan
  • Mikšovský et al, 1999 - C5.0
  • decision tree to find the relevance of attributes
  • decision tree for classification (using
    misclassification costs)

11
Credit Cards Promotion
  • Description - find characteristics of a card
    holder
  • deviation detection
  • Classification - predict score for card value
  • k-nearest neighbour
  • Putten, 1999

12
Clients Segmentation
  • Description - segmentation of clients according
    to transactions Hotho, Meadche,
    2000
  • Kohonen map decision trees
  • Rule 1 for Cluster 3
  • If ATTR5 gt 9945 and ATTR13 gt 0
  • Then -gt Cluster 3 (115, 0.983)

13
Challenge Organizing Lessons
  • To get and prepare real data is difficult
  • The time for analyzes should be as long as
    possible
  • The response rate was rather low ( 10)
  • No synergy effect observed

14
DM Lessons (1/4)
  • Cooperate with experts
  • domain experts
  • data experts
  • . . .
  • and with users

15
DM Lessons (2/4)
  • Use knowledge intensive preprocessing methods
  • compute age and sex from birth_number
  • set flags for different types of operations
  • compute monthly characteristics of transactions
    (sum, avg, min, max)
  • lbalance 1/30 ?i balance(i) ? days(i).

16
DM Lessons (3/4)
  • Make the results understandable
  • Werner, Fogarty 2001

17
DM Lessons (4/4)
  • Show some (even preliminary) results soon
  • experts are interested in solutions not in
    applying sophisticated methods

18
Discovery Challenge Benefits
  • Experts
  • deeper insight into the data
  • Participants
  • experience with analyzing large real data
  • motivations for further research
  • ML/KDD Community
  • prototype tasks/solutions (like the MiningMart
    project?)
  • Organizators
  • invitation to DMLL Workshop -)

19
Thank You
20
Contributions
Write a Comment
User Comments (0)
About PowerShow.com