Identifying Objects Using Cluster and Concept Analysis - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Identifying Objects Using Cluster and Concept Analysis

Description:

Legacy code incomprehensible. Lack of structure. Case: 100,000 LOC Banking System ... Only records written to/read from file. Refine by CRUD (Create/Read ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 25
Provided by: tobiask9
Category:

less

Transcript and Presenter's Notes

Title: Identifying Objects Using Cluster and Concept Analysis


1
Identifying Objects Using Cluster and Concept
Analysis
  • Arie van Deursen
  • Tobias Kuipers
  • CWI, The Netherlands

2
Motivation
  • Legacy code incomprehensible
  • Lack of structure
  • Case gt100,000 LOC Banking System
  • Cobol VSAM data files
  • Customer wanted OO redesign
  • Data central to the system

3
General Plan
  • Find interesting data
  • Data selection
  • Candidate attributes
  • Find interesting functionality
  • Program selection (procedure)
  • Candidate methods
  • Combine the two
  • Candidate classes

4
Input Selection
  • Domain related v. Implementation specific
  • Persistent data stores
  • Only records written to/read from file
  • Refine by CRUD (Create/Read/Update/Delete)
  • Records too big for one class
  • Analysis of Program Call Graph
  • high fan-out control-programs
  • high fan-in low-level technical

5
Combining Data Functionality
  • Cluster analysis -- technique for finding groups
    in data
  • Relies on metrics to compare distance between
    data items
  • Concept analysis -- for finding groups too
  • Relies on maximal subsets of data items sharing a
    set of features

6
Cluster Analysis
  • Calculate distance (similarity) number between
    all data items (record fields)
  • Use clustering to find hierarchy

7
Dendrogram
8
Dendrogram
9
Dendrogram
Distance is 1
10
Dendrogram
11
Dendrogram
12
Dendrogram
13
Dendrogram
14
Dendrogram from Real Data
Amount
Account
OfficeName BankCity IntAccount OfficeType PaymentK
ind RelationNr ChangeDate
MortSeqNr MortNr
TitleCd Prefix Initial
Name
ZipCd CountyCd StreetNr
City
Street
15
Concept Analysis
  • Relies on maximal subsets of data items sharing a
    set of features
  • Concept analysis finds a lattice

16
Concept Lattice
?
top
All Variables
bottom
17
Concept Lattice
?
top
All Variables
bottom
18
Concept Lattice
?
top
All Variables
P4
Number Nb-Ext Zipcode Street City
bottom
19
Concept Lattice
?
top
All Variables
P4
Number Nb-Ext Zipcode Street City
bottom
20
Real Concept Lattice
3
1 2
4
A B C D E F
5
G
H
M N O P
I J K L
7
6
Q
R
S
11 12
13 14
10
9
8
X
W
V
U
T
21
Concluding Remarks
  • Variable Selection - Input filtering
  • Records are natural starting point in
    data-intensive applications
  • Legacy/Cobol domain
  • Records are too big Decompose them
  • Cluster analysis v. Concept analysis

22
Cluster v Concept Analysis
  • Multiple partitionings
  • Clustering does not show all possibilities
  • Items in multiple groups
  • Features and clusters
  • Origin of cluster decision is lost
  • Concept more efficient computationally
  • Clustering needs more filtering

23
Questions
24
Current Approaches
  • Subsystem classification techniques
  • Survey, Lakhotia 97. Dont work for Cobol,
    Cimitile 99
  • Record as data part of a class
  • Newcomb Kotik (95) take level 01 records,
    Fergen et al (94) compare structure of records
    for reuse
  • Manual Methodology
  • Sneed (92) provides manual methodology for
    migration of code, Sneed Nyári (95) derive
    OO documentation from legacy.
Write a Comment
User Comments (0)
About PowerShow.com