Predicting with Sparse Data - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Predicting with Sparse Data

Description:

Predicting with Sparse Data – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 25
Provided by: martins90
Category:
Tags: data | in | predicting | slim | sparse

less

Transcript and Presenter's Notes

Title: Predicting with Sparse Data


1
Predicting with Sparse Data
  • Martin Shepperd
  • Empirical Software Engineering Research Group
  • Bournemouth University
  • email mshepper_at_bmth.ac.uk
  • http//dec.bmth.ac.uk/ESERG/

2
Agenda
  • 1. Background
  • 2. Data Problems
  • 3. Pairwise methods
  • 4. Results
  • 5. Future avenues

3
1. Background
  • Software developers need to predict, e.g.
  • effort, duration, number of features
  • defects and reliability

But ...
  • noise and change
  • complex interactions between variables
  • poorly understood phenomena
  • little systematic data

4
Effort Prediction Systems
  • off the shelf e.g. COCOMO
  • DIY models
  • machine learning
  • case based reasoning
  • neural nets
  • rule induction
  • genetic programming

5
Problems with Off the shelf models
  • Model Researcher MMRE
  • Basic COCOMO Kemerer 601
  • FP Kemerer 103
  • SLIM Kemerer 772
  • ESTIMACS Kemerer 85
  • COCOMO Miyazaki Mori 166
  • Intermediate COCOMO Kitchenham 255

6
A DIY Model
Predicting effort using number of files
7
Case Based Reasoning
8
Sensitivity Analysis
9
So Where Are We?
  • A major research topic
  • Poor results off the shelf
  • Accuracy improves with calibration but still
    mixed
  • Majority of techniques needs accurate local data
  • BUT data is problematic

10
Data Problems (1)
What does a person hour of effort
mean? training sickness unpaid overtime when does
a project start and finish politics etc
11
Data Problems (2)
Shortage of data - projects can be
infrequent Heterogeneity Obsolescence Trustworthin
ess
12
3. Pairwise Methods
therefore a subjective technique is
unavoidable. Significant evidence that pairwise
judgements are more reliable than ranking /
judging groups. Only method was a proprietory
method (SSM from Lockheed). A general purpose
technique for decision making is Analytic
Hierarchy Processing (AHP) so ...
13
Analytic Hierarchy Processing
  • Multi-criteria decision making technique
  • From decision / management sciences
  • Decision has n alternatives (elements) and m
    criteria
  • Make pairwise comparisons

14
AHP Matrices
Each judgement reflects perceived ratio of the
relative contributions of elements i and j to the
overall component. Assume that there are n
elements, then we require n(n-1))/2 pairwise
judgements to complete the matrix. Implies
redundancy since n(n-1))/2 gtn-1 for ngt2. So aij
(wi / wj), subject to the following constraints
aij gt 0 aii1 and aij (1 / aji).
15
Example AHP Matrix
Subjective pairwise comparisons (A-B, A-C, B-C)
of ratio of contributions.
16
AHP and Effort Prediction
Our sparse data method (SDM) requires the
following steps
  • Divide project into n sub components or sub tasks
    (n 1).
  • Need a minimum of one reference component R.
  • Use AHP to find contribution to overall (i.e.
    project R).
  • Solve for all other components since R is known.

17
DataSalvage
  • Developed a PC based software tool
  • develop hierarchy
  • make and edit pairwise comparisons
  • add a reference component(s)
  • calculate consistency
  • generate predictions

18
http//dec.bmth.ac.uk/ESERG/DataSalvage
19
4. Some Results
BT Dataset Dataset2 Initial size 18 21 Projects
removed 4 1 Final dataset size 14 20 Min. project
(days) 97 109 Max. project (days) 573 912
20
Expert v SDM
Absolute residuals
Technique Mean Median Min Max SDM 134.8 58.5 1 56
6 Expert 139.3 70 5 571
Wilcoxon Signed Rank test rejected H0 in favour
of Ha that the SDM median error is sig. (?0.1)
smaller than the expert unaided p 0.061.
21
Robustness
22
Experience with Users (1)
Student project group effort
Project Predicted Actual prototype 23.88 n/a 1
20 TFS 76.12 382.51 318.5
23
Experience with Users (2)
  • Project manager at BT
  • Found pairwise comparisons easy
  • Liked the technique and tool
  • Found identifying criteria and weights difficult
    leading to poor predictions

24
5. Future Avenues
  • Great need for useful prediction systems
  • Cannot always assume high quality systematic data
    is available
  • We have shown our sparse data method to have
    potential on industrial data
  • Unresolved issues include the choice of reference
    task and attribute hierarchy
Write a Comment
User Comments (0)
About PowerShow.com