Title: A Systematic Review of Software Development Cost Estimation Studies
1A Systematic Review of Software Development Cost
Estimation Studies
- Authors Magne Jorgeson Martin Shepperd
- Source IEEE Transactions on Software
Engineering, Volume 33, Issue No. 1 - Date January 2007
- Presented by Adriana Ogasawara Joshua Mahaz
2Introduction
- Purpose of this study was to improve software
estimation research through the examination of
previous cost estimation studies - In short, the authors examined 304 journal papers
by hand - Classified them according to a set of criteria
they devised based on estimation topics,
estimation approach, research approach, study
context, and data set - Observed trends in the data
- And based on this, they made suggestions on how
software cost estimation might be improved
3Introduction
- According to the authors, this paper
distinguishes itself from others in that - Aim is to direct future research, not discuss
specific estimation models - More comprehensive and systematic review
- Classification of studies used is unique to this
study
4Research Questions
5Inclusion Identification of Papers
- Systematic search for papers by hand
- Issue by issue starting with Volume 1
- Read titles and abstracts of all published papers
from over 100 potentially relevant, peer reviewed
journals published in English - Ended up with 304 papers from 76 journals
- Journals were found through
- Reading reference lists on cost estimation papers
- Internet searches
- Authors prior experience
6Classification of Papers
- To facilitate answering the 8 research questions,
papers were classified according to the
following - Research topic (estimation method, calibration of
models, etc) - Estimation approach (regression, analogy, etc)
- Research approach (theory, survey, etc)
- Study context (student projects, professional
projects, etc) - Data sets
7Classification of Papers
- Initial classification was performed by one
author - Robustness of the classification was performed by
second author - Tested a random sample of 30 papers (10)
- Results of the classification testing showed the
initial categories were too vague - Disagreements on 39 of the classifications
8Classification of Papers
- Most of the disagreements were due to different
interpretations of the classification categories - Only approx. 3 of the papers were blatantly
misclassified - Authors agreed the classification categories were
accurate enough as long as - They clarified the descriptions of the vague
categories (12 total) - Reclassified the papers that fell into the vague
categories - Out of the 109 papers that were reread, only 21
were reclassified
9Results
10RQ1 Which and how many journals include papers
on software cost estimation?Method Determine
which journals are the 10 most relevant by the
proportion of cost estimation papers they
containSupport cost estimation researchers with
a list of journals with potentially relevant
papers.
Research Questions
11RQ1 (contd)
- Found 76 journals with SW cost estimation papers
- The top 10 journals still only included 2/3 of
all the identified papers used in this study
12Research Question
- RQ2 To what extent are researchers aware of the
breadth of potential estimation study sources? - Method Reference lists from 30 randomly selected
cost estimation journal papers were analyzed. - Identify possible shortcomings of cost estimation
researchers searchers for related work.
13RQ2 (contd)
Out of the top 10 most important software cost
estimation journals, on average, only three are
referenced in a typical paper. And 7 out of the
10 were referenced in 3 or less of papers.
14RQ2 (contd)
15RQ2 (contd)
Arrows define information flow
16RQ2 (contd)
Journals outside the software engineering field,
though with highly relevant cost estimation data,
were practically ignored.
Intl Journal of Forecasting
Intl Journal of Project Management
Statistics
17Research Question
- RQ3 Which journal is the dominant SW cost
estimation journal? To what extent does this
journal have research topic biases? - Method Identify which journal contains
- A) the most cost estimation papers
- B) the most references
- Dominant journals have the potential to
introduce publication biases wherein a
researchers focus may be directed towards topics
favored by the journal
18RQ3 (contd)
19RQ3 (contd)
- After comparing the distribution of topics within
IEEE TSE with the total set of estimation papers - From a high level, IEEE TSE cost estimation
papers reflect the total set of estimation papers
quite well - However the authors have no information on papers
rejected by IEEE TSE so publication biases
might still exist that arent readily apparent
20Research Question
- RQ4 How easy is it to identify relevant software
cost estimation journal papers? (Using digital
libraries.) - Method Identify the recall rate of cost
estimation papers in Google Scholars and Inspec
using search terms - software cost estimation OR software effort
estimation - software AND (cost OR effort)
- A manual issue-by-issue search of papers is
accurate, but very time consuming and should be
replaced with an automated tool.
21RQ4 (contd)
22RQ4 (contd)
- However, the most typical reason for the missing
papers was due to use of more specific search
terms or substituting synonyms for estimation
and software - Need for standardized use of search keywords
- Authors suggest that a sufficiently wide search
for cost estimation papers with digital libraries
can result in a greater workload than a manual
search - software AND (cost OR effort) resulted in
278,000 records in Google Scholar
23The Bigger Picture of RQs 1-4
- Researchers need to increase the breadth of their
search for relevant studies - Not sufficient to conduct searches in digital
libraries or manual searches of the most
important journals - Where completeness is essential to research,
manually search for papers in a selected set of
journals - Where completeness is not essential, combine
manual searches and digital libraries
24RQ5 How many researchers are there who have a
long term interest in software cost estimation?
To what extent do the interests of these
researchers affect the distribution of research
topics?Method Gather data on the authors of
the different journal papers including papers
published, recent activity, and topics
covered.Assess the vulnerability of software
cost estimation research.
Research Questions
25RQ5 (contd)
- Only 13 researchers with more then 5 journal
papers published. - 9 out of 13 are still active with publications
between 2000 2004. - The ratio of research topics/estimation
approaches to active researchers is high. - The active researchers are generally covering a
wide spectrum of topics.
26RQ5 (contd)
- With so few researchers with a long-term focus,
topics requiring wide breadths of experience are
at risk. - Measures of estimation performance
- Data set properties
- Both require long-term experience but are also
essential for creating methods for meaningful
analysis and evaluation of cost estimation
techniques.
27Research Question
- RQ6 What are the most investigated software cost
estimation research topics and how has this
changed over time? - Method Separate papers by those published in
1989 and earlier, 1990-1999, 2000-2004 and sort
them by research topic (ex. Estimation method,
organizational issue, measure of performance) - Identify trends in papers and any shortcomings in
research topic focus
28RQ6 (contd)
29RQ6 (contd)
30Research Question
- RQ7 What are the most investigated estimation
methods and how has this changed over time? - Method Separate papers by those published in
1989 and earlier, 1990-1999, 2000-2004 and sort
them by estimation topic (ex. Regression,
analogy, expert judgement) - Identify trends in papers and any shortcomings in
research topic focus
31RQ7 (contd)
32RQ7 (contd)
33Research Question
- RQ8What are the most frequently applied research
methods, and in what study context? How has this
changed over time? - Method Examine papers that proposed a new
estimation method or evaluated an existing
approach. - Identify trends and shortcomings in research
topic focus.
34RQ8 (contd)
35RQ8 (contd)
- Historical data does not contain the same
realism as industry data - Evaluation of historical data depends on the
availability of the data since not all companies
keep this information - The lack of inclusion of conference papers from
professionals and professional projects is an
important shortcoming in cost estimation research
36Pros Of This Study
- Eight research questions used to root out the key
underlying trends in software development cost
estimation are the greatest strength of this
paper - Uploading all the papers reviewed for the study
into a freely available database providing a
dense source of relevant information - Use of the authors cost estimation database to
help younger researchers leap frog into more
complex topics sooner
37Cons Of This Study
- The extent of the material excluded from the
study was the papers resounding technical
weakness - Final system of classification lead to an
accuracy sufficiently high only one authors
results from reclassification were provided . . .
leaving one to wonder what the percentage change
was between the first and second review - Exclusion of papers published by industry
conferences, which would include past
experiences, results, and real-life data from the
software industry itself - Papers from other fields focusing on cost
estimation were not used, which would have
provided a great opportunity to apply analogical
reasoning to the topic
38Thoughts
- We think software cost estimation research would
be better improved by striving for more
collaboration with software industry - Researchers need to move away from the more
redundant research elements and into lesser
studied topics such as those in the software
industry - The authors themselves mention the idea numerous
times but the lack of conformance on their own
parts hurts their argument - Collaboration with the software industry would
provide real-world data sets tenfold more
relevant and accurate than the historical data
sets that are so commonly referenced in papers on
the subject matter