SoftLab Bogazii University Department of Computer Engineering Software Engineering Research Lab http presentation

About This Presentation

Transcript and Presenter's Notes

Title: SoftLab Bogazii University Department of Computer Engineering Software Engineering Research Lab http

1
SoftLabBogaziçi University Department of
Computer EngineeringSoftware Engineering
Research Labhttp//softlab.boun.edu.tr/
2
Research Challenges

Trend to large, heterogenous, distributed sw
systems leads to an increase in system complexity
Software and service productivity lags behind
requirements
Increased complexity takes sw developers further
from stakeholders
Importance of interoperability, standardisation
and reuse of software increasing.

3
Research Challenges

Service Engineering
Complex Software Systems
Open Source Software
Software Engineering Research

4
Software Engineering Research Approaches

Balancing theory and praxis
How engineering research differs from scientific
research
The role of empirical studies
Models for SE research

5
The need to link research with practice
Colin Potts, Software Engineering Research
Revisited, IEEE Software, September 1993

Why after 25 years of SE has SE research failed
to influence industrial practice and the quality
of resulting software?
Potts argues that this failure is caused by
treating research and its application by industry
as separate, sequential activities.
What he calls the research-then-transfer
approach. The solution he proposes is the
industry-as-laboratory approach.
.

6
Research-then-Transfer
Research Solution V1
Problem V1
Wide gulf bridged by indirect, anecdotal knowledge
Research Solution V2

Problem V2
Problem V3
Research Solution V3
Problem V4
Research Solution V4
Technology transfer Gap bridged by hard, but
frequently inappropriate technology
Problem evolves
invisibly to the
research community
Incremental Refinement of research solutions
7
Research-then-Transfer Problems

Both research and practice evolve separately
Match between current problems in industry and
research solutions is haphazard
No winners

8
Disadvantages of Research-then-Transfer

Research problems described and understood in
terms of solution technology - whatever is
current research fashion. Connection to practice
is tenuous.
Concentration is on technical refinement of
research solution - OK but lacks industrial need
as focus, so effort may be misplaced.
Evaluation is difficult as research solutions may
use technology that is not commonly used in
industry
Delay in evaluation means problem researchers are
solving has often evolved through changes in
business practice, technology etc.
Transfer is difficult because industry has little
basis for confidence in proposed research
solution.

9
Industry-as-Laboratory Approach to SE research
Problem V1
Research Solution V1
Problem V2
Research Solution V2
Problem V3
Research Solution V3
Problem V4
Research Solution V4
10
Advantages of Industry-as-Laboratory Approach

Stronger connection at start because knowledge of
problem is acquired from the real practitioners
in industry, often industrial partners in a
research consortium.
Connection is strengthened by practitioners and
researchers constantly interacting to develop the
solution
Early evaluation and usage by industry lessens
the Technology Transfer Gap.
Reliance on Empirical Research
shift from solution-driven SE to problem-focused
SE
solve problems that really do matter to
practitioners

11
Early SEI industrial survey research

What a SEI survey learned from industry
There was a thin spread of domain knowledge in
most projects
Customer requirements were extremely volatile.
These findings point towards research combining
work on requirements engineering with reuse -
instead of the approach of researching these
topics by separate SE research communities - as
is still found today!
From A field study of the Software Development
Process
for Large Systems, CACM, November 1988.

12
Further Results from Potts et al Early 90s Survey

23 software development organizations (during
1990-92). (Survey focused on Requirements
Modeling process)
Requirements were invented not elicited.
Most development is maintenance.
Most specification is incremental.
Domain knowledge is important.
There is a gulf between the developer and user
User-interface requirements continually change.
There is a preference for office-automation tools
over CASE tools to support development. I.e.
developers found using a WP DB more useful
than any CASE tools.

13
Industry-as-Laboratory emphasizes Real Case
Studies

Advantages of case studies over studying problems
in research lab.
Scale and complexity - small, simple (even
simplistic) cases avoided - these often bear
little relation to real problems.
Unpredictability - assumptions thrown out as
researchers learn more about real problems
Dynamism - a real case study is more vital than
a textbook account
The real-world complications of industrial case
studies are more likely to throw up
representative problems and phenomena than
research laboratory examples influenced by the
researchers preconceptions.

14
Need to consider Human/Social Context in SE
research

Not all solutions in software engineering are
solely technical.
There is a need to examine organizational, social
and cognitive factors systematically as well.
Many problems are people problems, and require
people-orientated solutions.

15
Theoretical SE research

While there is still a place for innovative,
purely speculative research in Software
Engineering, research which studies real problems
in partnership with industry needs to be given a
higher profile.
These various forms of research ideally
complement one another.
Neither is particularly successful if it ignores
the other.
Too industrially focused research may lack
adequate theory!
Academically focused research may miss the
practice!

16
Research models for SE

Problem highlighted by Glass
Most SE Research in 1990s was Advocacy
Research. Better research models needed.
The software crisis provided the platform on
which most 90s research was founded.
SE Research ignored practice, for the most part
lack of practical application and evaluation were
gapping holes in most SE research.
Appropriate research models for SE are needed.
Robert Glass, The Software -Research Crisis,
IEEE Software, November 1994

17
Methods underlying Models

Scientific method
Engineering method
Empirical method
Analytical method
From W.R.Adrion, Research Methodology in Software
Engineering, ACM SE Notes, Jan. 1993

18
Scientific method

Observe real world
Propose a model or theory of some real world
phenomena
Measure and analyze above
Validate hypotheses of the model or theory
If possible, repeat
19
Engineering method

Observe existing solutions
Propose better solutions
Build or develop better solution
Measure, analyze, and evaluate
Repeat until no further improvements are possible
20
Empirical method

Propose a model
Develop statistical or other basis for the model
Apply to case studies
Measure and analyze
Validate and then repeat
21
Analytical method

Propose a formal theory or set of axioms
Develop a theory
Derive results
If possible, compare with empirical observations
Refine theory if necessary
22
Need to move away from purely analytical method

The analytical method was the most widely used in
mid-90s SE research, but the others need to be
considered and may be more appropriate in some SE
research.
Good research practice combines elements on all
these approaches.

23
4 important phases for any SE research project
(Glass)

Informational phase - Gather or aggregate
information via
reflection
literature survey
people/organization survey
case studies
Propositional phase - Propose and build
hypothesis, method or algorithm, model, theory or
solution
Analytical phase - Analyze and explore proposal
leading to demonstration and/or formulation of
principle or theory
Evaluation phase - Evaluate proposal or analytic
findings by means of experimentation (controlled)
or observation (uncontrolled, such as case study
or protocol analysis) leading to a substantiated
model, principle, or theory.

24
Software Engineering Research Approaches

The Industry-as-Laboratory approach links theory
and praxis
Engineering research aims to improve existing
processes and/or products
Empirical studies are needed to validate Software
Engineering research
Models for SE research need to shift from the
analytic to empirical.

25
Empirical SE Research
26
SE Research

Intersection of AI and Software Engineering
An opportunity to
Use some of the most interesting computational
techniques to solve some of the most important
and rewarding questions

27
AI Fields, Methods and Techniques
28
What Can We Learn From Each Other?
29
Software Development Reference Model
Intersection of AI and SE Research
Empirical Software Engineering
30
Intersection of AI and SE Research

Build Oracles to predict
Defects
Cost and effort
Refactoring
Measure
Static code attributes
Complexity and call graph structure
Data collection
Open repositories (NASA, Promise)
Open source
Softlab Data Repository (SDR)

31
Software Engineering Domain

Classical ML applications
Data miner performance
The more data the better the performance
Little or no meaning behind the numbers, no
interesting stories to tell

32
Software Engineering Domain

Algorithm performance
Understanding Data
Change training data over/ under/ micro sampling
Noise analysis
Increase information content of data
Feature analysis/ weighting
Learn what you will predict later
Cross company vs within company data
Domain Knowledge
SE
ML

33
In Practise

Product quality
Lower defect rates
Less costly testing times
Low maintenance cost
Process quality
Effort and cost estimation
Process improvement

34
Software Engineering Research

Predictive Models
Defect prediction and cost estimation
Bioinformatics
Process Models
Quality Standards
Measurement

35
Major Research Areas

Software Measurement
Defect Prediction/ Estimation
Effort Cost Estimation
Process Improvement (CMM)

36
Defect Prediction

Software development lifecycle
Requirements
Design
Development
Test (Takes 50 of overall time)
Detect and correct defects before delivering
software.
Test strategies
Expert judgment
Manual code reviews
Oracles/ Predictors as secondary tools

37
A Testing Workbench
38
Static Code Attributes

void main()
//This is a sample code
//Declare variables
int a, b, c
// Initialize variables
a2
b5
//Find the sum and display c if greater than
zero
csum(a,b)
if c lt 0
printf(d\n, a)
return

LOC Line of Code LOCC Line of commented Code V
Number of unique operandsoperators CC
Cyclometric Complexity
39
Defect Prediction

Machine Learning based models.
Defect density estimation
Defect prediction between versions
Defect prediction for embedded systems

Software Defect Identification Using Machine
Learning Techniques, E. Ceylan, O. Kutlubay, A.
Bener, EUROMICRO SEAA, Dubrovnik, Croatia, August
28th - September 1st, 2006
"Mining Software Data", B. Turhan and O.
Kutlubay, Data Mining and Business Intelligence
Workshop in ICDE'07 , Istanbul, April 2007
"A Two-Step Model for Defect Density Estimation",
O. Kutlubay, B. Turhan and A. Bener, EUROMICRO
SEAA, Lübeck, Germany, August 2007
Defect Prediction for Embedded Software, A.D.
Oral and A. Bener, ISCIS 2007, Ankara, November
2007
"A Defect Prediction Method for Software
Versioning", Y. Kastro and A. Bener, Software
Quality Journal (in print).
Ensemble of Defect Predictors An Industrial
Application in Embedded Systems Domain. Tosun,
A., Turhan, B., Bener, A. A, and Ulgur, N.I.,
ESEM 2008.
B.Turhan, A. Tosun and A. Bener, "An Industrial
Application of Classifier Ensembles for Locating
Software Defects". Submitted to Information and
Software Technology Journal, 2008.

40
Constructing Predictors

Baseline Naive Bayes.
Why? Best reported results so far (Menzies et
al., 2007)
Remove assumptions and construct different
models.
Independent Attributes -gtMultivariate dist.
Attributes of equal importance -gt Weighted Naive
Bayes

"Software Defect Prediction Heuristics for
Weighted Naïve Bayes", B. Turhan and A. Bener,
ICSOFT2007, Barcelona, Spain, July 2007.
Software Defect Prediction Modeling, B. Turhan,
IDOESE 2007, Madrid, Spain, September 2007
Yazilim Hata Kestirimi için Kaynak Kod
Ölçütlerine Dayali Bayes Siniflandirmasi,
UYMS2007, Ankara, September 2007
A Multivariate Analysis of Static Code
Attributes for Defect Prediction, B. Turhan and
A. Bener QSIC 2007, Portland, USA, October 2007.
Weighted Static Code Attributes for Defect
Prediction, B.Turhan and A. Bener, SEKE 2008,
San Francisco, July 2008.
B.Turhan and A. Bener, "Analysis of Naive Bayes'
Assumptions on Software Fault Data An Empirical
Study". Data and Knowledge Engineering Journal,
2008, in print
B.Turhan, A. Tosun and A. Bener, "An Industrial
Application of Classifier Ensembles for Locating
Software Defects". Submitted to Data and
Knowledge Engineering Journal, 2008.
B.Turhan, A. Bener and G. Kocak "Data Mining
Source Code for Locating Software Bugs A Case
Study in Telecommunication Industry". Submitted
to Expert Systems with Applications Journal,
2008.

41
WC vs CC Data for Defects?

When to use WC or CC?
How much data do we need to construct a model?

Implications of Ceiling Effects in Defect
Predictors, Menzies, T., Turhan, B., Bener, A.,
Gay, G., Cukic, B., Jiang, Y. PROMISE 2008,
Leipzig, Germany, May 2008. Nearest Neighbor
Sampling or Cross Company Defect Predictors,
Turhan, B., Bener, A., Menzies, T., DEFECTS 2008,
Seattle, USA, July 2008. "On the Relative Value
of Cross-company and Within-Company Data for
Defect Prediction", B. Turhan, T. Menzies, A.
Bener, J. Distefano, Empirical Software
Engineering Journal, 2008, in print T. Menzies,
Z.Milton, B. Turhan, Y. Jiang, G. Gay, B. Cukic,
A. Bener, "Overcoming Ceiling Effects in Defect
Prediction", Submitted to IEEE Transactions on
Software Engineering, 2008.
42
Module Structure vs Defect Rate

Fan-in, fan-out
Page Rank Algorithm
Dependency graph information
small is beautiful

Koçak, G., Turhan, B., Bener,A. Software Defect
Prediction Using Call Graph Based Ranking
Algorithm, Euromicro 2008. G. Kocak, B. Turhan
and A.Bener, "Predicting Defects in a Large
Telecommunication System, ICSOFT'08.
43
COST ESTIMATION

Cost Estimation predicting the effort required
to develop a new software project
Effort the number of months one person would
need to develop a given project (person
months-PM)
CE assists project managers when they make
important decisions (bidding, planning, resource
allocation)
underestimation ? approve projects that would
then exceed their budgets
overestimation ? waste of resources
Modeling accurate robust cost estimators
Successful software project management

44
COST ESTIMATION

Understanding the data structure?
CROSS- vs. WITHIN-APPLICATION DOMAIN embedded
software domain
Better predictor?
Point Estimation a single value of effort is
tried to be estimated
Interval Estimation effort intervals are tried
to be estimated
COST CLASSIFICATION

dynamic intervals
classification algorithms
point estimates
45
COST ESTIMATION

How can we achieve accurate estimations with
limited amount of effort data?
feature subset selection Save the cost of
extracting less important features

46
Cost Estimation

Comparison of ML based models with parametric
models
Feature ranking
COCOMO81- COCOMO2-COQUALMO
Cost estimation as a classification problem
(interval prediction)

"Mining Software Data", B. Turhan and O.
Kutlubay, Data Mining and Business Intelligence
Workshop in ICDE'07 , Istanbul, April 2007
Software Effort Estimation Using Machine
Learning Methods, B. Baskeles, B.Turhan, A.
Bener, ISCIS 2007,Ankara, November 2007.
"Evaluation of Feature Extraction Methods on
Software Cost Estimation", B. Turhan, O.
Kutlubay, A. Bener, ESEM2007, Madrid, Spain,
September 2007 . ENNA Software Effort
Estimation Using Ensemble of Neural Networks with
Associative Memory Kültür Y., Turhan B., Bener
A., FSE 2008.
Software Cost Estimation as a Classification
Problem, Bakir, A., Turhan, B., Bener, A. ICSOFT
2008.
B.Turhan, A. Bakir and A. Bener, "A Comparative
Study for Estimating Software Development Effort
Intervals". Submitted to Knowledge Based Systems
Journal, 2008.
B.Turhan, Y. Kultur and A. Bener, "Ensemble of
Neural Networks with Associative Memory (ENNA)
for Estimating Software Development Costs",
Submitted to Knowledge Based Systems Journal,
2008.
A. Tosun, B. Turhan, A. Bener, "Feature
Weighting Heuristics for Analogy Based Effort
Estimation Models", Submitted to Expert Systems
with Applications, 2007.
A. Bakir, B.Turhan and A. Bener, "A New
Perspective on Data Homogeneity for Software Cost
Estimation". Submitted to Software Quality
Journal, 2008.

47
Prest

A tool developed by Softlab
Parser
C, Java, C, jsp
Metric Collection
Data Analysis

48
Data Sources

Public Datasets
NASA (IVV Facility, Metrics Program)
PROMISE (Software Engineering Repository)
Includes Softlab data now
Open Source Projects (Sourceforge, Linux, etc.)
Internet based small datasets
University of South California (USC) Dataset
Desharnais Dataset
ICBSG Dataset
NASA COCOMO and NASA 93 Datasets
Softlab Data Repository (SDR)
Local industry collaboration
Total 20 companies, 25 projects over 5 years

49
Process Automation

UML Refactoring
Class diagram source code
Tool
Algorithm (graph based)
What needs to be refactored
Complexity vs call graphs

Y. Kösker and A. Bener . "Synchronization of UML
Based Refactoring with Graph Transformation",
SEKE 2007, Boston, July 9-11, 2007 B.Turhan, Y.
Kosker and A. Bener, "An Expert System for
Determining Candidate Software Classes for
Refactoring". Submitted to Expert Systems with
Applications Journal, 2008. Y. Kosker, A.Bener
and B. Turhan, "Refactoring Prediction Using
Class Complexity Metrics, ICSOFT'08, 2008. B.
Turhan, A. Bener and Y.Kosker, "Tekrar Tasarim
Gerektiren Siniflarin Karmasiklik Olcutleri
Kullanilarak Modellenmesi" (in Turkish), 2.
Ulusal Yazilim Mimarisi Konferansi (UYMK'08),
2008.
50
Process Improvement and Assessment

A Case in health care industry
Process Improvement with CMMI
Requirements Management
Change Management
Comparison A Before and After Evaluation
Lessons Learned

Tosun, B. Turhan and A. Bener,"The Benefits of a
Software Quality Improvement Project in a Medical
Software Company
A Before and After Comparison", Invited Paper and
Keynote speech in International Symposium on
Health Informatics and
Bioinformatics (HIBIT'08), 2008.

51
Metrics Program in Telecom

Metrics extraction from 25 Java applications
Static code attributes (McCabe, Halstead and LOC
metrics)
CallGraph information (caller-callee relation
between modules)
Information from four versions (a version in two
weeks)
Product and test defects (pre-release defects)
Various experimental designs for predicting
fault-prone files
Discard versions Treat all applications in a
version as a single project
Predict fault-prone parts of each application
Using previous versions of all the applications
Using previous versions of the selected
application
Additionally,
Optimization of the local prediction model using
CallGraph metric
Refactoring prediction using class complexity
metrics

52
Matching reqs with defects
Requirements Analysis
Call Graph / Refactoring
Design
Test driven development
Coding
Defect prediction
Test
Refactoring
Maintenance ? 8
53
Emerging Research Topics

Adding organizational factors to local prediction
model
Information about the development team,
experience, coding practices, etc.
Adding file metrics from version history
Modified/added/deleted lines of code
Selecting only modified files from each version
in the prediction model
Confidence Factor
Using time factors
Dynamic prediction Constructing a model
for each application in a version
for each module/package in an application
for each developer by learning from his/her
coding habits
TDD
Measuring test coverage
Defect proneness
Company wide implementation process
Embedded systems
Cost/ Effort Estimation
Dynamic estimation per process
Bioinformatics

Write a Comment

User Comments (0)

About PowerShow.com

SoftLab Bogazii University Department of Computer Engineering Software Engineering Research Lab http PowerPoint PPT Presentation