SoftLab Bogazii University Department of Computer Engineering Software Engineering Research Lab http PowerPoint PPT Presentation

presentation player overlay
1 / 53
About This Presentation
Transcript and Presenter's Notes

Title: SoftLab Bogazii University Department of Computer Engineering Software Engineering Research Lab http


1
SoftLabBogaziçi University Department of
Computer EngineeringSoftware Engineering
Research Labhttp//softlab.boun.edu.tr/
2
Research Challenges
  • Trend to large, heterogenous, distributed sw
    systems leads to an increase in system complexity
  • Software and service productivity lags behind
    requirements
  • Increased complexity takes sw developers further
    from stakeholders
  • Importance of interoperability, standardisation
    and reuse of software increasing.

3
Research Challenges
  • Service Engineering
  • Complex Software Systems
  • Open Source Software
  • Software Engineering Research

4
Software Engineering Research Approaches
  • Balancing theory and praxis
  • How engineering research differs from scientific
    research
  • The role of empirical studies
  • Models for SE research

5
The need to link research with practice
Colin Potts, Software Engineering Research
Revisited, IEEE Software, September 1993
  • Why after 25 years of SE has SE research failed
    to influence industrial practice and the quality
    of resulting software?
  • Potts argues that this failure is caused by
    treating research and its application by industry
    as separate, sequential activities.
  • What he calls the research-then-transfer
    approach. The solution he proposes is the
    industry-as-laboratory approach.
  • .

6
Research-then-Transfer
Research Solution V1
Problem V1
Wide gulf bridged by indirect, anecdotal knowledge
Research Solution V2




Problem V2
Problem V3
Research Solution V3
Problem V4
Research Solution V4
Technology transfer Gap bridged by hard, but
frequently inappropriate technology
Problem evolves
invisibly to the
research community
Incremental Refinement of research solutions
7
Research-then-Transfer Problems
  • Both research and practice evolve separately
  • Match between current problems in industry and
    research solutions is haphazard
  • No winners

8
Disadvantages of Research-then-Transfer
  • Research problems described and understood in
    terms of solution technology - whatever is
    current research fashion. Connection to practice
    is tenuous.
  • Concentration is on technical refinement of
    research solution - OK but lacks industrial need
    as focus, so effort may be misplaced.
  • Evaluation is difficult as research solutions may
    use technology that is not commonly used in
    industry
  • Delay in evaluation means problem researchers are
    solving has often evolved through changes in
    business practice, technology etc.
  • Transfer is difficult because industry has little
    basis for confidence in proposed research
    solution.

9
Industry-as-Laboratory Approach to SE research
Problem V1
Research Solution V1
Problem V2
Research Solution V2
Problem V3
Research Solution V3
Problem V4
Research Solution V4
10
Advantages of Industry-as-Laboratory Approach
  • Stronger connection at start because knowledge of
    problem is acquired from the real practitioners
    in industry, often industrial partners in a
    research consortium.
  • Connection is strengthened by practitioners and
    researchers constantly interacting to develop the
    solution
  • Early evaluation and usage by industry lessens
    the Technology Transfer Gap.
  • Reliance on Empirical Research
  • shift from solution-driven SE to problem-focused
    SE
  • solve problems that really do matter to
    practitioners

11
Early SEI industrial survey research
  • What a SEI survey learned from industry
  • There was a thin spread of domain knowledge in
    most projects
  • Customer requirements were extremely volatile.
  • These findings point towards research combining
    work on requirements engineering with reuse -
    instead of the approach of researching these
    topics by separate SE research communities - as
    is still found today!
  • From A field study of the Software Development
    Process
  • for Large Systems, CACM, November 1988.

12
Further Results from Potts et al Early 90s Survey
  • 23 software development organizations (during
    1990-92). (Survey focused on Requirements
    Modeling process)
  • Requirements were invented not elicited.
  • Most development is maintenance.
  • Most specification is incremental.
  • Domain knowledge is important.
  • There is a gulf between the developer and user
  • User-interface requirements continually change.
  • There is a preference for office-automation tools
    over CASE tools to support development. I.e.
    developers found using a WP DB more useful
    than any CASE tools.

13
Industry-as-Laboratory emphasizes Real Case
Studies
  • Advantages of case studies over studying problems
    in research lab.
  • Scale and complexity - small, simple (even
    simplistic) cases avoided - these often bear
    little relation to real problems.
  • Unpredictability - assumptions thrown out as
    researchers learn more about real problems
  • Dynamism - a real case study is more vital than
    a textbook account
  • The real-world complications of industrial case
    studies are more likely to throw up
    representative problems and phenomena than
    research laboratory examples influenced by the
    researchers preconceptions.

14
Need to consider Human/Social Context in SE
research
  • Not all solutions in software engineering are
    solely technical.
  • There is a need to examine organizational, social
    and cognitive factors systematically as well.
  • Many problems are people problems, and require
    people-orientated solutions.

15
Theoretical SE research
  • While there is still a place for innovative,
    purely speculative research in Software
    Engineering, research which studies real problems
    in partnership with industry needs to be given a
    higher profile.
  • These various forms of research ideally
    complement one another.
  • Neither is particularly successful if it ignores
    the other.
  • Too industrially focused research may lack
    adequate theory!
  • Academically focused research may miss the
    practice!

16
Research models for SE
  • Problem highlighted by Glass
  • Most SE Research in 1990s was Advocacy
    Research. Better research models needed.
  • The software crisis provided the platform on
    which most 90s research was founded.
  • SE Research ignored practice, for the most part
    lack of practical application and evaluation were
    gapping holes in most SE research.
  • Appropriate research models for SE are needed.
  • Robert Glass, The Software -Research Crisis,
    IEEE Software, November 1994

17
Methods underlying Models
  • Scientific method
  • Engineering method
  • Empirical method
  • Analytical method
  • From W.R.Adrion, Research Methodology in Software
    Engineering, ACM SE Notes, Jan. 1993

18
Scientific method

Observe real world
Propose a model or theory of some real world
phenomena
Measure and analyze above
Validate hypotheses of the model or theory
If possible, repeat
19
Engineering method

Observe existing solutions
Propose better solutions
Build or develop better solution
Measure, analyze, and evaluate
Repeat until no further improvements are possible
20
Empirical method


Propose a model
Develop statistical or other basis for the model
Apply to case studies
Measure and analyze
Validate and then repeat
21
Analytical method



Propose a formal theory or set of axioms
Develop a theory
Derive results
If possible, compare with empirical observations
Refine theory if necessary
22
Need to move away from purely analytical method
  • The analytical method was the most widely used in
    mid-90s SE research, but the others need to be
    considered and may be more appropriate in some SE
    research.
  • Good research practice combines elements on all
    these approaches.

23
4 important phases for any SE research project
(Glass)
  • Informational phase - Gather or aggregate
    information via
  • reflection
  • literature survey
  • people/organization survey
  • case studies
  • Propositional phase - Propose and build
    hypothesis, method or algorithm, model, theory or
    solution
  • Analytical phase - Analyze and explore proposal
    leading to demonstration and/or formulation of
    principle or theory
  • Evaluation phase - Evaluate proposal or analytic
    findings by means of experimentation (controlled)
    or observation (uncontrolled, such as case study
    or protocol analysis) leading to a substantiated
    model, principle, or theory.

24
Software Engineering Research Approaches
  • The Industry-as-Laboratory approach links theory
    and praxis
  • Engineering research aims to improve existing
    processes and/or products
  • Empirical studies are needed to validate Software
    Engineering research
  • Models for SE research need to shift from the
    analytic to empirical.

25
Empirical SE Research
26
SE Research
  • Intersection of AI and Software Engineering
  • An opportunity to
  • Use some of the most interesting computational
    techniques to solve some of the most important
    and rewarding questions

27
AI Fields, Methods and Techniques
28
What Can We Learn From Each Other?
29
Software Development Reference Model
Intersection of AI and SE Research
Empirical Software Engineering
30
Intersection of AI and SE Research
  • Build Oracles to predict
  • Defects
  • Cost and effort
  • Refactoring
  • Measure
  • Static code attributes
  • Complexity and call graph structure
  • Data collection
  • Open repositories (NASA, Promise)
  • Open source
  • Softlab Data Repository (SDR)

31
Software Engineering Domain
  • Classical ML applications
  • Data miner performance
  • The more data the better the performance
  • Little or no meaning behind the numbers, no
    interesting stories to tell

32
Software Engineering Domain
  • Algorithm performance
  • Understanding Data
  • Change training data over/ under/ micro sampling
  • Noise analysis
  • Increase information content of data
  • Feature analysis/ weighting
  • Learn what you will predict later
  • Cross company vs within company data
  • Domain Knowledge
  • SE
  • ML

33
In Practise
  • Product quality
  • Lower defect rates
  • Less costly testing times
  • Low maintenance cost
  • Process quality
  • Effort and cost estimation
  • Process improvement

34
Software Engineering Research
  • Predictive Models
  • Defect prediction and cost estimation
  • Bioinformatics
  • Process Models
  • Quality Standards
  • Measurement

35
Major Research Areas
  • Software Measurement
  • Defect Prediction/ Estimation
  • Effort Cost Estimation
  • Process Improvement (CMM)

36
Defect Prediction
  • Software development lifecycle
  • Requirements
  • Design
  • Development
  • Test (Takes 50 of overall time)
  • Detect and correct defects before delivering
    software.
  • Test strategies
  • Expert judgment
  • Manual code reviews
  • Oracles/ Predictors as secondary tools

37
A Testing Workbench
38
Static Code Attributes
  • void main()
  • //This is a sample code
  • //Declare variables
  • int a, b, c
  • // Initialize variables
  • a2
  • b5
  • //Find the sum and display c if greater than
    zero
  • csum(a,b)
  • if c lt 0
  • printf(d\n, a)
  • return

LOC Line of Code LOCC Line of commented Code V
Number of unique operandsoperators CC
Cyclometric Complexity
39
Defect Prediction
  • Machine Learning based models.
  • Defect density estimation
  • Defect prediction between versions
  • Defect prediction for embedded systems
  • Software Defect Identification Using Machine
    Learning Techniques, E. Ceylan, O. Kutlubay, A.
    Bener, EUROMICRO SEAA, Dubrovnik, Croatia, August
    28th - September 1st, 2006
  • "Mining Software Data", B. Turhan and O.
    Kutlubay, Data Mining and Business Intelligence
    Workshop in ICDE'07 , Istanbul, April 2007
  • "A Two-Step Model for Defect Density Estimation",
    O. Kutlubay, B. Turhan and A. Bener, EUROMICRO
    SEAA, Lübeck, Germany, August 2007
  • Defect Prediction for Embedded Software, A.D.
    Oral and A. Bener, ISCIS 2007, Ankara, November
    2007
  • "A Defect Prediction Method for Software
    Versioning", Y. Kastro and A. Bener, Software
    Quality Journal (in print).
  • Ensemble of Defect Predictors An Industrial
    Application in Embedded Systems Domain. Tosun,
    A., Turhan, B., Bener, A. A, and Ulgur, N.I.,
    ESEM 2008.
  • B.Turhan, A. Tosun and A. Bener, "An Industrial
    Application of Classifier Ensembles for Locating
    Software Defects". Submitted to Information and
    Software Technology Journal, 2008.

40
Constructing Predictors
  • Baseline Naive Bayes.
  • Why? Best reported results so far (Menzies et
    al., 2007)
  • Remove assumptions and construct different
    models.
  • Independent Attributes -gtMultivariate dist.
  • Attributes of equal importance -gt Weighted Naive
    Bayes
  • "Software Defect Prediction Heuristics for
    Weighted Naïve Bayes", B. Turhan and A. Bener,
    ICSOFT2007, Barcelona, Spain, July 2007.
  • Software Defect Prediction Modeling, B. Turhan,
    IDOESE 2007, Madrid, Spain, September 2007
  • Yazilim Hata Kestirimi için Kaynak Kod
    Ölçütlerine Dayali Bayes Siniflandirmasi,
    UYMS2007, Ankara, September 2007
  • A Multivariate Analysis of Static Code
    Attributes for Defect Prediction, B. Turhan and
    A. Bener QSIC 2007, Portland, USA, October 2007.
  • Weighted Static Code Attributes for Defect
    Prediction, B.Turhan and A. Bener, SEKE 2008,
    San Francisco, July 2008.
  • B.Turhan and A. Bener, "Analysis of Naive Bayes'
    Assumptions on Software Fault Data An Empirical
    Study". Data and Knowledge Engineering Journal,
    2008, in print
  • B.Turhan, A. Tosun and A. Bener, "An Industrial
    Application of Classifier Ensembles for Locating
    Software Defects". Submitted to Data and
    Knowledge Engineering Journal, 2008.
  • B.Turhan, A. Bener and G. Kocak "Data Mining
    Source Code for Locating Software Bugs A Case
    Study in Telecommunication Industry". Submitted
    to Expert Systems with Applications Journal,
    2008.

41
WC vs CC Data for Defects?
  • When to use WC or CC?
  • How much data do we need to construct a model?

Implications of Ceiling Effects in Defect
Predictors, Menzies, T., Turhan, B., Bener, A.,
Gay, G., Cukic, B., Jiang, Y. PROMISE 2008,
Leipzig, Germany, May 2008. Nearest Neighbor
Sampling or Cross Company Defect Predictors,
Turhan, B., Bener, A., Menzies, T., DEFECTS 2008,
Seattle, USA, July 2008. "On the Relative Value
of Cross-company and Within-Company Data for
Defect Prediction", B. Turhan, T. Menzies, A.
Bener, J. Distefano, Empirical Software
Engineering Journal, 2008, in print T. Menzies,
Z.Milton, B. Turhan, Y. Jiang, G. Gay, B. Cukic,
A. Bener, "Overcoming Ceiling Effects in Defect
Prediction", Submitted to IEEE Transactions on
Software Engineering, 2008.
42
Module Structure vs Defect Rate
  • Fan-in, fan-out
  • Page Rank Algorithm
  • Dependency graph information
  • small is beautiful

Koçak, G., Turhan, B., Bener,A. Software Defect
Prediction Using Call Graph Based Ranking
Algorithm, Euromicro 2008. G. Kocak, B. Turhan
and A.Bener, "Predicting Defects in a Large
Telecommunication System, ICSOFT'08.
43
COST ESTIMATION
  • Cost Estimation predicting the effort required
    to develop a new software project
  • Effort the number of months one person would
    need to develop a given project (person
    months-PM)
  • CE assists project managers when they make
    important decisions (bidding, planning, resource
    allocation)
  • underestimation ? approve projects that would
    then exceed their budgets
  • overestimation ? waste of resources
  • Modeling accurate robust cost estimators
    Successful software project management

44
COST ESTIMATION
  • Understanding the data structure?
  • CROSS- vs. WITHIN-APPLICATION DOMAIN embedded
    software domain
  • Better predictor?
  • Point Estimation a single value of effort is
    tried to be estimated
  • Interval Estimation effort intervals are tried
    to be estimated
  • COST CLASSIFICATION

dynamic intervals
classification algorithms
point estimates
45
COST ESTIMATION
  • How can we achieve accurate estimations with
    limited amount of effort data?
  • feature subset selection Save the cost of
    extracting less important features

46
Cost Estimation
  • Comparison of ML based models with parametric
    models
  • Feature ranking
  • COCOMO81- COCOMO2-COQUALMO
  • Cost estimation as a classification problem
    (interval prediction)
  • "Mining Software Data", B. Turhan and O.
    Kutlubay, Data Mining and Business Intelligence
    Workshop in ICDE'07 , Istanbul, April 2007
  • Software Effort Estimation Using Machine
    Learning Methods, B. Baskeles, B.Turhan, A.
    Bener, ISCIS 2007,Ankara, November 2007.
  • "Evaluation of Feature Extraction Methods on
    Software Cost Estimation", B. Turhan, O.
    Kutlubay, A. Bener, ESEM2007, Madrid, Spain,
    September 2007 . ENNA Software Effort
    Estimation Using Ensemble of Neural Networks with
    Associative Memory Kültür Y., Turhan B., Bener
    A., FSE 2008.
  • Software Cost Estimation as a Classification
    Problem, Bakir, A., Turhan, B., Bener, A. ICSOFT
    2008.
  • B.Turhan, A. Bakir and A. Bener, "A Comparative
    Study for Estimating Software Development Effort
    Intervals". Submitted to Knowledge Based Systems
    Journal, 2008.
  • B.Turhan, Y. Kultur and A. Bener, "Ensemble of
    Neural Networks with Associative Memory (ENNA)
    for Estimating Software Development Costs",
    Submitted to Knowledge Based Systems Journal,
    2008.
  • A. Tosun, B. Turhan, A. Bener, "Feature
    Weighting Heuristics for Analogy Based Effort
    Estimation Models", Submitted to Expert Systems
    with Applications, 2007.
  • A. Bakir, B.Turhan and A. Bener, "A New
    Perspective on Data Homogeneity for Software Cost
    Estimation". Submitted to Software Quality
    Journal, 2008.

47
Prest
  • A tool developed by Softlab
  • Parser
  • C, Java, C, jsp
  • Metric Collection
  • Data Analysis

48
Data Sources
  • Public Datasets
  • NASA (IVV Facility, Metrics Program)
  • PROMISE (Software Engineering Repository)
  • Includes Softlab data now
  • Open Source Projects (Sourceforge, Linux, etc.)
  • Internet based small datasets
  • University of South California (USC) Dataset
  • Desharnais Dataset
  • ICBSG Dataset
  • NASA COCOMO and NASA 93 Datasets
  • Softlab Data Repository (SDR)
  • Local industry collaboration
  • Total 20 companies, 25 projects over 5 years

49
Process Automation
  • UML Refactoring
  • Class diagram source code
  • Tool
  • Algorithm (graph based)
  • What needs to be refactored
  • Complexity vs call graphs

Y. Kösker and A. Bener . "Synchronization of UML
Based Refactoring with Graph Transformation",
SEKE 2007, Boston, July 9-11, 2007 B.Turhan, Y.
Kosker and A. Bener, "An Expert System for
Determining Candidate Software Classes for
Refactoring". Submitted to Expert Systems with
Applications Journal, 2008. Y. Kosker, A.Bener
and B. Turhan, "Refactoring Prediction Using
Class Complexity Metrics, ICSOFT'08, 2008. B.
Turhan, A. Bener and Y.Kosker, "Tekrar Tasarim
Gerektiren Siniflarin Karmasiklik Olcutleri
Kullanilarak Modellenmesi" (in Turkish), 2.
Ulusal Yazilim Mimarisi Konferansi (UYMK'08),
2008.
50
Process Improvement and Assessment
  • A Case in health care industry
  • Process Improvement with CMMI
  • Requirements Management
  • Change Management
  • Comparison A Before and After Evaluation
  • Lessons Learned
  • Tosun, B. Turhan and A. Bener,"The Benefits of a
    Software Quality Improvement Project in a Medical
    Software Company
  • A Before and After Comparison", Invited Paper and
    Keynote speech in International Symposium on
    Health Informatics and
  • Bioinformatics (HIBIT'08), 2008.

51
Metrics Program in Telecom
  • Metrics extraction from 25 Java applications
  • Static code attributes (McCabe, Halstead and LOC
    metrics)
  • CallGraph information (caller-callee relation
    between modules)
  • Information from four versions (a version in two
    weeks)
  • Product and test defects (pre-release defects)
  • Various experimental designs for predicting
    fault-prone files
  • Discard versions Treat all applications in a
    version as a single project
  • Predict fault-prone parts of each application
  • Using previous versions of all the applications
  • Using previous versions of the selected
    application
  • Additionally,
  • Optimization of the local prediction model using
    CallGraph metric
  • Refactoring prediction using class complexity
    metrics

52
Matching reqs with defects
Requirements Analysis
Call Graph / Refactoring
Design
Test driven development
Coding
Defect prediction
Test
Refactoring
Maintenance ? 8
53
Emerging Research Topics
  • Adding organizational factors to local prediction
    model
  • Information about the development team,
    experience, coding practices, etc.
  • Adding file metrics from version history
  • Modified/added/deleted lines of code
  • Selecting only modified files from each version
    in the prediction model
  • Confidence Factor
  • Using time factors
  • Dynamic prediction Constructing a model
  • for each application in a version
  • for each module/package in an application
  • for each developer by learning from his/her
    coding habits
  • TDD
  • Measuring test coverage
  • Defect proneness
  • Company wide implementation process
  • Embedded systems
  • Cost/ Effort Estimation
  • Dynamic estimation per process
  • Bioinformatics
Write a Comment
User Comments (0)
About PowerShow.com