Using Benchmarking to Advance Research: A Challenge to Software Engineering - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Using Benchmarking to Advance Research: A Challenge to Software Engineering

Description:

Explain the success of benchmarks from both sociological and technical reasons ... Should we have paradigms before we have benchmarks? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 13
Provided by: gerryr
Category:

less

Transcript and Presenter's Notes

Title: Using Benchmarking to Advance Research: A Challenge to Software Engineering


1
Using Benchmarking to Advance Research A
Challenge to Software Engineering
  • Susan Elliott Sim
  • Steve Easterbrook
  • Richard C.Holt

2
Descriptive theory
  • A descriptive theory is an explanatory framework
    to help us better understand the past.
  • Scope of the theory Concerned primarily with
    benchmarks that are created and used by a
    technical research community
  • Definition of benchmark A benchmark as a test or
    set of tests used to compare the performance of
    alternative tools or techniques.

3
Definition of a benchmark
Need
  • Task sample
  • representative sample
  • Performance Measures
  • Performance is a measure of fitness for purpose

Motivating comparison
purpose, the heart
Technical comparison, Research agenda
4
The critical insight of the theory
  • Benchmarks ??Scientific Paradigms

Function
scientific discovery
benchmarks
consensus
5
Effectiveness (1)
  • Explain the success of benchmarks from both
    sociological and technical reasons
  • Sociological factors

P R O M O T E
Frank, detailed, and technical communication among
researchers
benchmark
collaborative
open, public
research
During development
During deployment
Evaluations
6
Effectiveness (2)
  • Technical factors
  • Empirical method
  • Experiments control of task sample -gt reduce
    variability in the results
  • Case study little control over the selection of
    things to be evaluated
  • Replication
  • Accepted and familiar evaluation technique

7
Apply the theory
  • How to determine whether to begin or not?
  • First precondition minimum level of maturity in
    the discipline,
  • For example an increasing concern with
    validation and comparison
  • Caveats
  • Significant cost
  • Too early? Hold back later progress, close off
    other directions
  • Second precondition
  • An ethos of collaboration within the community
  • Principles for the benchmark development process
    three attributes
  • Seven requirements for the end product

8
Case Study CppETS (1)
  • CppETS a benchmark for comparing the
    capabilities of C fact extractors
  • Motivating Comparison to find the most accurate
    and robust fact extractor for C.
  • Task Sample

9
Case Study CppETS (2)
2. What is the fourth enumeration constant in
enum days? - Our parser does not keep this
information. Enumeration constants are reported
as "Variables", which seems to be a parser bug
(or "feature" -)) 3. What is the (integer)
value of of enumeration constant MON? - The
Visual Age IDE has a view called "Declarations"
that contains all enum constants and their
values. The relevant line reads days MON 2
  • Performance measures
  • full answer,
  • partial answer,
  • no answer

10
Case Study CppETS (3)
  • A successful benchmark development process should
    have three attributes
  • The effort must be led by a small number of
    champions
  • Championed by Susan Elliott Sim
  • Design decisions for the benchmark need to be
    supported by laboratory work
  • Before creating, discussed with some researchers
    while developing, used extensive test.
  • The benchmark must be developed by consensus
  • Provided two opportunities to participate

11
Case Study CppETS (4)
  • CppETS meets the requirements
  • Accessibility downloaded and used by any
    interested person
  • Portability. the test programs were portable to a
    variety of platforms and compilers.
  • And Affordability, Scalability, Clarity,
    Relevance, Solvability
  • Impact of CppETS
  • Has improved both the technical results and the
    cohesiveness of a community

12
Discussion
  • What is the development process and what is the
    effectiveness of benchmarks which are designed
    for business or marketing purpose?
  • What is relationship among paradigm, benchmark,
    scientific discovery and social consensus?
  • Should we have paradigms before we have
    benchmarks?
  • Do you think that benchmark development process
    must be led by a small number of champions,
    considering that the benchmark must be developed
    under consensus? Why?
  • Is there a danger if the development of benchmark
    begins too early in a research community? If too
    early, What may happens?
Write a Comment
User Comments (0)
About PowerShow.com