Using Benchmarking to Advance Research: A Challenge to Software Engineering

About This Presentation

Title:

Using Benchmarking to Advance Research: A Challenge to Software Engineering

Description:

Explain the success of benchmarks from both sociological and technical reasons ... Should we have paradigms before we have benchmarks? ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 13

Provided by: gerryr

Category:

more less

Transcript and Presenter's Notes

Title: Using Benchmarking to Advance Research: A Challenge to Software Engineering

1
Using Benchmarking to Advance Research A
Challenge to Software Engineering

Susan Elliott Sim
Steve Easterbrook
Richard C.Holt

2
Descriptive theory

A descriptive theory is an explanatory framework
to help us better understand the past.
Scope of the theory Concerned primarily with
benchmarks that are created and used by a
technical research community
Definition of benchmark A benchmark as a test or
set of tests used to compare the performance of
alternative tools or techniques.

3
Definition of a benchmark
Need

Task sample
representative sample
Performance Measures
Performance is a measure of fitness for purpose

Motivating comparison
purpose, the heart
Technical comparison, Research agenda
4
The critical insight of the theory

Benchmarks ??Scientific Paradigms

Function
scientific discovery
benchmarks
consensus
5
Effectiveness (1)

Explain the success of benchmarks from both
sociological and technical reasons
Sociological factors

P R O M O T E
Frank, detailed, and technical communication among
researchers
benchmark
collaborative
open, public
research
During development
During deployment
Evaluations
6
Effectiveness (2)

Technical factors
Empirical method
Experiments control of task sample -gt reduce
variability in the results
Case study little control over the selection of
things to be evaluated
Replication
Accepted and familiar evaluation technique

7
Apply the theory

How to determine whether to begin or not?
First precondition minimum level of maturity in
the discipline,
For example an increasing concern with
validation and comparison
Caveats
Significant cost
Too early? Hold back later progress, close off
other directions
Second precondition
An ethos of collaboration within the community
Principles for the benchmark development process
three attributes
Seven requirements for the end product

8
Case Study CppETS (1)

CppETS a benchmark for comparing the
capabilities of C fact extractors
Motivating Comparison to find the most accurate
and robust fact extractor for C.
Task Sample

9
Case Study CppETS (2)
2. What is the fourth enumeration constant in
enum days? - Our parser does not keep this
information. Enumeration constants are reported
as "Variables", which seems to be a parser bug
(or "feature" -)) 3. What is the (integer)
value of of enumeration constant MON? - The
Visual Age IDE has a view called "Declarations"
that contains all enum constants and their
values. The relevant line reads days MON 2

Performance measures
full answer,
partial answer,
no answer

10
Case Study CppETS (3)

A successful benchmark development process should
have three attributes
The effort must be led by a small number of
champions
Championed by Susan Elliott Sim
Design decisions for the benchmark need to be
supported by laboratory work
Before creating, discussed with some researchers
while developing, used extensive test.
The benchmark must be developed by consensus
Provided two opportunities to participate

11
Case Study CppETS (4)

CppETS meets the requirements
Accessibility downloaded and used by any
interested person
Portability. the test programs were portable to a
variety of platforms and compilers.
And Affordability, Scalability, Clarity,
Relevance, Solvability
Impact of CppETS
Has improved both the technical results and the
cohesiveness of a community

12
Discussion

What is the development process and what is the
effectiveness of benchmarks which are designed
for business or marketing purpose?
What is relationship among paradigm, benchmark,
scientific discovery and social consensus?
Should we have paradigms before we have
benchmarks?
Do you think that benchmark development process
must be led by a small number of champions,
considering that the benchmark must be developed
under consensus? Why?
Is there a danger if the development of benchmark
begins too early in a research community? If too
early, What may happens?