Title: Using Benchmarking to Advance Research: A Challenge to Software Engineering
1Using Benchmarking to Advance Research A
Challenge to Software Engineering
- Susan Elliott Sim
- Steve Easterbrook
- Richard C.Holt
2Descriptive theory
- A descriptive theory is an explanatory framework
to help us better understand the past. - Scope of the theory Concerned primarily with
benchmarks that are created and used by a
technical research community - Definition of benchmark A benchmark as a test or
set of tests used to compare the performance of
alternative tools or techniques.
3Definition of a benchmark
Need
-
- Task sample
- representative sample
- Performance Measures
- Performance is a measure of fitness for purpose
-
Motivating comparison
purpose, the heart
Technical comparison, Research agenda
4The critical insight of the theory
- Benchmarks ??Scientific Paradigms
Function
scientific discovery
benchmarks
consensus
5 Effectiveness (1)
- Explain the success of benchmarks from both
sociological and technical reasons - Sociological factors
P R O M O T E
Frank, detailed, and technical communication among
researchers
benchmark
collaborative
open, public
research
During development
During deployment
Evaluations
6Effectiveness (2)
- Technical factors
- Empirical method
- Experiments control of task sample -gt reduce
variability in the results - Case study little control over the selection of
things to be evaluated - Replication
- Accepted and familiar evaluation technique
7Apply the theory
- How to determine whether to begin or not?
- First precondition minimum level of maturity in
the discipline, - For example an increasing concern with
validation and comparison - Caveats
- Significant cost
- Too early? Hold back later progress, close off
other directions - Second precondition
- An ethos of collaboration within the community
- Principles for the benchmark development process
three attributes - Seven requirements for the end product
8Case Study CppETS (1)
- CppETS a benchmark for comparing the
capabilities of C fact extractors - Motivating Comparison to find the most accurate
and robust fact extractor for C. - Task Sample
9Case Study CppETS (2)
2. What is the fourth enumeration constant in
enum days? - Our parser does not keep this
information. Enumeration constants are reported
as "Variables", which seems to be a parser bug
(or "feature" -)) 3. What is the (integer)
value of of enumeration constant MON? - The
Visual Age IDE has a view called "Declarations"
that contains all enum constants and their
values. The relevant line reads days MON 2
- Performance measures
- full answer,
- partial answer,
- no answer
10Case Study CppETS (3)
- A successful benchmark development process should
have three attributes - The effort must be led by a small number of
champions - Championed by Susan Elliott Sim
- Design decisions for the benchmark need to be
supported by laboratory work - Before creating, discussed with some researchers
while developing, used extensive test. - The benchmark must be developed by consensus
- Provided two opportunities to participate
11Case Study CppETS (4)
- CppETS meets the requirements
- Accessibility downloaded and used by any
interested person - Portability. the test programs were portable to a
variety of platforms and compilers. - And Affordability, Scalability, Clarity,
Relevance, Solvability - Impact of CppETS
- Has improved both the technical results and the
cohesiveness of a community
12Discussion
- What is the development process and what is the
effectiveness of benchmarks which are designed
for business or marketing purpose? - What is relationship among paradigm, benchmark,
scientific discovery and social consensus? - Should we have paradigms before we have
benchmarks? - Do you think that benchmark development process
must be led by a small number of champions,
considering that the benchmark must be developed
under consensus? Why? - Is there a danger if the development of benchmark
begins too early in a research community? If too
early, What may happens?