Title: Why and How should we experiment in computer science
1Why and How should we experiment in computer
science?
- Gu Mingyang
- Dept. Computer Science
- NTNU 11.10.2002
2The Three Topics
- Should Computer Scientists Experiment More? Why
do we experiment? - Experimental Models for Validating Technology
Validation models - Case Studies for Method and Tool Evaluation How
to use Case-study?
3Why Do We Experiment _ summary
- Computer scientists and practitioners defend
their lack of experimentation with a wide range
of arguments. - In this article, we will discuss several such
arguments to illustrate the importance of
experimentation for computer science.
4Why Do We Experiment _Is CS an engineering or not
- CS is not a science, but a synthetic, an
engineering discipline (computer) - In an engineering field, testing theories by
experiments would be misplaced - The primary subjects of CS are not merely
computers, but information structures and
information processes - Like nervous systems, immune systems, genetic
processes and so on
5Why Do We Experiment _the purpose of experiment
- Test theory
- Experiment can prove that a theory has bugs
- Example the failure probabilities of a
multi-version program - A community gradually accepts a theory if
- All known facts within its domain can be deduced
from the theory - It has withstood numerous experimental tests
- It correctly predicts new phenomena
6Why Do We Experiment _the purpose of experiment
- Exploration
- Probe the influence of assumptions
- Eliminate alternative explanations of phenomena
- Unearth new phenomena
7Why Do We Experiment _the arguments against
experiment
- Traditional scientific method is not applicable
- There are plenty of CS theories that have not
been tested functional programming, o-o
programming, formal software development
processes and so on - As other scientific fields, CS should test and
explore them iteratively to validate them, to
formulate new theories
8Why Do We Experiment _the arguments against
experiment
- The current level of experimentation is good
enough - Some surveys hold by the author and other people
proved it is not like so - 40 or 50 percents of all the papers with claims
needing empirical support had none at all, but
the rates in other fields are much smaller than
in CS. - The data suggests that CS publish a lot of
untested ideas, we should try to improve it.
9Why Do We Experiment _the arguments against
experiment
- Experiment cost too much
- It is true that experiment clearly requires more
resources than theory does, but it is worth
spending so much resources. - To answer question is the aim of science the
test of general relativity which is not a big
waste. - We will waste much more resources if we accept a
wrong theory without validation through
experiment C-gtC, OO - The software industry is beginning to value
experiments, because results may give a company a
three to five years lead over the competition
10Why Do We Experiment _the arguments against
experiment
- Demonstrations will suffice
- Demonstration critically depends on the
observers imagination and their willingness to
extrapolate, and they can not produce solid
evidence. - To obtain such solid evidence, we need careful
analysis involving experiments, data, and
replication. Such as, SE methods evaluations,
testing the algorithms behavior, testing of
relative merits of parallel systems and so on
11Why Do We Experiment _the arguments against
experiment
- There is too much noise in the way
- An effective way to simplify repeated experiments
is by benchmarking. - A benchmark provides a fair playing field for
competing ideas, and allows repeatable and
objective comparisons - We can borrow the methods which are used in
medicine and psychology to deal with the human
subjects in experiment - Such as control groups, random assignments,
placebos and so on
12Why Do We Experiment _the arguments against
experiment
- Progress will slow
- If everything must be experimentally supported,
publication progress will slow - In fact, the papers with meaningful validation
can be accepted easier than questionable ideas,
so not slowing but accelerating - On the other hand, papers with good concepts or
formulating new hypotheses can publish first, and
experimental testing come later
13Why Do We Experiment _the arguments against
experiment
- Technology changes too fast
- The changing in CS is so fast that by the time
results are confirmed they may no longer be of
any relevance. - Behind many questions with a short lifetime lurks
a fundamental problem with a long lifetime, for
example, behind so many SE methods, there is a
fundamental question that what are the features
of Software Development. - Scientists should anticipate changes in
assumptions and proactively employ experiments to
explore the consequences of such changes
14Why Do We Experiment _the arguments against
experiment
- You will never get it published
- Papers about experiment are difficult to
published? - If fact, the authors experience told us that
publishing experimental results is not difficult
if one choose the right outlet. - Not just building systems, the experimenters
should try to find some thing new and contribute
to current knowledge from the concepts and
phenomena underlying such experiments
15Why Do We Experiment _the arguments against
experiment
- Why substitutes wont work
- Traditional paper type
- The work describes a new idea, prototyped perhaps
in a small system this type fits for a
radically new idea - The work claims its place in science by making
features comparison - Scientists should create models, formulate
hypotheses, and test them using experiments
16Why Do We Experiment _the arguments against
experiment
- Trust your intuition
- For example, meetings were essential for software
reviews - Trust the experts
- It is a good system to check results carefully
and to accept them until they have been
independently confirmed - Problems do exist
- It is true that problems exist in the field of SC
experiment, but we should not discard it because
of that
17Why Do We Experiment _the arguments against
experiment
- Competing theories
- A prerequisite for competition among theories is
falsifiability. The lack of observation and
experiment may cause the SC to have difficulties
to discovery new and interesting phenomena worthy
of better theories. - Unbiased results
- A list of merits can lead managers or funding
agencies to make a decision no matter whether it
is right or not - It is very dangerous
18Validation models_summary
- To determine whether a particular technique is
effective, we need refined experimentation to
measure it. - This article provides us some validation models,
and tells us how to choose a validation model and
evaluate such models
19Validation models_Introduction of experiment
- Classification
- Scientific, Engineering, Empirical, Analytical
- Some aspects in data collection
- Replication, local control
- Some other aspect about SE
- Influence of the experiment design(active or
passive), temporal properties(data historical or
current)
20Validation models_Validation Models
- In the article, author list 12 validation models
classified to 3 categories - Observational little control
- Collecting relevant data as a project develops
- Historical no control
- Collecting data from projects that have already
been completed - Controlled most control
- Providing many instances for statistical
validation
21Validation models_Observational Category
- Project monitoring
- Feature Lowest level, passive model
- Shortcoming difficulty in retrieving information
later - Case study
- Feature data collection is derived from a
specific goal, active model, little additional
cost - Shortcoming project is relatively unique, the
goal of process improvement against that of
competing management
22 Validation models_Observational Category
- Assertion
- Feature the developer and the experimenter of a
technology are same - Shortcoming the experiment is not a real test,
but a select - Field study
- Feature Examining data collected from several
projects simultaneously, less intrusive - Fitness measurement of process, products
23Validation models_Historical methods
- Literature search
- Feature least invasive and most passive,
collecting data from publications - Shortcoming selection bias(publishing positive
results), lack of quantitative data - Legacy data
- Feature data from source program, specification,
design, testing documentation and data collected
in the programs development stages
quantitative data - Shortcoming lack of data about cost, schedule
and so on, can not compare between projects
24Validation models_Historical methods
- Lessons Learned
- Feature collecting data from lessons learned
documents, fit for improve future developments - Shortcoming lack of concrete data, and such
lessons is just for writing - Static analysis
- Feature collection data from completed product,
analyzing the structure to determine its
characteristics - Shortcoming the models quantitative definition
is hard to relate to the attribute of interest
25Validation models_ Controlled methods
- Replicated experiment
- Feature several projects are staffed to perform
a task in multiple ways - Shortcoming a lot of cost, subjects are not
serious - Synthetic environment experiments
- Feature performing in smaller artificial setting
- Shortcoming it is not fitful to transfer a
result generated from a smaller artificial
setting to a real environment
26Validation models_ Controlled methods
- Dynamic analysis
- Feature add tools and debugging or testing code
to demonstrate the products features, can
compare between different products - Shortcoming perturbing the product behavior, can
not fit for different data set - Simulation
- Feature using a model of real environment to
evaluate a technology - Shortcoming not know how well the synthetic
environment models reality
27Validation models_Choose model
- When we design an experiment, we have to select
data collecting type(s) that conform(s) to one or
several of our data collection models.
28Validation models_Model validation
- Too many papers have no experimental validation
at all(36, 29, 19) - Too many papers use an informal(assertion) form
of validation(one third) - Researchers use lessons learned and case studies
about 19 percent, much smaller in other
field(19) - Experimentation terminology is sloppy(no
standards)
Use of validation methods in 612 published papers
29How to use Case study_summary
- Case studies can be used to evaluate the benefits
of methods and tools, but unlike formal
experiment, Case studies do not have a well
understood theoretical basis. - This article provides guidelines for organizing
and analyzing case studies.
30How to use Case study_empirical investigation
methods
- Classification
- How to design a experimentation
- Focus on single project case study
- Involve many projects or a single type of
project formal experiment or case study - Looks at many teams and many projects formal
experiment or survey (planned or not)
31How to use Case study_empirical investigation
methods
- How to choose a method (involved factors)
- Case studies are easier to plan, but hard to
interpret and generalize - Case studies fit for process improvement of
particular organizations - The special conditions fit for case studies
- The process changes are very wide ranging
- The effects of the change can not be identified
immediately
32How to use Case study_empirical investigation
methods
- The results of a well-designed experiment can be
applied to many types of project. - Formal experiments are useful if performing self
standing tasks - Self-standing tasks can be isolated from the
overall product-development process - The results of self-standing tasks can be judged
immediately - The results of self-standing tasks can be
isolated and the small differences from variables
can be identified
33How to use Case study_empirical investigation
methods
- Survey advantages
- Surveys can be used to ensure that process
changes are successful - Data collection will take a lot of time, and the
results may not be available until many projects
complete - The most common form is based on questionnaires
34How to use Case study_Case study guidelines
- Seven guidelines
- Define the hypothesis
- Select the pilot projects
- Identify the method of comparison
- Minimize the effect of confounding factors
- Plan the case study
- Monitor the case study against the plan
- Analyze and report the results
35How to use Case study_Case study guidelines
- Define the hypothesis
- Define the effects you expect the method to have,
such as quality, reliability - The definition must be detailed enough
- It is easy to disprove something, so we often
define the null hypothesis - The more clearly you define your hypotheses, the
more likely you are to collect the right measures
36How to use Case study_Case study guidelines
- Select the pilot projects
- The pilot project should be representative
- Using the significant characteristics to identify
the project, such as application domain,
programming language, design method and so on
37How to use Case study_Case study guidelines
- Identify the method of comparison
- Select a sister project with which to compare
- Compare the results of using the new method
against a company baseline - If the method applies to individual components,
apply it at random to some product components and
not to others
38How to use Case study_Case study guidelines
- Minimize the effect of confounding factors
- Separate Learning and assessment
- Not using staff who are either very enthusiastic
or very skeptical to the objective method - Be careful to compare different application types
39How to use Case study_Case study guidelines
- Plan the case study
- The plan identifies all the issues to be
addressed, such as the training requirements, the
necessary measures, the data-collection
procedures and so on - The evaluation should have a budget, schedule,
staffing plan separate from the actual project
40How to use Case study_Case study guidelines
- Monitor the case study against the plan
- The case studys progress and results should be
compared with the plan to ensure the methods or
tools are used correctly, and that any factors
that would bias the results are recorded.
41How to use Case study_Case study guidelines
- Analyze and report the results
- The analysis procedures you follow depend on the
number of data items and data characteristics you
must analyze. For example, if the treatments
assigned at random, you can use standard
statistical methods If you have only one value
from each method or tool being evaluated, no
analysis techniques are available.
42How to use Case study_analysis methods for case
studies
- Not just compare the data to show the goal of
case studies, but the other data about the
environment, such as some data about the
developing groups experience, the
representativeness of the case in the research
domain and so on
43The end of the literature
44Questions about why to experiment
- How to choose researching methods?
- Reading-Thinking-Idea-Discussion-Paper
- Reading-Thinking-Idea-Experiment-Paper
- ExperimentReading-Thinking-Idea-Paper
- As a Pd.D student in SE, how to experiment?
- Time, paper, experiment, deepness or wideness of
research
45Question about validation models
- How should we choose experiment model?
- Data collection, goal, cost limitation, type of
projects, the number of projects involved and so
on - As a Ph.D student, which type of experiment
should we select? - Time, paper, thesis