Title: Modularity and the Evolution of Software Evolvability
1Modularity and the Evolution of Software
Evolvability
- Dissertation Talk
- Terry Van Belle
- August 11, 2004
2Biological Modularity, example
- Halder et al (1995)
- Mis-expression of Eyeless cDNA caused extra eyes
to form on wings, legs, and antennae of
Drosophila - Eyes were structurally complete
- Cornea, bristles, photoreceptors, electrically
responsive to light - Master control for eye formation
3Biological Modularity
- More complex organisms have modular genotypes
(developmental biology) - Module is
- A complex of genes
- Single purpose
- Limited pleiotropic and epistatic influence on
other modules - Open question How does biological modularity
evolve? - Modularity improves evolvability
- Wagner/Altenberg (1995)
- Allows for independently evolving traits
- Equal fitness, different evolvabilities
4Software Evolvability
- Software evolves?
- Software adapts to environmental changes
- Environment User Requirements
- Beyond version 1.0
- Software Evolvability
- Ability to change software in response to changes
in requirements - Short-term success vs. Long-term success
- Software Archaeology
- Examination of software change histories for
evolvability clues - Evolutionary Metrics
- Orthogonal to traditional Static vs. Dynamic
dichotomy
5Dissertation Talk Structure
- Software modularity allows for the evolution of
independently changing features - Three Approaches
- Evolution of Code Factoring (Analysis)
- The Effectiveness of Interfaces (Analysis)
- Optimizing Code Modularity (Synthesis)
- Contributions
6Evolution of Code Factoring
- Literal evolution of evolvability to improve
modularity - Factoring code minimizes number of necessary
changes - Can Genetic Programming in a changing environment
discover this fact? - Supply the genomes with an Automatically Defined
Function - Symbolic regression on y Asin(Ax)
- A varies every five generations
- A factored representation evolved
- Unfortunately, EC with dynamic fitness function
doesnt scale well - y Asin(Ax) Bsin(Bx)
- Van Belle and Ackley (GECCO 2002)
7The Effectiveness of Interfaces
- Interfaces limit the spread of changes
- Percolation network model of software change
- Interfaces improve evolvability, but
- They split work into small/frequent and
rare/large changes (Highly Optimized Tolerance) - Software Archaeology
- Used data from public-domain Java software
projects - CVS change history to find out what types of
language elements changed
8Optimizing Package Structure
- Can we generate a package structure better than
the existing one? - Elements are Java files from open-source projects
- Jikes RVM, Jakarta Tomcat, Net Beans
- Changed if added, deleted, or touched
- Hourly granularity
- Partition files into packages
- Compare results with current modularity, as
expressed by unique directory names
9Clustering, Change Correlations
- Correlation of changes between files
- We want to group highly correlated files together
- Use a 2x2 contingency table
- r
- Set a correlation threshold parameter
AD-BC
F2
!F2
A
B
F1
v (AB)(CD)(AC)(BD)
C
D
!F1
10Clustering Algorithm
F2
0.5
-0.3
F1
F3
0.1
1.0
0.7
0.4
0.7
F4
0.2
0.9
-0.2
-0.1
F5
F6
0.0
11Clustering Algorithm
F2
F1
F3
F4
F5
F6
12Modularity Metrics
- Why do we use modules?
- Aggregation
- Segregation
- Module design lies in the tension between these
forces - Two metrics to capture these forces
- Breadth average number of modules touched
- Weight average total touched module size
13Modularity Metrics, continued
- Breadth is trivially minimized by putting all
files in one module - Weight is trivially minimized by giving every
file its own module - Ideally we want to minimize both
Coarse-grained
Weight
Ideal
Fine-grained
Breadth
14ModPartition Algorithm
- Variant of the Kernighan-Lin Algorithm
- A greedy algorithm, but able to move through
fitness valleys - Allows clusters to move across modules
- Adapted to generate module structure
- Use fitness instead of edge crossings
- Fitness ? breadth weight
- Pre-set maximum number of modules
15FastModPartition Algorithm
- ModPartition is too slow for real code
- Want an adaptive number of modules
- A quicker, recursive version of ModPartition
- First, divide into modules 0 and 1
- Divide module 0 into 0 and 2
- Divide module 1 into 1 and 3, and so on
- Stop after predetermined limit, or when modules
dont split anymore - Two orders of magnitude faster than ModPartition
16Modularity Scores, Jikes
17Modularity Scores, Jakarta Tomcat
18Modularity Scores, Net Beans
19Jikes Evolution
Package declarations
Time (changes)
examples
jdp
on-stack replacement
Files (alphabetical by directory)
JMTk
20Jikes Change Correlations
examples
on-stack replacement
arch
JMTk
21Sample Module, Jikes RVM
- Module 4
- rvm/src/vm/arch/intel/runtime/VM_DynamicLinkerHelp
er.java - rvm/src/vm/arch/powerPC/runtime/VM_DynamicLinkerHe
lper.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_BasicB
lockEnumeration.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_IREnum
eration.java - rvm/src/vm/compilers/optimizing/ir/util/OPT_Instru
ctionEnumeration.java - Note the repeated names
- Intel and PowerPC architecture-specific files are
grouped together - Group by function, not implementation
22Existing Structure
vm
arch
powerPC
intel
. . .
runtime
runtime
. . .
VM_DLH.java
VM_DLH.java
. . .
. . .
23Refactored Structure
vm
runtime
. . .
VM_IntelDLH.java
VM_PowerPCDLH.java
. . .
24Contributions
- Grounding software engineering in evolutionary
history - Made explicit the link between evolvability and
code factoring, using EC - Formed a link between HOT and Software
engineering - Developed automated techniques that improved
software modularity - Devised new metrics for measuring evolvability
- Techniques discovered a package design principle
25Extra Material
26Increasing Evolvability over Time
27Sample Solution
ADF0
RPB
-
-
exp
sin
/
adf0
exp
cos
sin
sin
sin
x
adf0
0.938
0.645
0.645
0.610
28The Benefits of Encapsulation
AreaAverager
height, width
radius
side
Circle
Square
Rectangle
29The Benefits of Encapsulation
rarely changes
AreaAverager
Shape
area()
Circle
Square
Rectangle
30Language Elements Jikes RVM
31Highly Optimized Tolerance
- Doyle and Carlson 1999
- Engineering a system produces a heavy-tailed
distribution of failures - Conservation of Fragility
- Encapsulation Engineering the system
- Failure Change
- Programming by interfaces induces a Conservation
of Change
32Evolvability Metrics
- Likelihood
- Probability that an element is part of a change
- Impact
- Expected change size, given element has changed
- Work
- Likelihood Impact
- Acuteness
- Impact / Likelihood
- Acute Interfaces vs Chronic Implementations
33Calculating Breadth
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
2
1
2
4
1
1
3
2
Breadth 2
34Calculating Weight
Unchanged
x
Changed
Module Changed
Time
x
x
x
x
x
x
x
x
x
x
Files
x
x
x
x
x
x
x
x
x
x
x
x
x
x
5
4
3
10
4
1
6
6
Weight 4.875