Title: Software Merging
1Software Merging
- An Overview
- Dr. Tom Mens
- Programming Technology Lab
- Vrije Universiteit Brussel
- Course OOSE.RC
- EMOOSE 1999-2000
2Problem Statement
- Collaborative Software Development
- many different software developers working
simultaneously on the same software - parallel changes are made to the same code
- need to combine these changes
- Software merging
- automated tool support for combining parallel
changes - detect inconsistencies or unexpected interactions
between parallel changes - provide support for resolving these
inconsistencies
3Context
- Merge tools are usually part of a configuration
management or version management system - Definition
- Software configuration management (SCM) is the
discipline of managing and controlling change in
the evolution of software systems - IEEE Standard 1042, 1987
- Examples
- Revision Control System Tichy 1985
- Concurrent Version System Berliner 1990
- Perforce www.perforce.com
- ClearCase Leblang 1994
- Adele Estublieral 1994
4Version Terminology
5SCM
- Traditionally, SCM was seen purely as a
management discipline - Bershoffal80
- Nowadays, it is also treated as a software
development support discipline - provides automated help to reduce complexity of
making changes to large-scale software systems - SCM is necessary in all phases of software
life-cycle
6SCM Concepts
- Configuration item
- a self-contained software artefact whose
evolution needs to be tracked and controlled - some items can be composite, consisting of other
items - Version
- identifies the state of a configuration item at a
well-defined point in time - each state has a unique version number
- variants are versions that are intended to
coexist - e.g., Mac/Windows/Unix variant of software
application - e.g., Light/Standard/Professional edition of
software application - a promotion is a version available to other
developers - promotions are stored in a workspace (or dynamic
library) - a release is a version available to clients or
users - releases are stored in a repository (or static
library)
7SCM Concepts ctd.
- Configuration
- a version of a composite configuration item,
containing a consistent set of other
configuration item versions - Change request
- formal request for modifying a configuration item
- Baseline
- Formally reviewed and agreed on configuration
item that can only be changed through a change
request - Branch
- concurrent development path requiring independent
SCM - different branches can be reconciled by merging
their versions
See BrueggeDutoit2000.
8Exercise 1
- SCM systems such as RCS and CVS use file names
and their paths to identify configuration items. - Explain why this feature prevents the
configuration management of composite
configuration items, even in the presence of
labels.
9SCM Activities
- Configuration item identification
- each item has unique version number
- Status accounting
- record status of individual components, work
products and change requests - Build management
- enable automatic rebuilding of system when new
versions of components are created - minimise amount of recompilation
- Process management
- implement change policy
- e.g. only syntactically correct code can be part
of a version /builds should be made every week
/relevant developers should be notified about
new versions that have been created - Promotion management
10SCM Activities ctd.
- Release management
- creation of releases is decided at management
level, based on marketing and quality control
advice - creation of releases includes
- updating user manual (documentation)
- ensuring there are no inconsistencies
- validate completeness and quality
- Change management
- ensure consistency with project goals during
changes - different steps
- request a change
- assess request against project goals
- may include cost analysis and impact analysis
- accept or reject request
- plan accepted change, prioritise, and assign to
developer - audit implemented change (quality control)
11SCM Activities ctd.
- Branch management
- merging is needed to coordinate overlapping or
interacting parallel changes - detect and resolve conflicts between overlapping
changes - heuristics to minimise merge conflicts
- anticipate where overlapping changes can occur
- merge frequently to identify overlaps early
- communicate likely conflicts to relevant
developers - minimalise changes in main branch, and do
important changes in separate development
branches - minimise number of branches
- Variant management
- variants are needed when
- software operates on different platforms
(different OS or hardware) - sofware is delivered in variants with different
levels of functionality - variants can be dealth with by
- Different teams for each variant --gt reduced
complexity / increased redundancy - Single project with variant-specific code
12Roles in SCM
- Configuration manager
- identifies configuration items
- defines procedures for creating promotions and
releases - Change control board member
- approves or rejects change requests
- assesses the changes and plans accepted changes
- Developer
- implements change requests
- creates promotions
- resolve merge conflicts
- Auditor
- ensure quality of changes
- select and evaluate promotions for a release
- ensure consistency and completeness of a release
13Storing subsequent versions
- Alternatives for storing subsequent versions of a
software artefact - storing all versions integrally
- using deltas, i.e., store differences only
- forward deltas record original version and apply
deltas to produce newer versions - e.g. SCCS Rochkind 1975
- backward deltas record latest version entirely
and apply deltas to produce older versions - e.g. RCS Tichy 1985
14Exercise 2
- Most version control systems use backward deltas
rather than forward deltas to store subsequent
revisions of the same version. - Explain why this is the case.
15Kinds of Merging
- 2-way vs 3-way merging
- reuse versus evolution
- textual, syntactic or semantic merging
- state-based vs change-based
16a) 2-way vs 3-way merging
17b) reuse vs evolution
- merging is necessary
- when an object-oriented framework is being
customised by a framework user, while it is also
evolved by the framework developer - Cf. reuse contracts
- When two parallel changes to the same software
artifact need to be combined
18c) textual, syntactic, semantic
- textual merging
- Considers sofware artefacts as pure text files
(or, alternatively, binary files) - syntactic merging
- Use more structured information of software
artefacts (e.g. trees or graphs) - semantic merging
- Use behavioural information about software
artefacts
19text-based merging
- Different levels of granularity
- Line-based merging takes lines as primitive
building blocks - E.g. Unix diff
- Using single characters as building blocks is too
inefficient for primitive use - More efficient (two-way) approaches for merging
binary files - E.g. bdiff Tichy84 and vdelta
20Exercise 3
- CVS uses a simple line-based merge rule to
identify merge conflicts there is a conflict if
the same line was changed in both revisions. If
no such line exists, no conflict is generated and
the merge is performed automatically. - a) Explain why this approach fails to detect
certain types of conflicts. Provide an
illustrative example of both a syntactic and a
semantic conflict that goes undetected. - b) Vice versa, try to find an example where the
approach generates a conflict while there isnt
one.
21Exercise 3 Solution a.1
function F(a,b)
function F(a,b) x F(1,2)
add function call
add third argument
Syntactic conflict! Function called with wrong
number of arguments.
function F(a,b,c)
function F(a,b,c) x F(1,2)
22Exercise 3 Solution a.2
circumference(r) 2?r area(r) ?rr
circumference(r) 2area(r)/r area(r)
?rr
Semantic conflict! Unexpected infinite recursion
after merge.
circumference(r) 2?r area(r)
circumference(r)r/2
circumference(r) 2area(r)/r area(r)
circumference(r)r/2
23syntactic merging
- Based on parse trees
- essentially models is-part-of relation between
software entities - Examples
- Westfechtel1991
- domain-independent approach
- Asklund1994
- Cdiff Grass 1992
- for parse trees of C programs
24syntactic merging
- Based on graphs
- More flexible than trees
- also models relations like invokes, calls,
uses, accesses, defines, ... - Examples
- Rhoal1998
- Reuse contracts
- Steyaertal96 essentially method calls
- Mens2000 domain-independent formalism
25semantic merging
- Finding all possible semantic conflicts is an
undecidable problem in general - Conservative approaches provide a safe
approximation - No false negatives all semantic conflicts are
detected - E.g. Horwitzal89, Binkleyal95
- Lightweight approaches only consider part of the
semantics - Can give rise to false positives and false
negatives - Possible approaches
- using predicates pre/postconds, invariants,
obligations, exceptions - Hoare69, Perry87
- using algebraic specifications
- Larch Guttagal5
- ...
26d) state-based vs change-based
- state-based merging
- only uses information in original version and its
revisions - change-based merging
- explicitly documents the changes that have been
made to the versions - extensional change-based versioning annotates the
changes inside the version - e.g. Asklund 1994
- intensional change-based versioning describes the
changes separately from the versions, in terms of
the operations or transformations that have been
used. - E.g. EPOS Gullaal 1991
27Exercise 4
- Explain why intensional change-based merging is
more general or more expressive than state-based
merging. - Also give an example of a conflict that can be
detected with change-based merging, but not with
state-based merging.
28Exercise 4 Solution
- 1. changes can be separated from the versions to
which they are applied. - a) In this way, the same changes can be applied
more than once, for example to parallel versions
of the software under development. - b) It also becomes very straightforward to
implement a multiple undo/redo mechanism. For
undo, perform the last applied operations in the
opposite direction. For redo, simply reapply the
operations.
29Exercise 4 Solution ctd.
- 2. improves conflict detection and conflict
resolution - efficiently detect more conflicts (conflict table)
30Two Definitions of Merging
- Two parallel modifications M1 and M2 of the same
software artifact can be merged if - They can be serialised in any order (M1M2 and
M2M1), and both serialisations lead to the same
result - The can be serialised in at least one order
(M1M2 or M2M1).
M1
M2
31Exercise 5
- a) Give an example of a situation that can be
merged by making use of definition 2, but not by
means of definition 1. - b) Give an example where the merge according to
definition 2 leads to a counter-intuitive result.
32Exercise 5 Solution (a)
- Can be merged according to def. 2. First apply
AddEdge(e,a,b), then perform Rename(b,c). - Cannot be merged according to def. 1. If we first
apply Rename(b,c), we cannot apply AddEdge(e,a,b)
anymore.
33Exercise 5 Solution (b)
34Other Merge Issues
- Domain-independence
- Scalability
- Degree of formality
- Level of granularity
- Resolving conflicts
- Minimising conflicts
351) domain-independence
- Most approaches are restricted to a particular
programming language - Cdiff restricted to C
- Rational Rose Visual Differencing restricted to
UML - Domain-independent approaches
- Westfechtel 1991, using parse trees
- Mens 1999, using graphs
362) Scalability
- Text-based merge tools are not scalable
- changes to multiple lines simultaneously lead to
conflicts for each line involved - For operation-based merging
- Define composite transformations in terms of more
primitive ones - Gives higher-level view of the evolution
- Ignore some basic conflicts when they appear as
part of a composite transformation
373) Degree of formality
- Ad-hoc
- E.g. Line-based merge tools
- Lightweight approach
- Using conflict tables
- Feather 1989, Steyaertal96
- Using graph rewriting
- Mens 1999 confluency pushout property,
parallel sequential independence - Completely formal techniques
- Berzins 1994
- Denotational semantics and Browerian algebras
- Horwitzal89, Binkleyal95
- program dependence graphs and program slicing
384) Level of granularity
- text-based merge tools
- line-based
- block-based
- character-based
395) Resolving Conflicts
- Use default conflict resolution strategies
- Cf. Asklund1994
406) Minimising Conflicts
- Small changes can have large impact
- A simple change can give rise to conflicts
throughout the entire code - Exercise 6 Try to find a number of different
ways in which one might consider to reduce the
number of detected conflicts to a managable
number.
41Exercise 6 Solution
- Using information hiding techniques to localise
effect of changes - Ignore temporary inconsistencies that are part of
a large evolution step - Use fine-grained revision control, where changes
are as small as possible - Keep parallel developers aware of each others
changes - Only perform local merges
- Intraprocedural merging JacksonLadd94
42Useful Algorithms
- Redundancy removal
- Reduces number of detected conflicts
- reduces spaces, increases speed, increases
understandability - Normalisation
- Canonical form
- Reconstruction
- Reconstruct transformation given base version and
revised version only
43Classify existing approaches
Considered approach ... ... Reuse contracts Mens 2000
2-way /3-way 3-way
text / syntactic / semantic Syntactic uses typed graphs Light semantics
state / change change-based uses RC operations
Domain (in) dependence independent of considered domain
44Assignment
- Classify a number of approaches according to the
given criteria and answer the following questions - Is the approach 2-way or 3-way?
- Is the approach textual, syntactic or semantic?
- Be as precise as possible Is it line-based?
Which kind of syntactic or semantic software
artefacts does it address? Which kind of
semantics? (Conservative/light) - Is the approach state-based or change-based?
- If change-based, is it extensional or
intentional? - If intentional, which operations or
transformations are available? Is it scalable to
composite transformations? - Is the approach domain-independent or
domain-specific? - If domain-specific, can the technique be
generalised to more domain-independent artefacts?
Wy (not)? - Does the approach have a formal foundation?
- Which? What are the benefits of this?
- How are conflicts detected?
- Are there any typical or special features of the
approach?
45Example reuse contract approach
- 3-way
- syntactic approach specialisation interfaces in
Steyaertal96, collaboration diagrams in
Lucas97, graphs in Mens99 - Light semantics ...
- Change-based merging
- Primitive transformations are Extension,
Refinement, Cancellation, Coarsening - Composite transformations can be defined
- Steyaertal96 and Lucas97 are domain-specific
- Make use of a conflict table
- class inheritance hierarchies and collaborating
classes, respectively - Mens99 presents domain-independent formalism
based on graph rewriting - Gives a formal characterisation of merge
conflicts - Special featureOriginally designed for reuse
versus evolution conflicts
46Assignment ctd.
- Discuss 3 approaches from the following list
- Feather 1989
- Unix diff diff3 utility, Emacs emerge tool,
bdiff, vdelta, Suns filemerge tool Adams et al.
1986 - Westfechtel 1991, Asklund 1994, Cdiff Grass
1992 - Rational Rose Visual Differencing tool
- SCCS Rochkind 1975, DSEE Leblang et al. 1984,
RCS Tichy 1985 - Commercial configuration management tools
ClearCase Leblang et al. 1988, Leblang 1994,
Adele Estublier et al. 1994 - Horwitz et al. 1989, Binkley et al. 1995,
Semantic Diff Jackson et al. 1994, Berzins
1994 - Lie et al. 1989, Lippe et al. 1992
47References
- Adams et al. 1986 E. Adams, W. Gramlich, S.
Muchnick, S. Tirfing. SunPro Engineering a
practical program development environment. Proc.
Int. Workshop on Advanced Programming
Environments. LNCS 244 86-96, Springer-Verlag,
1986 - Asklund 1994 U. Asklund. Identifying conflicts
during structural merge. Proc. Nordic Workshop on
Programming Environment Research 94, pp.
231-242, Lund University, 1994 - Berliner 1990 B. Berliner. CVS II
parallelizing software development. Proc. USENIX
Conf., pp. 22-26, 1990 - Bersoff et al. 1980 E. H. Bersoff, V. D.
Henderson, S. G. Siegel. Software configuration
management an investment in product integrity.
Prentice Hall, 1980. - Berzins 1994 V. Berzins. Software merge
semantics of combining changes to programs. ACM
Transactions on Programming Languages and
Systems, 16(6) 1875-1903, ACM Press, 1994 - Binkley et al. 1995 D. Binkley, S. Horwitz, T.
Reps. Program integration for languages with
procedure calls. ACM Transactions on Software
Engineering and Methodology, 4(1) 3-35, ACM
Press, 1995 - Estublier et al. 1994 J. Estublier, R.
Casallas. The Adele configuration manager. In
Configuration management trends in software.
John Wiley Sons, 1994 - Feather 1989 M. Feather. Detecting interference
when merging specification evolutions. ???, pp.
169-176, ACM Press, 1989 - Grass 1992 J. E. Grass. Cdiff A syntax
directed Diff for C programs. Proc. USENIX C
Conf., pp. 181-193, 1992
48References ctd.
- Gulla et al. 1991 B. Gulla, E.-A. Karlsson, D.
Yeh. Change-oriented version descriptions in
EPOS. Software Engineering Journal 6(6) 378-386,
1991. - Horwitz et al. 1989 S. Horwitz, J. Prins, T.
Reps. Integrating non-interfering versions of
programs. ACM Transaction on Programming
Languages and Systems, 11(3) 345-387, ACM Press,
1989 - Jackson et al. 1994 D. Jackson, D. A. Ladd.
Semantic Diff A tool for summarizing the effects
of modifications. Int. Conf. On Software
Maintenance. IEEE Press, 1994 - Leblang et al. 1984 D. Leblang, R. Chase.
Computer-aided software engineering in a
distributed workstation environment.
SIGPLAN/SIGSOFT Software Engineering Symposium on
Practical Software Development Environments. ACM
SIGPLAN Notices pp. 104-112, ACM Press, 1984 - Leblang et al. 1988 D. Leblang, R. Chase, H.
Spilke. Increasing productivity with a parallel
configuration manager. Proc. Int. Workshop on
Software Version and Configuration Control, pp.
21-38, Teubner-Verlag, 1988 - Leblang 1994 D. Leblang. The CM challenge
configuration management that works. In
Configuration management trends in software.
John Wiley Sons, 1994 - Lie et al. 1989 A. Lie, R. Conradi, T.
Didriksen, E.-A. Karlsson. Change-oriented
versioning in a software engineering database.
Proc. 2nd Int. Workshop on Software Configuration
Management, ACM SIGSOFT Software Engineering
Notes, 17 56-65, ACM Press, October 1989
49References ctd.
- Lippe et al. 1992 E. Lippe, N. van Oosterom.
Operation-based merging. Proc. 5th ACM SIGSOFT
Symposium on Software Development Environments.
ACM SIGSOFT Software Engineering Notes, 17(5)
78-87, ACM Press, 1992 - Mens 1999 T. Mens. A Formal Foundation for
Object-Oriented Software Evolution. PhD
Dissertation, Vrije Universiteit Brussel,
Belgium, September 1999 - Mens 2000 T. Mens. Conditional graph rewriting
as a domain-independent formalism for software
evolution, Proc. Int. Agtive 99 Conference,
LNCS, Springer-Verlag, 2000 - Rho et al. 1998 J. Rho, C. Wu. An efficient
version model of software diagrams. Proc. 5th
Asia-Pacific Conf. Software Engineering, pp.
236-243, 1998 - Rochkind 1975 M. Rochkind. The source code
control system. IEEE Transactions on Software
Engineering, 1(4) 364-370, IEEE Press, December
1975 - Tichy 1985 W. Tichy. RCS a system for version
control. Software Practice and Experience, 15(7)
637-654, 1985 - Westfechtel 1991 B. Westfechtel.
Structure-oriented merging of revisions of
software documents. Proc. 3rd Int. Workshop on
Software Configuration Management, pp. 68-79, ACM
Press, 1991