How does your software grow? Evolution and architectural change in open source software - PowerPoint PPT Presentation

About This Presentation
Title:

How does your software grow? Evolution and architectural change in open source software

Description:

'Fix it, don't try to understand it.' Just-in-time program comprehension [Lethbridge] ... However, it's pioneering work on software evolution and software engineering. ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 50
Provided by: plgUwa
Category:

less

Transcript and Presenter's Notes

Title: How does your software grow? Evolution and architectural change in open source software


1
How does your software grow?Evolution and
architectural change in open source software
  • Michael Godfrey
  • Software Architecture Group (SWAG)
  • University of Waterloo

2
What is software evolution?
  • Evolution is what happens
  • while youre busy making other plans.
  • We distinguish between maintenance and evolution
  • Maintenance is the planned set of tasks to
    effect changes.
  • Evolution is what actually happens to the
    software.
  • All I want to know is
  • How and why does software evolve?

3
Why should we care?
  • Much of the commercial software world operates in
    perpetual crisis mode.
  • Fix it, dont try to understand it.
  • Just-in-time program comprehension Lethbridge
  • but large software systems are major assets
    of many businesses
  • Getting it right more important than getting it
    done fast.
  • Budget and time for preventive maintenance, navel
    gazing.
  • Relatively little research on trying to
    understand how and why programs evolve.

4
Lehmans Laws of Software Evolution
  • Based on measurement of a few (commercially-develo
    ped) systems, most notably IBMs OS 360
  • Originally three laws, now there are eight.
  • Controversial as laws
  • Has been criticized for strong claims based on
    limited data.
  • However, its pioneering work on software
    evolution and software engineering.

5
Lehmans Laws of Software Evolution
  1. Continuing change An E-type program that is
    used must be continually adapted else it becomes
    progressively less satisfactory.
  2. Increasing complexity As a program is evolved,
    its complexity increases unless work is done to
    maintain or reduce it.
  3. Self regulation The program evolution process
    is self-regulating with close to normal
    distribution of measures of product and process
    attributes.
  4. Invariant work rate The average effective
    global activity rate on an evolving system is
    invariant over the product lifetime.

6
Lehmans Laws of Software Evolution
  1. Conservation of familiarity During the active
    life of an evolving program, the content of
    successive releases is statistically invariant.
  2. Continuing growth Functional content of a
    program must be continually increased to maintain
    user satisfaction over its lifetime.
  3. Declining quality E-type programs will be
    perceived as of declining quality unless
    rigorously maintained and adapted to a changing
    operation environment.
  4. Feedback system E-type programming processes
    constitute multi-loop, multi-level feedback
    systems and must be treated as such to be
    successfully modified or improved.

7
Lehmans Laws in a nutshell
  • Observations
  • (Most) useful software must evolve or die.
  • As a software system gets bigger, its resulting
    complexity tends to limit its ability to grow.
  • Development progress/effort is (more or less)
    constant
  • growth is at best constant.
  • Lehman/Turskis model y y E/y2
    (3Ex)1/3
  • where y of modules, x release number
  • Advice
  • Need to manage complexity.
  • Do periodic redesigns.
  • Treat software and its development process as a
    feedback system (and not as a passive theorem).

8
Lehmans examples
9
The S curve
size
time
10
A case study in evolutionThe Linux OS kernel
ICSM-00
11
A case study in evolutionThe Linux OS kernel
ICSM-00
  • Evolution in Open Source Software A Case Study
  • Godfrey and Tu, ICSM 2000
  • Its Linux!
  • Large system, very stable, many releases over
    several years, many developers
  • Growing mainstream adoption (e.g., IBM S390 port)
  • Commonly used within networked systems
  • Open source development model
  • Interesting phenomenon in itself
  • Easy to track, can publish results, many experts
  • Not much previous study

12
Evolution of Linux Questions
  • How has Linux evolved over time?
  • Does it obey Lehmans laws?
  • What is the best way to characterize growth?
  • How has its (open source) process model affected
    its development?
  • How has the (high-level) architecture
  • changed over time?
  • affected the systems evolution?

13
Open source development
  • Open source development vs. open source software
  • GNU, Linux, Apache, vim, gcc, FreeBSD
  • vs.
  • Mozilla, JDK, Jikes, NetBeans
  • The Cathedral and the Bazaar Raymond
  • Usual goal scratching an interesting itch, not
    filling a commercial void.
  • Anyone may contribute, tho owner(s) have final
    say.
  • Usually, developers work part-time and for free.
  • Motivation is peer recognition and personal
    satisfaction, not money.
  • However, industrial participation also increasing
  • (e.g., Cygnus, IBM)

14
Open source development
  • Largely immune from time-to-market pressures
  • Can release when its really ready
  • Can be hard to control/direct developers
  • Big egos, cant be fired
  • Whats cool vs. whats needed
  • Less sexy development tasks often suffer
  • e.g., planned testing, preventive maintenance
  • Code quality varies widely
  • Some projects have coding standards
  • Unstable/experimental code common (and even
    encouraged)
  • Quality maintained via massively parallel
    debugging, not rigorous testing.

15
Linux background
  • Linux kernel v1.0 released March 1994
  • 487 source files, 165 KLOC, i386 only
  • Linux kernel v2.3.39 released January 2000
  • 4854 source files, 2.2 MLOC, 10 hardware
    architectures supported, over 300 developers
    credited
  • Maintained along two parallel paths
  • development and stable

16
Methodology
  • Examined 96 versions of Linux kernel
  • 34 of the 67 stable releases
  • 62 of the 369 development releases
  • All measures considered only .c/.h files
    contained in tarball
  • Counted LOC using wc l and an awk script that
    ignored comments and blank lines
  • Counted of fcns/vars/macros using ctags
  • Architectural model (SSs hierarchy) based on
    default directory structure
  • We plotted growth against calendar time
  • Lehman suggests plotting growth against release
    number

17
Software architecture of Linux
IWPC-00
18
Growth of of source files
19
Growth of of global fcns, variables, and macros
20
Growth of compressed tar file
21
Growth of Lines of Code (LOC)
22
Average/median .c file size
23
Average/median .h file size
24
Growth of major SSs (dev. releases)
25
SS LOC as percentage of total system
26
SS LOC as percentage of total system (ignoring
drivers)
27
Growth of arch SSs
28
Growth of drivers SSs
29
Observations and hypotheses
  • Growth along devel. path is super-linear!
  • y .21x2 252x 90,055 r2.997
  • y size in LOC
  • x days since v1.0
  • r2 is coefficient of determination using least
    squares
  • Lehman/Turskis model y y E/y2
    (3Ex)1/3
  • where y of modules, x release number
  • Linuxs strong growth is continuing.
  • This is stronger growth at MLOC level than
    observed by others (Lehman, Gall), even for other
    OSs.

30
Growth of fetchmail Raymond
31
Growth of pine
32
Growth of X Windows
X11R6
X11R6.3
X11R6.4
X11R6.1
X11R5
X11R3
X10R4
X11R2
X10R3
X11R1
33
Growth of gcc/g/egcs
34
Growth of vim (text editor)
35
vim avg comments and blank lines per file
36
vim avg/median file size
37
vims architecture
38
Some open questions
  • Philosophical
  • Does software evolve in the same way as frogs and
    social structures?
  • The Selfish Gene, by Richard Dawkins
  • The Nature of Economies, by Jane Jacobs
  • What are the recurring patterns and compelling
    metaphors of software evolution?
  • Methodological
  • How to measure size?
  • How to correlate size and quality?
  • How to measure change?
  • How to model architectural change?
  • What is the predictive power of such models?
  • Do the other phenomena dominate?

39
Some open questions
  • Practical
  • What information do developers need to know about
    how a software system has evolved?
  • What kinds of tools would be useful
  • to the front-line developer?
  • to the manager?
  • How best to deal with
  • Large data sets (large_system ? many_versions)
  • Visualization and navigation

40
Change patterns and evolutionary narratives
  • Band-aid evolution (just add a layer)
  • quick way to add new functionality, esp. if
    system is not well understood
  • e.g., Y2K fixing, adding portability, new
    features
  • Vestigial features
  • design artifact persists after rationale dies
  • e.g., whale fin bone structure resembles hand
  • Adaptive radiation Lehman
  • when conditions permit, encourage wild variation
    for a while.
  • later, evaluate and let best ideas live on.
  • e.g., Linux kernel evolution
  • Convergent evolution
  • compare similar systems to reference arch. (or to
    each other)
  • e.g., everyone grows an XML generator in response
    to market pressure

41
Change patterns and evolutionary narratives
  • Cathedral style Raymond
  • careful control and management
  • debugging done before committing code
  • evolution is slow, planned, rarely undone
  • Bazaar style (OSD)
  • lots of low-level changes, frequent fixes
  • lots of building around rather than wholesale
    changing, occasional redesigns
  • creeping feature-itis, complete dependency
    graph

42
Change patterns and evolutionary narratives
  • Radical redesigns (localized and global)
  • aka refactoring
  • little new functionality added, but structure
    changes significantly, legacy cruft dissipates
  • likely goodness (design metrics) improves
  • Migration patterns
  • look out for known translation idioms, especially
    if migration is not one big bang
  • e.g., procedural-to-OO idioms

43
Change patterns and evolutionary narratives
  • OO evolutionary patterns
  • one recognizable design pattern transformed into
    another (or a variation of the original)
  • requires good OO extraction tools (dynamic
    binding, polymorphism, reflection, etc.)
  • Reuse patterns
  • components are (re)used in different systems
  • e.g., build COTS interface, throw out homebrew DB

44
Change patterns and evolutionary narratives
  • Phenomena observed in Linux evolution
  • Careful control of core code more flexibility on
    contributed drivers, experimental features
  • Linus has many lieutenants
  • Aunt Tillie effect
  • Simplicity and scrutability of code, development
    processes, approval process, etc.
  • Mostly parallel enables sustained growth
  • Hard interfaces make good neighbours.
  • Loadable modules makes feature development easier
  • Clone and hack makes sense!

45
Change patterns and evolutionary narratives
  • Phenomena observed in Linux evolution
  • Amazing social phenomenon of OSD
  • You can try this at home
  • and they did!
  • Anti-MS sentiments,
  • We can build it ourselves!
  • Enlightened self-interest for many large computer
    industry companies
  • If we cant own the standard, no one should.
  • Bandwagon effect (both OS developers and
    industry)
  • Support for Linux as deployed OS by IBM, Dell,
    Sun,
  • Lots of contributed production-quality third
    party code from industry (IBM S/390, drivers)

46
An observed evolutionary phenomenon
  • Code cloning!
  • Usually regarded as a bad sign
  • Usual solution
  • Abstract commonality into a single place, remove
    duplication
  • In an OO setting, can use inheritance
  • But as observed in Linux, it seems less
    problematic than one might think!

47
Case study Cloning in Linux SCSI drivers
  • Nice, controlled experiment
  • Large body of code, multiple versions, well used
    system, open source
  • SCSI drivers all do similar tasks
  • Source comments shows cloning has occurred!
  • Approx. 500 releases of Linux since 1994.
  • Kernel v2.3.39 (released Jan 2000)
  • 5000 source files, 2.2 MLOC, 10 hardware
    architectures
  • drivers/scsi has 212 source files, 166 KLOC,

48
Goals of case study
  • Examine real world cloning
  • How common is it?
  • Why is it done?
  • What do the cloning patterns look like?
  • Examine parallel evolution
  • What kinds of changes are common?
  • Do developers (need to) change clone relatives
    too?
  • Is there a better design structure lurking?
  • Compare against existing clone detection tools
  • Are detections tools looking for the right
    indications of cloning?

49
SCSI Subsystem - Size (rel. 2.2.16)
  • Number of source files 211
  • Number of functions 2512
  • Number of lines of code 254,953
  • of comments 38
  • Number of low-level drivers 80
  • File size
  • on average 3000 lines
  • large multi-card drivers 15,000 lines

50
SCSI Subsystem - Architecture
  • Upper Layer
  • Uniform way of handling devices
  • Hard Disk, CD-ROM Disk, Tape, Generic
  • Middle Layer
  • bridge between Upper Layer and Low-Level
    Devices
  • Low-Level Device Drivers
  • low-level driver functionality and management

51
Clones Expected?
  • Why did we expect to find clones?
  • Every driver must implement uniform interface
  • Design of subsystem does not support other forms
    of reuse
  • Driver logic is relatively simple (!)
  • Devices from same family ? more cloning
  • Completely different hardware ? less or no
    cloning
  • Open source ? anyone can reuse code
  • Easier and more efficient to reuse existing code
  • Reused code already tested, so probably better
    quality than if we build it from scratch

52
Clones - Manual Inspection
  • From source code comments, we have found

esp.ch
jazz_esp.ch
cyberstorm.ch
dec_esp.ch
cyberstormII.ch
mca_53c9x.ch
blz2060.ch
fastlane.ch
qlogicisp.ch
fdomain.ch
sd.ch
t128.ch
qlogicpti.ch
fd_mcs.ch
sr.ch
pas16.ch
53
Types of Changes Detected
  • Names of variables
  • Initialization parameters and constants
  • Driver specific initialization logic
    removed/added
  • Small change in supporting functions
  • Small changes in driver management code
  • Comments are updated
  • Code changed is highly embedded into other code,
    which makes extraction of that code hard

54
Conclusion (Cloning)
  • Unclear that current clone detection tools do
    the right thing
  • Combination of different approaches should give
    the best detection results
  • Theory developed on clone management, detection,
    and removal is not universally applicable to all
    types of applications, languages, and designs
  • Need more qualitative analysis of cloning in the
    real world
  • As practised, code cloning in the Linux SCSI
    subsystem seems like a reasonable approach!

55
The past, present, and future of open source
software
  • Past
  • An outgrowth of the Unix sysadmin tradition
  • Bug fixes and evolution!
  • Present
  • Trendy ... but large companies see more
  • Apache is used by 63 of web servers Aug
    02 Netcraft survey
  • BIND used by vast majority of DNS servers
  • Sendmail is most widely used email transport
  • ...

56
The past, present, and future of open source
software
  • Future
  • Corporate prisoner's dilemma
  • Enforced open-ness allows companies to breathe
    easier, can concentrate on core strengths and
    real innovations
  • Governments are beginning to require open source
    and open standards when available
  • The German government and IBM recently announced
    a "far-reaching co-operation agreement"
    NYTimes
  • Concern over Microsoft, .NET, and monopolistic
    practices
  • Mono, an open source implementation of the .NET
    framework, being developed.

57
Open source infrastructure
  • Pros
  • Reliability, trust, confidence by users
  • Many users/readers aids in debugging
  • Can usually fork projects for specialized needs
  • Interoperability when component vendors conform
    to standards
  • Cons
  • Some open source licences can cause headaches
  • e.g., the GPL virus
  • Not in everyone's strategic best interests
  • Non-compliance is a recurring problem

58
Summary Evolution and open source software
  • Open source software seems to break some of the
    rules of how successful software is built and
    evolved
  • Motivation for developers is fun, pride,
    professionalism, politics, rather than money
  • It violates some of Lehmans laws
  • yet this software is often of high quality and
    in wide use.
  • esp. common infrastructure-type systems
  • e.g., Linux/MacOS kernel, apache, ldap, samba,
    imap,
Write a Comment
User Comments (0)
About PowerShow.com