Title: Improving Dependency Structure of Large Software Projects
1Improving Dependency Structure of Large Software
Projects
- Brown Bag Seminar
- Murat Gungor
- Friday, October 15, 2004
2Goals
- Monitor progress in large software projects.
- Provide tools for continuous extraction of
structural quality from source code. - Provide means to improve software systems
dependency structure.
3Introduction
- Software is an expensive product - it involves
intensive labor. - Software projects typically consist of many
parts. - Interdependency between parts of a project is
desirable. Needed for one component to use
another. However excessive dependency reduces - Testability
- Maintainability
- Reusability
- Understandability
- Observing current state of a project is
critically important, since early detection of
quality defects will avoid delays, difficulties
and costs associated with development evolution
later in project lifecycle.
4Problem Definition
- Dependencies between software files are essential
so that one component may provide services to
another. - However, dependencies complicate process of
making changes, perhaps to fix latent errors or
performance problems, because of effects a change
may have on other files. - When files each bind to many other files and
mutual dependencies exist between them,
maintenance and testing may become quite
difficult to carry out effectively. - It is not uncommon for a change in one file to
precipitate a cascade of changes in other files,
especially in the presence of mutual dependencies
5Motivation
- Provide managers of large software projects
immediate views of current state of their
projects products. - We study existing projects to try to understand
ways to do that. - Our current work has shown that static dependency
structure is an important element of that
analysis.
6Problem Large Fan-out
After topological sort
Top. Sorted Files
Structure chart - large Fan-out
- Depending on scores of other files
- (large fan-out) may indicate a lack of cohesion
the file is taking responsibilities for too many,
perhaps only loosely related, tasks and needs the
services of many other files to manage that.
Level Dependency
7Problem Large Strong Components strong
component is a set of mutual dependencies
After topologically sorting, strong components
are expanded
Top. Sorted Files
Files 2, 3, 4, and 5 cannot be ordered. The order
given is the best we can achieve.
Dependency chart
- Ideal testing process
- test those files with no dependencies, then test
all files depending only on files already tested. - For testing, a strong component must be treated
as a unit. The larger a strong component becomes,
the more difficult it is to adequately test. - Change management becomes tougher, due to
con-sequential changes to fix latent errors or
performance problems
Level Dependency
8Problem Large Fan-in
After topological sort
Top. Sorted Files
Structure chart - large Fan-in
- High fan-in coupled with low quality creates a
high probability for consequential change. By
consequential change we mean a change induced in
a depending file due to a change in the depended
upon file
Level Dependency
9Good Dependency Structure
After topologically sorted strong components
expanded
Top. Sorted Files
Dependency chart
- Each component (file) depends only on its close
neighbors. All files haveLow fan-in and
fan-out. There is no call back to upper level
components, or deep call forward.
Level Dependency
10This is Mozilla, Version 1.4.1, Windows
BuildPlot shows some very large mutual
dependencies
- This view is generated by our tools
- DepAnal
- DAView
- It shows all files that depend on one specific
file in largest strong component (Fan-In).
Green lines show Fan-Out of one file in a large
strong component. Note dependencies both inside
and outside component.
Size of bubble proportional to number of files in
strong component.
11Is Complex Dependency Really a Problem?
- Mozilla was targeted for Apple OSX.10 (Panther)
but Apple switched to KHTML - Apple snub stings Mozilla
- Bourdon said Safari engineers looked at size,
speed and compatibility in choosing KHTML. In
addition to Mozilla, Apple also considered
building its own browser from scratch. - "Translated through a de-weaselizer, (Melton's
e-mail) says 'Even though some of us used to
work on Mozilla, we have to admit that the
Mozilla code is a gigantic, bloated mess, not to
mention slow, and with an internal API so
flamboyantly baroque that frankly we can't even
comprehend where to begin,'" Zawinski wrote.
12Visibility
- The dependencies shown on the previous slide are,
without our tools, invisible. - Developers know only a small part of the
dependency structure based on their own reading
of the code. The rest they find by observing
breakage when they change something. - Note that Mozilla, 1.4.1 is composed of 6701
files! Impossible to understand that dependency
structure without effective tools.
13Project Monitoring
- Monitoring software quality in a development
project is an important task required of project
management, especially for large-scale projects. - Constant feedback is an essential part of project
management. - Watching progress manually is not an effective
way in terms of time and correctness of results. - Up to date project documentation is not available
always, but source code is. - Obtaining information from source code provides
instant feedback. How do you do that for 6701
files?
14Static Source Code Analysis
- Provides instant snapshot of projects state
- Helps to diagnose the state of health of software
project effectively. - Provides (almost) accurate result
- Provides constant progress monitoring
- Helps to determine effect of potential decisions
- Helps to improve control because we change based
on measurements, not guesses
15Focus is dependencies among files
- Many engineering organizations use source code
files as the unit for analysis, management,
testing. - Because we seek to provide support we
- Investigate dependency structure between files.
- Identify causes of dependency.
- Research possible ways to improve dependency
structure of existing software. - Automate static source code analysis to extract
dependency structure and other software metrics. - This isnt as easy as it sounds for large file
sets
16Importance of this Study
- Softwares quality depends on quality of its
parts. - Future enhancements depend on existing system.
- Maintainability depends on quality of current
foundation. - Reuse is directly affected by dependencies
- To reuse in a different context implies that we
can extract the reused from its context. - That cant be done when dependencies are out of
control.
17Scope of the Study
- We are not analyzing syntactic correctness of
code. - We are not analyzing logical correctness of code.
- Its applicability includes C-based procedural and
object oriented languages C, C, C, Java. - Our tools only support C and C
- Much of remaining work deals with repackaging
content of existing code files to enhance
dependency quality. - Research on repackaging techniques with
heuristics, optimization. - Intent is to modify structure, not introduce new
code. - Creating applications to automate obtaining
information from source files.
18Progress till now
- Developed DepAnal, which is C/C static source
code dependency analyzer tool. - Developed DAView, which visualizes dependencies
among files and components in graphical
representations. - Preparing paper for submission to
- ISCA 20th INTERNATIONAL CONFERENCE ON COMPUTERS
AND THEIR APPLICATIONS (CATA-2005) March
16-18, 2005, New Orleans, Louisiana,
USAhttp//isca-hq.org/confr.htm - Full paper Submission Deadline
- November 5, 2004
19Dependency Model
- Focus is dependencies between files.
- Files are unit of testing and configuration
management - Based on types, global functions and variables.
- Dependency Model - file A depends on file B if
- A creates and/or uses an instance of a type
declared or defined in B - A is derived from a type declared or defined in B
- A is using the value of a global variable
declared and/or defined in B - A defines a non-constant global variable modified
by B - A uses a global function declared or defined in B
- A declares a type or global function defined in B
- A defines a type or global function declared in B
- A uses a template parameter declared in B
- Outputs are presented as direct dependencies.
(does not show transitive closures for ease of
interpretation too dense)
20Architectural view of DepAnal
- The goal is to build a tool that can be used to
constantly monitor evolution of the state of
large software systems
- Makes two passes over each file in the project.
Finds dependencies based on static type analysis
- DepAnal collects data from source code with the
help of a C/C tokenizer and semi-expression
composer.
21Mozilla Project Version 1.4.1
- The Mozilla project is a very large project
developing browser tools for many different
platforms. - Win 32 Configuration
- Number of executables 94
- Number of dynamic link libraries 111
- Number of static libraries 303
- Number of source files for Win32, v 1.4.1
6701 - Analysis took approximately 24 hours on Dell
Dimension 8300 with 1 G Memory
Wow!
22Dependency Analysis Results
- Show different views of dependency data for
project and draw conclusions about what such data
can disclose concerning a projects
implementation. - The analysis results are presented for several
data sets, in six views - Fan-in the number of files that depend on a
file, for each file in the analysis set, and
related fan-in density histogram. - Fan-out the number of files that a file depends
on, for each file in the analysis set and related
fan-out density histogram. - Strong Components groups of files that are all
mutually dependent and its related strong
component density histogram. - Topological sort of the strong components.
- Expansion of all strong components within the
sorted data. - Cyclomatic complexity versus file size.
- We examine each of these views and interpret
their data with respect to measures of project
implementation strengths and weaknesses they
reveal.
23Fan-in Data Mozilla GKGFX library
- Number of source files 655.
- Dependencies from within the library.
- When we analyze the entire build many of these
fan-in numbers will increase.
High Fan-in coupled with low quality creates a
high probability for consequential change.
24Fan-in Density Mozilla GKGFX library
- Plot shows that significant number of library
source code files have high fan-in,
characteristic of a widely used library.
A library with this profile should be given high
priority for analysis by the test team and
quality analysts.
25Fan-out Data Mozilla GKGFX library
- A file with large fan-out may be symptomatic of a
weak abstraction.
Fan-Out of 60!
We expect that a well-designed source file should
carry out its assigned tasks with the aid of a
few trusted delegates and perhaps a few
references to commonly used utilities.
26Fan-out Density Mozilla GKGFX library
- Large Fan-Out may be symptomatic of weak
abstraction. Weve show elsewhere that High
Fan-Out is correlated with large number of
changes.
There are a significant number of files with
large fan-out.
27Expanded Topological Sort GKGFX Library
- If the file belongs to a strong component and any
other file in that component is changed, rigorous
testing dictates that it be retested. This makes
a compelling argument in favor of continuous
regression testing using test harnesses.
Approximately half the files in this library
cannot be put into a classic testing sequence.
This indicates a high probability of repeatedly
testing a given file.
Components below the diagonal are due to cycles
in dependency graph, e.g. mutual dependencies.
28Dependency Data for the Entire Windows-Based
Mozilla Build
- The plot below is a topological sorting of the
dependency graph and then expanding strong
components of the entire Mozilla build for
windows.
This plot is so dense that it is becoming
difficult to draw conclusions, but the plot is
consistent with previous figure for the GKGFX
library.
29This is Mozilla, Version 1.4.1, Windows
BuildPlot for GKGFX Library shows some very
large mutual dependencies
- DAView shows that the GKGFX Library does indeed
have significant structural problems, as
predicted by the preceding views. - Note that these problems, made visible by our
tools, are normally invisible!
30Towards Improvement
- If we can identify (low-level) causes of
dependencies, we can reorganize file contents to
improve inter-file dependency structure. - Intent is not to introduce new code, but to
redistribute existing code between files to get
better structure.
31Future Research
- Future researchs primary goal is to
- Provide means to improve software systems
dependency structure.
End of presentation Thanks for listening
32Backup Slides
- The following slides provide a little more detail
in a few areas.
33Files - Unit for analysis
- In most development organizations, files are
unit of testing and configuration management. - Dependencies between software files are essential
so that one component may provide services to
another. - If a file is using services of other files, it
cannot be tested alone. - The larger the number of dependency between
files, the harder it is to test,
manage, understand, reuseThe situation gets
worse if there are mutual dependencies. - Therefore, it is better to reduce dependencies
between files, especially mutual dependencies.
34File Dependency Structural Problems
- Large Fan-out
- Large Fan-in
- Large Strong Components
- Many and inconsistent levels in structure chart
35Fine grain level dependency
- One file depends on another file, if it uses the
other files services - Types
- Global Functions
- Global Variables
- To solve the file dependency problems we need to
find more than file to file dependency. We check
type-to-type, type-to-global function or
variable, global function-to- type, global
function-to-global function or variable. - If we obtain this information, we have fine-grain
level dependencies. Now we can relocate some
existing code to reduce dependency density among
files.
36Method Used for Improving Dependency
- One simple solution is to put content of all
files into one file, but this is not what we
would like to uugggh, spaghetti code! - Our target is to simplify dependency structure by
moving types, global functions and variables
among existing files, and/or introducing new
files, keeping file complexity essentially
constant.
37Comparing with previous dependency structure
- In order to see the improvement in dependency
structure, we will be comparing original
dependency structure with the enhanced one by
comparing - Dependency Structure
- Fan-In, Fan-Out, Strong Component Size
- Level analysis
- And other graphic that are presented previous
slides