Title: The Software Matrix An Architecture for Software Salvage
1The Software MatrixAn Architecture for Software
Salvage
- Riddhiman Ghosh
- Advisor Dr. James Fawcett
Master of Science Thesis Dept. of Electrical
Engineering and Computer Science, Syracuse
University December 15, 2004
2Software Salvage
- Salvage lifting of a significant block of
existing systems and inserting them into a newly
developed system.
3Overview
- Our work centers on significantly managing the
problem of software salvage. - We propose an architecture for system
construction to simplify salvage and use message
passing and mediator structures. - Message passing has been used before to
communicate between processes and machines. - Our contribution is to show that Mediated Queued
Message Passing at the module level is an
effective way to support software salvage - Adding the mediator structure makes assembling a
new application with salvaged parts much easier - Our work shows we gain a lot of flexibility at an
acceptable performance cost.
4Introduction
- Approximately 100 billion lines of source code at
work in the world today. - Large fractions of code in systems are
functionally equivalent. - However systems are often constructed without
leveraging existing code bases less than 15 of
new code serves an original purpose. - Reinvention of the wheel in the software
industry.
5Introduction
- Software construction is needlessly error-prone
and expensive. - Were maintaining multiple copies of essentially
the same software. - Time and cost of developing, testing and
documenting a piece of software is multiplied by
the number of equivalent copies in existence.
6Software Reuse
- Study of reuse recycling of software assets
has been an important branch in the software
engineering discipline. - Effective reuse promises
- Reduced development and maintenance costs
- Gains in development schedule and quicker
time-to-market - Increased robustness and quality
- Not a new idea.
7Software Reuse
- Systematic Reuse an institutionalized
organizational approach to product development in
which software assets are intentionally created
or acquired to be reusable - However, few organizations practice systematic
reuse, in spite of a recognition of its potential
benefits. - Reuse is hard!
8Gap between theory and practice in Reuse
Not all the theoretical oriented and
sophisticated solutions presented by
researchers have ...thrilled the
practitioners... Zand, M. ...the reuse
community has worked on complex technologies and
methods with high ceremony, yet most of the
software community seems to be looking for
simpler solutions... High ceremony methods
require an organization with high process
maturity to achieve success. Griss, M.
9Software Salvage
- What we seen in the industry is not necessarily
software reuse, but rather software salvage - Salvage lifting of a significant block of
existing systems and inserting them into a newly
developed system. - Radar Systems Department, General Electric
Company (Syracuse, NY) routinely attempted
salvage in the building of a new radar.
10Reuse vs. Salvage
- We make a distinction between the terms software
reuse and software salvage. - Reuse connotes immutability
- the individual pieces meant for reuse cannot be
modified theyre designed and implemented to be
adaptable but with no intent to change even a
single character of source code. - Salvage makes no guarantee of immutability
- very often source code of individual pieces being
salvaged is modified.
11Problem
- Effective salvage is difficult to accomplish
- The large pieces we want often have many
dependencies on parts we dont want - Requires expensive changes
- Think of salvage as analogous to transplanting
the heart from one living organism to another!
Similar problems, when we pull out a part from a
software system due to the connectedness inherent
in typical software.
12Problem
- It is difficult to salvage existing parts of
systems and build new systems from them, with
ease.
13Goal
- Enable leveraging of major parts of systems
- Eliminate or reduce changes required by salvage
so that salvage moves closer to the reuse model
(immutability), without requiring high-process
organization. - Simplify software salvage to make it a useful
paradigm.
14Prior Approaches
- Two major (and fairly recent) approaches towards
encouraging reuse have been - object-oriented reuse
- component-oriented reuse.
15Object-Oriented Reuse
- Reuse has been one of the classic motivations of
object-orientation - Object oriented theory
- mandated discipline in writing code through
best-practice guidelines such as separating
interface from implementation - Encouraged reuse through inheritance,
composition, parameterization
16Object-Oriented Reuse
- However OO technologies didnt quite engender the
reuse revolution that was hoped for. - Only OO libraries widely used are user-interface
frameworks (such as MFC) and libraries for data
structures. - OO techniques have made compiler libraries an
effective means of reuse but business
organizations have had a harder time getting
leverage from their pre-existing software assets
only through the creation and consumption of OO
libraries.
17Object-Oriented Reuse
- The definition of an object is purely technical
- Defined as an encapsulation of state and behavior
- No direct mapping to the physical unit that is
actually deployed, versioned and potentially
reused. - Objects dont fit the salvage model well
- In terms of granularity, they are of an
inappropriate size to be mixed and matched. - They are the size of grains of sand, while what
salvagers are typically looking for in building
systems, are bricks.
18Component-Oriented Systems
- The logical next step was the notion of
components - a coherent package of software implementation
that can be independently deployed and composed
with other components. - Defining feature system can be made better by
only updating components - of coarser granularity than objects
- independent and deployable implies executable or
loadable code
19Component-Oriented Systems
- There are several component technology offerings
- COM, CORBA, JavaBeans/EJB, .NET Components
- Adopting a component-oriented approach (rather
than only concentrating on programming language
as in OO theory) is beneficial. - But the existence of these component technologies
has not solved the problems that plague software
salvage, discussed earlier - Packaging techniques alone are not enough
20Our Approach
- We are of the view that components are useful
only if there is a framework that actively
supports and promotes their reuse. - .NET and J2EE have framework support, however it
is focused on generic industry problems (network
communication, web publishing, application
security, etc.) - We seek to support vertical, perhaps proprietary,
applications, where building a huge framework is
impractical from a return on investment point of
view.
21Our Approach
- Make salvage easier by
- viewing applications as compositions of different
pieces - having a framework of collaborating pieces from
which applications can be composed dynamically
- Try to achieve benefits of the Software-IC
model a plug-and-play approach to software
construction.
22Our Approach
- We are limiting the hard problem of general reuse
to a smaller domainof salvage within an
organization, to be used in the construction of
modest-sized systems (about 200,000 lines of
code). - To address this domain would be to provide
solutions of value to the small and medium-sized
software shops that have been resistant to adopt
systematic reuse.
23Software Matrix
- The Software Matrix is a framework that actively
supports and promotes the salvage of components. - We focus on the reuse of major blocks of code
rather than low-level functionality - By employing message-passing and mediator
structures and by supporting the discovery of
needed types, weve built a pluggable
architecture that can gracefully adapt to salvage
operations.
24Software Matrix
- In particular, the Matrix is a runtime
infrastructure that acts as a substrate into
which individual pieces of an application
different blocks of code can plug.
25Software Matrix
From these plugged-in individual pieces the
Matrix dynamically composes applications.
26Software Matrix
- This infrastructure was named the Software Matrix
in order to connote a very structured pattern of
building software. - System elements are embedded in this substrate
a matrix /collection through which applications
are dynamically composed. - The image of endless banks of incubators of
humans in a matrix, as from The Matrix motion
picture, isnt entirely unintentional, if
somewhat flippant.
27Cells
- The individual blocks of code that plug-in to the
Matrix are called Cells and are the building
blocks out of which applications are built.
28Cells
- Cells represent the unit of composition and reuse
in our system.
29Cells
- An application is built through the collaboration
of Cells - Cells communicate with each other strictly
through messages (we use XML encoded messages)
30Cells
- One of the problems of salvaging
- extracting parts of monolithic (or very
tightly-coupled) applications. - We are insisting on loose coupling
- Enforce a separation of concern between Cells by
inserting a message bus between them
31Message Passing
- Advantages of Message Passing Interface (MPI)
over Procedure Call Interface (PCI) - MPI is a universal interface. All entities using
MPI use something equivalent to GetMessage() and
PostMessage(msg). So any Cell providing MPI is
guaranteed to be plug-compatible with another
Cell accepting MPI - Messages are self describing command and state
carriers. If a Cell does not understand all
elements sent to it, its free to ignore what it
does not understand. PCI has no such support. - Messages can be easily intercepted and inspected.
Therefore MP systems are easier to debug - Message Passing makes it easier to use the
Mediator pattern, since the Mediator does not
need to know a lot of interfaces, just the small
MPI.
32Mediator Pattern
- The use of the Mediator pattern is significant in
the Software Matrix - About the Mediator pattern Gamma et al. say
- Thus, proliferating interconnections between
different partitions of a system reduces the
potential for salvage. The Matrix acts as a
mediator the different Cells only know about
the mediator.
Mediator promotes loose coupling by keeping
objects from referring to each other explicitly,
and it lets you vary their interaction
independently. lots of interconnections make it
less likely that an object can work without the
support of others.
33A closer look at Cells
- Every Cell contains
- a message queue
- holds request and response messages during
collaboration with other cells.
34A closer look at Cells
- Every Cell contains
- a capability list
- used to advertise the capabilities of a cell to
other cells, via the Matrix. (E.g.
SU.Math.Convolution) - The capability list is used by the Matrix in
order to discover the right cells for system
construction.
35A closer look at Cells
- Every Cell contains
- a globally unique identifier (GUID)
- Cells can be uniquely identified using a GUID.
This is used by the Matrix for several operations
such as discovery and registration.
36A closer look at Cells
- Every Cell contains
- functionality
- Cells also contain the functionality that allows
them to be considered as software assets with
potential for reuse.
37A closer look at Cells
- All cells subscribe to a common protocol (ICell)
that specifies how to - register and un-register with the Matrix
- advertise capabilities
- send and accept messages
- collaborate with other cells (could be
synchronous, asynchronous or one-way)
MyCell
38A closer look at Cells
- Every cell also has an entry-point (start), and
is given a chance to execute once it is
plugged-in. - This entry-point (empty/non-empty) decides
whether a cell will be only a passive server,
or itself actively seek collaboration from other
cells.
39Example
- Sample application needs to read data samples
from an input file, perform a signal processing
operation on the data (e.g., filtering), plot the
results of the operations on the system display,
and log the results to a file. - The major pieces of the application would be
responsible for - file operations
- signal processing
- graphical plotting
40Example
- These would be written as cells and plugged-in to
the Matrix. - The Matrix would then assume the responsibility
of constructing the application from these
individual cells. - In order to perform its task a cell may need
services of another cell. But cells do not
explicitly bind to other cells they only
specify what message type they need handled, and
the Matrix discovers cells capable of handling
that message. - If no suitable cell found, a not supported
message is generated.
41Example
42Example
Sequence of events in the construction of the
sample application
43Example
Dynamic composition we are building a system
from pieces that exist on the Matrix at runtime.
Matrix takes care of compositional aspects (as
opposed to only computational aspects) of
software.
Matrix automatically connects the right pieces at
run-time without having to bind to anything
explicitly at compile-time.
Software is now amenable to salvage operations.
The very same Cells could be used to build other
applications.
44How do cells plug-in?
- Cells are implemented as plug-in modules, and
are realized using .NET components. - The Matrix uses the reflection and
late-binding (Fusion) features of the .NET
framework to discover and register plug-ins. - Given an .NET assembly, the Matrix will reflect
over the contained types to determine which of
them implement the ICell interface in order to
recognize valid plug-ins.
45How do cells plug-in?
- Throughout its lifetime, the Matrix monitors the
plug-in directory, in order to discover new cells
- Valid cells are registered and available for
system construction.
46Steps to take advantage of the Matrix
- In order to enable salvage, pieces of an
application (at the time of writing it, or
existing pieces) are wrapped in a Cell. - Create a wrapper that inherits from ICell (e.g.
we wish to create FileManager Cell responsible
for common file operations ) -
- FileManager ICell
-
- ...
47Steps to take advantage of the Matrix
- The Capability List of this FileManager Cell
is then populated to indicate the types of
messages it is capable of handling. -
CapabilityList.Add(SU.FileManager.Files.Read) C
apabilityList.Add(SU.FileManager.Files.Write) C
apabilityList.Add(SU.FileManager.Files.Search)
CapabilityList.Add(SU.FileManager.Files.Compressi
on)
48Steps to take advantage of the Matrix
- Override the process method to add appropriate
message processing - Basically you specify what is to be done in
response to a particular message type delegate
calls to your implementation.
49Steps to take advantage of the Matrix
- Compile, and copy the resulting binary into the
plug-in directory of the Software Matrix. The
Matrix will automatically detect and register the
cell, and it will be available for composition.
50Steps to take advantage of the Matrix
- If a cell wishes to use other cells (or is a
program executive for instance), it will probably
say - ...
- Result syncSend(
- "SU.FileManager.Files.Search",
- Params)
- ...
- Here it is trying to locate a cell that can
handle the named message type.
51Features
- Fine-grained message-passing
- We are using message passing at a much finer
level of granularity than is normally seen - The Matrix requires messages be the only mode of
collaboration between different parts of an
application - Decreases degree of coupling critical to the
success of salvage operations.
52Features
- Dynamic Composition
- The Matrix takes care of the compositional
aspects of software by automatically discovering
and connecting the right pieces and building an
application at runtime - By adding a few more cells into the Matrix if
needed, we can build new applications by reusing
existing cells.
53Features
- Support for System Evolution
- Appropriate cell to serve a particular message
type is selected at runtime - Evolving requirements can be accommodated easily
by modifying only those cells that represent the
affected part of the system. - Easy to field-replace cells only copy the new
cell over the old cell in the plug-in directory. - Effective way to support program maintainability,
bug fixes/upgrades.
54Features
- Simplicity
- The Matrix is a supporting infrastructure that is
lightweight (bare infrastructure is roughly only
2000 lines of code) - Simple to use. The infrastructure comes with
documented full source (if needed) and sample
applications. - The underlying technology used is the .NET
component model less complex as compared to
other models such as COM.
55System Design
High-level partitioning of the Software Matrix
56Class Relationships
57Assessment
- The Software Matrix was used in the construction
of real world tools - File Synchronizer
- NetView
- An assessment of our technique was made based on
qualitative and quantitative factors such as - ease of salvage
- cognitive distance
- lines of source code
- performance
- Built using the Matrix way and traditional way.
58File Synchronizer Application
- File Synchronizer is a useful tool that allows
for remote directory synchronization
59File Synchronizer Application
- What does it do?
- Allows contents of a directory of one machine to
match that of a specified directory on another
machine over the network/Internet. - Always have the latest versions of a specified
set of files -- Newer versions update older
versions - Files present on one side and not on the other
are copied - Older files will not overwrite newer files
without explicit user permission - File Synchronizer has a peer-to-peer architecture
- instances running on different machines are
exactly identical and can serve in both client
and server roles.
60Building File Synchronizer
- The Matrix is concerned with building blocks of
applications rather than low-level pieces of
functionality - For this application 3 major blocks were
identified - file upload and download block
- file synchronization policy block
- user-interface
- Each of these 3 blocks was built into an
independent Cell with appropriate functionality.
Synchronizer
UI
FileTransfer
61Building File Synchronizer
- Capability lists of the different Cells were
decided upon
SU.FileTransfer.GetLocalDirsFiles SU.FileTransfer.
RunServer SU.FileTransfer.Connect
SU.FileTransfer.GetRemoteDirsFiles SU.FileTransfe
r.UploadLocalFile SU.FileTransfer.DownloadRemoteF
ile
SU.Synchronizer.Policy.Compare
Synchronizer Cell
Null
UI Cell
FileTransfer Cell
62Building File Synchronizer
- The Cells were built by incorporating wizard
generated boiler-plate code for a generic Cell,
and Cellspecific implementation code.
63Building File Synchronizer
- The resulting binaries upon compilation were
dropped into the Matrix plug-in directory
64Building File Synchronizer
- The Matrix discovers and composes the right
pieces to form the File Synchronizer application.
65Evaluation Cognitive Distance
- Cognitive Distance
- amount of intellectual effort needed to
understand and adopt a new technique/methodology - Cognitive distance of the Software Matrix
approach is moderate - steps required are neither to numerous nor too
complicated for the average developer with a fair
understanding of the platform/language being
used.
66Evaluation SLOC
- Source Lines of Code (SLOC)
- The File Synchronizer application was built both
the traditional way, and the Matrix way. - In terms of SLOC we notice an increase in the
Matrix version of File Synchronizer - However most of the increase constitutes
boiler-plate code required by the
infrastructure, which the developer does not
write - Effective increase in SLOC is marginal not very
large (1729 vs. 1526)
67Evaluation SLOC
Traditional File Synchronizer
Matrix File Synchronizer
68Evaluation Performance
- In employing the Software Matrix supporting
infrastructures we anticipate a performance
impact on applications - The traditional and Matrix versions of the
File Synchronizer were compared to see how they
fared in terms of performance - No perceptible difference while running the 2
versions from a users perspective - To quantify the performance difference common
tasks in the Synchronizer were timed using OS
high-resolution timers.
69Timing remote file transfer
70Timing remote file transfer
71Evaluation Performance
- The timing values of the Matrix File Synchronizer
is only marginally higher in most cases - On an average 20 higher
- Cells with low processing requirements may suffer
higher overhead - Performance impact of using the Software Matrix
should usually not be prohibitive. - Tradeoff here between increased productivity and
ease of salvage, vs. slightly better performance
using traditional approach.
72Evaluation Ease of salvage
- We pull out approximately one-third of the File
Synchronizer application, without any surgery
and use it through a minimalist interface. - This interface will exercise the upload/download
functionality of File Synchronizer. - We simply
- copy the Matrix infrastructure to a suitable
location - drop the upload/download Cell into the plug-in
directory - write the Cell to send messages the extracted
part, and drop it into plug-in directory
73Evaluation Ease of salvage
With a mere 25 lines of developer code we were
able to extract a major chunk of an existing
application, in this case 30, with ease a
successful salvage operation.
74NetView Application
- NetView is a simple conferencing tool that allows
users to drive presentations, slide shows or
similar content from their computer, on other
remote machines over the Internet/network.
75NetView Application
- Built by salvaging from File Synchronizer.
76NetView Application
Matrix Admin Console
NetVIEW Application
Application composed by the Software Matrix
77Comparison with related work
- There exists a combination of middleware and
component technologies such as COM and CORBA that
have reusability as one of their goals. - COM, CORBA are considered highly complex,
over-specified and require heavy-weight
supporting infrastructures. - The Matrix was meant to address the specific
problem of salvage and is lightweight (approx.
2000 SLOC) and simple.
78Comparison with related work
- Most middleware technologies use the
procedure-call model of collaboration. Having
components bind to exact function signatures is
not flexible enough leads to the
tightly-coupled systems that make salvage
operations difficult. - The Matrix uses message-passing as the mode of
collaboration between components. This leads to
looser coupling of system elements resulting in
potentially easier salvage in the future.
79Comparison with related work
- Message Oriented Middleware (MOM) and to a
certain extent Web Services also use
message-passing (most web services use XML-RPC,
though they support a message-passing
abstraction) - The Matrix differs from them in the granularity
and scale of message-passing, since we are
focused on local compositions of cells
(components) to build applications, rather than
accessing remote functionality over the network.
80Future Work
- Capabilities of application building blocks are
identified by message types - Useful to have catalogs that list message type
handlers available in a particular
domain/organization - Would aid developers in salvaging from existing
software assets.
Message Catalogs
Versioning Support Schemes
Remote Cell Collaboration
Applications of the Matrix
Usability Studies
81Future Work
- Two Cells handling exactly the same message type
are considered to provide equivalent
functionality - Associating versioning information with message
types will add more flexibility for the graceful
evolution of systems. - Leverage inherent .NET assembly versioning
features
Message Catalogs
Versioning Support Schemes
Remote Cell Collaboration
Applications of the Matrix
Usability Studies
82Future Work
- Matrix aids in composition of systems from
locally available Cells - Could be extended system composition includes
Cells present on a remote machine
Message Catalogs
Versioning Support Schemes
Remote Cell Collaboration
Applications of the Matrix
Usability Studies
83Future Work
- Investigate other scenarios apart from salvage
where the Matrix infrastructure can be used - E.g. in regression testing of systems, since the
Matrix can act as an interceptor.
Message Catalogs
Versioning Support Schemes
Remote Cell Collaboration
Applications of the Matrix
Usability Studies
84Future Work
- Unfamiliarity with a new technique is a stumbling
block in the way of its adoption - Would be very instructive to conduct a usability
study amongst developers to obtain feedback on
how the Matrix infrastructure services could be
more intuitive/simple.
Message Catalogs
Versioning Support Schemes
Remote Cell Collaboration
Applications of the Matrix
Usability Studies
85Conclusion
- The Software Matrix is a runtime substrate with a
plug-in architecture that enables simpler salvage - Weve applied message-passing at a much finer
granularity than has been done before. Weve used
mediator structures to enforce loose coupling of
system elements - Weve shown that mediated queued message-passing
at the module level is an effective way to
support software salvage - Our work shows that we gain a lot of flexibility
at an acceptable performance cost - In the Software Matrix weve used a combination
of techniques to solve a problem that has not
been solved before we are significantly
managing the problems associated with software
salvage - Simplifying software salvage is a worthy goal and
will have a positive impact on productivity in
the software industry.
86End of Presentation
87Backup Slides
88System Design
- Serves as the executive module co-ordinates and
uses services of all helper modules - Provides administrative console interface for
status and errors
89System Design
- Identifies valid Cells through code reflection
- Loads valid Cells into the Matrix
90System Design
- Registers Cells with the Matrix so they are
reachable - Associates Cells with a GUID.
91System Design
- Represent the unit of composition and reuse
- Encapsulate functionality that makes them
candidates for potential reuse
92System Design
- Performs discovery to locates Cells of a
particular capability
93System Design
- Responsible for generating (sender side) and
interpreting (receiver side) the well-formed XML
messages being passed in the Matrix
94System Design
- Serializes data such as arguments in requests and
return values in responses so that they can be
persisted in messages - Also performs reverse process of de-serialization
95System Design
- Converts binary serialized data into the Base-64
encoding so that it can be easily encapsulated
within XML tags - Also performs decoding
96System Design
- Used to maintain blocking queues used for
messaging - Used for temporal decoupling of client and server
Cells
97System Design
- Collaboration between Cells can be synchronous,
asynchronous or one-way - Implements efficient synchronous collaboration
98System Design
- Relays messages (request/response) between
message queues of collaborating Cells.
99Class Relationships
- Exec
- Executive of the Software Matrix.
- Manages administrative interface
100Class Relationships
- Loader
- At periodic intervals Loader.monitor checks the
system plug-in directory - Uses services in System.
- Reflection and System.Type of the .NET FCL to
identify and load Cells
101Class Relationships
- Matrix
- Represents substrate where Cells plug-in to.
- Maintains collection of all active Cells.
- Matrix.register
- Matrix.unregister
- Matrix.discover
- Matrix.send
102Class Relationships
- BlockingQueue
- Implements efficient blocking queues used by
Cells. - Uses FCL AutoResetEvent
- BlockingQueue.enQ
- BlockingQueue.deQ
103Class Relationships
ICell.register ICell.unregister ICell.accept ICell
.getClone ICell.syncSend ICell.start ICell.queryCa
pability ICell.process
- ICell
- Defines the protocol to be followed by all Cells
- Actual Cells are concrete classes deriving from
ICell, and are provided default implementation.
104Class Relationships
- ExecObject
- ReturnObject
- Assist in the packaging of arguments, return
values, exceptions, etc. in requests and
responses between cooperating Cells.
105Class Relationships
- Messaging
- Provides services for the generation of
well-formed XML messages in the correct format. - Messaging.
- buildRequest
- Messaging.
- buildResponse
- Messaging.
- extract
106Class Relationships
- Clone
- Provides clone of a particular Cell enabling
stateful collaboration between Cells.
107Class Relationships
- SyncWait
- Allows for synchronous collaboration between
Cells. - Collaboration could also be asynchronous or
one-way.