Compression Techniques to Simplify the Analysis of Large Execution Traces presentation

About This Presentation

Transcript and Presenter's Notes

Title: Compression Techniques to Simplify the Analysis of Large Execution Traces

1
Compression Techniques to Simplify the Analysis
of Large Execution Traces

Abdelwahab Hamou-Lhadj and Dr. Timothy C.
Lethbridge
ahamou, tcl_at_site.uottawa.ca
University of Ottawa - Canada
IWPC 2002 - Paris

2
Introduction

Execution traces are important to understand the
behavior and sometimes the structure of a
software system
Execution traces tend to be very large and need
to be compressed
In this presentation, we present techniques for
compressing traces of procedure calls
We also show the results of our techniques when
applied to two different software systems

3
Why Traces of Procedure Calls?

Many of todays legacy systems were developed
using the procedural paradigm
The flow of procedure calls can be useful to
comprehend the execution of a particular software
feature
The level of abstraction of traces of procedure
calls tend to be not too low and not too high
Traces of method invocation become crucial when
it comes to understand the behavior of
object-oriented systems

4
Traditional Compression Techniques

They are two types of compression techniques
lossy and lossless compression
In Information theory, most of the compression
algorithms are based on the same principle (David
Salomon, 2000)
Compressing data by removing redundancy
These techniques produce good results, however
The information, once compressed, is no longer
readable by humans.
Such algorithms certainly will not help in
program comprehension

5
Trace Compression Steps

Preprocess the trace by removing the contiguous
redundancies due to loops and recursion
Represent the trace as a rooted ordered labeled
tree
Detect the non-contiguous redundancies and
represent them only once
this problem is also known as the common
subexpression problem and can be solved in linear
time
Analyze the compressed version and estimate the
gain

6
Preprocessing Stage

Redundant calls caused by loops and recursion
tend to encumber the trace and should be removed
the number of occurrences is stored to
reconstruct the original trace
Removing the redundant calls is one form of
compression that could make the trace more
readable
If the trace is perceived as a tree, removing
contiguous redundancies reduce the depth of the
tree and the degree of its nodes

7
The Common Subexpression ProblemIntroduced by
J.P. Downey, R. Sethi and R.E. Tarjan

Any tree can be represented in a maximally
compact form as a directed acyclic graph where
common subtrees are factored and shared, being
represented only once - Flajolet, Sipala and
Steyaert
The process of compacting the tree is known as
the common subexpression problem also called
subtree factoring
If we consider trees with a finite number of
nodes so that the degrees are bounded by some
constant ... The compacted form of a tree can be
computed in expected time O(n) using a top-down
recursive procedure in conjecture with
hashing... - Flajolet, Sipala and Steyaert

8
Example
Input tree 9 nodes and 8 links
The Compressed form 5 nodes and 6 links
9
The Algorithm Introduced by P. Flajolet, P.
Sipala, J.M. Steyaert and improved by G. Valiente

The algorithm assigns a positive number called
certificate to each node
Two nodes have the same certificate if, and only
if the trees rooted at them are isomorphic.
The certificate of a node n is obtained by
building a sequence L(n), a1, .... , am called
the signature of the node, where L(n) is the
label of the node, a1,..., am are the
certificates of the children of the node.
The certificates and signatures are stored in a
global table

10
Example
11
The Algorithm Steps (iterative version)

The algorithm performs a bottom-up traversal of
the tree using a queue
1. For each node n
2. Build a signature for n
3. If the signature already exists in the global
table then
4. Return the corresponding certificate
Else
5. Create a new certificate
6. Update the table
7. Assign the certificate to the node
If the degree of the tree is bounded by a
constant and a hash table is used to store the
certificates then this algorithm performs in
linear time

12
Experiment

We experimented with traces of the following
systems
XFIG (a drawing system under UNIX)
A real world telecommunication system
We are interested in the following results
The initial size of the trace n
The size of the trace after preprocessing it n1
The compression ratio r1 such that r1 n1 / n
The size of the trace after using the common
subexpression algorithm n2.
The compression ratio r2 such that r2 n2 / n

13
Results of the Experiment (XFIG System)
14
Some Considerations Regarding the
Telecommunication System

It is a large legacy system
The traces are generated using an internal
mechanism
The traces tend to be incomplete. This is
reflected as an inconsistency in the trace with
respect to the nesting levels.
Our solution to this problem is to complete the
trace
by filling up the gaps with virtual procedure
calls
estimate the error ratio, which is the number of
missing calls to the size of the original trace.
e g / (gn)

15
Results of the Experiment (Telecom. System)
16
Variation of the degrees of the tree according to
depth (3 traces of XFIG)
Before the preprocessing step
After the preprocessing step
17
Variation of the degrees of the tree according to
depth (3 traces of the telecom. system)
Before the preprocessing step
After the preprocessing step
18
Discussion

Procedure-call traces could be considerably
compressed in a way that preserves the ability
for humans to understand them
Possible improvement
look for procedures that are not of a great
interest to software engineers
remove them before the compression process
The preprocessing stage could be very useful to
reduce the trace size
increase of the performance of the common
subexpression algorithm

19
Conclusions and future directions

The results shown in this presentation can help
build better tools based on execution traces
We intend to conduct more experiments with this
framework to see how helpful it is to software
engineers
Future directions should focus on lossy
compression.Types of information eliminated can
include
the number of repetitions, the order of calls,
and some lower-level utility procedures
The non-contiguous redundancies can be used to
determine other features of the system

20
(No Transcript)
21
Results of the Experiment (XFIG System)With
procedures and files
22
Results of the Experiment (Telecom. System) with
procedures and files

Write a Comment

User Comments (0)

About PowerShow.com

Compression Techniques to Simplify the Analysis of Large Execution Traces PowerPoint PPT Presentation