Compression Techniques to Simplify the Analysis of Large Execution Traces PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Compression Techniques to Simplify the Analysis of Large Execution Traces


1
Compression Techniques to Simplify the Analysis
of Large Execution Traces
  • Abdelwahab Hamou-Lhadj and Dr. Timothy C.
    Lethbridge
  • ahamou, tcl_at_site.uottawa.ca
  • University of Ottawa - Canada
  • IWPC 2002 - Paris

2
Introduction
  • Execution traces are important to understand the
    behavior and sometimes the structure of a
    software system
  • Execution traces tend to be very large and need
    to be compressed
  • In this presentation, we present techniques for
    compressing traces of procedure calls
  • We also show the results of our techniques when
    applied to two different software systems

3
Why Traces of Procedure Calls?
  • Many of todays legacy systems were developed
    using the procedural paradigm
  • The flow of procedure calls can be useful to
    comprehend the execution of a particular software
    feature
  • The level of abstraction of traces of procedure
    calls tend to be not too low and not too high
  • Traces of method invocation become crucial when
    it comes to understand the behavior of
    object-oriented systems

4
Traditional Compression Techniques
  • They are two types of compression techniques
    lossy and lossless compression
  • In Information theory, most of the compression
    algorithms are based on the same principle (David
    Salomon, 2000)
  • Compressing data by removing redundancy
  • These techniques produce good results, however
  • The information, once compressed, is no longer
    readable by humans.
  • Such algorithms certainly will not help in
    program comprehension

5
Trace Compression Steps
  • Preprocess the trace by removing the contiguous
    redundancies due to loops and recursion
  • Represent the trace as a rooted ordered labeled
    tree
  • Detect the non-contiguous redundancies and
    represent them only once
  • this problem is also known as the common
    subexpression problem and can be solved in linear
    time
  • Analyze the compressed version and estimate the
    gain

6
Preprocessing Stage
  • Redundant calls caused by loops and recursion
    tend to encumber the trace and should be removed
  • the number of occurrences is stored to
    reconstruct the original trace
  • Removing the redundant calls is one form of
    compression that could make the trace more
    readable
  • If the trace is perceived as a tree, removing
    contiguous redundancies reduce the depth of the
    tree and the degree of its nodes

7
The Common Subexpression ProblemIntroduced by
J.P. Downey, R. Sethi and R.E. Tarjan
  • Any tree can be represented in a maximally
    compact form as a directed acyclic graph where
    common subtrees are factored and shared, being
    represented only once - Flajolet, Sipala and
    Steyaert
  • The process of compacting the tree is known as
    the common subexpression problem also called
    subtree factoring
  • If we consider trees with a finite number of
    nodes so that the degrees are bounded by some
    constant ... The compacted form of a tree can be
    computed in expected time O(n) using a top-down
    recursive procedure in conjecture with
    hashing... - Flajolet, Sipala and Steyaert

8
Example
Input tree 9 nodes and 8 links
The Compressed form 5 nodes and 6 links
9
The Algorithm Introduced by P. Flajolet, P.
Sipala, J.M. Steyaert and improved by G. Valiente
  • The algorithm assigns a positive number called
    certificate to each node
  • Two nodes have the same certificate if, and only
    if the trees rooted at them are isomorphic.
  • The certificate of a node n is obtained by
  • building a sequence L(n), a1, .... , am called
    the signature of the node, where L(n) is the
    label of the node, a1,..., am are the
    certificates of the children of the node.
  • The certificates and signatures are stored in a
    global table

10
Example
11
The Algorithm Steps (iterative version)
  • The algorithm performs a bottom-up traversal of
    the tree using a queue
  • 1. For each node n
  • 2. Build a signature for n
  • 3. If the signature already exists in the global
    table then
  • 4. Return the corresponding certificate
  • Else
  • 5. Create a new certificate
  • 6. Update the table
  • 7. Assign the certificate to the node
  • If the degree of the tree is bounded by a
    constant and a hash table is used to store the
    certificates then this algorithm performs in
    linear time

12
Experiment
  • We experimented with traces of the following
    systems
  • XFIG (a drawing system under UNIX)
  • A real world telecommunication system
  • We are interested in the following results
  • The initial size of the trace n
  • The size of the trace after preprocessing it n1
  • The compression ratio r1 such that r1 n1 / n
  • The size of the trace after using the common
    subexpression algorithm n2.
  • The compression ratio r2 such that r2 n2 / n

13
Results of the Experiment (XFIG System)
14
Some Considerations Regarding the
Telecommunication System
  • It is a large legacy system
  • The traces are generated using an internal
    mechanism
  • The traces tend to be incomplete. This is
    reflected as an inconsistency in the trace with
    respect to the nesting levels.
  • Our solution to this problem is to complete the
    trace
  • by filling up the gaps with virtual procedure
    calls
  • estimate the error ratio, which is the number of
    missing calls to the size of the original trace.
  • e g / (gn)

15
Results of the Experiment (Telecom. System)
16
Variation of the degrees of the tree according to
depth (3 traces of XFIG)
Before the preprocessing step
After the preprocessing step
17
Variation of the degrees of the tree according to
depth (3 traces of the telecom. system)
Before the preprocessing step
After the preprocessing step
18
Discussion
  • Procedure-call traces could be considerably
    compressed in a way that preserves the ability
    for humans to understand them
  • Possible improvement
  • look for procedures that are not of a great
    interest to software engineers
  • remove them before the compression process
  • The preprocessing stage could be very useful to
  • reduce the trace size
  • increase of the performance of the common
    subexpression algorithm

19
Conclusions and future directions
  • The results shown in this presentation can help
    build better tools based on execution traces
  • We intend to conduct more experiments with this
    framework to see how helpful it is to software
    engineers
  • Future directions should focus on lossy
    compression.Types of information eliminated can
    include
  • the number of repetitions, the order of calls,
    and some lower-level utility procedures
  • The non-contiguous redundancies can be used to
    determine other features of the system

20
(No Transcript)
21
Results of the Experiment (XFIG System)With
procedures and files
22
Results of the Experiment (Telecom. System) with
procedures and files
Write a Comment
User Comments (0)
About PowerShow.com