Program Analysis and Transformation - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Program Analysis and Transformation

Description:

In almost any language, we can find out information about variable usage ... Static analysis research tools typically get about 60% of the problems right. 8/22/09 ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 60
Provided by: SpirosMa2
Category:

less

Transcript and Presenter's Notes

Title: Program Analysis and Transformation


1
Program Analysis and Transformation
2
Program Analysis
  • Extracting information, in order to present
    abstractions of, or answer questions about, a
    software system
  • Static Analysis Examines the source code
  • Dynamic Analysis Examines the system as it is
    executing

3
What are we looking for?
  • Depends on our goals and the system
  • In almost any language, we can find out
    information about variable usage
  • In an OO environment, we can find out which
    classes use other classes, which are a base of an
    inheritance structure, etc.
  • We can also find potential blocks of code that
    can never be executed in running the program
    (dead code)
  • Typically, the information extracted is in terms
    of entities and relationships

4
Entities
  • Entities are individuals that live in the system,
    and attributes associated with them.
  • Some examples
  • Classes, along with information about their
    superclass, their scope, and where in the code
    they exist.
  • Methods/functions and what their return type or
    parameter list is, etc.
  • Variables and what their types are, and whether
    or not they are static, etc.

5
Relationships
  • Relationships are interactions between the
    entities in the system.
  • Relationships include
  • Classes inheriting from one another.
  • Methods in one class calling the methods of
    another class, and methods within the same class
    calling one another.
  • A method referencing an attribute.

6
Information format
  • Many different formats in use
  • Simple but effective RSF inherit TRIANGLE SHAPE
  • TA is an extension of RSF that includes a
    schema INSTANCE SHAPE Class
  • GXL is a XML-like extension of TABlow-up factor
    of 10 or more makes it rather cumbersome

7
Static Analysis
  • Involves parsing the source code
  • Usually creates an Abstract Syntax Tree
  • Borrows heavily from compiler technology but
    stops before code generation
  • Requires a grammar for the programming language
  • Can be very difficult to get right

8
CppETS
  • CppETS is a benchmark for C extractors
  • It consists of a collection of C programs that
    pose various problems commonly found in parsing
    and reverse engineering
  • Static analysis research tools typically get
    about 60 of the problems right

9
Example program
  • include ltiostream.hgt
  • class Hello
  • public Hello() Hello()
  • HelloHello()
  • cout ltlt "Hello, world.\n"
  • HelloHello()
  • cout ltlt "Goodbye, cruel world.\n"
  • main()
  • Hello h
  • return 0

10
Example QA
  • How many member methods are in the Hello class?
  • Where are these member methods used?

Answer Two, the constructor (HelloHello()) and
destructor (HelloHello()).
Answer The constructor is called implicitly when
an instance of the class is created. The
destructor is called implicitly when the
execution leaves the scope of the instance.
11
Static analysis in IDEs
  • High-level languages lend themselves better to
    static analysis needs
  • EiffelStudio automatically creates BON diagrams
    of the static structure of Eiffel systems
  • Rational Rose does the same with UML and Java
  • Unfortunately, most legacy systems are not
    written in either of these languages

12
Static analysis pipeline
Source code
Parser
Abstract Syntax Tree
Fact extractor
Clustering algorithm
Fact base
Visualizer
Metrics tool
13
Dynamic Analysis
  • Provides information about the run-time behaviour
    of software systems, e.g.
  • Component interactions
  • Event traces
  • Concurrent behaviour
  • Code coverage
  • Memory management
  • Can be done with a profiler or a debugger

14
Instrumentation
  • Augments the subject program with code that
    transmits events to a monitoring application, or
    writes relevant information to an output file
  • A profiler can be used to examine the output file
    and extract relevant facts from it
  • Instrumentation affects the execution speed and
    storage space requirements of the system

15
Instrumentation process
Source code
Annotator
Annotated program
Annotation script
Compiler
Instrumented executable
16
Dynamic analysis pipeline
Instrumented executable
CPU
Dynamic analysis data
Profiler
Clustering algorithm
Fact base
Visualizer
Metrics tool
17
Non-instrumented approach
  • One can also use debugger log files to obtain
    dynamic information
  • Disadvantage Limited amount of information
    provided
  • Advantage Less intrusive approach, more accurate
    performance measurements

18
Dynamic analysis issues
  • Ensuring good code coverage is a key concern
  • A comprehensive test suite is required to ensure
    that all paths in the code will be exercised
  • Results may not generalize to future executions

19
Static vs. Dynamic
  • Reasons over all possible behaviours (general
    results)
  • Conservative and sound
  • Challenge Choose good abstractions
  • Observes a small number of behaviours (specific
    results)
  • Precise and fast
  • Challenge Select representative test cases

20
SWAGKit
  • SWAGKit is used to generate software landscapes
    from source code
  • Based on a pipeline architecture with three
    phases
  • Extract (cppx)
  • Manipulate (prep, linkplus, layoutplus)
  • Present (lsedit)
  • Currently usable for programs written in C/C

21
The SWAGKit Pipeline
Source Code
layoutplus
linkplus
cppx
prep
lsedit
Landscape
22
The SWAGKit Pipeline
Function Filter Input Output
Extract cppx source .ta
Manipulate prep .ta .o.ta
Linkplus .o.ta out.ln.ta
Layoutplus out.ln.ta out.ls.ta
Present lsedit out.ls.ta picture
23
cppx prep
  • C/C Fact extractor based on gcc
    (http//swag.uwaterloo.ca/cppx)
  • Extracts facts from one source file at a time
  • Facts represent program information as a series
    of triples
  • INSTANCE x integer x is an integer
  • inherit Student Person Student inherits from
    Person
  • call foo bar foo calls bar
  • Produces .c.ta files, one per source file
  • Use g option for gcc parameters

24
cppx prep
  • Prep is a series of scripts written in Grok
  • Function is to clean up facts from cppx so they
    are in a form which can be usable by the rest of
    the pipeline.
  • Produces one .o.ta for each .ta
  • Can replace manual use of cppx prep with gce
  • Edit makefile, replace gcc with gce
  • Type make

25
Grok
  • A simple scripting language
  • A relational algebraic calculator
  • Powerful in manipulating binary relations
  • Widely used in architecture transformation
  • Online documentation

http//swag.uwaterloo.ca/j25wu/projects/grokdoc/i
ndex.html
http//swag.uwaterloo.ca/nsynytskyy/grokdoc/index
.html
26
Grok Features
  • Set operations
  • Union (), intersection (), subtraction (-),
    cross-product (X)
  • Binary relation operations
  • Union (), intersection (), subtraction (-),
    composition (o, ), projection (.), domain (dom),
    range (rng), identity (id), inverse (inv), entity
    (ent), transitive closure (), and reflective
    transitive closure ()

27
Grok Features Cont.
  • Programming constructs
  • if else
  • for, while
  • Arithmetic, comparison, logical operators
  • , -, , /,
  • lt, lt, , gt, gt, !
  • !, ,

28
Grok Scripts (1)
  • Grok
  • gtgt cat Garfield, Fluffy
  • gtgt mouse Mickey, Nancy
  • gtgt cheese Roquefort, Swiss
  • gtgt animals cat mouse
  • gtgt food mouse cheese
  • gtgt animalsWhichAreFood animals food
  • gtgt animalsWhichAreNotFood animals food
  • gtgt animalsWhichAreFood
  • Mickey
  • Nancy
  • gtgt animals food
  • Garfield
  • Fluffy
  • gtgt food
  • 4
  • gtgt mouse lt food
  • True
  • gtgt

gtgt chase cat X mouse gtgt chase Garfield
Mickey Garfield Nancy Fluffy Mickey Fluffy
Nancy gtgt gtgt eat chase mouse X cheese gtgt
eat Garfield Mickey Garfield Nancy Fluffy
Mickey Fluffy Nancy Mickey Roquefort Mickey
Swiss Nancy Roquefort Nancy Swiss
29
Grok Scripts (2)
  • gtgt Mickey . eat
  • Roquefort
  • Swiss
  • gtgt eat . Mickey
  • Garfield
  • Fluffy
  • gtgt
  • gtgt eater dom eat
  • gtgt food rng eat
  • gtgt chasedBy inv chase
  • gtgt topOfFoodChain dom eat rng eat
  • gtgt bottomOfFoodChain rng eat dom eat
  • gtgt bothEatAndChase   eat chase
  • gtgt eatButNotChase eat chase
  • gtgt chaseButNotEat chase eat
  • gtgt secondOrderEat   eat  o  eat
  • gtgt anyOrderEat eat

Programming constructs if expression
statements else statements while
expression statements for variable in
expression statements
30
A real example
  • containFacts 1
  • getdb containFacts
  • d dom contain
  • r rng contain
  • e ent contain
  • root d r
  • leaves r d
  • rootChildren root . contain
  • toKeep leaves rootChildren
  • toDelete e toKeep
  • cc contain
  • delset toDelete
  • delrel contain
  • contain cc
  • relToFile contain 2

Input A containment tree Output A flattened
version of the containment tree
31
linkplus
  • Function is to link all facts into one large
    graph
  • Combine graphs from .o.ta files
  • Resolve inter-compilation unit relationships
  • Merge header files together
  • Do some cleanup to shrink final graph
  • Usage
  • linkplus list_of_files_to_link
  • Produces out.ln.ta

32
layoutplus
  • Adds
  • Clustering of facts based on contain.rsf (created
    manually or from a clustering algorithm
  • Layout information so that graph can be displayed
  • Schema information
  • Usage
  • layoutplus contain_file out.ln.ta
  • Produces out.ls.ta

33
lsedit
  • View software landscape produced by previous
    parts of the pipeline
  • Can make changes to landscape and save them
  • Usage
  • lsedit out.ls.ta

34
Program Representation
  • Fundamental issue in re-engineering
  • Provides means to generate abstractions
  • Provides input to a computational model for
    analyzing and reasoning about programs
  • Provides means for translation and normalization
    of programs

35
Key questions
  • What are the strengths and weaknesses of various
    representations of programs?
  • What levels of abstraction are useful?

36
Abstract Syntax Trees
  • A translation of the source text in terms of
    operands and operators
  • Omits superficial details, such as comments,
    whitespace
  • All necessary information to generate further
    abstractions is maintained

37
AST production
  • Four necessary elements to produce an AST
  • Lexical analyzer (turn input strings into tokens)
  • Grammar (turn tokens into a parse tree)
  • Domain Model (defines the nodes and arcs
    allowable in the AST)
  • Linker (annotates the AST with global
    information, e.g. data types, scoping etc.)

38
AST example
  • Input string 1 / two / 2
  • Parse Tree
  • AST (withoutglobal info)


2
1
Add
arg1
arg2
int
int
1
2
39
Program Transformation
  • A program is a structured object with semantics
  • Structure allows us to transform a program
  • Semantics allow us to compare programs and decide
    on the validity of transformations

40
Program Transformation
  • The act of changing one program into another
    (from a source language to a target language)
  • Used in many areas of software engineering
  • Compiler construction
  • Software visualization
  • Documentation generation
  • Automatic software renovation

41
Application examples
  • Converting to a new language dialect
  • Migrating from a procedural language to an
    object-oriented one, e.g. C to C
  • Adding code comments
  • Requirement upgrading, e.g. using 4 digits for
    years instead of 2 (Y2K)
  • Structural improvements, e.g. changing GOTOs to
    control structures
  • Pretty printing

42
Simple program transformation
  • Modify all arithmetic expressions to reduce the
    number of parentheses using the formula (ab)c
    ac bcx (25)3becomesx 23 53

43
Two types of transformations
  • Translation
  • Source and target language are different
  • Semantics remain the same
  • Rephrasing
  • Source and target language are the same
  • Goal is to improve some aspect of the program
    such as its understandability or performance
  • Semantics might change

44
Translation
  • Program synthesis
  • Lowers the level of abstraction, e.g. compilation
  • Program migration
  • Transform to a different language
  • Reverse Engineering
  • Raises the level of abstraction, e.g. create
    architectural descriptions from the source code
  • Program Analysis
  • Reduces the program to one aspect, e.g. control
    flow

45
Translation taxonomy
46
Rephrasing
  • Program normalization
  • Decreases syntactic complexity (desugaring), e.g.
    algebraic simplification of expressions
  • Program optimization
  • Improves performance, e.g. inlining,
    common-subexpression and dead code elimination

47
Rephrasing
  • Program refactoring
  • Improves the design by restructuring while
    preserving the functionality
  • Program obfuscation
  • Deliberately makes the program harder to
    understand
  • Software renovation
  • Fixes bugs such as Y2K

48
Transformation tools
  • There are many transformation tools
  • Program-Transformation.org lists 90 of them
  • Most are based on term rewriting
  • Other solutions use functional programming,
    lambda calculus, etc.

49
Term rewriting
  • The process of simplifying symbolic expressions
    (terms) by means of a Rewrite System, i.e. a set
    of Rewrite Rules.
  • A Rewrite Rule is of the formlhs rhswhere lhs
    and rhs are term patterns

50
Example Rewrite System
  • 0 x x
  • s(x) y s(x y)
  • (x y) z x (y z)
  • Under these rewrite rules, the term
  • ((s(s(a)) s(b)) c)
  • will be rewritten as
  • s(s(s(a (b c))))

51
TXL
  • A generalized source-to-source translation system
  • Uses a context-free grammar to describe the
    structures to be transformed
  • Rule specification uses a by-example style
  • Has been used to process billions of lines of
    code for Y2K purposes

52
TXL programs
  • TXL programs consist of two parts
  • Grammar for the input language
  • Transformation Rules
  • Lets look at some examples

53
Calculator.Txl - Grammar
  • Part I. Syntax specification
  • define program
  • expression
  • end define
  • define expression
  • term
  • expression addop term
  • end define
  • define term
  • primary
  • term mulop primary
  • end define
  • define primary
  • number
  • ( expression )
  • end define
  • define addop
  • '
  • '-
  • end define
  • define mulop
  • '
  • '/
  • end define

54
Calculator.Txl - Rules
  • Part 2. Transformation rules
  • rule main
  • replace expression
  • E expression
  • construct NewE expression
  • E resolveAddition
  • resolveSubtraction
  • resolveMultiplication
  • resolveDivision
  • resolveParentheses
  • where not
  • NewE E
  • by NewE
  • end rule
  • rule resolveAddition
  • replace expression
  • N1 number N2 number
  • by
  • N1 N2
  • end rule
  • rule resolveSubtraction
  • rule resolveMultiplication
  • rule resolveDivision
  • rule resolveParentheses
  • replace primary
  • ( N number )
  • by N
  • end rule

55
DotProduct.Txl
  • Form the dot product of two vectors,
  • e.g., (1 2 3).(3 2 1) gt 10
  • define program
  • ( repeat number ) . ( repeat number )
  • number
  • end define
  • rule main
  • replace program
  • ( V1 repeat number ) .
  • ( V2 repeat number )
  • construct Zero number
  • 0
  • by
  • Zero addDotProduct V1 V2
  • end rule
  • rule addDotProduct V1 repeat number
  • V2 repeat number
  • deconstruct V1
  • First1 number
  • Rest1 repeat number
  • deconstruct V2
  • First2 number
  • Rest2 repeat number
  • construct ProductOfFirsts number
  • First1 First2
  • replace number
  • N number
  • by
  • N ProductOfFirsts
  • addDotProduct Rest1 Rest2
  • end rule

56
Sort.Txl
  • Sort.Txl - simple numeric bubble sort
  • define program
  • repeat number
  • end define
  • rule main
  • replace repeat number
  • N1 number N2 number Rest repeat
    number
  • where
  • N1 gt N2
  • by
  • N2 N1 Rest
  • end rule

57
Other TXL constructs
  • compounds
  • -gt
  • end compounds
  • keys
  • var procedure exists inout out
  • end keys
  • function isAnAssignmentTo X id
  • match statement
  • X Y expression
  • end function

58
www.txl.ca
  • Guided Tour
  • Many examples
  • Reference manual
  • Download TXL for many platforms

59
Example uses
  • HTML Pretty Printing of Source Code
  • Language to Language Translation
  • Design Recovery from Source
  • Improvement of security problems
  • Program instrumentation and measurement
  • Logical formula simplification and
    interpretation.
Write a Comment
User Comments (0)
About PowerShow.com