The Pivot: Static Analysis of C Applications - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

The Pivot: Static Analysis of C Applications

Description:

The Pivot: Static Analysis of C++ Applications Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs Overview Static analysis of C++ What would be ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 30
Provided by: parasolTa
Category:

less

Transcript and Presenter's Notes

Title: The Pivot: Static Analysis of C Applications


1
The PivotStatic Analysis of C Applications
  • Bjarne Stroustrup
  • Texas AM University
  • http//www.research.att.com/bs

2
Overview
  • Static analysis of C
  • What would be useful
  • Why it is hard
  • C0x
  • The Pivot
  • Context
  • Aims
  • Organization
  • Basic representations
  • High-level program representation for HPC
  • Concept-based checking and transformation

3
What would be useful?
  • Direct representation of high-level ideas in code
  • E.g. no sideeffects, idempotent operation, always
    gives the same answer for the same element, no
    security violation, no memory leak, no race
    condition, no deadlock, being sorted, being
    band-diagonal, parallel application
  • Use of such direct representation
  • For providing guarantees
  • For information
  • For optimization
  • For program transformation

4
Its hard
  • C is
  • Large
  • Extremely flexible and general
  • Quite irregular
  • Has its type-unsafe C subset
  • High-level ideas tend to be represented as
    templated classes and functions
  • Generic programming, Template meta-programming,
    generative programming
  • We have little experience with tools representing
    and manipulating templates
  • Such templates tend to be provided as part of
    domain specific libraries

5
Bell Labs proverbs
  • Library design is language design
  • Language design is library design
  • But the devil is in the details

6
C0x
  • 1998 ISO C standard
  • 2009 (estimated) ISO C standard
  • Better libraries and better support for library
    building
  • Hash maps, regular expressions, file system,
  • Threads and memory model
  • Concepts
  • A type system for types, integers, and operations
  • Auto, template aliases, general, initializer
    lists,

7
Concept trivial example
  • // Caveat likely C0x
  • templateltForwardIterator Iter, ValueType Valgt
  • where AssignableltItervalue_type,Valgt
  • Iter find(Iter first, Iter last, Val v)
  • templateltContainer Cont, ValueType Valgt
  • where AssignableltContvalue_type,Valgt
  • Iter find(Iter first, Iter last, Val v)
  • vectorltintgt v 2, 3, 5, 8, 13, 21, 34
  • auto p1 find(v.begin(), v.end(), 42)
  • auto p2 find(v,42.3)
  • auto p3 find(7,42) // error 7 is not a
    Container

8
Concepts
  • Can express many high-level abstractions
  • A type system for sets of types, integers, and
    operations
  • We have experimental implementations of concepts
  • A concept is a handle to which we can attach
  • some standard semantics within the language
  • essentially arbitrary semantics outside the
    language using tools
  • Until we get concepts, we can fake them with
    static analysts and transformation tools

9
Context for the Pivot
  • Semantically Enhanced Library (Language)
  • Enhanced notation through libraries
  • Restrict semantics through tools
  • And take advantage of that semantics

Domain Specific Library
C
Semantic Restrictions
10
Context for the Pivot
Domain Specific Library
C
Semantic Restrictions
  • Provide the advantages of specialized languages
  • Without introducing new special purpose
    languages
  • Without supporting special-purpose language tool
    chains
  • Avoiding the 99.? language death rate
  • Provide general support for the SELL idea
  • Not just a specialized tool per
    application/library
  • The Pivot fits here

11
Example SELL Safe C
  • Add
  • Range-checked stdvector
  • iterators
  • Resource handles
  • Any (if needed) (a typesafe union type)
  • Subtract
  • Arrays
  • Pointers
  • New/delete
  • Unions
  • Excessively complex/obscure code
  • Uses of undefined construct not caught by
    compilers (e.g. ai i)
  • Transforms
  • Pointers into iterators and resource handles (if
    porting)
  • New/delete into resource handle uses

12
Example SELL STAPL
  • Wait for Lawrences talk

13
Aims
  • To allow fully general analysis of C source
    code
  • What a human can do
  • Foci
  • Templates (e.g. specialization)
  • C0x features (e.g. concepts, generalized
    initializers)
  • Distributed programming
  • Embedded systems
  • Limitation we work after macro expansion
  • To allow transformation of C code
  • i.e. production of new code from old source
  • Non-aim handling other languages
  • e.g. Fortran, Java
  • but C and C dialects are relatively easy

14
Related work
  • Lots
  • 20 tools for analyzing C
  • But
  • Most are specialized
  • E.g. alias analysis, flow analysis, numeric
    optimizations
  • Most are attached to a single compiler/parser
  • None handles all of C
  • E.g. C classes, C but not standard libraries
  • (that requires full handling of templates)
  • Hardly two tools handle the same subset
  • None handles the key C0x features (e.g.
    concepts)
  • Some are proprietary
  • No serious interoperability

15
The Pivot
16
The Pivot
C source
Object code
Tool 1
Compiler
Compiler
Compiler
IDL
Tool 2
IPR
C source
Tool 3
Tool 4
Specialized representation (e.g. flow graph)
XPR
information
17
Why? The Original Project
  • Communication with remote mobile device
  • Calling interface
  • CORBA, DCOM, Java RMI, , homebrew interface
  • Transport
  • TCP/IP, XML, , homebrew protocol
  • Big, Ugly, Slow, Proprietary,
  • Why cant I just write ISO Standard C?

18
The original Project Distributed programs in ISO
C
// use local object X x // remote at my
host A a stdstring s("abc") // x.f(a, s)
// a function call
// use remote object proxyltXgt
x x.connect("my_host") A a stdstring
s("abc") // x.f(a, s) // a message send
  • as similar as possible to non-distributed
    programming, but no more similar

19
IPR high-level principles
  • Complete Direct representation of C
  • Built-in types, classes, templates, expressions,
    statements, translation units
  • Can represent erroneous and incomplete C
    programs
  • Regular
  • The structure contains all of C but doesnt
    mimic irregularities
  • Programming effort proportional to complexity of
    task
  • IPR is not just a data structure
  • Extensible
  • Node types
  • Information associated with a node
  • Operations
  • No integration with compilers

20
IPR design choices
  • Type safe
  • IPR (not its users) handles memory management
  • Minimal (run-time and space)
  • Minimal number of nodes (unification)
  • Minimal number of checked indirections (usually,
    virtual function calls)
  • Expression-based regular superset of C
  • E.g. statements, declarations are expressions too
  • C0x features (most important concepts types
    have types)
  • Interfaces
  • Purely functional, abstract classes, for most
    users
  • No mutation operation on abstract classes
  • Users don't get pointers directly
  • Mutating (operates on concrete classes)
  • Users get to use pointers for in-place
    transformation
  • Traversals (and queries)
  • Several, most not in the Pivot core

21
IPR is minimal
  • Necessary for dealing with real-world code
  • Multi-million line programs are not uncommon
  • Given the constraint of completeness
  • C is complex
  • especially when we use the advanced template
    features essential for high-performance work
  • Unified representation
  • E.g., there is only one int and only one 1
  • Type comparison becomes pointer comparison
  • Indirections are minimized
  • An indirection (only) when there is a choice of
    different types of information

22
Original idea (XTI)
  • Too large, too slow

23
Current hierarchy (IPR)
  • Compact
  • minimal call overhead

24
IPR Example 1
void foo(float b 2.4)
25
IPR Example 2
26
XPR (eXternal Program Representation)
  • Can be thought of as a specialized portable
    object database
  • Easy/fast to parse
  • Easy/fast to write
  • Compact
  • About as compact as C source code
  • Robust
  • Read/write without using a symbol table
  • LR(1), strictly prefix declaration syntax
  • Human readable
  • Human writeable
  • Can represent almost all of C directly
  • No preprocessor directives
  • No multiple declarators in a declaration
  • No lt, gt, gtgt, or ltlt in template arguments, except
    in parentheses

27
XPR
  • i int // int i
  • C class // class C
  • m const int // const int m
  • mm const int // const int mm
  • f (int,char) double // double f(int,char)
  • f (zcomplex) C // C f(complex z)
  • //
  • vector ltTgt class // templateltclass Tgt class
    vector
  • p T // T p
  • sz int // int sz
  • //

28
Extremely simple SELL example
  • template ltParallelizable Tgt
  • void f(const T v)
  • double d v2 // OK
  • double d v2 // not OK

29
Current and future work
  • Complete infrastructure
  • Complete EDG and GCC interfaces
  • Represent headers (modularity) directly
  • Complete type representation in XPR
  • Initial applications
  • Style analysis
  • including type safety and security
  • Analysis and transformation of STAPL programs
  • Build alliances

30
References
  • GJS06 Gregor, Douglas Järvi, Jaako Siek,
    Jeremy Lumsdaine, Andrew Dos Reis, Gabriel
    Stroustrup, Bjarne Concepts Linguistic Support
    for Generic Programming in C. to appear
    OOPSLA'06.
  • DRS05 Stroustrup, Bjarne Dos Reis, Gabriel A
    concept design. C Committee, paper N1782. April
    2005.
  • SDR05 Stroustrup, Bjarne Dos Reis, Gabriel
    Supporting SELL for High Performance Computing.
    LCPC '05.
  • Str05 Stroustrup, Bjarne A rational for
    semantically enhanced libraries. LCSD '05.
Write a Comment
User Comments (0)
About PowerShow.com