Title: The Pivot: Static Analysis of C Applications
1The PivotStatic Analysis of C Applications
- Bjarne Stroustrup
- Texas AM University
- http//www.research.att.com/bs
2Overview
- Static analysis of C
- What would be useful
- Why it is hard
- C0x
- The Pivot
- Context
- Aims
- Organization
- Basic representations
- High-level program representation for HPC
- Concept-based checking and transformation
3What would be useful?
- Direct representation of high-level ideas in code
- E.g. no sideeffects, idempotent operation, always
gives the same answer for the same element, no
security violation, no memory leak, no race
condition, no deadlock, being sorted, being
band-diagonal, parallel application - Use of such direct representation
- For providing guarantees
- For information
- For optimization
- For program transformation
4Its hard
- C is
- Large
- Extremely flexible and general
- Quite irregular
- Has its type-unsafe C subset
- High-level ideas tend to be represented as
templated classes and functions - Generic programming, Template meta-programming,
generative programming - We have little experience with tools representing
and manipulating templates - Such templates tend to be provided as part of
domain specific libraries
5Bell Labs proverbs
- Library design is language design
- Language design is library design
- But the devil is in the details
6C0x
- 1998 ISO C standard
- 2009 (estimated) ISO C standard
- Better libraries and better support for library
building - Hash maps, regular expressions, file system,
- Threads and memory model
- Concepts
- A type system for types, integers, and operations
- Auto, template aliases, general, initializer
lists,
7Concept trivial example
- // Caveat likely C0x
- templateltForwardIterator Iter, ValueType Valgt
- where AssignableltItervalue_type,Valgt
- Iter find(Iter first, Iter last, Val v)
- templateltContainer Cont, ValueType Valgt
- where AssignableltContvalue_type,Valgt
- Iter find(Iter first, Iter last, Val v)
- vectorltintgt v 2, 3, 5, 8, 13, 21, 34
- auto p1 find(v.begin(), v.end(), 42)
- auto p2 find(v,42.3)
- auto p3 find(7,42) // error 7 is not a
Container
8Concepts
- Can express many high-level abstractions
- A type system for sets of types, integers, and
operations - We have experimental implementations of concepts
- A concept is a handle to which we can attach
- some standard semantics within the language
- essentially arbitrary semantics outside the
language using tools - Until we get concepts, we can fake them with
static analysts and transformation tools
9Context for the Pivot
- Semantically Enhanced Library (Language)
- Enhanced notation through libraries
- Restrict semantics through tools
- And take advantage of that semantics
Domain Specific Library
C
Semantic Restrictions
10Context for the Pivot
Domain Specific Library
C
Semantic Restrictions
- Provide the advantages of specialized languages
- Without introducing new special purpose
languages - Without supporting special-purpose language tool
chains - Avoiding the 99.? language death rate
- Provide general support for the SELL idea
- Not just a specialized tool per
application/library - The Pivot fits here
11Example SELL Safe C
- Add
- Range-checked stdvector
- iterators
- Resource handles
- Any (if needed) (a typesafe union type)
- Subtract
- Arrays
- Pointers
- New/delete
- Unions
- Excessively complex/obscure code
- Uses of undefined construct not caught by
compilers (e.g. ai i) - Transforms
- Pointers into iterators and resource handles (if
porting) - New/delete into resource handle uses
12Example SELL STAPL
13Aims
- To allow fully general analysis of C source
code - What a human can do
- Foci
- Templates (e.g. specialization)
- C0x features (e.g. concepts, generalized
initializers) - Distributed programming
- Embedded systems
- Limitation we work after macro expansion
- To allow transformation of C code
- i.e. production of new code from old source
- Non-aim handling other languages
- e.g. Fortran, Java
- but C and C dialects are relatively easy
14Related work
- Lots
- 20 tools for analyzing C
- But
- Most are specialized
- E.g. alias analysis, flow analysis, numeric
optimizations - Most are attached to a single compiler/parser
- None handles all of C
- E.g. C classes, C but not standard libraries
- (that requires full handling of templates)
- Hardly two tools handle the same subset
- None handles the key C0x features (e.g.
concepts) - Some are proprietary
- No serious interoperability
15The Pivot
16The Pivot
C source
Object code
Tool 1
Compiler
Compiler
Compiler
IDL
Tool 2
IPR
C source
Tool 3
Tool 4
Specialized representation (e.g. flow graph)
XPR
information
17Why? The Original Project
- Communication with remote mobile device
- Calling interface
- CORBA, DCOM, Java RMI, , homebrew interface
- Transport
- TCP/IP, XML, , homebrew protocol
- Big, Ugly, Slow, Proprietary,
- Why cant I just write ISO Standard C?
18The original Project Distributed programs in ISO
C
// use local object X x // remote at my
host A a stdstring s("abc") // x.f(a, s)
// a function call
// use remote object proxyltXgt
x x.connect("my_host") A a stdstring
s("abc") // x.f(a, s) // a message send
- as similar as possible to non-distributed
programming, but no more similar
19IPR high-level principles
- Complete Direct representation of C
- Built-in types, classes, templates, expressions,
statements, translation units - Can represent erroneous and incomplete C
programs - Regular
- The structure contains all of C but doesnt
mimic irregularities - Programming effort proportional to complexity of
task - IPR is not just a data structure
- Extensible
- Node types
- Information associated with a node
- Operations
- No integration with compilers
20IPR design choices
- Type safe
- IPR (not its users) handles memory management
- Minimal (run-time and space)
- Minimal number of nodes (unification)
- Minimal number of checked indirections (usually,
virtual function calls) - Expression-based regular superset of C
- E.g. statements, declarations are expressions too
- C0x features (most important concepts types
have types) - Interfaces
- Purely functional, abstract classes, for most
users - No mutation operation on abstract classes
- Users don't get pointers directly
- Mutating (operates on concrete classes)
- Users get to use pointers for in-place
transformation - Traversals (and queries)
- Several, most not in the Pivot core
21IPR is minimal
- Necessary for dealing with real-world code
- Multi-million line programs are not uncommon
- Given the constraint of completeness
- C is complex
- especially when we use the advanced template
features essential for high-performance work - Unified representation
- E.g., there is only one int and only one 1
- Type comparison becomes pointer comparison
- Indirections are minimized
- An indirection (only) when there is a choice of
different types of information
22Original idea (XTI)
23Current hierarchy (IPR)
- Compact
- minimal call overhead
24IPR Example 1
void foo(float b 2.4)
25IPR Example 2
26XPR (eXternal Program Representation)
- Can be thought of as a specialized portable
object database - Easy/fast to parse
- Easy/fast to write
- Compact
- About as compact as C source code
- Robust
- Read/write without using a symbol table
- LR(1), strictly prefix declaration syntax
- Human readable
- Human writeable
- Can represent almost all of C directly
- No preprocessor directives
- No multiple declarators in a declaration
- No lt, gt, gtgt, or ltlt in template arguments, except
in parentheses
27XPR
- i int // int i
- C class // class C
- m const int // const int m
- mm const int // const int mm
- f (int,char) double // double f(int,char)
- f (zcomplex) C // C f(complex z)
- //
- vector ltTgt class // templateltclass Tgt class
vector - p T // T p
- sz int // int sz
- //
28Extremely simple SELL example
- template ltParallelizable Tgt
- void f(const T v)
-
- double d v2 // OK
- double d v2 // not OK
29Current and future work
- Complete infrastructure
- Complete EDG and GCC interfaces
- Represent headers (modularity) directly
- Complete type representation in XPR
- Initial applications
- Style analysis
- including type safety and security
- Analysis and transformation of STAPL programs
- Build alliances
30References
- GJS06 Gregor, Douglas Järvi, Jaako Siek,
Jeremy Lumsdaine, Andrew Dos Reis, Gabriel
Stroustrup, Bjarne Concepts Linguistic Support
for Generic Programming in C. to appear
OOPSLA'06. - DRS05 Stroustrup, Bjarne Dos Reis, Gabriel A
concept design. C Committee, paper N1782. April
2005. - SDR05 Stroustrup, Bjarne Dos Reis, Gabriel
Supporting SELL for High Performance Computing.
LCPC '05. - Str05 Stroustrup, Bjarne A rational for
semantically enhanced libraries. LCSD '05.