Refactoring C to Safer C - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Refactoring C to Safer C

Description:

Refactoring C to Safer C. Bill McCloskey. Eric Brewer. One (1) ACME. Refactoring Tool ... C is a powerful language with many users and a huge base of legacy ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: stude1447
Category:

less

Transcript and Presenter's Notes

Title: Refactoring C to Safer C


1
Refactoring C to Safer C
One (1) ACME Refactoring Tool
C (OBSCURITAS TERRIBILIS)
SAFER C (LINGUA SALVA)
  • Bill McCloskey
  • Eric Brewer

2
Background
  • C is a powerful language with many users and a
    huge base of legacy software
  • but C is error-prone
  • Buffer overflows
  • Memory/pointer errors
  • Concurrency errors/race conditions
  • Lack of proper error-handling/cleanup
  • Several ways to fix the problem for old code

3
Solution 1
  • Program analysis/transformation (Lint, Metal,
    Prefix, CQual, CCured)
  • Continue to maintain old programs in C
  • Occasionally use the tool to find bugs or
    generate safety checks
  • Downsides
  • Programs still written in C, which is error-prone
  • Often have many false-positives, and the user
    must sort through them

4
Solution 2
  • Another possibility (Cyclone)
  • Rewrite old programs in a safer language
    (Cyclone) that still resembles C
  • All future changes are made to the new (Cyclone)
    code, so they are guaranteed safe
  • Downsides
  • Its difficult to rewrite an entire program, even
    if new language is close to C
  • As new checks are added to Cyclone (e.g., for
    data races), old code must be revised again

5
Proposal
  • The Cyclone approach has many benefits
  • All new code is guaranteed safe
  • Cyclone can check for several classes of bugs
  • Why not use a tool to transform old code into a
    safer language automatically?
  • This kind of transformation is called a
    refactoring

6
Refactoring
  • Refactorings are code improvement transformations
  • The output of a refactoring is readable code
  • Idea gradually transform C code into safe,
    readable code---with programmer intervention
    allowed at each step
  • Use different refactorings to solve different
    problems memory safety, concurrency, etc.

7
Refactoring Example
existing C program P0
refactored program P1
refactored program P2
final program P3
Stages
eliminate buffer overflows
guard against race conditions
ensure clean-up after exceptions
this code readable by programmer
Refactorings

Resusults
8
Difficulties
  • A refactoring must output readable code
  • but existing tools cant do that
  • They start by running the C preprocessor, which
    destroys readability
  • Macros are expanded
  • Include files are merged
  • Conditional code is inlined or eliminated
  • For refactoring to work, the preprocessor problem
    must be solved

9
Outline
  • Replacing the C preprocessor
  • ASTEC an improved macro language
  • Macroscope a cpp to ASTEC translator
  • Refactoring C code
  • Asfact a prototype refactoring tool
  • Future work

10
Cpp Lost in Translation
  • Cpp operates at the token level
  • Analysis tools have difficulty parsing such
    macros directly, so they expand them
  • but expanding them destroys readability
  • Cpp must be replaced with something that operates
    on entire syntax trees ASTEC

define ADD(x) x ADD(3) 4
3 4
11
ASTEC Examples
  • Constants and expressions
  • Inline functions
  • Types (possibly parameterized)
  • Also modules, decorators, conditional
    compilation, include files

_at_macro CACHE_INDEX(int n) n2 1
_at_macro WORK(int when) begin() do_work(when)
end()
_at_type LIST_VALUE() int
12
ASTEC
  • Main goal of ASTEC enable analysis of macros in
    isolation, without expanding them
  • ASTEC macros are complete units, so they can be
    parsed without expansion
  • Also include type information, so they can be
    typechecked without expansion
  • ASTEC supports the most common kinds of macros,
    but more may be added as necessary

13
Outline
  • Replacing the C preprocessor
  • ASTEC an improved macro language
  • Macroscope a cpp to ASTEC translator
  • Refactoring C code
  • Asfact a prototype refactoring tool
  • Future work

14
Macroscope
  • For ASTEC to be useful, we must be able to
    translate cpp constructs into ASTEC
  • Macroscope is an automatic translation tool
  • Example

define ADD(x,y) xy ADD(3, 4)
_at_macro ADD(x, y) xy ADD(3, 4)
15
Macroscope Algorithm
  • Expand all macros in the program
  • Keep a record of the tokens involved
  • Parse the expanded code
  • Find the post tokens in the syntax tree and try
    to synthesize a macro from them
  • extract cpp arguments as ASTEC arguments

ADD(3, 4)
34
pre
post
16
Macroscope Example
ADD(3, 4)
34
pre
post
  • Steps
  • Expand macros
  • Parse expanded code
  • Identify arguments and do reverse substitution
  • Identify macro body (least common ancestor of
    post tokens)
  • Emit macro definition

bin op

x
y
_at_macro ADD(x, y) xy
17
Macroscope Examples
  • Using a slightly more advanced algorithm

define ADD(x) x ADD(3) 4
_at_macro ADD(x, y) xy ADD(3, 4)
define FIELDS f.g.h data.FIELDS
_at_macro FIELDS(a) a.f.g.h FIELDS(data)
18
Macroscope Results
  • Some preliminary results
  • All translated programs are semantically
    equivalent to their cpp counterparts
  • Imperfect translations occur when Macroscope
    synthesizes a macro that is less general than the
    original

19
Outline
  • Replacing the C preprocessor
  • ASTEC an improved macro language
  • Macroscope a cpp to ASTEC translator
  • Refactoring C code
  • Asfact a prototype refactoring tool
  • Future work

20
Asfact
  • Refactors ASTEC code
  • Generates readable output
  • Understands ASTEC constructs
  • Supports standard refactorings/analyses
  • Search/rename for variables/functions/fields
  • Add arguments to a function
  • Also supports programmable refactorings
  • Buffer overflows

21
Asfact Buffer Overflows
  • As a simple test case, refactored gzip to
    eliminate a well-known strcpy overflow
  • Example
  • Although Asfact (currently) can only recognize
    fixed-size buffers, it still succeeded in most
    cases for gzip

void main() char buffer80
strcpy(buffer, some_data)
void main() char buffer80
strcpy_safe(buffer, 80, some_data)
22
Incremental Refactoring
  • A more complex example requiring user guidance

struct buffer char data unsigned int
len void foo(const char text) struct
buffer buf strcat(buf-gtdata, text)
A refactoring tool can accept information from
the user that len is the size of data. CCured
would fatten data to include an extra,
unnecessary length field.
23
Incremental Refactoring
  • Giving the user the ability to control the
    refactoring can increase its power

struct buffer char data unsigned int
len void foo(const char text) struct
buffer buf strcat_safe(buf-gtdata,
buf-gtlen, text)
The resulting code is more efficient and
cleaner. Code is also more likely to be
compatible with old libraries, since fewer data
structure changes are necessary.
24
Future Work
  • Right now, Asfact converts ASTEC code into ASTEC
    code
  • Better idea increase the power of the language,
    via extensions, and refactor into this new
    language
  • Example Add bounds-checked arrays to C refactor
    old-style arrays to the new form to increase
    safety

25
Conclusion
  • Goal to make existing C code safer via
    incremental refactoring
  • We are now one step closer to the goal
  • For later steps, many refactorings can be
    synthesized from existing analyses (e.g., CCured)

Macroscope
C
ASTEC
Safer C
Refactorings
26
Extra
Write a Comment
User Comments (0)
About PowerShow.com