Title: Delta: Heuristically Minimize
1DeltaHeuristically Minimize Interesting
Filesdelta.tigris.org
- Daniel S. Wilkerson
- work with Scott McPeak
2This quater million line file crashes my tool!
- We had a quarter million line (preprocessed) C
file that crashed our C front-end (Elsa) - How long would it take you to minimize that by
hand? - Delta reduced it in a few hours to a page or two
of code - While we did something else!
3Delta Debugging Algorithm
- Andreas Zellers Delta Debugging Algorithm
- For file minimization, reduces to this
- for each granularity g from 0 to log2 N
- partition the file into 2g parts
- for each part
- test if the file minus part is still interesting
- if so, permanently throw out that part
- Result is one minimal
- removing any one line will make test fail
4Example both blue needed
5both blue needed g 0
cant delete the box since it contains both b and
e
6both blue needed g 1
cant delete contains b
cant delete contains e
7both blue needed g 2
can delete
can delete
8both blue needed g 3
can delete
can delete
9both blue needed final
10You could do this manually...
- and be much more clever
- ...but delta is often faster
- I find it surprising that minimizing a file
exibiting a certain behavior, brute force mostly
wins over cleverness - Computers are as dumb as hell but they go like
60 -- Richard Feynman
11Do a controlled experiment
- An experiment does many things
- the interesting bit
- and the boilerplate just to make it go
- A control is another experiment
- that only does the boilerplate
- Do both and subtract finds interesting bit
- gcc -c F control F passes gcc
- oink F grep 'error... but not oink
12topformflat explaining hierarchical structure
- To delta, a file is a sequence of lines
- topformflat explains the nesting of C/C
- Simple flex filter that copies input to output
- but doesnt print newlines nested deeper than a
nesting-depth argument - Strategy repeatedly minimize with increasing
nesting depths
13topformflat Example
void foo() for(...) x - 5 bar()
while(...) j void bar() z
17 foo() void baz() ...
14topformflat Example, level0
void foo() for(...)x - 5bar()while(...)j
void bar() z 17foo() void baz() ...
15topformflat Example, level1
void foo() for(...) x - 5 bar()
while(...) j void bar() z 17
foo() void baz() ...
deleted
16topformflat Example, level2
void foo() for(...) x - 5 bar()
while(...) j void bar() z
17 foo() void baz() ...
17Science Most bugs exhibitableby small inputs
- On any input size, the result is almost always
small - for C input to a compiler, 1-2 pages of code.
- Seems to be a phenomenon of computation
- there actually is Science in Computer Science!
- but not always
- delta worked for a week and still had 50 files
- a buffer had to fill up and then flush
18The Configuration File Trick
- Delta generalizes to many situations if you
- parameterize the process with a file
- minimize the file.
- Simon Goldsmith was instrumenting Java system
binaries - during class-loading JVM would seg-fault
nothing really comprehensible would happen - wrote a script to read a config file for which
instrumented classes to put into the jar file - use delta to minimize the config file
19Simulated Annealing
- Simulated Annealing
- Large, non-convex sub-space
- Gradient of goodness
- Random local moves
- likely to find another point in the sub-space
- Moves parameterizable by a temperature.
- Some say the ability to sometimes get worse is
essential - I say locality, randomness, and temperature
20Delta as Simulated Annealing
- space files that pass your test
- goodness smaller file is better
- local moves chop out a chunk of file
- note that we never get worse
- so delta is greedy
- temperature chunk size
- we have an exponential annealing schedule,
which is not unusual, says wikipedia anyway.
21Delta surprisingly effective
- Especially given how ignorant and general it is
- Most ideas for improvements are how to make the
local moves better at staying in the space - These ideas generally require knowing what the
file means. - Important point But note how well delta already
does knowing nothing! - and topformflat only knows nesting and quotes!
22Improvement use knowledge of dependencies to
improve moves
If you know the language semantics, reject moves
that would violate it, or only make moves that
would produce a legal file
decl
use
23Fan Mail
- From Flash Sheridan
- This is just a quick thank-you note for Delta.
... it immediately reduced a ... bug file from
16K lines to ten (GCC bug 22604). - Oddly enough, it initially found a different bug
(22603), since I'd only specified "internal
compiler error", not "segmentation fault".
24Fan Mail, p.2
- From Flash Sheridan
- Delta has become even more valuable since my
initial thank-you note. - I'm not sure it's helped with all of the GCC bugs
I've been filing... but I couldn't have filed
most of them without Delta. - Delta has always been able to find a radically
smaller file, which I have been able to attach to
my bug report.
25Fan Mail, p.3
- From Richard Guenther
- delta is saving a lot of gcc developers life ) I
would guess 1 of 3 bugs sumitted to the gcc
bugzilla get their testcase reduced using delta. - ... a little bit more accurate would be to say
we're using delta to reduce all testcases from
the gcc bugzilla in case they get entered
unreduced.
26Delta This simple dumb script is everywhere!
- One class devoted to it in both Berkeley and
Stanford Software Engineering Courses - Berkeley We've just assigned a delta-related
homework to the students today - Stanford I gave them a homework assignment for
CS295 using delta. Feedback was positive but
unquantified. - Why did it take so long to think of this simple
thing?