Two techniques for programming by sketching (Stanford, November 2004) - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Two techniques for programming by sketching (Stanford, November 2004)

Description:

Two techniques for programming by sketching (Stanford, November 2004) Rastislav Bodik, David Mandelin, Armando Solar-Lezama, Lin Xu UC Berkeley – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 48
Provided by: RasBo
Category:

less

Transcript and Presenter's Notes

Title: Two techniques for programming by sketching (Stanford, November 2004)


1
Two techniques for programming by
sketching(Stanford, November 2004)
  • Rastislav Bodik, David Mandelin, Armando
    Solar-Lezama, Lin Xu UC Berkeley
  • Rodric Rabbah MIT
  • Kemal Ebcioglu, Doug Kimelman, Vivek Sarkar IBM

2
Synthesis
  • Program synthesis
  • given a specification, synthesize a program
    meeting this spec
  • synthesis inverse to verification
  • most work in reactive systems (Pnueli, Kupferman,
    )
  • Synthesis vs. compilation
  • synthesis involves a search for the desired
    program
  • Benefits
  • less coding, more correctness

3
Programming by sketching
  • Our approach
  • apply synthesis to software
  • sketching specification is partial
    (underspecified)

4
Two sketching techniques
  • Sketch
  • partial implementation, provided by programmer
  • Sketch resolution
  • completing the sketch into a full implementation
  • which one? (sketch completes into many
    implementations!)
  • StreamBit
  • behavioral spec sketch ? full implementation
  • Prospector
  • sketch ? several full implementations
  • user selects implementation with desired behavior

5
StreamBit Sketching high-performance
implementations of bitstream programs
  • Project lead Armando Solar-Lezama

6
Bitstream Programs
  • Bitstream programs a growing domain
  • crypto DES, Serpent, Rijndael,
  • coding in general, NSA/BitTwiddle
  • Bitstream programs operate under strict
    constraints
  • performance is very important
  • up to 95 of server cycles spent in
    security-related processing
  • correctness is crucial
  • subtle bug in Blowfish implementation allowed
    over half the keys to be cracked in less than 10
    minutes

7
Example
  • Drop every third bit in the bit stream.
  • exhibits many features of complicated
    permutations
  • exponentially many choices
  • greedy choice is suboptimal
  • fast implementation can be sketched

FAST O(log w)
8
Full sketch (13 lines of code)
WSIZE16 subsequence UnrollWSIZE(subsequence)
subsequence PermutFactor shift(12 by 0),
shift(1718 by 0), shift(3334 by 0),

shift(116 by ?), shift(1732 by ?),
shift(3348 by ?) (
subsequence ) subsequence.subsequence_1DiagSpli
tWSIZE(subsequence) for(i0 ilt3 i)
subsequence.subsequence_1.filter(i)
PermutFactor shift(116 by 0 1),
shift(116
by 0 2),
shift(116 by 0 4)
( subsequence.subsequence_1.filter(i)
) Size 13 lines
  • Compare with 100 lines of such FORTRAN code
    (from BitTwiddle)
  • ...
  • DATA MASKB2 /Z'FFC003FF000FFC00',
    Z'3FF000FFC003FF00',
  • Z'0FFC003FF000FFC0',
    Z'03FF000FFC003FC0',
  • ...
  • c Compress 5-bit groups together
  • TB IAND(TB ISHFT(TB, SKIPBC),
    MASKB2(J))
  • TC IAND(TC ISHFT(TC, SKIPBC),
    MASKC2(J))
  • ...

9
What you gain
  • DropThird benchmark
  • Speedups over naïve code with a 14 line sketch
  • 32 bit on a Pentium IV 83.8
  • 64 bit on an Itanium II 233
  • DES benchmark
  • 32 bit on a Pentium IV with 30 line sketch
  • 634 speedup over naïve
  • within 11 of hand optimized libDES
  • 64 bit IA64 and IBM SP2
  • we beat libDES by 8

10
What is sketching
  • Key idea separation of concerns
  • specify behavior without concern for performance
  • create implementation without concern for bugs
  • domain expert
  • writes a behavioral specification of her crypto
    algorithm
  • as clean as possible, no optimizations
  • performance expert
  • describes an efficient implementation of the
    clean algorithm
  • neither reimplements nor describes in full
  • he only sketches an outline of the
    implementation compiler fills in details
  • if sketch is wrong, compiler complains ? no bugs
    can be introduced

11
Compilation strategy
  • A sketch overrides a naïve compiler
  • naïve compiler translates the clean algorithm
    into target code,
  • with a simple sequence of semantics-preserving
    transformations
  • (1) make all filters word-size (unroll and split)
  • (2) decompose word-size filters into machine
    instructions
  • sketch inserts a step into the naïve sequence
  • Ex. sketch decomposes a filter into a pipeline
    of filters
  • after sketch is applied, naïve compiler continues

12
The behavioral spec (StreamIt)
  • StreamIt
  • synchronous dataflow language
  • filters represented internally as matrices

13
Naïve compilation
  • Example Drop Third Bit (word size W 4
    bits)
  • Unroll filter
  • decompose into filters operating on W4 bits of
    input.
  • decompose into filters producing W4 bits of
    output

14
Naïve compilation (cont.)
  • Make each filter correspond to one basic
    operation available in the hardware

15
The Full Picture
Task Description
16
Decomposition without sketching
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.F_1
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1
specify FAST bit shifting algorithm w/out
sketching
F.F_1
F
F.F_2
F.F_3
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.F_2
  • User provides high level decomposition of F into
    F.F_i
  • System Takes care of compiling F.F_i
  • Correctness is guaranteed as long as
  • F.F_3 ? F.F_2 ? F.F_1 F
  • Avoid spelling out the decomposition
  • Sketch It!
  • some properties ? some properties ? some
    properties F

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0
F.F_3
17
Sketching another example
A permutation from DES cipher (64 bits ? 64 bits)
Problem when implemented as a table lookup, the
table is very large
  • Idea decompose into a pipeline of two
    permutations
  • provided by the programmer an inexpensive
    permutation
  • automatically derived from the sketch two
    identical permutations (to be implemented as one
    smaller table)

18
Sketching How it works
  • Start with a sketch
  • Define xi,j as the amount bit i will move on step
    j
  • Semantic equivalence imposes linear constraints
    on the xi,j
  • Many of the constraints in the sketch also impose
    linear constraints on xi,j
  • Solving the linear constraints produces a space
    of possible solutions
  • Map the nonlinear constraints to this solution
    space
  • Search
  • SketchDecomp
  • shift(132 by 0 1),
  • shift(132 by 0 2),
  • shift(132 by 0 4),
  • shift(132 by 0 8)
  • ( Filter )

19
User Study (time to first solution)
C
20
User Study (developing a good implementation)
21
Implementing the fastest DES
  • How fast can we match the fastest DES
    implementation?
  • 6 different implementations in 4 hours
  • includes all but one trick used in libDES
  • so fast partly because sketching avoids bugs

22
Concluding Remarks
  • StreamBit allows for
  • Task specification oblivious to performance
  • Implementation specification without bugs
  • Same idea may apply in other domains
  • If people currently resort to very low level
    coding
  • If some algebraic structure can be imposed on the
    task
  • It may be amenable to implementation sketching.

23
Mining Jungloids Helping to Navigate the API
Jungle
  • Project lead David Mandelin

24
A software reuse problem
  • big components reusable Lampson99
  • OS, DBMS, browser
  • small components challenging
  • flexibility functionality cut finely, for fine
    control
  • size in J2SE, 21,000 methods in 1000s of classes
  • cost to understand and use
  • one of three obstacles to reuse Lampson99
  • searching for information
  • nearly ¼ of developer time metallect.com
  • often give up reuse and reimplement

25
Example
  • programming task parse a Java file into an AST

IFile file ICompilationUnit cu
JavaCore.createCompilationUnitFrom(file)
ASTNode node AST.parseCompilationUnit(cu,
false)
IFile file ASTNode node ?
  • Why so hard to find? (productivity 2LOC/hour)
  • class member browsers? two unknown classes used
  • follow expected design? two levels of file
    handlers
  • grep? method returns a subclass

26
The morale?
  • type signatures
  • not very useful in finding desired code
  • but once found, can be used to verify
  • so why not search existing code base?
  • somebody must have written these two lines
    before!
  • yes, but not in same method
  • for software engineering reasons
  • or even same program
  • e.g. parse an editor buffer, not a file
  • still, sample code useful, as we will see

27
Our goal
  • We want a programmers search engine that
  • doesnt merely find an example code
  • instead, it synthesizes the desired code
  • from two favorite sources
  • type signatures
  • existing code examples

28
More precisely
  • mining input
  • the API (type signatures from class definitions)
  • corpus of API client code
  • search input
  • a query specifying programmers intent
  • output
  • synthesized code
  • ready for insertion into user program
  • give several candidates (user selects one)

29
Formulating the code search problem
  • We must decide on the structure of
  • input query (coding intent)
  • easy to express for the user
  • yet specific enough for the search engine
  • output code (synthesized code)
  • easy to understand and validate (by reading docs)
  • code should complete the program under
    construction

30
The query from have to want
  • 1st observation
  • Reuse problems can usually be described with a
    have-one-want-one query q(h,w)
  • What code will transform a (single) object of
    (static) type h into a (single) object of
    (static) type w?
  • Our parsing example q (IFile, ASTNode)

IFile file ICompilationUnit cu
JavaCore.createCompilationUnitFrom(file)
ASTNode node AST.parseCompilationUnit(cu,
false)
31
Output code jungloid
  • 2nd observation
  • most queries can be answered with a jungloid
  • jungloid
  • a unary expression composed of unary expressions
  • field access
  • call to an instance method with 0 arguments
  • call to a static method or constructor with 1
    argument
  • conversion to supertype
  • (multi-argument methods decomposed into unary
    ones)

IFile file ICompilationUnit cu
JavaCore.createCompilationUnitFrom(file)
ASTNode node AST.parseCompilationUnit(cu,
false)
32
Coverage
  • An informal experiment
  • using 16 coding headaches, collected by us
  • Can the query express interesting problems?
  • yes, for 12 out of 16 coding problems
  • Can queries be answered with a jungloid?
  • yes, all 12 queries answered with jungloids
  • 9 of them are simple jungloids
  • 3 of them use some multi-argument methods

33
Prospector our prototype
  • Eclipse plugin
  • integrated with code completion assist
  • var.CTRLSPACE
  • the want type w
  • WantType x CTRLSPACE
  • a set H of has types obtained from context
  • local variables, arguments, class fields, globals
  • issue queries (h,w) for each h ? H

field foo() bar(int len, Object key)
34
Type signature graph
  • Any path from h to w is a (h,w)-jungloid
  • 3rd observation
  • desired jungloid typically among k shortest paths
    (k5)

getResource()
getParent()
IJavaElement
IResource
IContainer
supertype
JavaCore.createClassFileFrom()
AST.parseCompilationUnit()
IClassFile
supertype
IFile
CompilationUnit
ASTNode
ICompilationUnit
AST.parseCompilationUnit()
JavaCore.createCompilationUnitFrom()
35
Jungloids with downcasts
IDebugView debugger ... Viewer viewer
debugger.getViewer() IStructuredSelection sel
(IStructuredSelection) viewer.getSelection() Java
InspectExpression expr (JavaInspectExpression)
sel.getFirstElement()
36
Our solution
  • Besides downcasts, this problem appears in
  • method arguments of type Object (only accept a
    JavaBean)
  • String objects (strings are highly polymorphic)
  • Potential solutions
  • parametric type inference, alias analysis
  • Our solution
  • mine a corpus of API uses for legal downcasts

37
Mining jungloids with downcasts
  • Ideally, only correct jungloids are synthesized
  • correct it must be possible to write a client
    code in which the jungloids downcast succeeds,
    for at least one input
  • This ideal can be approximated (overview)
  • use a corpus of API client code
  • extract jungloids with downcasts
  • use them to extend the signature graph
  • In the limit, we meet the ideal
  • limit infinitely large, bug-free corpus
  • bug-free corpus
  • weak requirement jungloids in corpus to succeed
    for one input

38
Mining jungloids with downcasts (example)
  • protected IJavaObject getObjectContext()
  • IWorkbenchPage page
  • IWorkbenchPart part page.getActivePart()
  • IDebugView view (IDebugView)
    part.getAdapter()
  • ISelection s view.getViewer().getSelection()
  • IStructuredSelection sel (IStructuredSelection)
    s
  • Object selection sel.getFirstElement()
  • JavaInspectExpression exp (JavaInspectExpressio
    n) selection
  • ...

IStructuredSelectionltJavaInspectExpressiongt
ViewerltIStructuredSelectionltJavaInspectExpressiongt
gt
IDebugView
getViewer()
Viewer
getSelection()
getInput()
ISelection
IStructuredSelection
getFirstElement()
Object
JavaInspectExpression


39
The jungloid mining algorithm (key idea)
  • When extracting jungloids, how to determine the
    necessary downcast context (i.e., jungloid
    suffix)?
  • w.x.a.(T)
  • s.y.a.(S)
  • What if the context is too short?
  • unsound a query may synthesize a jungloid that
    will throw exception in any client code
  • What if the context is too long?
  • incomplete a query may fail to synthesize the
    jungloid even though the corpus contains the
    necessary example

40
Experiment 1 (ranking test)
  • hypothesis
  • to find the desired code, the user needs to
    examine only top 5 candidate jungloids.
  • result
  • desired code in top 5 17 out 20 times (10 out
    of 20, in top 1)
  • remaining three fixable
  • methodology
  • used 20 real-world coding tasks
  • collected from FAQs, newsgroups, our practice,
    emails to us

41
Experiment 2 (user study)
  • hypothesis
  • Prospector-equipped programmers are better at
    solving API programming problems than other
    programmers
  • methodology
  • 6 problems, each user did 3 with Prospector and 3
    without
  • problems formulated not to reveal the query
  • sample problem
  • The new Java channel IO system represents files
    as channels. How do I get a channel that
    represents a String filename?
  • somewhat sparse data (10 users), surveys still
    trickling in

42
Experiment 2 (user study). Results.
  • Prospector shortens development time
  • some problems solved only by Prospector users
  • when both groups succeeded, Prospector users 30
    faster
  • Prospector may help enable reuse
  • non-Prospector users sometimes reimplemented
  • Prospector may help avoid making mistakes
  • mistakes applying code found on internet into own
    code
  • We expect even stronger results on a more robust
    infrastructure.

43
Future work
  • Coding task we currently cant handle
  • print an AST as Java source
  • The limitation
  • task is expressible as a (have,want) query
  • but result is not a jungloid (as defined in this
    talk)

ASTNode ast ... ASTFlattener visitor new
ASTFlattener() ast.accept(visitor) String
result visitor.getResult()
ASTNode ast ... ASTFlattener visitor new
ASTFlattener() ASTFlattener visitor2
ast.accept(visitor) String result
visitor2.getResult()
44
Try it!
  • Web demo
  • snobol.cs.berkeley.edu
  • Eclipse plugin
  • coming soon
  • want to alpha test it?

45
Conclusion
  • Sketch
  • partial implementation, provided by programmer
  • StreamBit
  • behavioral spec sketch ? full implementation
  • goal total correctness and performance
  • Prospector
  • sketch ? several full implementations
  • user selects implementation with desired behavior
  • goal software reuse

46
Backup slides
47
Programming with jungloids
NodeItem node (NodeItem) getModel() GraphNodeFi
gure f (GraphNodeFigure) getFigure() f.getLabel
().setName(node.getNodeName()) Rectangle r new
Rectangle(node.x, node.y, -1, -1) GraphicalEditPa
rt parent (GraphicalEditPart)
getParent() parent.setLayoutConstraint(this, f,
r)
Write a Comment
User Comments (0)
About PowerShow.com