Compiling Haskell to Java - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Compiling Haskell to Java

Description:

Select the correct execution path based on the result of evaluation (i.e. pattern matching) ... Pattern Matching and Continuations ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 38
Provided by: Eri6179
Category:

less

Transcript and Presenter's Notes

Title: Compiling Haskell to Java


1
Compiling Haskell to Java
  • CPSC 510 Final Presentation
  • Eric Parsons

2
Part 1
  • The Core Language

3
The Core Language
  • Haskell source is preprocessed by GHC and
    compiled to Core
  • Core is a simplification of Haskell
  • Core is the typed ?-calculus plus
  • data constructors
  • let(rec) expressions
  • case expressions

4
The Core Language Contd
  • Pattern matching and if then else expressions
    are translated to case expressions
  • Convenience syntax like do notation, records, and
    list comprehensions are desugared
  • Type classes are replaced by explicit dictionary
    passing
  • Special symbols are z-encoded (e.g. gt becomes
    zgze, z becomes zz)

5
The Core Language Contd
  • Original Type Class
  • class Foo a where
  • foo a -gt String
  • bar a -gt String
  • instance Foo Bool where
  • foo True true
  • foo False false
  • bar _ Boolean
  • twoFoo Foo a gt a -gt String
  • twoFoo x foo x foo x
  • test String
  • test twoFoo True
  • Conversion to Dictionary Passing
  • data Foo a
  • Foo (a -gt String) (a -gt String)
  • foo_Bool True true
  • foo_Bool False false
  • bar_Bool _ Boolean
  • twoFoo Foo a -gt String
  • twoFoo (Foo foo bar)
  • foo x foo x
  • test String
  • test
  • twoFoo (Foo foo_Bool bar_Bool) True

6
Core Runtime Model
  • Core has lazy evaluation semantics
  • Delayed evaluations (thunks), partial application
    and data constructors are uniformly represented
    at runtime using closures
  • A closure represents an instance of a function
    (or constructor) applied to some values
  • A closure stores two things
  • a pointer to executable code, representing its
    definition
  • the values for the variables

7
Core Runtime Model Contd
  • Example map f 1,2,3

8
Core Runtime Model Contd
  • A closure is similar to an activation frame on a
    stack in traditional languages, except
  • Its heap-allocated
  • It can be treated like an ordinary value and
    passed around
  • It can be evaluated at a later time, or not at
    all
  • Closures point to other closures, and form a
    graph at runtime

9
Core Runtime Model Contd
  • A thunk is a special type of closure that is
    reducible
  • Executing (entering, forcing) a thunk eventually
    returns a WHNF (constructor closure) that cannot
    be further reduced
  • The thunk is overwritten with the WHNF so that if
    its value is needed again, it doesnt need to be
    recalculated

10
Mapping Core to Closures
  • Lambda abstractions define new closure types
  • Application allocates a new closure instance
  • case expressions
  • Force evaluation of a closure
  • Select the correct execution path based on the
    result of evaluation (i.e. pattern matching)

11
Continuations
  • Before evaluating a closure, a continuation is
    pushed on a stack
  • A continuation is a vector of code pointers, one
    for each alternative in the case expression
  • When a closure for a constructor is evaluated,
    it
  • Pops the top continuation from the stack
  • Selects the corresponding pointer from the
    continuation and jumps
  • Similar to continuation passing style -- code
    never returns, it just keeps jumping to the
    next segment

12
Part 2
  • Mapping Core to Java

13
Closures in Java
  • Closures in Java are naturally represented as
    objects
  • Code pointers can be implemented using virtual
    methods
  • There are two basic approaches
  • A generic class that can represent any type of
    closure
  • A separate class for each type of closure

14
Approach 1 - Generic Class
  • public abstract class CodePointer
  • public void run(Closure values)
  • public class Closure
  • public CodePointer cp
  • public Closure values
  • public class map extends CodePointer
  • public void run(Closure values)
  • // Code for map
  • public static CodePointer map_cp new map()
  • ...
  • Closure c new Closure()
  • c.cp map_cp

15
Approach 1 - Generic Class
  • Advantages
  • Can be overwritten with its WHNF once evaluated
  • Disadvantages
  • Extra memory for array and code pointer
  • All primitives must be boxed
  • Extra indirection to run code and access values
  • Array access is slow in Java

16
Approach 2 - Specialized Classes
  • public abstract class Closure
  • public void enter()
  • public class map extends Closure
  • public Closure f
  • public Closure xs
  • public void enter()
  • // Code for map
  • ...
  • Closure c new map(f_value, xs_value)
  • ...
  • c.enter()

17
Approach 2 - Specialized Classes
  • Advantages
  • Direct access to values can be unboxed
  • No explicit code pointer less memory overhead
  • Less indirection to run code
  • Disadvantages
  • Cant overwrite one object with another a
    different strategy needed for updates

18
Control Flow
  • Functional approach presents problems
  • Java has no tail-call optimization
  • Java has no long jumps, only local jumps (within
    the same method)
  • Using the native Java stack and normal function
    call semantics would result in stack overflow
  • An explicit stack (Java array) is needed for
    continuations

19
Avoiding Stack Overflow Solution 1
  • Each type of closure is tagged with a unique
    integer value
  • Entire Program is compiled into a single method
  • The method has a loop that inspects the
    destination tag, and jumps to the appropriate
    code using a switch statement

20
Avoiding Stack Overflow Solution 1
  • public static void main(String args)
  • int next
  • while (true)
  • switch (next)
  • case MAP
  • // Code for map
  • // Set next to desired jump destination
  • break
  • case FILTER
  • ...
  • case ...
  • default
  • // Exit loop

21
Avoiding Stack Overflow Solution 1
  • Advantages
  • Faster than method invocation
  • Disadvantages
  • Separate compilation not possible since whole
    program must be compiled at once
  • Java imposes limit of 64K on the size of a single
    method not viable for large programs

22
Avoiding Stack Overflow Solution 2
  • On execution, closures return the next closure
    they want to call
  • Function calls are bounced off a tiny
    interpretive loop called a trampoline
  • Closure next initial_closure
  • while (next ! null)
  • next next.enter()

23
Handling Function Passing and Partial Application
  • Example
  • map f xs case xs of
  • Nil -gt Nil
  • Cons y ys -gt
  • let
  • c1 f y
  • c2 map f ys
  • in Cons t1 t2
  • The closure c2 is of a known type (map)
  • The closure c1 is unknown
  • What code should execute when its entered?
  • Does f take 1 parameter, or n parameters with n-1
    already applied?

24
Handling Function Passing and Partial Application
  • Solution each closure definition has 2
    implementations
  • One that reads all its values from the closure
    (saturated)
  • One that reads all its values from an argument
    stack (curried)
  • Unknown functions and partially applied functions
    are handled by special apply closures

25
Apply Closures
  • Apply closures store
  • The closure that is being applied
  • The values that it is being applied to
  • In previous example, c1 looks like this

26
Apply Closures
  • When an apply closure is entered, it
  • Pushes the argument(s) onto the argument stack
  • Enters the applied closure
  • Apply closures can be chained
  • The last closure in the chain is a curried
    closure when it executes, all of its arguments
    are now on the stack
  • Arguments must be pushed in reverse order (last
    argument pushed first)
  • In reality, separate stacks are used to handle
    closures vs. primitive arguments (int, double,
    etc.)

27
Pattern Matching and Continuations
  • When a case expression forces evaluation of a
    closure, it will eventually be reduced to a WHNF
    (data constructor)
  • Execution must resume at the correct branch
    depending on which constructor is returned

28
Pattern Matching and Continuations
  • Simple approach tag each constructor closure
    with an integer, and select the correct branch
    based on the tag
  • Disadvantages
  • Extra memory overhead needed for tags
  • Extra branching instructions
  • Not in line with GHCs tagless approach

29
Pattern Matching and Continuations
  • Better approach since each closure has
    associated code, the code for constructors can
    directly jump to the right alternative using
    continuations
  • No tags or branch instructions necessary

30
Continuations
  • Continuations can be represented two different
    ways
  • As an array of code pointer objects
  • Each different constructor jumps to a different
    offset
  • This approach has several disadvantages, as
    already mentioned (slow array access, extra
    indirection, etc)
  • As an object with a method for each alternative
  • Each constructor calls the appropriate method
  • The method directly implements the continuation
    code

31
Continuations
  • Example
  • // Base class for all continuations of type List
  • // Continuations extend this class and override
    each method appropriately
  • // c is the actual instance of the constructor
  • public class List_Continuation
  • public Closure nil_branch(Closure c)
  • return default_branch(c)
  • public Closure cons_branch(Closure c)
  • return default_branch(c)
  • public Closure default_branch(Closure c)
  • throw new RuntimeException(pattern match
    failure)

32
Updates
  • Overwriting not possible in given implementation
  • Simple workaround allocate an extra field in
    each thunk that stores the WHNF once it is
    entered
  • The code for the thunk checks this field, and if
    it is non-null it simply returns that instead of
    executing the closure body again

33
Updates
  • Disadvantages
  • Extra memory overhead for every thunk to store
    the WHNF
  • The thunk itself cannot be garbage-collected
  • The original values in the thunk must be set to
    null to avoid memory leaks
  • Update overhead is incurred even if the thunk is
    evaluated only once (in typical programs, this
    will be the case the majority of the time)

34
Updates
  • A better solution perform updates only when
    necessary by doing update analysis
  • Two ways to do update analysis
  • Closure creation time (used by GHC)
  • Closure sharing time (used by me)

35
Updates
  • GHC uses static analysis to determine which types
    of thunks dont need updating
  • Unfortunately, there are few situations where
    this can be detected statically, especially with
    separate compilation
  • Most thunks created at runtime will be updated,
    even if they are evaluated only once

36
Sharing Analysis
  • Observation a closure can only be entered more
    than once if it is pointed to by more than one
    closure in the runtime graph
  • This can only happen if the closure is shared,
    i.e. if it is referenced more than once in the
    body of a function
  • Using these observations, we can do sharing
    analysis instead

37
Sharing Analysis
  • All closures are created as if they will only be
    entered once (no updating code)
  • If a closure is shared, then an updateable
    version of it is created before it is referenced
    by anything else
  • This is done by attaching special code to the
    original closure at runtime -- an indirection
    node and an updater object
Write a Comment
User Comments (0)
About PowerShow.com