Title: Compiling Haskell to Java
1Compiling Haskell to Java
- CPSC 510 Final Presentation
- Eric Parsons
2Part 1
3The Core Language
- Haskell source is preprocessed by GHC and
compiled to Core - Core is a simplification of Haskell
- Core is the typed ?-calculus plus
- data constructors
- let(rec) expressions
- case expressions
4The Core Language Contd
- Pattern matching and if then else expressions
are translated to case expressions - Convenience syntax like do notation, records, and
list comprehensions are desugared - Type classes are replaced by explicit dictionary
passing - Special symbols are z-encoded (e.g. gt becomes
zgze, z becomes zz)
5The Core Language Contd
- Original Type Class
- class Foo a where
- foo a -gt String
- bar a -gt String
- instance Foo Bool where
- foo True true
- foo False false
- bar _ Boolean
- twoFoo Foo a gt a -gt String
- twoFoo x foo x foo x
- test String
- test twoFoo True
- Conversion to Dictionary Passing
- data Foo a
- Foo (a -gt String) (a -gt String)
- foo_Bool True true
- foo_Bool False false
- bar_Bool _ Boolean
- twoFoo Foo a -gt String
- twoFoo (Foo foo bar)
- foo x foo x
- test String
- test
- twoFoo (Foo foo_Bool bar_Bool) True
6Core Runtime Model
- Core has lazy evaluation semantics
- Delayed evaluations (thunks), partial application
and data constructors are uniformly represented
at runtime using closures - A closure represents an instance of a function
(or constructor) applied to some values - A closure stores two things
- a pointer to executable code, representing its
definition - the values for the variables
7Core Runtime Model Contd
8Core Runtime Model Contd
- A closure is similar to an activation frame on a
stack in traditional languages, except - Its heap-allocated
- It can be treated like an ordinary value and
passed around - It can be evaluated at a later time, or not at
all - Closures point to other closures, and form a
graph at runtime
9Core Runtime Model Contd
- A thunk is a special type of closure that is
reducible - Executing (entering, forcing) a thunk eventually
returns a WHNF (constructor closure) that cannot
be further reduced - The thunk is overwritten with the WHNF so that if
its value is needed again, it doesnt need to be
recalculated
10Mapping Core to Closures
- Lambda abstractions define new closure types
- Application allocates a new closure instance
- case expressions
- Force evaluation of a closure
- Select the correct execution path based on the
result of evaluation (i.e. pattern matching)
11Continuations
- Before evaluating a closure, a continuation is
pushed on a stack - A continuation is a vector of code pointers, one
for each alternative in the case expression - When a closure for a constructor is evaluated,
it - Pops the top continuation from the stack
- Selects the corresponding pointer from the
continuation and jumps - Similar to continuation passing style -- code
never returns, it just keeps jumping to the
next segment
12Part 2
13Closures in Java
- Closures in Java are naturally represented as
objects - Code pointers can be implemented using virtual
methods - There are two basic approaches
- A generic class that can represent any type of
closure - A separate class for each type of closure
14Approach 1 - Generic Class
- public abstract class CodePointer
- public void run(Closure values)
-
- public class Closure
- public CodePointer cp
- public Closure values
-
- public class map extends CodePointer
- public void run(Closure values)
- // Code for map
-
-
- public static CodePointer map_cp new map()
- ...
- Closure c new Closure()
- c.cp map_cp
15Approach 1 - Generic Class
- Advantages
- Can be overwritten with its WHNF once evaluated
- Disadvantages
- Extra memory for array and code pointer
- All primitives must be boxed
- Extra indirection to run code and access values
- Array access is slow in Java
16Approach 2 - Specialized Classes
- public abstract class Closure
- public void enter()
-
- public class map extends Closure
- public Closure f
- public Closure xs
- public void enter()
- // Code for map
-
-
- ...
- Closure c new map(f_value, xs_value)
- ...
- c.enter()
17Approach 2 - Specialized Classes
- Advantages
- Direct access to values can be unboxed
- No explicit code pointer less memory overhead
- Less indirection to run code
- Disadvantages
- Cant overwrite one object with another a
different strategy needed for updates
18Control Flow
- Functional approach presents problems
- Java has no tail-call optimization
- Java has no long jumps, only local jumps (within
the same method) - Using the native Java stack and normal function
call semantics would result in stack overflow - An explicit stack (Java array) is needed for
continuations
19Avoiding Stack Overflow Solution 1
- Each type of closure is tagged with a unique
integer value - Entire Program is compiled into a single method
- The method has a loop that inspects the
destination tag, and jumps to the appropriate
code using a switch statement
20Avoiding Stack Overflow Solution 1
- public static void main(String args)
- int next
- while (true)
- switch (next)
- case MAP
- // Code for map
- // Set next to desired jump destination
- break
- case FILTER
- ...
- case ...
- default
- // Exit loop
-
-
-
21Avoiding Stack Overflow Solution 1
- Advantages
- Faster than method invocation
- Disadvantages
- Separate compilation not possible since whole
program must be compiled at once - Java imposes limit of 64K on the size of a single
method not viable for large programs
22Avoiding Stack Overflow Solution 2
- On execution, closures return the next closure
they want to call - Function calls are bounced off a tiny
interpretive loop called a trampoline - Closure next initial_closure
- while (next ! null)
- next next.enter()
23Handling Function Passing and Partial Application
- Example
- map f xs case xs of
- Nil -gt Nil
- Cons y ys -gt
- let
- c1 f y
- c2 map f ys
- in Cons t1 t2
- The closure c2 is of a known type (map)
- The closure c1 is unknown
- What code should execute when its entered?
- Does f take 1 parameter, or n parameters with n-1
already applied?
24Handling Function Passing and Partial Application
- Solution each closure definition has 2
implementations - One that reads all its values from the closure
(saturated) - One that reads all its values from an argument
stack (curried) - Unknown functions and partially applied functions
are handled by special apply closures
25Apply Closures
- Apply closures store
- The closure that is being applied
- The values that it is being applied to
- In previous example, c1 looks like this
26Apply Closures
- When an apply closure is entered, it
- Pushes the argument(s) onto the argument stack
- Enters the applied closure
- Apply closures can be chained
- The last closure in the chain is a curried
closure when it executes, all of its arguments
are now on the stack - Arguments must be pushed in reverse order (last
argument pushed first) - In reality, separate stacks are used to handle
closures vs. primitive arguments (int, double,
etc.)
27Pattern Matching and Continuations
- When a case expression forces evaluation of a
closure, it will eventually be reduced to a WHNF
(data constructor) - Execution must resume at the correct branch
depending on which constructor is returned
28Pattern Matching and Continuations
- Simple approach tag each constructor closure
with an integer, and select the correct branch
based on the tag - Disadvantages
- Extra memory overhead needed for tags
- Extra branching instructions
- Not in line with GHCs tagless approach
29Pattern Matching and Continuations
- Better approach since each closure has
associated code, the code for constructors can
directly jump to the right alternative using
continuations - No tags or branch instructions necessary
30Continuations
- Continuations can be represented two different
ways - As an array of code pointer objects
- Each different constructor jumps to a different
offset - This approach has several disadvantages, as
already mentioned (slow array access, extra
indirection, etc) - As an object with a method for each alternative
- Each constructor calls the appropriate method
- The method directly implements the continuation
code
31Continuations
- Example
- // Base class for all continuations of type List
- // Continuations extend this class and override
each method appropriately - // c is the actual instance of the constructor
- public class List_Continuation
- public Closure nil_branch(Closure c)
- return default_branch(c)
-
- public Closure cons_branch(Closure c)
- return default_branch(c)
-
- public Closure default_branch(Closure c)
- throw new RuntimeException(pattern match
failure) -
32Updates
- Overwriting not possible in given implementation
- Simple workaround allocate an extra field in
each thunk that stores the WHNF once it is
entered - The code for the thunk checks this field, and if
it is non-null it simply returns that instead of
executing the closure body again
33Updates
- Disadvantages
- Extra memory overhead for every thunk to store
the WHNF - The thunk itself cannot be garbage-collected
- The original values in the thunk must be set to
null to avoid memory leaks - Update overhead is incurred even if the thunk is
evaluated only once (in typical programs, this
will be the case the majority of the time)
34Updates
- A better solution perform updates only when
necessary by doing update analysis - Two ways to do update analysis
- Closure creation time (used by GHC)
- Closure sharing time (used by me)
35Updates
- GHC uses static analysis to determine which types
of thunks dont need updating - Unfortunately, there are few situations where
this can be detected statically, especially with
separate compilation - Most thunks created at runtime will be updated,
even if they are evaluated only once
36Sharing Analysis
- Observation a closure can only be entered more
than once if it is pointed to by more than one
closure in the runtime graph - This can only happen if the closure is shared,
i.e. if it is referenced more than once in the
body of a function - Using these observations, we can do sharing
analysis instead
37Sharing Analysis
- All closures are created as if they will only be
entered once (no updating code) - If a closure is shared, then an updateable
version of it is created before it is referenced
by anything else - This is done by attaching special code to the
original closure at runtime -- an indirection
node and an updater object