Compiling Haskell to Java - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Compiling Haskell to Java

Description:

Select the correct execution path based on the result of evaluation (i.e. pattern matching) ... Pattern Matching and Continuations ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 38

Provided by: Eri6179

Category:

more less

Transcript and Presenter's Notes

Title: Compiling Haskell to Java

1
Compiling Haskell to Java

CPSC 510 Final Presentation
Eric Parsons

2
Part 1

The Core Language

3
The Core Language

Haskell source is preprocessed by GHC and
compiled to Core
Core is a simplification of Haskell
Core is the typed ?-calculus plus
data constructors
let(rec) expressions
case expressions

4
The Core Language Contd

Pattern matching and if then else expressions
are translated to case expressions
Convenience syntax like do notation, records, and
list comprehensions are desugared
Type classes are replaced by explicit dictionary
passing
Special symbols are z-encoded (e.g. gt becomes
zgze, z becomes zz)

5
The Core Language Contd

Original Type Class
class Foo a where
foo a -gt String
bar a -gt String
instance Foo Bool where
foo True true
foo False false
bar _ Boolean
twoFoo Foo a gt a -gt String
twoFoo x foo x foo x
test String
test twoFoo True

Conversion to Dictionary Passing
data Foo a
Foo (a -gt String) (a -gt String)
foo_Bool True true
foo_Bool False false
bar_Bool _ Boolean
twoFoo Foo a -gt String
twoFoo (Foo foo bar)
foo x foo x
test String
test
twoFoo (Foo foo_Bool bar_Bool) True

6
Core Runtime Model

Core has lazy evaluation semantics
Delayed evaluations (thunks), partial application
and data constructors are uniformly represented
at runtime using closures
A closure represents an instance of a function
(or constructor) applied to some values
A closure stores two things
a pointer to executable code, representing its
definition
the values for the variables

7
Core Runtime Model Contd

Example map f 1,2,3

8
Core Runtime Model Contd

A closure is similar to an activation frame on a
stack in traditional languages, except
Its heap-allocated
It can be treated like an ordinary value and
passed around
It can be evaluated at a later time, or not at
all
Closures point to other closures, and form a
graph at runtime

9
Core Runtime Model Contd

A thunk is a special type of closure that is
reducible
Executing (entering, forcing) a thunk eventually
returns a WHNF (constructor closure) that cannot
be further reduced
The thunk is overwritten with the WHNF so that if
its value is needed again, it doesnt need to be
recalculated

10
Mapping Core to Closures

Lambda abstractions define new closure types
Application allocates a new closure instance
case expressions
Force evaluation of a closure
Select the correct execution path based on the
result of evaluation (i.e. pattern matching)

11
Continuations

Before evaluating a closure, a continuation is
pushed on a stack
A continuation is a vector of code pointers, one
for each alternative in the case expression
When a closure for a constructor is evaluated,
it
Pops the top continuation from the stack
Selects the corresponding pointer from the
continuation and jumps
Similar to continuation passing style -- code
never returns, it just keeps jumping to the
next segment

12
Part 2

Mapping Core to Java

13
Closures in Java

Closures in Java are naturally represented as
objects
Code pointers can be implemented using virtual
methods
There are two basic approaches
A generic class that can represent any type of
closure
A separate class for each type of closure

14
Approach 1 - Generic Class

public abstract class CodePointer
public void run(Closure values)
public class Closure
public CodePointer cp
public Closure values
public class map extends CodePointer
public void run(Closure values)
// Code for map
public static CodePointer map_cp new map()
...
Closure c new Closure()
c.cp map_cp

15
Approach 1 - Generic Class

Advantages
Can be overwritten with its WHNF once evaluated

Disadvantages
Extra memory for array and code pointer
All primitives must be boxed
Extra indirection to run code and access values
Array access is slow in Java

16
Approach 2 - Specialized Classes

public abstract class Closure
public void enter()
public class map extends Closure
public Closure f
public Closure xs
public void enter()
// Code for map
...
Closure c new map(f_value, xs_value)
...
c.enter()

17
Approach 2 - Specialized Classes

Advantages
Direct access to values can be unboxed
No explicit code pointer less memory overhead
Less indirection to run code

Disadvantages
Cant overwrite one object with another a
different strategy needed for updates

18
Control Flow

Functional approach presents problems
Java has no tail-call optimization
Java has no long jumps, only local jumps (within
the same method)
Using the native Java stack and normal function
call semantics would result in stack overflow
An explicit stack (Java array) is needed for
continuations

19
Avoiding Stack Overflow Solution 1

Each type of closure is tagged with a unique
integer value
Entire Program is compiled into a single method
The method has a loop that inspects the
destination tag, and jumps to the appropriate
code using a switch statement

20
Avoiding Stack Overflow Solution 1

public static void main(String args)
int next
while (true)
switch (next)
case MAP
// Code for map
// Set next to desired jump destination
break
case FILTER
...
case ...
default
// Exit loop

21
Avoiding Stack Overflow Solution 1

Advantages
Faster than method invocation
Disadvantages
Separate compilation not possible since whole
program must be compiled at once
Java imposes limit of 64K on the size of a single
method not viable for large programs

22
Avoiding Stack Overflow Solution 2

On execution, closures return the next closure
they want to call
Function calls are bounced off a tiny
interpretive loop called a trampoline
Closure next initial_closure
while (next ! null)
next next.enter()

23
Handling Function Passing and Partial Application

Example
map f xs case xs of
Nil -gt Nil
Cons y ys -gt
let
c1 f y
c2 map f ys
in Cons t1 t2
The closure c2 is of a known type (map)
The closure c1 is unknown
What code should execute when its entered?
Does f take 1 parameter, or n parameters with n-1
already applied?

24
Handling Function Passing and Partial Application

Solution each closure definition has 2
implementations
One that reads all its values from the closure
(saturated)
One that reads all its values from an argument
stack (curried)
Unknown functions and partially applied functions
are handled by special apply closures

25
Apply Closures

Apply closures store
The closure that is being applied
The values that it is being applied to
In previous example, c1 looks like this

26
Apply Closures

When an apply closure is entered, it
Pushes the argument(s) onto the argument stack
Enters the applied closure
Apply closures can be chained
The last closure in the chain is a curried
closure when it executes, all of its arguments
are now on the stack
Arguments must be pushed in reverse order (last
argument pushed first)
In reality, separate stacks are used to handle
closures vs. primitive arguments (int, double,
etc.)

27
Pattern Matching and Continuations

When a case expression forces evaluation of a
closure, it will eventually be reduced to a WHNF
(data constructor)
Execution must resume at the correct branch
depending on which constructor is returned

28
Pattern Matching and Continuations

Simple approach tag each constructor closure
with an integer, and select the correct branch
based on the tag
Disadvantages
Extra memory overhead needed for tags
Extra branching instructions
Not in line with GHCs tagless approach

29
Pattern Matching and Continuations

Better approach since each closure has
associated code, the code for constructors can
directly jump to the right alternative using
continuations
No tags or branch instructions necessary

30
Continuations

Continuations can be represented two different
ways
As an array of code pointer objects
Each different constructor jumps to a different
offset
This approach has several disadvantages, as
already mentioned (slow array access, extra
indirection, etc)
As an object with a method for each alternative
Each constructor calls the appropriate method
The method directly implements the continuation
code

31
Continuations

Example
// Base class for all continuations of type List
// Continuations extend this class and override
each method appropriately
// c is the actual instance of the constructor
public class List_Continuation
public Closure nil_branch(Closure c)
return default_branch(c)
public Closure cons_branch(Closure c)
return default_branch(c)
public Closure default_branch(Closure c)
throw new RuntimeException(pattern match
failure)

32
Updates

Overwriting not possible in given implementation
Simple workaround allocate an extra field in
each thunk that stores the WHNF once it is
entered
The code for the thunk checks this field, and if
it is non-null it simply returns that instead of
executing the closure body again

33
Updates

Disadvantages
Extra memory overhead for every thunk to store
the WHNF
The thunk itself cannot be garbage-collected
The original values in the thunk must be set to
null to avoid memory leaks
Update overhead is incurred even if the thunk is
evaluated only once (in typical programs, this
will be the case the majority of the time)

34
Updates

A better solution perform updates only when
necessary by doing update analysis
Two ways to do update analysis
Closure creation time (used by GHC)
Closure sharing time (used by me)

35
Updates

GHC uses static analysis to determine which types
of thunks dont need updating
Unfortunately, there are few situations where
this can be detected statically, especially with
separate compilation
Most thunks created at runtime will be updated,
even if they are evaluated only once

36
Sharing Analysis

Observation a closure can only be entered more
than once if it is pointed to by more than one
closure in the runtime graph
This can only happen if the closure is shared,
i.e. if it is referenced more than once in the
body of a function
Using these observations, we can do sharing
analysis instead

37
Sharing Analysis

All closures are created as if they will only be
entered once (no updating code)
If a closure is shared, then an updateable
version of it is created before it is referenced
by anything else
This is done by attaching special code to the
original closure at runtime -- an indirection
node and an updater object

Write a Comment

User Comments (0)