Title: Design, Implementierung und Evaluierung einer virtuellen Maschine fr Oz
1Design, Implementierung und Evaluierung einer
virtuellen Maschine für Oz
Ralf Scheidhauer PS Lab, DFKI May 18, 1999
2Oz
- Developed at DFKI since 1991
- DFKI Oz 1.0 (1995), DFKI Oz 2.0 (1998)
- Mozart 1.0 (1999)
- 180 000 lines of C
- 140 000 lines of Oz
- 65 000 lines documentation
- Since 1996 collaboration with SICS and UCL
- Application strength systemmulti agents (DFKI,
SICS), computer-bus scheduling (Daimler), gate
scheduling (Singapore), NL (SFB), comp. biology
(LMU),...
3Related Work
- LP, CLP Warren 77, Jaffer Lassez 86
- Concurrency Saraswat 93
- AKL Janson Haridi 90, Janson 94
- FP Appel 92
4Overview
- Language L
- Virtual machine
- Implementation
- Evaluation
5The Language L
- Core language of Oz
- Presentation as extension of a sub language of
SML - Logic variables
- Threads
- Synchronization
- Dynamic type system
- Extensions via predefined functions lvar() logic
variable unify(x,y) unification spawn(f) thre
ad creation
6Graph Model
- Integers
- Tuples
- Functions
- Cells (references)
- Constructors
- Strict evaluation of expressions e0 ? e1 ? ...
7Why Logic Variables?
- Programming techniques backpatching, difference
lists, ... - Cyclic data structures
- Tail recursive definition of many functions
(append, map, ...) - Synchronization of threads
- Search
8Logic Variables Creation and Representation
- let val x lvar()in (4,x,23)end
9Logic Variables Unification
unify( , )
TUPLE
TUPLE
INT/3
VAR
INT/2
INT/3
INT/5
VAR
10Threads
thread1
threadn
. . .
e1
en
store
- Synchronization logic variables (xy)
- Fairness
11Virtual Machine
12Model
X-regs
stack
threads
heap
...move Y3 X0move G5 X1apply G2 2return...
scheduler
code
13V-Addressing
- Address toplevel variables via V-registers
- Loader builds data on the heap ? code contains
direct references into heap - Example fun f(l,u) map(fn(x)gth(x)g(x)u, l)
- h and g in V-register ? reduced memory consumption
14Dynamic Code Specialization
apply V3 2
15Unification in the Machine Model
unify( , )
TUPLE
TUPLE
INT/3
VAR
INT/2
INT/3
INT/5
VAR
16Synchronization Suspension Wakeup
(xy)
...
thread
17Synchronization Suspension Wakeup
18Implementation
19Emulator vs. Native Code
virtual machine
implementation
native code
emulator
20Threads
- X registers once per machine, not per thread
- Save live X registers upon preemption/suspension
pessimistic guess per function - Exact determination during GC by code
interpretation
21Representation of the Graph Naiv
register
heap
type
...
...
22Representation of the Graph Optimized
register
heap
INT
23
type
...
...
PTR
...
23Representation of the Graph Logic Variables
register
heap
INT
23
VAR
...
PTR
REF
...
PTR
24Logic Variables Optimized
register
heap
INT
23
type
...
...
PTR
...
...
VAR
REF
register
REF
25Moving More Tags
register
heap
INT
23
type
PTR
...
...
REF
...
TPL
...
...
26Evaluation
27Comparison with Emulators
- Mozart is one of the fastest emulators
- Competitive with OCAML and Java
- Significantly faster than Moscow ML
- Twice as fast as Sicstus Prolog and Erlang
28Comparison with Native Code Systems
- Few memory accesses (i.e. arithmetics) ? Mozart
is easily one order of magnitude slower - Memory intensive (symbolic computation)
- Difference only approx. factor 2-3
- Mozart in single cases faster than native ML or
C
29Threads
- Threads in Mozart are very light weight
- Leading position both for creation and
communication - Up to nearly 2 orders of magnitude faster than
Java (creation)
30Summary
- Extended sub language of SML by logic variables
and threads - Machine model
- V - registers
- Dynamic code specialization
- Synchronization
- Implementation
- Efficient implementation of threads
- Tagging scheme
- Evaluation
- Mozart is one of the fastest emulators
- Compares well with native code systems on its
target applications - Mozart has very light weight threads
31Backup Slides for the Discussion
32Logic Variables vs. Functions
- Runtime fibonacci takeushi speedup 1.18
1.45 - Memory (large scale applications)
- Use approx. 18 of heap memory
- Approx. twice as much as objects
- Approx. as much as records
33Memory Profile
34Mandelbrot (Floats)
1.00
2.65
1/1.11
1/1.58
1/8.77
1/11.23
1.37
1/39.24
35Quicksort with Lists
1.00
2.43
1.57
5.19
1/2.59
1/3.69
1/2.99
1/3.46
36Quicksort with Arrays
1.00
1.25
1/1.48
1/4.01
1/7.92
1/1.52
1/20.86
37Naiv Reverse
1.00
1.81
1.59
1.51
11.82
1.04
1/1.60
2.05
1.70
38Threads Creation
39Threads fib(20)
1.0
1.09
4.73
708.06
1/1.14
40Tagging Scheme of Mozart
- 4 bit tag, but only 2 bit loss for address space
(1GB)align structures on word boundaries - Lists, tuples no need to unmask before type test
- REF - tag
- no unmask before test necessary
- no unmask before deref
41Threads
move Y3 X0move G5 X1apply G2 2...
PC
task
L
G
X
thread
42Emulators Optimization Techniques
- Threaded code
- Instruction collapsing
- Register access
- Specialization
- Example move Y5 X3 move Y6 X1
34 11 (SPARC)
43Address Modes (Registers)
- name liveness notation usage
X thread Xi temp. values, parameters
local fct-body Li local
variablesglobal function Gi free
variablesvirtual program Vi constants
44Threads
- Fairness status-registercheck on every
function call (and return)
....
GC
IO
PRE
45L
- e x variable n integer (e1,...,en)
tuple fn (x1,...,xn) gt e function
e0(e1,...,en) application let val x e in e
end variable declaration let con x in e
end constructor declaration case e of p1 gt
e1 ... pngten pattern matching -
- lvar () -gt ? logic variableunify ? ? ?
-gt () unification spawn (() -gt ?) -gt
() thread creation
Operators
46add Xi Xk Xn
Tagged Xi X(PC1) 2 0
(2) DEREF(Xi)
2 0 if (isInt(getTag(Xi))) 12 0 Tagged
Xk X(PC2) 2 2 DEREF(Xk)
2 0 if
(isInt(getTag(Xk))) 12 0
int aux intValue(Xi)intValue(Xk) 111 2
XPC(3) oz_int(aux) ovflwshifttagstore 322
0 (2) DISPATCH(4)
3 3
---------------
27 7 (11)
no derefs 23
no type tests 17
overflow 6
47Java JIT vs. Emulator
speedup quicksort (array) 18.8 fib
(int) 14.2 fib (float) 4.9
queens 6.1 nrev 2.0 quicksort
(list) 2.3 fib (thread) 1.1
mandelbrot 5.4 deriv (virtual) 1.9