Title: Marmot
1Marmot
- Bob Fitzgerald
- Todd Knoblock
- Erik Ruf
- Bjarne Steensgaard
- David Tarditi
2Why Study Java?
- Good features static typing, type safe, OO,
exceptions, gc - Normal enough that MS programmers might use
(versus ML, Scheme, ...) - Real implementations available for
bootstrap/comparison - Benchmark code available
3Motivation
- The quality of C/C software is not as good as
it could be. - Features such as clean design, type safety,
memory safety, and garbage collection can
substantially improve software quality. - Butpeople are very concerned about the
efficiency of Java
4Counter Examples
- Office suite in Java
- Excel in Java
5Inherent in Java, or artifact of implementations?
- Applets
- interpret/Jit
- Small memory
- Small code
- Mostly UI
- Less than 1/10 the speed of C.
- Applications
- ?
6Goals
- Well-balanced performance
- Research infrastructure
7The Marmot System
8Buzzwords
- static
- type-directed
- SSA-based
- native code generation
- research platform
9Language subset
- Correct thread, exception, memory semantics
- Most libraries, including awt, net
- No dynamic loading
- Minimal reflection
10Where did the time go?
11Architecture
12Standard Scalar Optimizations
- constant/copy prop with conditional branch elim
- dead variable/code elim
- CSE
- redundant load elim (fields and array elts)
- invariant code motion
- strength reduction
- control flow simplifications
- inlining
- operator lowering, reanalysis
- coloring register alloc
- instruction scheduling
- missing PRE, loop unrolling
13Standard OO Optimizations
- interprocedural allocation/invocation
- treeshaking
- intraprocedural type propagation
- static method binding
- check elimination(array store, cast, instanceof,
null)
14Bytecode-related optimizations
- type elaboration
- idiom recognition
- ints representing booleans
- control/value issues
- subroutines
- statically initialzed arrays
15Intermediate Representation
- traditional CFG plus exception arcs
16Block splitting enables SSA
17Array Bound Check Optimization
- more difficult in Java
- precise exceptions limit code motion
- array use not limited to intraprocedural loop
nests - often performed incorrectly
- array indices are signed quantities
- i, i1, ..., ik doesnt increase monotonically
18Interprocedural Flow Analysis
- non flow based
- RTA, TBAA, field analysis
- type inference based
- possibly-null,
- type propagation,
- escape frame/thread,
- thread specific synchronization/allocation
19Flow analysis efficiency
- interprocedural
- must be near-linear time in practice
- intraprocedural
- O(n3) typically ok
- are procedures the proper units for this
distinction?
20Runtime
- Multiple GC implementations
- conservative, copying, generational, sliding,
thread-specfic, stack allocation - card marking, SSB write barriers
- compact tables
- Efficient primitives
- type tests, synchronization, exception handling,
interface dispatch (but heavyweight threads) - exposure in IR lowering, redundancy elimination
21Backend
- coloring register allocation with lifetime
splitting - effective use of x86 instruction set
- basic instruction scheduling
22Libraries
- custom implementation
- written Java wherever possible
23Performance
24Where does the time go now?
25Performance Of Other Systems Relative to Marmot
Higher is better
26(No Transcript)
27Assessment
- performance is improving 18 in 1999, 24 more
in 2000 - 10x faster than JIT
- 85 of C on average for cognates
- whats an acceptable price?
28Some Open Questions
- (Things we know less about now than when we
started)
29Lowerings
- How many?
- How different?
- Atomizing operators
- Null/sync/finally, etc.
30Is SSA a good idea?
- Expense and proliferation of locals
- Imputed definitions (Range, null, type, etc.)
- Arrays and fields
- Transforming the dominance relation
- Complications for exceptions
31Space/Time Tradeoffs
32What is Local, What is global?
- Basic block
- Method
- Class
- Application
- Processor
- System
- LAN
- World
- Universe
- O(n3) local
- O(n) global
- Are the source programs divisions meaningful?
33Current Work
34Interprocedural Flow Analysis
- goal model polymorphic containers
- e.g. Vector, Hashtable, Object
- primary sources of downcasts, virtual calls
- presence limits quality of interprocedural opts
- example VectorltTgt.elementAt(i) returns a T
- uses call rebinding, synchronization
elimination, stackframe/thread allocation - expensive substitute for source-level generics
35Profile Based Optimization
- improve existing phases
- inlining
- field layout
- gc selection/tuning
- register allocation
- instruction scheduling
- issue relating object-code results to IR at
various optimization levels
36Backend
- register allocation
- allocation quality limits inlining,
specialization - especially important on 6-register CPU
- instruction scheduling
- CISCification, rematerialization
37Storage Management
- improve tables, metadata mechanisms
- object layout, splitting, inlining
- compile time optimization, staging of runtime
operations - compile time gc (reuse, regions)
38Language Issues
- annoyances scoping, co/contravariance
- parametric polymorphism and abstract datatypes
- invariants/design by contract
- alternative typing disciplines
- module systems
39Q A