Title: TitaniumJava Performance Analysis
1Titanium/Java Performance Analysis
- Ryan Huebsch
- ryan_at_huebsch.org
- Group Boon Thau Loo, Matt Harren
- Joe Hellerstein, Ion Stoica, Scott Shenker
P I E R Peer-to-Peer Infrastructure for Informati
on Exchange and Retrieval
1/29/02
2SciMark All Compilers Summary
Testing done on Millennium (550MHz, Katmai),
Titanium version 1.910 Except for Java testing, d
ata collected 11/2/01, Java collected on 1/23/02
3SciMark Selected Compilers Small Dataset
Testing done on Millennium (550MHz, Katmai),
Titanium version 1.910 Except for Java testing, d
ata collected 11/2/01, Java collected on 1/23/02
4SciMark Selected Compilers Large Dataset
Testing done on Millennium (550MHz, Katmai),
Titanium version 1.910 Except for Java testing, d
ata collected 11/2/01, Java collected on 1/23/02
5SciMark Titanium Version Comparisons
Small Dataset
All data collected on mm62 (550MHz, Katmai) on
1/23/02
Large Dataset
6PIER Application Details
- Network/Database Discrete Event Simulator
- A query engine (relational join group by) on
top of a distributed hash table
- Simulates end-to-end network communication
(latency, bandwidth divided among flows, etc.)
- Application written in Java for compatibility
with other Berkeley database research projects
- Software Engineering
- Over 200 class files, heavy use of inheritance,
polymorphism, etc.
- About 25,000 lines of code (and not too many
comments yet)
- Layered, easily ported to real, working
implementation
- Some parts of the simulation are faked for
performance reasons, tuples are kept small
(1Kb
- Primarily an object moving program with some
processing (string manipulations, basic math,
etc.)
- All objects are kept in memory, disk I/O is
minimal (for result logging) and not timed in
following slides
7PIER Language Summary
83.3 (0.8 faster Java)
84.0 (5.1 faster Java )
63.4 faster
62.7
77.7
83.1
- Small simulation
- 64 Simulated Nodes
- 5000 Tuples per table
Testing done on Millennium (600MHz, 2G RAM),
collected on 1/28/02
8PIER Memory Footprint
Memory usage runtime grow exponential with
primary simulation parameters (Test parameters
same as previous slide)
9PIER Parallel Attempts ?
- Parallel attempt with Titanium failed miserably
- Negative speedup (our best almost matched
sequential execution)
- Simulated nodes were divided among processes,
best version utilized out-of-order execution to
improve performance, earlier versions used small
time steps to keep all processes synchronized. - Problems we encountered
- Lots of small remote accesses (when using 8
processes on 2 hosts, the MPI performance
counters rolled over at least once)
- All small accesses due to the movement of our
objects, with sub objects, and sub objects, and
more sub objects.
- Globally, processes were load balanced, within
time steps they were not various allocations of
simulated nodes to processes were attempted
- Application is more memory intensive then
computationally bound
10Parallel Speedup Graph
11Parallel Execution Time Breakup
Post Communication (Comm imbalance)
Communication
Pre Communication (Execution imbalance)
Execution
10ms
Region
Async
300ms Heap
300ms List
300ms Vect
12Titanium Wish List
- Titanium Features that would be nice for our
application (yes, you can laugh at them)
- Serialization to move objects with encapsulated
objects
- Better Memory Management (Regions just were not
enough)
- Global Garbage Collection
- Directed memory deletion (i.e. delete object x)
- Performance counters/profiling