Title: JavaTile: CMP-simulation with a twist
1JavaTile CMP-simulation with a twist
- Dan Greenfield
- Computer Architecture Group
Internal Presentation, 16th February 2007
2Aim of Talk
- Introduce JavaTile
- Show benefits and problems of approach
- Spark interest in collaboration
- Invite expertise from multiple areas to solve CMP
problems
3Quick Background Exciting Times!
Cisco 188-core (50 BIPS) 2
Intel 80-core (1 TFLOPS) 1
4Parts of a CMP
- Q How well do each of the components run?
- Q How well does the network run?
From Pestata et al 2004 3
5Parts of a CMP continued
- Real Q How well do Applications run?
6Motivations
- Need more realistic NoC traffic
- Current methods synthetic, limited applications,
low PE count, course-grain, OO Superscalar
internals - How is the network used?
- What is needed in NoC for future CMP?
- Want System-level view of performance, power and
fault-tolerance - Most current metrics concern the NoC and 'guess'
what this means for the system-level - Want to explore solutions at all levels
7Some Existing CMP Approaches
- SimpleScalar-based CMP simulator
- Hydra 4 MIPS-core CMP simulator
- CMP-SIM (extension of SimpleScalar)
- SESC Superscalar (1.5MIPS on 3GHz P4)
- GEMS (commercial SIMICS-based)
- ML-RSIM (Sparc RSIM-based)
8Java Virtual Machine
- Platform with standard library
- Virtual Processor executing Java instruction set
'bytecode' - Compilable to native platform
9Java Advantages
- A widely deployed standard platform
- Its 'machine code' is itself Object Oriented with
type information - Amenable to static code analysis
- Tools to run efficiently, or compile to native
executable
10JavaTile Processing Element
11JavaTile System
12Bytecode Instrumentation
- Hook into all instructions that may cause NoC
traffic
Fibonacci2() Code 0 bipush 0 2
bipush -33 4 invokestatic 23
//Method monitor/Monitor.methodStart(II)V 7
sipush -29729 10 sipush 0 13
invokestatic 26 //Method monitor/Monitor.jum
pMarker(II)V 16 aload_0 17 sipush
1 20 invokestatic 30 //Method
monitor/Monitor.syncCycleCount(I)V 23
invokespecial 32 //Method java/lang/Object."lt
initgt"()V 26 sipush -29729 29
sipush 4 32 invokestatic 35
//Method monitor/Monitor.postMethodCall(II)V
35 return
13Current Flow
14Problems
- Garbage Collection
- Local memory vs global memory allocation
- Passing by pointers (ownership)
- Push versus Pull
- No Inlining
- Auto-Parallelization
- Debugging
15Auto-Parallelization
- Software Pipelining
- e.g. MIT RAW Compiler 4
- e.g. Princeton DSWP (Decoupled SWP) 5
- Thread-Level Speculation
- Loop-level (e.g. Stanford Jrpm) 6
- Method-level (e.g. SableSpMT) 7
- Affine Partitioning
- e.g. Incorporated in Stanford SUIF 8
16References
- 1 Intel Polaris, from IDF 2006 slides, photo at
http//www.tomshardware.com - 2 W. Eatherton, The Push of Network Processing
to the Top of the Pyramid, Keynote Slides at
http//www.cesr.ncsu.edu/ancs/slides/eatherton/Key
note.pdf - 3 Pestata et al, Cost-Performance Trade-Offs in
Networks on Chip A Simulation-Based Approach,
DATE 2004 - 4 Waingold et al, Baring it All to Software
Raw Machines, Computer Vol 30, 9, 1997 - 5 Ottoni et al, Automatic Thread Extraction
with Decoupled Software Pipelining. MICRO 2005 - 6 Chen et al, The Jrpm System for Dynamically
Parallelizing Sequential Java Programs, IEEE
Micro Vol 23, No 6, Nov/Dec 2003 - 7 Pickett et al, SableSpMT a software
framework for analysing speculative
multithreading in Java, PASTE Workshop 2006 - 8 Lim et al, An affine partitioning algorithm
to maximize parallelism and minimize
communications, ACM SIGARCH 1999