Title: Pr
1Multi-Programming and Scheduling Design for
Applications of Interactive SimulationJean-Loui
s Roch al.
http//moais.imag.fr
Louvre, Musée de lHomme Sculpture (Tête) Artist
Anonyme Origin Rapa Nui Easter Island Date
between the XIst and the XVth century Dimensions
1,70 m high
EVALUATION SEMINAR -RESEARCH THEME Num B"Grids
and high-performance computing"March 27-28, 2008
2Staff and Skills
- 1/1/2005 Creation of MOAIS team1/1/2006
Creation of INRIA team-project MOAIS
- Vincent Danjean MdC 9/2005
- Pierre-François Dutot MdC 9/2006
- Thierry Gautier CR
- Guillaume Huard MdC
- Grégory Mounié MdC
- Bruno Raffin CR INRIA
- Jean-Louis Roch MdC, Team leader
- Denis Trystram Prof
- Frédéric Wagner MdC 9/2006
- 1 Invited Prof. Alfredo Goldman USP Sao
Paulo - 19 PhD students, 1 engineer
- 14 PhDs defended since 2005
- Parallel algorithms programming
- Scheduling
- Interactive applications
3Evolution of parallel programming
- Parallelism everywhere
- Distributed, Heterogeneous
-
MPSoC
Grids
Cluster
SMP
multi-core
GPU
MPI OpenMP
Cuda NVidia
MapReduce Google
TBB Intel
SPIRIT
Cilk CilkArts
Fortress Sun
4MOAIS objective
- End-to-end parallel programming solutionsfor
high-performance interactive computing with
provable performances. - optimization computational
steering, VR embedded -
- Performance is multi-objective
QAP/Nugent on Grid5000 PRISM, GSCOP, DOLPHIN
Streaming on MPSoCs ST
INRIA Grimage platform MOAIS, PERCEPTION,
EVASION
5Approach
- To mutually adapt application and scheduling.
- Proactive/static to the platform the devices
evolve gradually - Online/dynamic to the execution context data
and resources - Tolerant to data variations, failures, other
appli. perturbation, - From algorithms to applications
- Scheduling and parallel programming schemes
- Programming interfaces and tools
- Target applications batch scheduling,
combinatorial optimization, computational
steering, stream encoding
6Overview
Interactive application
Adaptive control of execution
M O A I S
model abstract representation
algorithm scheduling,
fault tolerance
Architecture
7Research directions and achievements for
2005-2007
- Scheduling
- Interfaces for coordination
- Adaptive algorithms
- Interactive applications
81. Scheduling
- Objective A modeling of scheduling problems for
adaptive applications - Adaptable parallelism degree for efficient coarse
grain scheduling - Parallel task models moldable tasks, divisible
load - Some results
- Comparisons and coupling models IJ FCS 06
- Off-line improvement of performance ratio
- 3/2-approximation SIAM J.Comp 07 instead of 2
Turekal by strip-packing - (3?5) for moldable tasks on a grid of clusters
Europar06 - On-line decrease of control overhead
work-first principle Cilk - Extension to general distributed data-flow
computations ICTTA06, ICCS07
Task
91. Scheduling
- Objective B Design of multi-objective scheduling
with provable guarantees. - Simultaneous approximation for each objective
- Approximated solutions of Pareto optimal
solutions - Makespan/ReliabilitySPAA07 - Makespan/Memory
IPDPS08
- Generic ?-Relaxation scheme Shmoysal.
- Makespan/Minsum WEA05
-
To include a smart algorithm inside a recursive
doubling (eg. for Makespan)
(eg. for Minsum)
For moldable tasks yields a bi-approximation
with arbitrary ratio between Cmax and Minsum
WEA05
t
16
0
2
4
8
102. Interfaces for coordination
- Objective provably efficient control at
runtime of the coupling of
components with various synchronizations
constraints. - Kaapi middleware
- Provable performances
- Efficient local serialization work-first
principle, zero-copy J. CLSS07, ICCS07 - Scheduling
- coarse-grain graph partitioning
work-stealing - Fault-tolerance protocols, from scheduling
properties - coordinated protocol ICTTA06 original TIC
protocol EIT05, TDSC08 - Positioning
- Multi-processors/multi-core architectures Intel
TBB, Cilk - Grid / global platforms Tolerate failure and
falsification Satin (FT)
1 struct sum 2 void operator()(Shared_r
lt int gt a, 3 Shared_r lt
int gt b, 4 Shared_w lt int
gt r ) 5 r.write(a.read() b.read())
6 7 8 struct fib 9 void
operator()(int n, Shared_wltintgt r) 10 if
(n lt2) r.write( n ) 11 else 12
int r1, r2 13 Forklt fib gt() ( n-1, r1 )
14 Forklt fib gt() ( n-2, r2 ) 15
Forklt sum gt() ( r1, r2, r ) 16 17
18
Local stack
runtime
Distributed nestedmacrodataflow graph
112. Kaapi Support and transfert
- Distributed implementation of CAPE-Open standard
for process engineering computations IFP - Cluster implementation of compliant runtime
RSI/Indiss-RT
-
- Quadratic assignment ANR CHOC
- Finite element computations ANR DISCOGRID
- Cryptographic S-Box selection ANR SAFESCALE
- Probabilistic inference engine ProBayes
123. Adaptive algorithms
Objective To design and analyze algorithms that
may obliviously adapt their execution under the
control of the scheduling
Sequential algorithm
Parallel algo 1 P2
Parallel algo 2 P100
Parallel algo k P8
Which one to select?
133. Adaptive algorithms
- Heterogeneous resources, variable speeds
work-stealing to obliviously self-tune
granularity -
- But work Wp increases when depth Dp decreases
- multi-objective problem
- Adaptive recursive coupling of algorithms
Europar06, PASCO07, PDP08 - Relaxation sequential / parallel work-stealing
- Minimize both the work Wp and the depth Dp
143. Adaptive algorithms
- Cacheprocessor oblivious stream computations
PDP07 - AWS adaptive work-stealing for MPSoCs
- Use case HDTV on MPSoCs ST Microelectronics film
grain tech.
MPSoC
Application description potential
parallelism AWS api
Architecture description SPIRIT / IP-XACT
Simulator
- Near optimal experimental results PDP08
153. Adaptive algorithms
- Adaptive 3D-vision VR07
- Realtime constraint 30 frames per sec
- Adaptive heterogeneous coupling with Kaapi
CPUGPU EGPV07
Maximum precision
Level of details
1 .. 16 CPUs
164. Interactivity
- Motivation parallelism for interactive
applications - Challenging application multi-cameras,
multi-cpus, multi-GPUs, multi-display - Grimage platform 2004
-
-
- Positioning other platforms
- Blue-C, ETH Zurich, 2005 ,Tele-Immersion_at_UCBer
keley 2005 - Specificity collaboration with
- EVASION(realtime physics simulation)
- PERCEPTION (computer vision)
- - 30 nodes cluster
- - 15 cameras
- - 16 projectors
174. Interactivity
- Middleware
dedicated to
interactive applications - Distributed components, moldable
- Parallel code coupling
-
- Static coarse grain mapping
HDTV player on 12 Mpixels display wall (16
projectors) CPUs GPUs
18Summary of 2005-2007
AWS
- Multi-objective Adaptive Performance
- Applications are time-consuming but essential to
validate scientific approach
19Some facts
- Publications
-
- Contracts
-
- Softwares
- Kaapi, FlowVR, Taktuk, AWS
- 127 in 3 years , 19 rank 1 - 17 Int. Journal
(SIAM J.Comp, IEEE TC, TPDS, TDSC, EJOR, FCS,
) - - 59 Int. Conf (SPAA, IPDPS, CCGrid VR,
VIS, Europar, ICCS, Siggraph)
- Industry partners STM, IFP, CEA, Bull,
C-S, DCN, - 2 ARC, 5 ANRs
- 1 pole MINALOGIC
- 2 Europe, 1 Ass. team
K
20Highlights
- 1st prize Plugtest Nov. 2007 Nqueens challenge
- SIGGRAPH Aug. 2007 Emerging Technologies Demo
-
- Valorization start-up (Sep. 2007)
- co-founded by former PhD C. Menier joined MOAIS
/ PERCEPTION - transfer parallel 3D modeling
- Dec. 2006 special Jury prize
- Nov. 2007 1st prize
- Nqueens(23) in 2107s with 3654 cores
4000 visitors
21Research directions 2008-20121/3
- To push the interactions to large scale
- Heterogeneous computing
- Complex memory hierarchy
- Provable performances vs adversary
- Game theory
22Research directions 2008-20122/3
- Scheduling multi-objective
- Large systems, many users, various objectives
equity / fairness - Extra global objective to non-cooperative
strategies - Coordination interface gt Runtime for HIPC on
demand - Work-stealing based runtime extended to complex
memory hierarchy - Dependable computing on global computing
platforms
23Research directions 3/3
- Adaptive algorithms
- Large data sets, out-of-core issues
- Framework / high level library
- High performance interactive computing
- Interactive resolution of complex problem
(scheduling) - Grimage explore new 3D interactions PERCEPTION
- Parallelism for adaptive interactive performance
- EVASION, ALCOVE
- Kaapi partitioningwork-stealing to balance
load between heterogeneous resources (CPUs / GPUs
)
24Summary
- To provide parallel programming schemes,
interfaces and tools for high performance
interactive computing that enable to achieve
provable performances on distributed parallel
architectures, from multi-processors
system-on-chip to lightweight grids and global
computing platforms.
SIGGRAPH07 MOAIS - PERCEPTION - EVASION
25Former members
- 13 PhDs defended in 2005-2007
- Now 2 at INRIA Alcove, Cepage 7
in university Reims, IKI Iran, Luxembourg,
Vannes, Damascus, Warsaw, Colima 1
in Postdoc Iowa SU 1 Start-up
co-founder 4DViews 2 in industry IFP,
Amadeus - 2 postdocs
- Now Univ. Paris 6, Petrobraz
- 1 long term visit
- Axel Krings, Idaho State Univ
- 3 engineers
- Now INRIA/PARIS, industry