Title: Bulk-Synchronous Parallel ML
1Frédéric Gava
Bulk-Synchronous Parallel ML Implementation of
the Parallel Superposition
2Background
Parallel programming
3Projects
- 2002-2004
- ACI Grid
- LIFO, LACL, PPS, INRIA
- Design of parallel and Grid librairies for OCaml.
- 2004-2007
- ACI Young researchers
- LIFO, LACL
- Production of a programming environment in which
certified parallel programs can be written and
safely executed.
4Outline
- The BSML language
- Multi-programming (superposition)
- Implementation of the superposition
- Conclusion and future works
5The BSML language
6The BSML spirite
- Bugs grow faster than Moores law. (G. Berry)
- High-level language ?? lines of code ?? number of
bugd - Certified library ?? number of bugs
- Small is beautiful. (R. H. Bisseling)
- BSML only use 5 primitives
- Who would drive a non-deterministic car ? (G.
Berry) - Propriety of confluence of the semantic of BSML
- French Proverb All the roads go to Roma But
the better way is to choose the shorter - One can give BSP costs to BSML programs
- Different of concurrent programming cost and
confluence
7The BSP model
BSP architecture
- Characterized by
- p Number of processors
- r Processors speed
- L Global synchronization
- g Phase of communication (1 word at most sent
of received by each processor)
8Model of execution
Beginning of the super-step i
Local computing on each processor
Global (collective) communications between
processors
Global synchronization exchanged data available
for the next super-step
Cost(i) (max0?xltp wxi) hi?g L
9Example broadcast
- Direct broadcast (one super-step)
BSP cost p?n?g L
- Broadcast with 2 super-steps
BSP cost 2?n?g 2?L
10The BSML language
?-calculus
- Structured parallelism as an explicit parallel
extension of ML - Functional language with BSP cost predictions
- Allows the implementation of skeletons
- Implemented as a parallel library for the
"Objective Caml" language - Using a parallel data structure called parallel
vector
11A BSML program
Replicated part
Sequential part
12Parallel primitives of BSML
- Asynchronous primitives
- Creation of a vector (creation of local values)
- mkpar (int ? ?) ? ? par
- Parallel point-wize application
- apply (? ? ?) par ? ? par ? ? par
- Synchronous and communications primitives
- Communications
- put (int??) par ? (int??) par
- Projection of local values (to be replicated)
- proj ? par ? (int??)
13Semantics
Programming model Easy for proofs (Coq)
Natural semantics
Easy for costs
Execution model Make asynchronous steps
appear Close to a real implemantation
14Natural semantics
- Semantics set of axioms and inference rules
- Easy to understand, makes proofs more easy
- Example
15Small steps semantics
Local costs
- Semantics set of rewriting rules
- Using contexts for the strategy
- Easier understanding of costs and errors
- Example
Global cost
16Distributed semantics
- Semantics set of parallel rewriting rules
- SPMD style
Parallel vector
Distributed evaluation
17Multi-programming
18Parallel composition
- Several programs on the same machine
- Primitive of parallel composition Superposition
- Divide-and-conquer BSP algorithms
19Parallel Superposition
- super (unit ? ?) ? (unit ? b) ? ? ? b
- super E1 E2 ? (E1 (), E2())
- Fusion of communications/synchronisations using
super-threads - Keep the BSP model
- Pure functional semantics
20Parallel Superposition
21Implementationof the superposition
22Semantics (1)
23Semantics (2)
24Semantics based implementation
- The semantics makes appear 3 low level
primitives - Send to send the data of the environment of
communication - Rcv to received them
- Wait to allow a super-thread to wait his brother
- BSML primitives are thus simple calls of them
(as in the small-steps semantics) - Super-threads could be implemented using threads
- A scheduler of this threads is thus need for the
special management of our super-threads - The environment of communications is just a
Hashtable with pid of super-threads as keys
25Example, prefixes calculus
scan (?????) ? ? par ? ? par scan () ltv0,
, vp-1gt ltv0, v0v1, , v0v1 vp-1gt
scan () ltv0, , vm, gt lt w0 , , wm ,
gt
scan () lt ,vm1, , vp-1gt lt, wm1 , ,
wp1gt
lt w0 , , wm , wmwm1, , wmwp1gt
ltv0, v0v1, v0vm, v0vm1,, v0vp-1gt
26Benchmarks
Time (s)
Size of the polynomials
27Conclusion and future works
28Conclusion
- BSMLBSPML
- Superposition primitive of parallel composition
- Small-step semantics of the superposition
- Distributed semantics as small one
- Superposition implemented using threads as in the
small-step semantics
29Future works
- Implementation using continuation
(transformation of sources code with the help of
a type checker) and proof of equivalence using
our semantics - Implentation of bigger algorithms for better
benchmarks of BSML and its superposition - Implementation of parallel skeletons (management
of tasks) using the superposition ? - BSP model-checking of high-level Petri-nets
(M-nets). The main difficult find a non-trivial
algorithm as the community of concurrent
programming does. Possible but need more
theoretical optimisations
30Thanks for your attention