Title: Partage d
1Partage dinformation à grande échelle
Cambridge Distributed Systems Group
21. Résumé des épisodes précédents
3Grandes étapes
- 1978 Ingénieur ENSEEIHT
- 1978--1980 doctorant LAAS
- 1980--1982 Post-doc MIT Lab. for Computer
Science - 1982--1985 Chargé de Recherches CMIRH
- 1984--1985 INRIA
- 1986--1999 Directeur de recherche, responsable
scientifique INRIA projet SOR - 1993--1994 Cornell
- 1997 Jini, Sun Research Labs
- 1999 Senior Researcher, MSR Cambridge
4Thèses encadrées
- Yek Loong Chong. U. de Cambridge, 2003.
- Nicolas Richer. Paris 6, 2002.
- Fabrice le Fessant (co-encadrement).
Polytechnique, 2001. - Xavier Blondel. CNAM, 2000.
- Aline Baggio. Paris 6, 1999.
- Georges Brun-Cottan. Paris 6, 1998.
- Julien Maisonneuve, Paris 6, 1996.
- Paulo Ferreira. Paris 6, 1996.
- Hervé Soulard. Paris 6, 1995.
- David Plainfossé. Paris 6, 1994.
- Daniel Edelson. UC Santa Cruz, 1993.
- Michel Ruffin. Paris 6, 1992.
- Yvon Gourhant. Paris 6, 1991.
- Sabine Habert. Paris 6, 1989.
- Mesaac Mounchili Makpangou. Paris 6, 1989.
5Publications
- Computing Surveys, en cours
- PODC 2001
- Livre Springer 2000
- PLDI 1998
- ECOOP 1998
- ICDCS 1996
- IWMM 1995
- WDAG 1995
- Livre IEEE 1994
- OSDI 1994
- ICDCS 1994
- PODC 1992
- RDS 1991
- Computing Systems 1989
- SOSP 1989
- ICDCS 1986
6Le partage de linformation à grande échelle
- Complexité des systèmes répartis parallélisme,
pannes, événement, latence - Objets sémantique non prédéfinie
- Système
- outils communs
- aspects dynamiques
- compromis
- Traiter les problèmes de fond, impact sur le long
terme
7Objets Fragmentés1981-1990
- Structuration
- Contrôle de la transparence
- Mandataire
- programmable
- spécifique au client
- état du protocole
- Système SOS
- Voir
- proxy Web
- Jini
- pages Web dynamiques
8Chaînes de Paires Souche-Scion 19901999
- Automatiser la gestion dobjets en réparti
- Références
- RM locaux coordination asynchrone
- Simplicité efficacité
- Tolérant les pannes détectables
- Dérivation formelle
x
z
y
t
9RM asynchrone en mémoire répliquée 19932000
- Scalability ? asynch.
- Consistency
- Local GC
- Distributed GC
- Sufficient safety rules
- Union rule
- Clean propagation
- Comprehensive scanning
- Create before delete
- Causal delivery
- PerDiS
10Réplication optimiste 2001
- Partage en écriture décentralisé
- IceCube
- moteur général de réconciliation
- journalisation dopérations
- paramétré par la sémantique
- Cohérence à terme
- modèle opérations contraintes
- sûreté vivacité
- invariant global
- décrire et comprendre les solutions
- Très préliminaire
112. Réplication optimiste et IceCube
12Optimistic replication
- Replicas of shared objects on sites
- Without synchronisation
- peer-to-peer read
- and update!
- Applications
- high latency networks
- disconnected operation
- cooperative work
- Improves availability performance
- Consistency a posteriori, offline
- Merge independent updates
13Example cooperative engineering with CVS
- CVS developing shared code
- Local, disconnected replica no interference
- Conflicts
- Write same file syntactic
- Overlap in file violates edit semantics
- Doesnt compile, test violates application
semantics - Both sides of a conflict are excluded
- Manual repair
14Example Bayou
- General-purpose database
- Any replica can update, log actions
- action dependency check, operation,
merge-procedure - Optimistic replication
- epidemic exchange logs
- roll-back, replay commit
- dep-check semantic check for conflict
- merge-proc semantic repair
15Operation-based model
0
scheduling commitment
0
16Execution model
- operation code pre/post-conditions
- Schedule must satisfy
- Violation ? conflict
- But pre/post-conditions often unknown
- Conservative approximations
17Happens-before
- True constraints unknown
- e1 precedes e2 in process
- e1 sends, e2 receives
- ? e1 ? e2
- ?(e1 ? e2) ? ?(e2 ? e1)
- ? e1 e2
- e1 e2 e1 does not cause e2
- e1 ? e2 e1 might cause e2
- Partial order, consistent with causal dependence
- Schedule consistent with ?
18Syntactic vs. semantic mechanisms
- Scalar timestamps
- no concurrency detection
- very conservative approx. of causality
- Vector timestamps
- detect concurrency
- conservative approx. of causality
- Alternative explicit constraints
19Constraints between operations
- Not all schedules are acceptable
- Constraints, e.g.
- x gt 50
- respect causal ordering
- all-or-nothing transactions
- alternative execution paths
- conflicting operations exclude each other
20IceCube Primitive constraints
- Constraint predicate (action, schedule)
- Declarative (static binary)
- MustHave a ? b
- if a?s and a?b then b?s
- (not necessarily contiguous nor in order)
- Order a ? b
- if a, b?s and a?b then a before b in s
- (not necessarily both nor contiguous)
- Imperative (dynamic) a.preCondition (State)
21IceCube log constraints
alternatives
predecessor- successor
parcel
- Express user intents
- Predecessor/successor a ? b ? b ? a
- b uses effect of a a causes b
- Parcel a ? b ? b ? a
- ?transaction
- Alternatives a ? b ? b ? a
22IceCube Object constraints
- Shared data type advertises static semantics
- mutually exclusive
- a ? b ? b ? a
- best order (e.g. bank credits before debits)
- a ? b
- Only between concurrent actions
- Also dynamic constraints
mutually exclusive
best order
commute
23Optimistic concurrency control scheduling
- Two actions are either
- Dependent
- ? schedule in dependence order
- Commutative
- ? schedule in any order
- Concurrent with favourable order
- schedule in non-conflicting order
- Concurrent and conflicting
- or exclude one, the other, or both
24IceCube scheduling
- Insight
- conflict choice of which action to exclude
- maximise value
25IceCube scheduling model
dynamic constraints
0
1
object constraints
0
2
26Search vs. syntactic order
27Performance of IceCube heuristics
283. Cohérence à terme (Eventual consistency)
29Eventual consistency
- Consistent with user intents
- Consistent with data invariants
- Replicas consistent with each other
- Eventual consistency
- Each site receives all actions
- Schedule that satisfies constraints
- Common stable prefix
- Equivalent results
30Stability
- Peer-to-peer, indefinite tentative update
advisory reconciliation OK - But stability needed
- Users, external world depend on it
- Garbage collect multilog
- Only stable actions relevant for consistency
- Stable eventually decisions not changed
- Committed definitely included in all schedules
- Aborted definitely excluded
31Eventual consistency intuitive
- Liveness sites receive all operations
- Epidemic multicast
- Quickly
- Safety sites compute the same value
- Equivalent schedules
- Stability actions eventually not undone
- Commit / abort
32Sound schedules
- s sound s satisfies constraints for all its
actions - Closed for MustHave
- a?s ? a?b ? b?s
- Consistent with Order (? acyclic)
- (a,b? s ? a?b) ? a before b in s
- Actions succeed
- a ? s ? a.preCondition (state)
33Maintaining local soundness
- ? site i, schedule si
- Legal
- committedi ? si ? abortedi ? si ?
- Safe
- si sound
- When aborting a, also abort actions that MustHave
a - When committing a, also abort uncommitted actions
that are Ordered before a
34Schedule equivalence
- Equivalence s ? t
- s, t sound
- a?s ? a?t
- ordering is irrelevant!
- Eventual consistency reduces to
- Same committed operations everywhere
- All committed operations in every schedule
- Schedules are sound
35Eventual consistency
- ? action a, site i,k, schedule si
- Legal
- committedi ? si ? abortedi ? si ?
- Safe
- si sound
- Live
- ?a ? committedi ? abortedi
- a ? committedi ? ??a ? committedk
- a ? abortedi ? ??a ? abortedk
36Global safety invariant
- sound ?t ? time, i ? sitescommittedi(t)
- Closed for MustHave ?
- Non-conflicting Acyclic in Order ?
- Actions successful
- ?s, ?a a.preCondition (state)
- Very strong!
- i commits a at t only if j wont commit
conflicting b at t - a will succeed everywhere, anytime
37Maintaining global invariant
- Alternatives
- Common knowledge deterministic abort rule idem
commit? - TWR
- Unilateral abort idem commit?
- CVS, Holliday 2000
- Single primary site decides
- Bayou, CVS
- Consensus before deciding
- Deno, Holliday 2000-2002
38Stability with TWR
- Independent objects
- Independent writes (no MustHave nor Order)
- All sites take same decision
- Given two writes to same object, abort the
earlier - Whether concurrent or not
- Write stable when seen by all sites
- Disjointness committedi ?
- Soundness no MustHave (no transactions)
39Stability in Bayou
- Databases
- Disjoint
- Independent no multi-DB transaction
- 1 primary / database
- Log constraints transactions, time order
- Disjointness Only 1 site decides about a the
primary for the database that a updates - Soundness whole transaction commits or aborts
40Hollidays pre-commit protocol
- Log constraints
- multi-object transactions
- happens-before order
- Read transactions commit locally
- Read-Write transactions consensus to commit
- convert locks to intentions
- pre-commit, vote
- commit if quorum yes
- abort if anti-quorum no or conflict with
committed
41Trade-offs
- No perfect solution
- Common knowledge
- syntactic fast, inflexible
- aborts, doesnt commit
- Partition primary
- single point of failure
- no MustHave across partition boundaries
- Consensus
- slow
- scalability
- impossibility of consensus in asynchronous
systems with failure
424. Conclusion
43Passage à léchelle ?
- Réplication en écriture
- CC pessimiste attendre
- CC optimiste spéculer
- Progrès malgré pannes
- Non transparent
- Limité par le commit
- Compromis possibles
- partitionner
- diminuer la granularité
- limiter nombre décrivains
44Perspectives
- Importance grandissante du partage
- Lecture et écriture
- Commerce électronique
- Pertinence des techniques
- Mandataires spécifiques encapsulant létat du
protocole - Java, .Net ramasse-miettes réparti
- Réplication centres Web, BD
- Travail déconnecté
45The end