Distributed Shared Memory for Large-Scale Dynamic Systems

About This Presentation

Title:

Distributed Shared Memory for Large-Scale Dynamic Systems

Description:

Implementing a distributed shared memory for. large-scale dynamic systems ... BitTorrent: File Sharing. Skype: Voice over IP. Joost: Video Streaming ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 66

Provided by: vgra

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Shared Memory for Large-Scale Dynamic Systems

1
Distributed Shared Memoryfor Large-Scale
Dynamic Systems

Vincent Gramoli
supervised by Michel Raynal

2
My Thesis

Implementing a distributed shared memory for
large-scale dynamic systems

3
My Thesis

Implementing a distributed shared memory for
large-scale dynamic systems
is
NECESSARY,

4
My Thesis

Implementing a distributed shared memory for
large-scale dynamic systems
is
NECESSARY,
DIFFICULT,

5
My Thesis

Implementing a distributed shared memory for
large-scale dynamic systems
is
NECESSARY,
DIFFICULT,
DOABLE!

6
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

7
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

8
Distributed Systems Enlarge

Internet explosion IPv4 -gt IPv6
Multiplication of personal devices
17 billions of network devices by 2012 (IDC
prediction)

Internet
9
Distributed Systems are Dynamic

Independent computational entities act
asynchronously, and are affected by unpredictable
events (join/leaving).
These sporadic activities make the system dynamic

10
Massively Accessed Applications

WebServices use large information
eBay Auctioning service
Wikipedia Collaborative encyclopedia
LastMinute Booking application
but require too much power supply and cost too
much

increase (auction)
modify (article)
reserve (tickets)
11
Massively Distributed Applications

Peer-to-Peer applications share resources
BitTorrent File Sharing
Skype Voice over IP
Joost Video Streaming
but prevent large-scale collaboration.

copy
exchange
create
12
Filling the Gap is Necessary

Providing distributed applications where entities
(nodes) can fully collaborate
P2Pedia using P2P to built a collaborative
encyclopedia
P2P eBay using P2P as an auctioning service

13
There are 2 Ways of Colaborating

Using a Shared Memory
A node writes information in the memory
Another node reads information from the memory
Using Message Passing
A node sends a message to another node
The second node receives the message from the
other

Memory
Read v
Write v
Node 1
Node 2
Node 3
Node 1
Send v
Node 2
Recv v
Node 3
14
Shared Memory is Easier to Use

Shared Memory is easy to use
If information is written, collaboration
progresses!
Message Passing is difficult to use
To which node the information should be sent?

15
Message Passing Tolerates Failures

Shared Memory is failure-prone
Communication relies on memory availability
Message-Passing is fault-tolerant
As long as there is a way to route a message

Memory
Read v
Write v
Node 1
Node 2
Node 3
Node 1
Node 2
Node 3
Send v
Recv v
16
The Best of the 2 Ways

Distributed Shared Memory (DSM)
emulates a Shared Memory to provide simplicity,
in the Message Passing model to tolerate
failures.

DSM
read / write(v) operations
read-ack(v) / write-ack
17
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

18
Our DSM ConsistencyAtomicity

Atomicity (Linearizability) defines an operation
ordering
If an operation ends before another starts, then
it can not be ordered after
Write operations are totally ordered and read
operations are ordered with respect to write
operations
A read returns the last value written (or the
default one if none exist)

19
Quorum-based DSM
Sharing memory robustly in message-passing
systems H. Attiya, A. Bar-Noy, D. Dolev, JACM
1995

Quorums mutually intersecting sets of nodes
Ex. 3 quorums of size q2, with memory size m3

Q1 n Q2 ? Ø Q1 n Q3 ? Ø Q2 n Q3 ? Ø
Q1
Q2
Q3

Each node of the quorums maintains
A local value v of the object
A unique tag t, the version number of this value

20
Quorum-based DSM

Read and write operations
A node i reads the object value vk by
Asking vj and tj to each node j of a quorum
Choosing the value vk with the largest tag tk
Replicating vk and tk to all nodes of a quorum
A node i writes a new object value vn by
Asking tj to each node j of a quorum
Choosing a larger tn than any tj returned
Replicating vn and tn to all nodes of a quorum

Get ltvk,tkgt
Set ltvk,tkgt
Get ltvk,tkgt
tn tk
Set ltvn,tngt
21
Quorum-based DSM

Reading a value

Q1
Q2
Q3
value? tag?
v1,t1
22
Quorum-based DSM

Reading a value

Q1
Q2
Q3
v1,t1
23
Quorum-based DSM

Reading a value

Q1
Q2
Q3
Output v1
24
Quorum-based DSM

Writing a value v2

Input v2
Q1
Q2
Q3
25
Quorum-based DSM

Writing a value v2

max tag?
t1
Q1
Q2
Q3
26
Quorum-based DSM

Writing a value v2

Q1
Q2
v2,t2 (with t2 gt t1)
Q3
27
Quorum-based DSM

Works well in static system
Number of failures f must be f m - q

Q1 n Q2 ? Ø Q2 n Q3 ? Ø
Q1
Q2
Q3

All operations can access a quorum

28
Quorum-based DSM

Does not work in dynamic systems
All quorums may fail if failures are unbounded

Problem Q1 n Q2 Ø and Q1 n Q3 Ø and
Q2 n Q3 Ø
Q1
Q2
Q3
29
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

30
Reconfiguring

Dynamism produces unbounded number of failures
Solution Reconfiguration
Replacing the quorum configuration periodically

Problem Q1 n Q2 Ø and Q1 n Q3 Ø and
Q2 n Q3 Ø
Q1
Q2
Q3
31
Agreeing on the Configuration

All must agree on the next configuration
Quorum-based consensus algorithm Paxos
Before, a consensus block complemented the DSM
service
Paxos, 3-phase leader-based algorithm
Prepare a ballot (2 message delays)
Propose a configuration to install (2 message
delays)
Propagate the decided configuration (1 message
delay)

RAMBO Reconfigurable Atomic Memory Service for
Dynamic Networks N. Lynch, A. Shvartsman, DISC
2002
32
RDS Reconfigurable Distributed Storage

RDS integrates consensus service into the
reconfigurable DSM
Fast version of Paxos
Remove the first phase (in some cases)
Quorums also propagate configuration
Ensuring Read/Write Atomicity
Piggyback object information into Paxos messages
Parallelizing Obsolete Configuration Removal
Add an additional message to the propagate phase
of Paxos

33
Contributions

Operations are fast (sometimes optimal)
1 to 2 message delays
Reconfiguration is fast (fault-tolerance)
3 to 5 message delays
While
Operation atomicity and
Operation independence are preserved

34
Facing Dynamism

Reconfigurable Distributed Storage
G. Chockler, S. Gilbert, V. Gramoli, P. Musial,
A. Shvartsman
Proceedings of OPODIS 2005

35
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

36
Facing Scalability is Difficult

Problems
Large-scale participation induces load
When load is too high, requests can be lost
Bandwidth resources are limited
Goal Tolerate load by preventing communication
overhead
Solution A DSM that adapts to load variations
and that restricts communication

37
Using Logical Overlay

Object replicas r1, , rk share a 2-dim
coordinate space

r1 r1 r2 r3 r4
r5 r6 r7 r8 r8

rk-1
rk
38
Benefiting from Locality

Each replica ri can communicate only with its
nearest neighbors

ri

39
Reparing the Overlay

Topology takeover mechanism

If a node ri fails, a takeover node rj replaces it
rj
ri

A Scalable Content-Addressable Network S.
Ratnasamy, P. Francis, M. Handley, R. Karp, S.
Shenker SIGCOMM 2001
40
Dynamic Bi-Quorums

Bi-Quorums
Quorums of two types where not all quorums
intersect
Quorums of different types intersect
Vertical Quorum All replicas responsible of an
abscissa x
Horizontal Quorum All replicas responsible of an
ordinate y

x

For any horizontal quorum H and any vertical
quorum V H ? V ? Ø
y
41
Operation Execution

Read Operation
Get up-to-date value and largest tag on a
horizontal quorum,
2) Propagate this value and tag on a vertical
quorum.

Write Operation
Get up-to-date value and largest tag on a
horizontal quorum,
2) Propagate the value to write (and a higher
tag) twice on the same vertical quorum

42
Load Adaptation

Thwart requests follow the diagonal until a
non-overloaded node is found.

Expansion A node is added to the memory if no
non-overloaded node is found.

Shrink if underloaded, a node leaves the memory
after having notified its neighbors.
43
Contributions

SQUARE is a DSM that
Scales well by tolerating load variations
Defines load-optimal quorums (under reasonable
assumption)
Uses communication efficient reconfiguration

44
Operation Latency
Request rate Memory size Read Latency Write Latency
100 10 479 733
125 14 622 812
250 24 1132 1396
500 46 1501 2173
1000 98 2408 3501
Bad News The operation latency increases with
the load (request rate)
45
Facing Scalability is Difficult

P2P Architecture for Self- Atomic Memory
E. Anceaume, M. Gradinariu, V. Gramoli, A.
Virgillito
Proceedings of ISPAN 2005
SQUARE Scalable Quorum-Based Atomic Memory
with Local Reconfiguration
V. Gramoli, E. Anceaume, A. Virgillito
Proceedings of ACM SAC 2007

46
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

47
Probability for modeling Reality

Motivations for Probabilistic Solutions
Tradeoff prevents deterministic solutions
efficiency
Allowing more Realistic Models
Any node can fail independently
Even if it is unlikely that many nodes fail at
the same time

48
What is Churn?

Churn is the dynamism intensity!
Dynamic System
n interconnected nodes
Nodes join/leave the system
A joining node is new
Here, we model the churn simply as c
At each time unit, cn nodes leave the network
At each time unit, cn nodes enter the network

49
Relaxing Consistency

Every operation verifies all atomicity rules with
high probability!
Unsuccessful operation operation that violate at
east one of those rules
Probabilistic Atomicity
If an operation Op1 ends before another Op2
starts, then it is ordered after with probability
e e-ß2 (with ß a constant) (If this happen,
operation Op2 is considered as unsuccessful)
Write operations are totally ordered and read
operations are ordered w.r.t. write operations
A read returns the last successfully value
written (or the default one if none exist) with
probability 1- e-ß2 (with ß a constant)(If this
does not hold, then the read is unsuccessful)

50
TQS Timed Quorum System

Intersection is provided during a bounded period
of time with high probability
Gossip-based algorithm in parallel
Shuffle set of neighbors using gossip-based
algorithm
Traditional read/write operations using two
message round-trip between the client and a
quorum
Consult value and tag from a quorum
Create new larger tag (if write)
Propagate value and tag to a quorum

51
TQS Timed Quorum System

Contacting a quorum
Disseminate message with TTL l to k neighbors,
Decrement TTL received if first time received.
Forward received messages to k neighbors if their
TTL is not null.
So that at the end, we have contacted nodes
with ?, the max period of time
between 2 successful operations

52
Complexity of our Implementation

Assumptions
At least one operation succeeds every ? time
units
Gossip-based protocol provides uniformity
Operation Time Complexity (in expectation)
where D (1-c)-? is the dynamic parameter

53
Complexity of our Implementation

Operation Communication Complexity (in
expectation)
where D (1-c)-? is the dynamic parameter

54
Complexity of our Implementation

Operation Communication Complexity (in
expectation)
where D (1-c)-? is the dynamic parameter
If D is a constant, then it reaches communication
complexity of static systems presented in

Probabilistic Quorum Systems D. Malkhi, M.
Reiter, A. Wool, R. Wright Information and Comp.
J. 2001
55
Probability of Success
Quorum size
n 10,000
10 of failures
30 of failures
50 of failures
Probability of non-intersecting
70 of failures
90 of failures
56
Contributions

TQS relies on timely and probabilistic
intersections
Operation latency is low
Operation communication complexity is low
No reconfigurations are needed
Replication is inherently done by the operations
Atomicity is ensured with high probability

57
A DSM to face Scalability and Dynamism

Core Persistence in Peer-to-Peer Systems
Relating Size to Lifetime
V. Gramoli, A-.M. Kermarrec, A. Mostéfaoui, M.
Raynal, B. Sericola
Proceedings of RDDS 2006 (in conjunction with OTM
2006)
Timed Quorum Systems for Large-Scale and Dynamic
Environments
V. Gramoli, M. Raynal
Proceedings of OPODIS 2007

58
RoadMap

Necessary? Communicating in Large-Scale Systems
An Example of Distributed Shared Memory
Difficult? Facing Dynamism is not trivial
Difficult? Facing Scalability is tricky too
Doable? Yes, here is a solution!
Conclusion

59
Conclusion

We have presented three DSM
Dynamism RDS
Scalability SQUARE
Dynamism and Scalability TQS

60
Conclusion
Solutions Latency Communication Guarantee
RDS Low High Safe
SQUARE High Low Safe
TQS Low Low High Probability
61
Open Questions

Could we still speed up operations?
Disseminating continuously up-to-date values
Consulting values that have already been
aggregated
How to model dynamism?
Differing results for the P2P File-Sharing
What would it be for different applications?

62
END
63
Load Balancing
Good News The load is well-balanced over the
replicas
63
64
Load Adaptation
Good News The memory self-adapts well in face of
dynamism
64
65
Reconfigurable Distributed Storage

Prepare phase
The leader creates a new ballot and sends it to
quorums
A quorum of nodes send back their candidate
config.
The leader chooses the configuration for the
ballot
Propose phase
The leader sends the ballot and its config. to
quorums The leader sends its tag and value and
adds the current configuration
A quorum of nodes can send their ballot vote,
their tag and value to quorums
These quorum nodes decide the next configuration
Propagate phase
These quorum nodes propagate the decided
configuation to quorums
These quorum nodes remove the old configuration