Title: Consistency and Replication
1Consistency and Replication
- Introduction (whats it all about)
- Data-centric consistency
- Client-centric consistency
- Replica management
- Consistency protocols
2Outline
- Data-centric consistency
- Continuous Consistency
- Sequential Consistency
- Causal Consistency
- Grouping Operations
- Client-centric consistency
3a and b are related, so not causal
a and b are not related, - causal
4Write b happened before write a -- sequential
Write b happened before write a -- not
sequential
5Grouping Operations (2)
6Outline
- Data-centric consistency
- Client-centric consistency
- Goal perhaps avoid system wide consistency, by
concentrating on what specific clients want,
instead of what should be maintained by servers. - Eventual Consistency
- Monotonic Reads
- Monotonic Writes
- Read Your Writes
- Writes Follow Reads
7WS(x1) sent to L2, is monotonic R
WS(x1) not sent to L2, not monotonic R
8WS(x1) sent to L2, is monotonic W
WS(x1) not sent to L2, not monotonic W
9(No Transcript)
10(No Transcript)
11Read Your Writes
- A data store is said to provide read-your-writes
consistency, if the following condition holds - The effect of a write operation by a process on
data item x will always be seen by a successive
read operation on x by the same process. - No matter where the location of the read is
- Suppose your web browser has a cache.
- You update your web page on the server.
- You refresh your browser.
- Do you have read-your-writes consistency?
12Read Your Writes (2)
W(x1) is part of WS (x1,x2), is read your writes
The read doesnt include the W(x1), not R-Y-W
- i.e. updating your Web page and guaranteeing that
your Web browser shows the newest version instead
of its cached copy.
13Writes Follow Reads (1)
- A data store is said to provide
writes-follow-reads consistency, if the following
holds - A write operation by a process on a data item x
following a previous read operation on x by the
same process is guaranteed to take place on the
same or a more recent value of x that was read. - Example See reactions to posted articles only if
you have the original posting (a read pulls in
the corresponding write operation).
14Writes Follow Reads (2)
is writes follow reads
Not writes follow reads
15Outline
- Introduction (whats it all about)
- Data-centric consistency
- Client-centric consistency
- Replica management
- Consistency protocols
16Two Subproblems
- Your boss says to you, Our system is too slow,
make it faster. - You decide that replication of servers is the
answer. What do you do next? What are the
questions that need to be answered? - Where to place servers?
- Where to place content?
17Placing Servers
- Given a set of N locations, how do you place the
K servers? - Locations network locations and geographic
locations. - A server may only part of the data store
- What are the goals?
- What is the metric that is being optimized?
18Placing Servers
- One algorithm, each time you place a server,
minimize the average remaining distance to
clients. - What is distance?
- Can we ignore the client locations?
- Yes, if they are uniformly distributed.
- Other ideas for algorithms?
19Possible approaches
20Example Clustering
- One idea, identify the K largest clusters, then
put one server in each cluster. - How do you find clusters?
- One way, divide space up into cells, pick K most
populated ones. - Calculate an appropriate cell size a simple
function of average distance - Complexity reduce from O(N2) to O(N x
Max(log(N),K))
21Replica-Server Placement
- Choosing a proper cell size for server placement.
- Turns out that computing from average distance
between two nodes and the number of replicas
works well.
22Placing Content
- Which server or servers to select to place an
object (data, code)?
23Permanent replicas
- E.g, Mirror sites. Database replica on servers
without sharing disks, memory and processes - Initial set of replica, static organization.
24Server-Initiated Replicas
- Created by the owner of the data store
- temporal use,
- Dynamic load,
- E.g, Web hosting dynamic replica
- Specific files on a server can be migrated or
replicated to servers placed in the proximity of
clients that issue many requests for those files.
- Keep track of access counts per file, aggregated
by considering server closest to requesting
clients - Number of accesses drops below threshold D ? drop
file - Number of accesses exceeds threshold R? replicate
file - Number of access between D and R ? migrate file
25Server-Initiated Replicas
26Client-Initiated Replicas
- Client caches
- temporarily, to improve access time
- Measured by cache hit.
- One client or shared by clients.
- Client request a near-by server to cache.
27Content Replication and Placement
28Content Distribution
- Issue propagate of (updated) content to the
relevant replica servers. - Possibilities for what is to be propagated in
terms of State versus Operations - Propagate only a notification/invalidate of an
update (often for caches). - Transfer data from one copy to another
(distributed databases). - Propagate the update operation to other copies
(also called active replication). - No single approach is the best, depending on
available bandwidth, read-to-write ratio at
replicas
29Pull versus Push Protocols
- Pushing updates server-initiated approach, in
which update is propagated regardless whether
target asked for it. - Pulling updates client-initiated approach, in
which client requests to be updated. - Best practices? Consistency need? Other
trade-offs - Hybrid approach lease A contract in which the
server promises to push updates to the client
until the lease expires. - E.g, multiple-client, single-server systems.
30(No Transcript)
31Outline
- Introduction (whats it all about)
- Data-centric consistency
- Client-centric consistency
- Replica management
- Consistency protocols