Title: Consistency and Replication
1Consistency and Replication
- Introduction to Distributed SystemsCS
457/557Fall 2008Kenneth Chiu
2- Topics
- Consistency models
- Implementation
- Replica location and content distribution
- Maintaining consistency
3Why Replicate?
- Reliability
- If one goes down, the others can stay up.
- How can it address corrupted data?
- Compare multiple versions
- Performance
- Divide the work
- Place data closer to place it is used.
- What is the challenge?
- Consistency
- Consider a web cache in your browser.
4Costs
- As a scaling technique, may not always be
applicable.
Update replica M times per second
Access replica N times per second
P
5WAN
Withdraw 50
Withdraw 50
- A dilemma
- Scalability can be alleviated by replication and
caching. - But consistency requires global synchronization!
- Only real solution is to relax consistency
requirements.
6Consistency Models Review
- Enforcing absolute ordering is too expensive,
especially with replication and caching. - So we need to allow for mis-ordering.
- We could just do it casually. Tell programmers,
Well, you might see things out of order a little
bit, but only in ways that wont matter. - They would say, What do you mean?
- So we need an exact, very precise way of
specifying the kinds of inconsistencies that the
application might see. - That is the purpose and point of having
consistency models.
7Data Centric Consistency Models
8 Data Stores
- Consistency is viewed as read/write ops on shared
data. - A consistency model is a contract between the
processes and the data store.
9Continuous Consistency
- Three axes for continuous consistency ranges
- Deviation in numerical values
- Deviation in staleness (age) between replicas
- Deviation with respect to ordering
- Numerical deviation
- Can be specified in terms of deviation in values.
- Can also be specified in terms of the number of
updates that have been applied, but not yet seen
by others. Deviation in value is then known as
the weight. - Staleness deviation
- A replica can be out-of-date, as long as it is
not too out-of-date - For example, a weather report.
- Ordering deviation
- Can be specified as the number of ops that may
need to be rolled back.
10Consistency Unit
- Conit The unit of data over which consistency is
to be measured. Examples? - A single stock
- A single weather report
11- Each replica maintains a vector clock. So it can
do causally ordered multicast. - The notation means time t at replica i.
- Conit is data items x and y. Both initialized to
0. Replica A has committed one operation.
12- Replica A
- Ordering deviation is 3, since it has three
uncommitted operations. - Numerical deviation by operations is 1. Weight is
5.
- Replica B
- Ordering deviation is 2.
- Numerical deviation is 3, with weight of 6.
13Conit Granularity
- Why do some hotels have a sink outside?
- Should conits be coarse-grained (a whole
database) or fine-grained (just one record in
it)? - In other words, should we try to keep large
pieces of data consistent or small pieces?
14- Assume that two replicas may only differ in one
outstanding update. - In top, the conit has two data items. In the
bottom, it only has one. - Two updates for the top will force propagation,
on the bottom it will not.
Data item
Update
Propagate updates
Update
Conit
Replica 2
Replica 1
Update
Updates postponed
Update
Replica 2
Replica 1
15- So should conits always be as small as possible?
- Higher overhead.
- Similar things in real life. For example, hotel
rooms with sink outside.
Data item
Update
Propagate updates
Update
Conit
Replica 2
Replica 1
Update
Updates postponed
Update
Replica 2
Replica 1
16Consistent Ordering
- A more traditional way to model consistency.
- From architecture and concurrent programming.
17Notation
- Processes execute to the right as time
progresses. - The notation W1(x)a means that the process P1
wrote the value a to the variable x. - The notation R2(x)a means that the process P2
read the value a from the variable x. - The subscript is often dropped.
18Sequential Consistency
- The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order and the operations of each individual
process appear in this sequence in the order
specified by its program. - There is some global order.
- Operations between processes must be as in the
program.
Program A A-OP1A-OP2A-OP3
Which of these are valid?
Global Order 2 A-OP1B-OP1A-OP2B-OP2B-OP3A-OP3
Global Order 3 A-OP1B-OP1A-OP2B-OP3B-OP2A-OP3
Global Order 1 A-OP1A-OP2A-OP3B-OP1B-OP2B-OP3
Program B B-OP1B-OP2B-OP3
19- Which of these is sequentially consistent?
20- Consider three concurrently executing processes
P1, P2, and P3. - The data items are x, y, and z.
- Assume all initialized to 0.
- Assignment is a write operation.
- Print is a simultaneous read operation.
- All operations are indivisible.
- What are some possible execution interleavings?
- Which ones are valid?
21- The signature is the value of the output of P1,
P2, and P3, concatenated in that order. - Not all signatures are valid.
- Which of these are valid?
Process P1
Process P2
Process P3
22Sequential Consistency(From 2006)
- The result of any execution is the same as if the
(read and write) operations by all processes on
the data store were executed in some sequential
order and the operations of each individual
process appear in this sequence in the order
specified by its program. - There is some global order.
- Operations between processes must be as in the
program.
Program A A-OP1A-OP2A-OP3
Global Order 2 A-OP1B-OP1A-OP2B-OP2B-OP3A-OP3
Global Order 3 A-OP1B-OP1A-OP2B-OP3B-OP2A-OP3
Global Order 1 A-OP1A-OP2A-OP3B-OP1B-OP2B-OP3
Program B B-OP1B-OP2B-OP3
23Sequential Consistency (3)(From 2006)
24Sequential Consistency (4)(From 2006)
- Figure 7-6. Three concurrently-executing
processes.
25Sequential Consistency (5)(From 2006)
- Figure 7-7. Four valid execution sequences for
the processes of Fig. 7-6. The vertical axis is
time.
26Causal Consistency
- For a data store to be considered causally
consistent, it is necessary that the store obeys
the following condition - Writes that are potentially causally related must
be seen by all processes in the same order.
Concurrent writes may be seen in a different
order on different machines.
27- Allowed?
- This sequence is allowed with a
causally-consistent store, but not with a
sequentially consistent store.
28 29Grouping Operations
- Do SMP machines also need consistency models?
- Yes, there are many kinds.
- Why we not care about these when writing MT
programs? - We do, if we are platform dependent and dont use
locks. - How do we handle consistency in MT programs?
- Use locks.
- As viewed by an external, data-centric process,
what do locks do? - They turn non-atomic operations into atomic ones
(functionally). - In other words, they group them.
30Synchronization Variables
- Operations are grouped via synchronization
variables (locks). - Each synchronization variable protects an
associated data set. - Each kind of synchronization variable has some
associated properties.
31Release Consistency
- Two operations
- Acquire a critical section is about to be
entered. - Release a critical section is about to be exited.
32Entry Consistency
- Entry Consistency Necessary criteria for correct
synchronization - An acquire access of a synchronization variable
is not allowed to perform until all updates to
guarded shared data have been performed with
respect to that process. - Before exclusive mode access to synchronization
variable by a process is allowed to perform with
respect to that process, no other process may
hold the synchronization variable, not even in
nonexclusive mode. - After exclusive mode access to a synchronization
variable has been performed, any other process
next nonexclusive mode access to that
synchronization variable may not be performed
until it has performed with respect to that
variables owner.
33- An acquire access of a synchronization variable
is not allowed to perform until all updates to
guarded shared data have been performed with
respect to that process. - When a process does an acquire, the acquire may
not complete until all remote changes to the
guarded data have been made visible. - Before exclusive mode access to synchronization
variable by a process is allowed to perform with
respect to that process, no other process may
hold the synchronization variable, not even in
nonexclusive mode. - Before updating a shared item, a process must
enter the critical section in exclusive mode. - After exclusive mode access to a synchronization
variable has been performed, any other process
next nonexclusive mode access to that
synchronization variable may not be performed
until it has performed with respect to that
variables owner. - If a process wants to enter a critical section in
non-exclusive mode, it must first check with the
owner of the synchronization variable to get the
most recent copies of the shared data.
34- Is this valid for entry consistency?
- Yes, a valid event sequence for entry consistency.
35Consistency vs. Coherence
- Consistency model describes what happens to a set
of data when a set of processes operate on that
data. - Coherence model only pertains to a single data
item. So it is about a set of processes writing
to a single data item.
36Client Centric Models
37Weaker Models
- Sometimes strong models are needed, if the result
of race conditions are very bad. - Banks
- Sometimes the result of races are just
inefficiency, or inconvenience, etc. - How strong is Orbitzs model?
- If it shows a ticket available, is it really?
- How does it prevent two people from reserving the
same seat? - One kind of weaker model is eventual consistency
- It eventually becomes consistent
38Eventual Consistency
Client moves to other location and(transparently)
connects to other replica
Replicas need tomaintain client-centric
consistency
WAN
Laptop
Read/writeoperations
Distributed andreplicated database
- How well does EC work for mobile clients?
- Not very well. Things can disappear (go
backwards, etc.). - Client-centric is intended to address this.
Consistent for a single client.
39Client-Centric Consistency
- Intended to address the issues in eventual
consistency for mobile clients. - Consistent for a single client.
- Notation
- xit is the version of x at local copy Li at
time t. - Version xit is the result of a series of write
operations at Li that took place since
initialization. This is WS(xit). - If operations in WS(xit) have also been
performed at local copy Lj at a later time t2, we
write WS(xit1xjt2).
40Monotonic Reads
- A data store is said to provide monotonic-read
consistency if the following condition holds - If a process reads the value of a data item x any
successive read operation on x by that process
will always return that same value or a more
recent value. - In other words, if a process has seen a value of
x at time t, it will never see an older version
of x at a later time. - Example Suppose a user opens his mailox in San
Francisco, then flies to New York. Should he see
an earlier version of his mailbox?
41- Which one of these obeys this model?
42Monotonic Writes
- In a monotonic-write consistent store, the
following condition holds - A write operation by a process on a data item x
is completed before any successive write
operation on x by the same process. - In other words, a write operation must wait for
all preceding write operations.
43- Which one of these obeys that?
44Read Your Writes
- A data store is said to provide read-your-writes
consistency, if the following condition holds - The effect of a write operation by a process on
data item x will always be seen by a successive
read operation on x by the same process. - In other words a write operation is always
completed before a successive read operation by
the same process, no matter where the read
operation takes place. - Suppose your web browser has a cache.
- You update your web page on the server.
- You refresh your browser.
- Do you have read-your-writes consistency?
45- Which of these is read-your-writes?
46Writes Follow Reads
- A data store is said to provide
writes-follow-reads consistency, if the following
holds - A write operation by a process on a data item x
following a previous read operation on x by the
same process is guaranteed to take place on the
same or a more recent value of x that was read. - In other words, any successive write operation by
a process on a data item x is guaranteed to take
place on a copy of x that is up to date with the
value most recently read. - Example Suppose we are replicating a database
for a blog. Performing a write amounts to posting
a response. If we do not use writes-follow-reads,
then it would be possible for a user to read a
response without the original.
47- Which of these obeys writes-follow-reads?
48Replica Management
49Two Subproblems
- Your boss says to you, Our system is too slow,
make it faster. - You decide that replication of servers is the
answer. What do you do next? What are the
questions that need to be answered? - Where to place servers?
- Where to place content?
50Placing Servers
- Given a set of N locations, how do you place the
K servers? - What are the goals?
- What is the metric that is being optimized?
- One algorithm, each time you place a server,
minimize the average remaining distance to
clients. - What is distance?
- Is average the right thing to minimize? What if
one client accesses a lot, the other not so much. - Can we ignore the client locations?
- Yes, if they are uniformly distributed.
- Other ideas for algorithms?
51Clustering
- One idea, identify the K largest clusters, then
put one server in each cluster. - How do you find clusters?
- One way, divide space up into cells, pick K most
populated ones.
52Replica-Server Placement
- Choosing a proper cell size for server placement.
- Turns out that computing from average distance
between two nodes and the number of replicas
works well. - Close to optimum results, but takes much less
time O(Nmaxlog(N),K). - For example, computing the 20 best replica
locations for 64,000 nodes is about 50,000 times
faster.
53Content Replication and Placement
- The logical organization of different kinds of
copies of a data store into three concentric
rings.
Server-initiated replication
Client-initiated replication
54Content Replication
- Permanent replicas
- Can be distributed across servers at a single
location. (What problem does this address?) - Can be distributed geographically. (What problem
does this address?)
55- Server-initiated replicas
- Created more dynamically, at the request of the
server. - For example, imagine the traffic on a
hypothetical Red Sox web site the night they won
the world series. - Can be done to reduce load, and also to improve
client performance. - One algorithm Each server keeps track of
requests for files, and where they come from. - If the number of requests for F at Q drops below
del(Q,F), the file is removed (if not the last
replica). - If the number of requests for F at Q goes above
rep(Q, F), the file is replicated. - If the number of requests for F is between del(Q,
F) and rep(Q, F), the file will be migrated if
for some server P, cntQ(P,F) exceeds more than
half of the total requests for F.
56- Counting access requests from different clients.
57- If migration does not succeed for some reason,
then replication is attempted. Server checks all
other servers, starting with the one farthest
away (why?). If some server has cntQ(R,F) above
a certain fraction of the requests for F, a
replication attempt is made.
58- Client-Initiated Replicas (client-side caches)
- Client can cache at will.
- Can have different invalidation policies, etc.
59Content Distribution
- What to propagate? Possibilities
- Propagate only a notification of an update.
- Invalidation protocol.
- Transfer data from one copy to another.
- Propagate the update operation to other copies.
- When is each advantageous?
- Read/write ratio is small?
- Read/write ratio is high?
60Pull vs. Push
- Push is sent by servers without request.
- Pull is specifically asked.
- When is each advantageous?
- One way of looking at efficiency is whether or
not a message is likely to be useless. For
example, an update message that is not read
before another one is sent.
61Leases
- Hybrid approach A lease is a promise by the
server to push for a specified amount of time.
After that, the client must poll. - Can distinguish three criteria
- If the data is rarely modified, should we give
long or short lease? - If a client often requests an update, should we
give long or short? - If space is short at the server?
62Unicasting vs. Multicasting
63Consistency Protocols
64Primary-Based Protocols
- In practice, consistency models are usually not
too hard to understand. - If it is too hard to understand, it is too hard
to write correct applications. - Note that this situation is somewhat different
for hardware consistency models. Why? - In primary-based protocols, each data item has an
associated primary replica. - Can be fixed or can move around.
65Remote-Write Protocols
- All write operations forwarded to a single fixed
primary server (also known as primary-backup). - This does the update and forwards to all others.
Only when all have responded does the original
respond.
66Client
Primary serverfor item x
Client
R2
W1
W5
R1
W4
W4
W3
W3
Backup server
W3
W2
W4
Data store
W1. Write requestW2. Forward request to
primaryW3. Tell backups to updateW4.
Acknowledge updateW5. Acknowledge write completed
R1. Read requestR2. Response to read
67Client
Primary serverfor item x
Client
R2
W1
W5
R1
W4
W4
W3
W3
Backup server
W3
W2
W4
Data store
- How is the performance of this protocol?
- Is it necessary to wait for the W5 to complete
before allowing the client to continue?
68Local-Write Protocols
- Primary copy migrates.
- Advantage is that multiple successive writes can
be carried out locally. - Reading processes can continue to read.
69(No Transcript)
70- Also corresponds well with mobile computing.
- Before you disconnect, make your laptop the
primary server. - While disconnected, everything is update locally.
- Also fits distributed file systems.
71Replicated-Write Protocols
- Active replication
- Writes may happen to any replica
- Need to handle ordering issues.
- One way is with totally ordered multicast.
- Another way is with a sequencer coordinator that
assigns sequence numbers.
72- Quorum-based Use voting.
- To do a write, a client must first get the
approval of a majority of the servers. - File is then updated, and a new version number is
assigned. - To do a read, a client also contacts a majority,
and gets the current version number from them. If
version numbers are the same, then it is the most
recent version. - Generalized
- To do a read, assemble a read quorum, NR.
- To modify, assemble a write quorum, NW.
- Constraints
- NR NW N, to prevent read-write conflicts.
- NW N/2, to prevent write-write conflicts.
73Read quorum
Write quorum
- Which of these are valid?
74Cache-Coherence Protocols
- For hardware, broadcast or snooping is possible.
Not for distributed systems. - Three aspects
- Coherence detection strategy When are
inconsistencies detected. - Static, such as a compiler, inserts instructions
that might lead to inconsistencies. What about
for concurrency? - Dynamic, inconsistencies are detected at runtime.
- When accessed, block the operation/transaction.
- When accessed, but do not block the transaction
(optimistic). - Only when commit.
- When is each of these good?
- Coherence enforcement strategy How caches are
kept consistent. - Do not cache any shared data.
- If can be cached
- Send invalidation to all caches.
- Send the actual update.
- When is each of these going to be better?
- Modifications by clients What happens when a
client modifies data. - Write-through
- Write-back
75- void foo(int a, int b) // Does b0 need to
be reloaded? for (int i 0 i ai bi b0
76Client-Centric Consistency
- Straightforward, if we ignore performance issues.