Title: Transactions with replicated data EEE465 2001 Reference: CDK00 14'5
1Transactions with replicated data EEE465
2001Reference CDK00 14.5
- Major Greg Phillips
- Royal Military College of Canada
- Electrical and Computer Engineering
- greg.phillips_at_rmc.ca
- 1-613-541-6000 ext. 6190
2Transaction with replicated data
- a transaction on replicated objects should appear
the same as one with only non-replicated objects
(one-copy serializability) - each replica manager (RM) provides concurrency
control and recovery of its own objects. - here, assume normal two-phase locking is used for
concurrency control - when RM recovers from failure, it uses info from
other RMs to restore its objects to their current
values - last transaction from own log, other transactions
from other RMs
3Architecture
- different schemes to address different needs
- can a client request be addressed to any of the
RM or just a particular RM? - how many RM are required for the successful
completion of an operation? - can contacted RM defer the forwarding of requests
until a transaction is committed? - two-phase commit protocol
- each transaction is a two-level nested 2PC
- coordinator communicates with participants
- if participant is replica, must communicate with
other replicas - participant votes No if any replica unable to
commit
4Replication Management Schemes
- primary copy
- all reads and writes from a single copy, updates
others - read-one/write-all
- simple, reads from any replica, writes to all
replicas - not fault tolerant
- available copies with validation
- works through network partition
- when partitions reconnected, inconsistencies
detected and repaired - quorum consensus
- operations only carried out in larger partition
- on reconnection, smaller partition brought up to
date - virtual partition
- combination of available copies and quorum
consensus
5Primary copy replication
- all requests directed to distinguished primary
copy - primary copy performs local concurrency control,
updates replicas - useful for fault tolerance, availability on
primary copy failure we appoint a new primary
from the backups - no help with performance over single server
- approaches
- eager updates backups first, then responds to
client - replicas always consistent
- lazy responds to client first, then updates
backups - faster performance at the cost of possible
inconsistency (visible on primary failure) - one-copy serializable (why?)
6Read-one/write-all
- read request may be immediately handled by any
replica - write request requires concurrence of all
replicas - doesnt handle network partition
- one-copy serializable
- transaction can succeed if there are no read or
write lock confilicts - if there is a write/write conflict it will be
detected as there will be conflicting write locks
at all replicas - if there is a read/write conflict it will be
detected as there will be conflicting read and
write locks at one replica
7Available copies
- extends read-one/write-all to account for failure
- read request performed by any active replica
- write request requires all available replicas
- when a replica manager fails it is replaced by a
recovered copy - with no failures, equivalent to
read-one/write-all, therefore one-copy
serializable - in failure case, requires additional mechanism
8Available copies failure example
getBalance(A) deposit(B,3)
T
U
getBalance(B) deposit(A,3)
B
M
A
A
B
B
Y
X
N
P
Replica group
Replica group
Consider if X fails just after T has performed
getBalance(A) and N fails just after U has
performed getBalance(B), but neither transaction
has performed its deposit. Which replicas perform
Ts deposit? Which replicas perform Us
deposit? Do the transactions succeed or fail? Is
there an inconsistency?
9Local validation
- local validation is added to available copies to
prevent inconsistencies from RM failure - rule no failure or recovery event can appear to
happen during a transaction - before a transaction commits it checks for
failures and recoveries of replica managers of
any object it has accessed - in our example T detects Ns failure when it
writes to B - on commit, check that N is still failed and X, P,
and M are still available if true, can commit - this imposes a causal order on commits
- algorithm will fail for network partitions
(working replica managers unable to communicate)
10Quorum consensus
- designed to reduce the number of RMs that must
perform update operations, but at the expense of
increasing the number of RMs required to perform
read-only operations - key idea timestamp each replica, vote on which
replicas hold current timestamp - to read, require contact with R replica managers
- to write, require contact with W replica managers
- set W gt one half total RMs
- set RW gt total RMs
- ensures any pair of read quorums and write
quorums will contain some common copies and at
least one will be the most up-to-date - using read and write locks, can ensure one-copy
serializability
11Virtual Partition
- a combination of available copies and quorum
consensus - quorum consensus works correctly in the presence
of partitions - available copies more efficient for read
operations - key idea if a virtual partition (set of
communicating RMs) has enough votes to make a
quorum, use available copies algorithm for reads - if a RM fails and the partition changes during
the transaction, abort - ensures all transactions that complete see the
failures and recoveries of RMs in the same order - managing virtual partitions is complex
12Next ClassName Services