Optimistic replication for Internet data services presentation

About This Presentation

Transcript and Presenter's Notes

Title: Optimistic replication for Internet data services

1
Optimistic replication for Internet data services

Yasushi Saito
Hank Levy

http//porcupine.cs.washington.edu/
University of Washington Department of Computer
Science and Engineering, Seattle, WA, U.S.A.
2
Overview

Simple and lightweight algorithm suitable for
cluster-based Internet data services
Dynamic replica addition/deletion.
Ensures eventual consistency of replicas.
Completely decentralized.
Tolerates multiple node failures, partitions,
etc.
Is space- and cost- efficient.
Implemented on Porcupine scalable email server

3
Outline

Motivation
Examples
Correctness
Practical Issues
Performance
Conclusion

4
Motivation
The Internet

Porcupine cluster-based mail server
Manageability, availability, and performance via
homogeneous architecture and dynamic data
distribution
Other applications BBS, Web, Calendar

Naming load balancing service
...
5
Goals and Non-goals

Goals
Dynamic addition/removal of replicas
Space- and computational- efficiency
Fault tolerance
Simplicity
Non-goals
Single-copy consistency (its Internet, anyway)

6
Why a new algorithm?

PC-based clusters present a new environment.
Prior art focused on two extreme environments
mainframeLAN, laptopmodem
Single-copy algorithms are not available enough
Mobile replication algorithms are not optimized
for mostly-connected environments.
Very few algorithm allows addition/deletion of
replicas

7
Algorithm Overview

Contents-pushing (cf. Usenet, MS Active
Directory)
? Computational efficiency
Two-phase protocol (Apply, Retire)
? Space efficiency
Unified treatment of contents updates and replica
addition/deletion
Thomas write rule node discovery to resolve
conflicting updates
? Simplicity fault tolerance

8
Outline

Motivation
Examples
Updating contents
Adding and deleting replicas
Resolving conflicting updates
Correctness
Practical Issues
Performance
Conclusion

9
Example Updating contents
Object contents
Replica set
A
B
C
A
B
C
Timestamp
C
Ack set
310pm
A
A
A
B
C
Update record (exists only during update
propagation)
B
10
Example Update Propagation
A
B
C
A
B
C
310pm
310pm
A
A
C
A
B
C
A
C
310pm
A
B
B
11
Update Retirement
Retire 310pm
A
B
C
A
B
C
310pm
310pm
A
B
C
A
C
A
B
C
A
C
310pm
A
B
Retire 310pm
B
12
Example Final State

Algorithm quiescent after update retirement
New contents absent from the update record
Contents are read from replica directly
Update stored only during propagation
Computational space efficiency

A
A
B
C
B
A
B
C
C
A
B
C
13
Replica addition and removal
A issues an update to delete C
A
B
A
B
C
B

Unified treatment of updates to contents and
to replica-set.

310pm
A
B
A
B
C
A
B
C
C
A
A
New replica set
Target set
Ack set
14
What if updates conflict?

Thomas write rule
Newest update always wins.
Older update canceled by overwriting by the newer
update.
Same rule applied to replica addition/deletion.
But some subtleties...

15
Update conflict resolution
C
D

A adds C, B adds D simultaneously
B must discover C and let C delete the replica
contents

A
B
C
D
A
B
310pm
320pm
A
B
A
B
C
D
A
B
C
A
B
D
A
B
A
B
Target set
Ack set
New replica set
16
Node discovery protocol
A
B
C
D
C
A
B
D
310pm
320pm
Apply 320 update
A
B
A
B
C
D
A
B
C
A
B
D
C
Add targets C
A
B
A
B
17
Proof of Correctness

Claim
All live replicas will store the newest update,
regardless of
number of concurrent updates.
number of replicas added or removed.
number of node failures.
when
nodes can discover each other at least
indirectly
E.g., when partitioned, each partition will
become consistent.

18
Outline

Motivation
Examples
Correctness
Practical Issues
Performance
Conclusion

19
Practical Issues

Handling long-dead nodes
Algorithm maintains consistency of remaining
replicas.
But updates will get stuck and clog nodes
disks.
Solution erase dead nodes names from replica
sets update records after the grace period.

20
Performance Networking overhead

Each update sends Apply and Retire msgs.
Retire can be batched w/o affecting users.
Actual of msgs
? 2(N-1).

Measured networking overhead on a fully loaded
Porcupine mail server.
21
Performance Space overhead

Each update is small
(contents are read directly from the replicas)
Update is deleted quickly after retirement.
of outstanding updates is independent of of
objects on node

100K for update records
2G for email messages
22
Conclusion

Simple and lightweight algorithm suitable for
cluster-based Internet data services
Contributions
Simple dynamic replica addition protocol
Node discovery for resolving concurrent updates
Update retirement using synchronized clocks
Code available at
http//porcupine.cs.washington.edu/

23
Potential Applications

This algorithm is not just for email..
Imagine proxies for update-intensive web sites
Today, they use timeout and polling
Dynamic replication improves availability.

Master
Proxies
24
Potential Applications

This algorithm is not just for email..
Imagine proxies for update-intensive web sites
Today, they use timeout and polling
Dynamic replication improves availability.

Master
Proxies
25
Performance Networking overhead (bytes)

Each network message is mostly occupied by actual
object contents.
Overhead by the replication service
? 6.

Write a Comment

User Comments (0)

About PowerShow.com

Optimistic replication for Internet data services PowerPoint PPT Presentation