Dynamo: Amazon's Highly Available Keyvalue Store

About This Presentation

Title:

Dynamo: Amazon's Highly Available Keyvalue Store

Description:

Shopping cart: tens of millions of requests for 3 million checkouts in a single day ... Can always write to shopping cart. Pushes conflict resolution to reads ... – PowerPoint PPT presentation

Number of Views:343

Avg rating:3.0/5.0

Slides: 22

Provided by: stevesc6

Category:

more less

Transcript and Presenter's Notes

Title: Dynamo: Amazon's Highly Available Keyvalue Store

1
Dynamo Amazon's HighlyAvailable Key-value Store
Guiseppe DeCandia, Deniz Hastorun,Madan Jampani,
Gunavardhan Kakulapati,Avinash Lakshman, Alex
Pilchin,Swami Sivasubramanian, Peter
Vosshall,and Werner Vogels
Presented by Steve Schlosser Big Data Reading
Group October 1, 2007
2
What Dynamo is

Dynamo is a highly available distributed
key-value storage system
put(), get() interface
Sacrifices consistency for availability
Provides storage for some of Amazon's key
products (e.g., shopping carts, best seller
lists, etc.)?
Uses synthesis of well known techniques to
achieve scalability and availability
Consistent hashing, object versioning, conflict
resolution, etc.

3
Scale

Amazon is busy during the holidays
Shopping cart tens of millions of requests for 3
million checkouts in a single day
Session state system 100,000s of concurrently
active sessions
Failure is common
Small but significant number of server and
network failures at all times
Customers should be able to view and add items
to their shopping cart even if disks are failing,
network routes are flapping, or data centers are
being destroyed by tornados.

4
Flexibility

Minimal need for manual administration
Nodes can be added or removed without manual
partitioning or redistribution
Apps can control availability, consistency,
cost-effectiveness, performance
Can developers know this up front?
Can it be changed over time?

5
Assumptions requirements

Simple query model
values are small (lt1MB) binary objects
No ACID properties
Weaker consistency
No isolation guarantees
Single key updates
Stringent latency requirements
99.9th percentile
Non-hostile environment

6
Service level agreements

SLAs are used widely at Amazon
Sub-services must meet strict SLAs
e.g., 300ms response time for 99.9 of requests
at peak load of 500 requests/s
Average-case SLAs are not good enough
Mentioned a cost-benefit analysis that said 99.9
is the right number
Rendering a single page can make requests to 150
services

7
Consistency

Eventual consistency
Always writable
Can always write to shopping cart
Pushes conflict resolution to reads
Application-driven conflict resolution
e.g., merge conflicting shopping carts
Or Dynamo enforces last-writer-wins
How often does this work?

8
Other stuff

Incremental scalability
Minimal management overhead
Symmetry
No master/slave nodes
Decentralized
Centralized control leads to too many failures
Heterogeneity
Exploit capabilities of different nodes

9
Interface

get(key) returns object replica(s) for key, plus
a context object
context encodes metadata, opaque to caller
put(key, context, object) stores object

10
Variant of consistent hashing
Key K
A
B
G
Each node isassigned tomultiple pointsin the
ring (e.g., B, C, Dstore keyrange(A, B)
C
F
of points canbe assigned basedon nodes
capacity
E
If node becomesunavailable, load isdistributed
to others
D
11
Replication
Key K
Coordinator for key K
A
B
G
B maintains a preferencelist for each data
itemspecifying nodes storingthat item
C
F
Preference list skipsvirtual nodes in favor
ofphysical nodes
E
D
D stores (A, B, (B, C, (C, D
12
Data versioning

put() can return before update is applied to all
replicas
Subsequent get()s can return older versions
This is okay for shopping carts
Branched versions are collapsed
Deleted items can resurface
A vector clock is associated with each object
version
Comparing vector clocks can determine whether two
versions are parallel branches or causally
ordered
Vector clocks passed by the context object in
get()/put()
Application must maintain this metadata?

13
Vector clock example
14
Quorum-likeness

get() put() driven by two parameters
R the minimum number of replicas to read
W the minimum number of replicas to write
R W gt N yields a quorum-like system
Latency is dictated by the slowest R (or W)
replicas
Sloppy quorum to tolerate failures
Replicas can be stored on healthy nodes
downstream in the ring, with metadata specifying
that the replica should be sent to the intended
recipient later

15
Adding and removing nodes

Explicit commands issued via CLI or browser
Gossip-style protocol propagates changes among
nodes
New node chooses virtual nodes in the hash space

16
Implementation

Persistent store either Berkeley DB Transactional
Data Store, BDB Java Edition, MySQL, or in-memory
buffer w/ persistent backend
All in Java!
Common N, R, W setting is (3, 2, 2)
Results are from several hundred nodes configured
as (3, 2, 2)
Not clear whether they run in a single datacenter

17
One tick 12 hours
18
One tick 1 hour
19
During periods of high loadpopular objects
dominate
During periods of low load,fewer popular objects
are accessed
One tick 30 minutes
20
Quantifying divergent versions

In a 24 hour trace
99.94 of requests saw exactly one version
0.00057 received 2 versions
0.00047 received 3 versions
0.00009 received 4 versions
Experience showed that diversion came usually
from concurrent writers due to automated client
programs (robots), not humans

21
Conclusions

Scalable
Easy to shovel in more capacity at Christmas
Simple
get()/put() maps well to Amazons workload
Flexible
Apps can set N, R, W to match their needs
Inflexible
Apps have to set N, R, W to match their needs
Apps may have to do their own conflict resolution
They claim its easy to set these does this
mean that there arent many interesting points?
Interesting?