Dynamo: Amazon's Highly Available Keyvalue Store - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Dynamo: Amazon's Highly Available Keyvalue Store

Description:

Shopping cart: tens of millions of requests for 3 million checkouts in a single day ... Can always write to shopping cart. Pushes conflict resolution to reads ... – PowerPoint PPT presentation

Number of Views:343
Avg rating:3.0/5.0
Slides: 22
Provided by: stevesc6
Category:

less

Transcript and Presenter's Notes

Title: Dynamo: Amazon's Highly Available Keyvalue Store


1
Dynamo Amazon's HighlyAvailable Key-value Store
Guiseppe DeCandia, Deniz Hastorun,Madan Jampani,
Gunavardhan Kakulapati,Avinash Lakshman, Alex
Pilchin,Swami Sivasubramanian, Peter
Vosshall,and Werner Vogels
Presented by Steve Schlosser Big Data Reading
Group October 1, 2007
2
What Dynamo is
  • Dynamo is a highly available distributed
    key-value storage system
  • put(), get() interface
  • Sacrifices consistency for availability
  • Provides storage for some of Amazon's key
    products (e.g., shopping carts, best seller
    lists, etc.)?
  • Uses synthesis of well known techniques to
    achieve scalability and availability
  • Consistent hashing, object versioning, conflict
    resolution, etc.

3
Scale
  • Amazon is busy during the holidays
  • Shopping cart tens of millions of requests for 3
    million checkouts in a single day
  • Session state system 100,000s of concurrently
    active sessions
  • Failure is common
  • Small but significant number of server and
    network failures at all times
  • Customers should be able to view and add items
    to their shopping cart even if disks are failing,
    network routes are flapping, or data centers are
    being destroyed by tornados.

4
Flexibility
  • Minimal need for manual administration
  • Nodes can be added or removed without manual
    partitioning or redistribution
  • Apps can control availability, consistency,
    cost-effectiveness, performance
  • Can developers know this up front?
  • Can it be changed over time?

5
Assumptions requirements
  • Simple query model
  • values are small (lt1MB) binary objects
  • No ACID properties
  • Weaker consistency
  • No isolation guarantees
  • Single key updates
  • Stringent latency requirements
  • 99.9th percentile
  • Non-hostile environment

6
Service level agreements
  • SLAs are used widely at Amazon
  • Sub-services must meet strict SLAs
  • e.g., 300ms response time for 99.9 of requests
    at peak load of 500 requests/s
  • Average-case SLAs are not good enough
  • Mentioned a cost-benefit analysis that said 99.9
    is the right number
  • Rendering a single page can make requests to 150
    services

7
Consistency
  • Eventual consistency
  • Always writable
  • Can always write to shopping cart
  • Pushes conflict resolution to reads
  • Application-driven conflict resolution
  • e.g., merge conflicting shopping carts
  • Or Dynamo enforces last-writer-wins
  • How often does this work?

8
Other stuff
  • Incremental scalability
  • Minimal management overhead
  • Symmetry
  • No master/slave nodes
  • Decentralized
  • Centralized control leads to too many failures
  • Heterogeneity
  • Exploit capabilities of different nodes

9
Interface
  • get(key) returns object replica(s) for key, plus
    a context object
  • context encodes metadata, opaque to caller
  • put(key, context, object) stores object

10
Variant of consistent hashing
Key K
A
B
G
Each node isassigned tomultiple pointsin the
ring (e.g., B, C, Dstore keyrange(A, B)
C
F
of points canbe assigned basedon nodes
capacity
E
If node becomesunavailable, load isdistributed
to others
D
11
Replication
Key K
Coordinator for key K
A
B
G
B maintains a preferencelist for each data
itemspecifying nodes storingthat item
C
F
Preference list skipsvirtual nodes in favor
ofphysical nodes
E
D
D stores (A, B, (B, C, (C, D
12
Data versioning
  • put() can return before update is applied to all
    replicas
  • Subsequent get()s can return older versions
  • This is okay for shopping carts
  • Branched versions are collapsed
  • Deleted items can resurface
  • A vector clock is associated with each object
    version
  • Comparing vector clocks can determine whether two
    versions are parallel branches or causally
    ordered
  • Vector clocks passed by the context object in
    get()/put()
  • Application must maintain this metadata?

13
Vector clock example
14
Quorum-likeness
  • get() put() driven by two parameters
  • R the minimum number of replicas to read
  • W the minimum number of replicas to write
  • R W gt N yields a quorum-like system
  • Latency is dictated by the slowest R (or W)
    replicas
  • Sloppy quorum to tolerate failures
  • Replicas can be stored on healthy nodes
    downstream in the ring, with metadata specifying
    that the replica should be sent to the intended
    recipient later

15
Adding and removing nodes
  • Explicit commands issued via CLI or browser
  • Gossip-style protocol propagates changes among
    nodes
  • New node chooses virtual nodes in the hash space

16
Implementation
  • Persistent store either Berkeley DB Transactional
    Data Store, BDB Java Edition, MySQL, or in-memory
    buffer w/ persistent backend
  • All in Java!
  • Common N, R, W setting is (3, 2, 2)
  • Results are from several hundred nodes configured
    as (3, 2, 2)
  • Not clear whether they run in a single datacenter

17
One tick 12 hours
18
One tick 1 hour
19
During periods of high loadpopular objects
dominate
During periods of low load,fewer popular objects
are accessed
One tick 30 minutes
20
Quantifying divergent versions
  • In a 24 hour trace
  • 99.94 of requests saw exactly one version
  • 0.00057 received 2 versions
  • 0.00047 received 3 versions
  • 0.00009 received 4 versions
  • Experience showed that diversion came usually
    from concurrent writers due to automated client
    programs (robots), not humans

21
Conclusions
  • Scalable
  • Easy to shovel in more capacity at Christmas
  • Simple
  • get()/put() maps well to Amazons workload
  • Flexible
  • Apps can set N, R, W to match their needs
  • Inflexible
  • Apps have to set N, R, W to match their needs
  • Apps may have to do their own conflict resolution
  • They claim its easy to set these does this
    mean that there arent many interesting points?
  • Interesting?
Write a Comment
User Comments (0)
About PowerShow.com