DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store - PowerPoint PPT Presentation

About This Presentation
Title:

DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store

Description:

atomic compare-and-swap. Our online repartitioning algorithm lowers scaling cost. Reactive scaling adjusts capacity to match current load ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 20
Provided by: Andrew780
Category:

less

Transcript and Presenter's Notes

Title: DStore:%20An%20Easy-to-Manage%20Persistent%20State%20Store


1
DStore An Easy-to-Manage Persistent State Store
Andy Huang and Armando FoxStanford University
2
Outline
  • Project overview
  • Consistency guarantees
  • Failure detection
  • Benchmarks
  • Next steps and bigger picture

3
Background Scalable CHTs
Frontends
App Servers
DBs
Cluster hash tables (CHTs)
  • Single-key-lookup data
  • Yahoo! user profiles
  • Amazon catalog metadata
  • Underlying storage layer
  • InktomiwordID ? docID listdocID ? document
    metadata
  • DDS/Ninjaatomic compare-and-swap

4
DStore An easy-to-manage CHT
  • Capacity planning
  • High scaling costs necessitate accurate load
    prediction
  • Failure detection
  • Fast detection is at odds with accurate detection

C H A L L E N G E S
  • Cheap recovery
  • Predictably fast and predictably small impact on
    availability/performance
  • Our online repartitioning algorithm lowers
    scaling cost
  • Reactive scaling adjusts capacity to match
    current load
  • Lowers the cost of acting on false positive
  • Effective failure detection not contingent on
    accuracy

B E N E F I T S
Manage like stateless frontends
5
Cheap recovery Principles and costs
  • Single-phase writes
  • No locking and transactional logging
  • Quorums
  • No recovery code to freeze writes copy missed
    updates

T E C H N I Q U E S
  • Sacrifice some consistency Well-defined
    guarantees that provide consistent ordering
  • Higher replication factor 2N1 bricks to
    tolerate N failures (vs. N1 in ROWA)

C O S T S
Trade storage and consistency for cheap recovery
6
Nothing new under the sun, but
DStore
Prior work
Technique
Ease of management
Scalable performance
CHT
Availability during failures and recovery
Availability during network partitions and
Byzantine faults
Quorums
Availability and performance while nodes are
unavailable
Relaxed consistency
Cheap recovery (but thats just the start)
High availability and performance (end goal)
Result
7
Cheap recovery simplifies state management
DStore
Prior work
Challenge
Effective even if it is not highly accurate
Difficult to make fast and accurate
Failure detection
Duration and impact is predictably small
Relatively new area Aqueduct
Online repartitioning
Scale reactively based on current load
Predict future load
Capacity planning
Future work
RAID
Data reconstruction
Manage state with techniques used for stateless
frontends
State management is costly (administration- and
availability-wise)
Result
8
Outline
  • Project overview
  • Consistency guarantees
  • Failure detection
  • Benchmarks
  • Next steps and bigger picture

9
Consistency guarantees
  • Usage model
  • Guarantee For a key k, DStore enforces a global
    order of operations that is consistent with the
    order seen by individual clients.
  • C1 issues w1(k, vnew) to replace current hash
    table entry (k, vold)
  • w1 returns SUCCESS subsequent reads return vnew
  • w1 returns FAIL subsequent reads return vold
  • w1 return UNKNOWN (due to Dlib failure) two
    cases

10
Case 1 Another user U2 performs a read
(k1,vold)
U2 r(k1) returns vold no user has read
vnew vnew no user will later read vold
Dlib failure can cause a partial write, violating
the quorum property
If timestamps differ, read-repair restores
majority invariant
U1
B1
B2
B3
U2
11
Case 2 U1 performs a read
(k1,vold)
U1 r(k1) write is immediately committed or
aborted all future readers see either vold or
vnew
A write-in-progress cookie can be used to detect
partial writes and commit/abort on the next read
B1
B2
B3
U1
U2
12
Consistency guarantees
  • C1 issues w1(k, vnew) to replace current hash
    table entry (k, vold)
  • w1 returns SUCCESS subsequent reads return vnew
  • w1 returns FAIL subsequent reads return vold
  • w1 return UNKNOWN (due to Dlib failure)
  • U1 reads w1 is immediately committed or aborted
  • U2 reads if vold is returned, no user has read
    vnew if vnew is returned, no user
    will later read vold

13
Versus sequential consistency
(k1,vold)(k2,vold)
Conditions atomicity consistent ordering
w1(k1,vnew)
UNKNOWN causes non-atomic writes
U1
B1
B2
B3
U2
14
Two-phase commit vs. single phase writes
Single-phase writes
2-phase commit
Property
Consistent ordering
Sequential consistency
Consistency
No special-case recovery
Read log to complete in progress transactions
Recovery
No locking
Locking may cause request to block during
failures
Availability
1 synchronous update1 roundtrip
2 synchronous log writes2 roundtrips
Performance
Read-repair (spreads out the cost of 2-PC to
make common case faster)Write-in-progress
cookie (spreads out the responsibility of
2-PC)
None
Other costs
15
Recovery behavior
Predictably fast and small impact
16
Application-generic failure detection
Failure detection techniques
Operating statistics (CPU load, requests
processed, etc.)
Anomalies
Beacon listener
gt treshold
Median absolute deviation
reboot
Simple detection techniques work because
resolution mechanism is cheap
17
Failure detection and repartitioning behavior
Aggressive failure detection
Low scaling cost
Low cost of acting on false positives
18
Bigger picture What is self-managing?
Indicator
Brick performance
a sign of system health
Monitoring
tests for potential problems
Treatment
low-impact resolution mechanism
19
Bigger picture What is self-managing?
20
Bigger picture What is self-managing?
Brick performance
System load
Disk failures
Simple detection mechanisms policies
Key low-cost mechanisms
Constant recovery
21
Nothing new under the sun, but
Technique Prior work DStore
CHT Scalable performance Ease of management
Quorums Availability during network partitions and Byzantine faults Availability during failures and recovery
Relaxed consistency Availability and performance while nodes are unavailable Availability during failures and recovery
Result High availability and performance (end goal) Cheap recovery (but thats just the start)
22
Cheap recovery simplifies state management
Challenge Prior work DStore
Failure detection Difficult to make fast and accurate Effective even if it is not highly accurate
Online repartitioning Relatively new area Aqueduct Duration and impact is predictably small
Capacity planning Predict future load Scale reactively based on current load
Data reconstruction RAID Future work
Result State management is costly (administration- and availability-wise) Manage state with techniques used for stateless frontends
23
Two-phase commit vs. single phase writes
Property 2-phase commit Single-phase writes
Consistency Sequential consistency Consistent ordering
Recovery Read log to complete in progress transactions No special-case recovery
Availability Locking may cause request to block during failures No locking
Performance 2 synchronous log writes2 roundtrips 1 synchronous update1 roundtrip
Other costs None Read-repair (spreads out the cost of 2-PC to make common case faster)Write-in-progress cookie (spreads out the responsibility of 2-PC)
24
Bigger picture
25
Big picture
  • Use simple metrics to trigger scaling
  • Brick load
  • Cache hit rate
  • Online data reconstruction

26
Simple, aggressive failure detection
  • Bricks send operating statistics
  • CPU load, average queue delay, number of requests
    processed, etc.
  • Statistical methods
  • Median absolute deviation compares one bricks
    behavior with the current behavior of the rest of
    the bricks
  • Tarzan incorporates past behavior of each brick
    and detects anomalies in the operating
    statistics patterns
  • Why these techniques are effective
  • Not the best failure detection mechanisms
  • Parameters are not highly tuned
  • Simple, application-generic techniques work
    because of the low cost of acting on false
    positives
Write a Comment
User Comments (0)
About PowerShow.com