The Single Node B-tree for Highly Concurrent Distributed Data Structures PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: The Single Node B-tree for Highly Concurrent Distributed Data Structures


1
The Single Node B-tree for Highly Concurrent
Distributed Data Structures
by Barbara Hohlt
2
Why a B-tree DDS?
  • To do range queries (the queries need NOT be
    degree-3 transaction protected)
  • Need only sequential scans for related indexed
    items (retrieve mail messages 3-50, etc.)
  • Performance impact illustrated later

3
Prototype DDS Distributed B-tree
clients interact with any
client
client
client
client
client
client
client
client
client
client
service front
-
end as all
persistent service state is
in DDS and is consistent
throughout entire cluster
service interacts with
DDS via library library is 2PC
coordinator, handles partitions,
replication, etc., and exports B
-
tree API
brick is durable single
-
node
B
-
tree plus RPC
skels
for
storage
storage
storage
storage
storage
storage
network access brick can be
brick
brick
brick
brick
brick
brick
on same node as service
storage
storage
storage
storage
storage
storage
example of a distributed B
-
tree
brick
brick
brick
brick
brick
brick
partition with 3 replicas in group
4
Architecture
service interacts with DDS via library library
is 2PC coordinator, handles partitioning,
replication, etc., and exports B-Tree HT API
5
(No Transcript)
6
Component Layers
The application layer makes search and insert
requests to a btree instance. The btree
determines what data blocks it needs and fetches
them from the global buffer cache. If the cache
does not have the needed blocks, it fetches them
from the global I/O core, which is transparent to
the btree instance.
7
(No Transcript)
8
API Flavor
  • SN_BtreeCloseRequest, SN_BtreeClosecomplete
  • SN_BtreeCreateRequest, Sn_BtreeCreateComplete
  • SN_BtreeOpenRequest, SN_Btree OpenComplete
  • Sn_BtreeDestroyRequest, SN_BtreeDestroyComplete
  • SN_BtreeReadRequest, SN_BtreeReadComplete
  • SN_BtreeWriteRequest, SN_BtreeWriteComplete
  • SN_BtreeRemoveRequest, SN_BtreeRemoveComplete

9
API Flavor, Contd..
  • Distributed_BtreeCreateRequest,
    Distributed_BtreeCreateComplete
  • Distributed_BtreeDestroyRequest,
    Distributed_BtreeDestroyComplete
  • Distributed_BtreeReadRequest, Distributed_BtreeRea
    dComplete
  • Errors timeout (even after retries),
    replica_dead, lockgrab_failed, doesnt exist, etc.

10
Evaluation Metrics
  • Speedup performance versus resources (data size
    fixed)
  • Scaleup data size versus resources (fixed
    performance)
  • Sizeup performance versus data size
  • Throughput total number of reads/writes
    completed per second
  • Latency for satisfying a single request

11
Single Node B-tree Performance
12
Single Node B-tree Performance
13
FSM-based Data Scheduling
  • Scheduling is for
  • Performance (including fairness, avoiding
    starvation)
  • Correctness/isolation
  • This functionality has traditionally resided in
    two different modules (kernel schedules threads,
    app/database schedules locks). Also, each module
    optimized individually
  • Our claim is there can be significant performance
    wins by jointly optimizing both

14
How to Achieve Isolation?
  • Use threads and locks
  • Do careful scheduling (e.g. B-trees)
  • Unify all scheduling decisions
  • Problem is such a globally optimal scheduling is
    hard
  • In restricted settings, similar to hardware
    scoreboarding techniques
  • A useful lesson for Database Concurrency
  • You can choose order of operations to avoid
    conflicts (have a prepare/prefetch phase) to
    avoid locking across blocking I/O (Lesson Do not
    lock if you block)
  • This can be implemented more naturally with
    asynchronous FSMs than with straight-line
    threaded code

15
Benefits of Using FSMsevents for Concurrency
Control
  • Control-flow based concurrency control, as
    opposed to lock-based concurrency control
  • Can avoid wrong scheduling decisions
  • Unnecessary locks can be eliminated
  • Locks can be released faster
  • More flexibility for concurrency-control based on
    isolation requirements
  • Explicit concurrency-control also avoids
    deadlocks, priority inversions, race conditions,
    and convoy formations

T1
T2
16
Benefits of using FSMsQueues for concurrency
control
  • Control-flow based concurrency control using FSMs
    and queues, as opposed to lock-based concurrency
    control
  • Can avoid wrong scheduling decisions
  • Unnecessary locks can be eliminated
  • Locks can be released faster
  • More flexibility for concurrency-control based on
    isolation requirements
  • Explicit scheduling also avoids deadlocks,
    priority inversions, race conditions, and convoy
    formations

T1
T2
17
The Convoy Problem Illustrated
  • Most tasks execute code like lock(b) read(b)
    lock(b-gtnext) unlock(b)
  • Problem is if task T1 blocks on I/O for b4, then
    task T2 cannot unlock b3 to acquire a lock on b4,
    and task T3 cannot unlock b2 to acquire a lock on
    b3, and so on, forming a convoy even though most
    blocks are in cache and each task may require
    only a finite number of locks.

b1
b2
b3
b4
Locked and blocked on I/O by T1
Locked by T2 waiting for lock on b4
Locked by T4 waiting for lock on b2
Locked by T3 waiting for lock on b3
18
Scheduling Based on Data Availability
  • Two transaction T1 and T2 request blocks b1, b2,
    and b1, b3 respectively and T1 acquires the lock
    on b1 first
  • Problem is if T1 acquires a lock on b2 and
    blocks, T2 cannot make progress, even though T2
    can access both b1 and b3
  • Lesson schedule depending on how data is
    available not how requests enter the system

19
Scheduling Based on Data Availability (Example of
Misordering)
  • Transferring funds from checking to savings.
  • Begin(transaction)
  • 1 read (checking account)
  • 2 read(savings_account)
  • 3 read(teller) // in cache
  • 4 read(bank) // in cache

  • 5 update(savings_account)
  • 6 update(checking_account)
  • 7 update(teller)
  • 8 update(bank)
  • End (transaction)

If steps 3 and 4 were swapped with 1 and 2, we
would be blocking while holding locks on the bank
and teller balances. In a global scheduling model
ordering of reads does not matter because a
request does not start execution unless all the
required data in the most probable execution path
is available.
20
Distributed Synchronization
  • Conventional lock-based implementations
    serialize the lock manager code. In the example
    above, T1 serializes against T3, although T1 and
    T3 should ideally execute concurrently.
    Distributed synchronization on distinct queues is
    possible in FSMs running on multiprocessors,
    without requiring static data partition

21
Single Node Btree Brick
22
FSM for Non-blocking Fetch
23
Splitting node a into nodes a and b
24
A Single Node B-tree
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com