The Single Node B-tree for Highly Concurrent Distributed Data Structures presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Single Node B-tree for Highly Concurrent Distributed Data Structures

1
The Single Node B-tree for Highly Concurrent
Distributed Data Structures
by Barbara Hohlt
2
Why a B-tree DDS?

To do range queries (the queries need NOT be
degree-3 transaction protected)
Need only sequential scans for related indexed
items (retrieve mail messages 3-50, etc.)
Performance impact illustrated later

3
Prototype DDS Distributed B-tree
clients interact with any
client
client
client
client
client
client
client
client
client
client
service front
-
end as all
persistent service state is
in DDS and is consistent
throughout entire cluster
service interacts with
DDS via library library is 2PC
coordinator, handles partitions,
replication, etc., and exports B
-
tree API
brick is durable single
-
node
B
-
tree plus RPC
skels
for
storage
storage
storage
storage
storage
storage
network access brick can be
brick
brick
brick
brick
brick
brick
on same node as service
storage
storage
storage
storage
storage
storage
example of a distributed B
-
tree
brick
brick
brick
brick
brick
brick
partition with 3 replicas in group
4
Architecture
service interacts with DDS via library library
is 2PC coordinator, handles partitioning,
replication, etc., and exports B-Tree HT API
5
(No Transcript)
6
Component Layers
The application layer makes search and insert
requests to a btree instance. The btree
determines what data blocks it needs and fetches
them from the global buffer cache. If the cache
does not have the needed blocks, it fetches them
from the global I/O core, which is transparent to
the btree instance.
7
(No Transcript)
8
API Flavor

SN_BtreeCloseRequest, SN_BtreeClosecomplete
SN_BtreeCreateRequest, Sn_BtreeCreateComplete
SN_BtreeOpenRequest, SN_Btree OpenComplete
Sn_BtreeDestroyRequest, SN_BtreeDestroyComplete
SN_BtreeReadRequest, SN_BtreeReadComplete
SN_BtreeWriteRequest, SN_BtreeWriteComplete
SN_BtreeRemoveRequest, SN_BtreeRemoveComplete

9
API Flavor, Contd..

Distributed_BtreeCreateRequest,
Distributed_BtreeCreateComplete
Distributed_BtreeDestroyRequest,
Distributed_BtreeDestroyComplete
Distributed_BtreeReadRequest, Distributed_BtreeRea
dComplete
Errors timeout (even after retries),
replica_dead, lockgrab_failed, doesnt exist, etc.

10
Evaluation Metrics

Speedup performance versus resources (data size
fixed)
Scaleup data size versus resources (fixed
performance)
Sizeup performance versus data size
Throughput total number of reads/writes
completed per second
Latency for satisfying a single request

11
Single Node B-tree Performance
12
Single Node B-tree Performance
13
FSM-based Data Scheduling

Scheduling is for
Performance (including fairness, avoiding
starvation)
Correctness/isolation
This functionality has traditionally resided in
two different modules (kernel schedules threads,
app/database schedules locks). Also, each module
optimized individually
Our claim is there can be significant performance
wins by jointly optimizing both

14
How to Achieve Isolation?

Use threads and locks
Do careful scheduling (e.g. B-trees)
Unify all scheduling decisions
Problem is such a globally optimal scheduling is
hard
In restricted settings, similar to hardware
scoreboarding techniques
A useful lesson for Database Concurrency
You can choose order of operations to avoid
conflicts (have a prepare/prefetch phase) to
avoid locking across blocking I/O (Lesson Do not
lock if you block)
This can be implemented more naturally with
asynchronous FSMs than with straight-line
threaded code

15
Benefits of Using FSMsevents for Concurrency
Control

Control-flow based concurrency control, as
opposed to lock-based concurrency control
Can avoid wrong scheduling decisions
Unnecessary locks can be eliminated
Locks can be released faster
More flexibility for concurrency-control based on
isolation requirements
Explicit concurrency-control also avoids
deadlocks, priority inversions, race conditions,
and convoy formations

T1
T2
16
Benefits of using FSMsQueues for concurrency
control

Control-flow based concurrency control using FSMs
and queues, as opposed to lock-based concurrency
control
Can avoid wrong scheduling decisions
Unnecessary locks can be eliminated
Locks can be released faster
More flexibility for concurrency-control based on
isolation requirements
Explicit scheduling also avoids deadlocks,
priority inversions, race conditions, and convoy
formations

T1
T2
17
The Convoy Problem Illustrated

Most tasks execute code like lock(b) read(b)
lock(b-gtnext) unlock(b)
Problem is if task T1 blocks on I/O for b4, then
task T2 cannot unlock b3 to acquire a lock on b4,
and task T3 cannot unlock b2 to acquire a lock on
b3, and so on, forming a convoy even though most
blocks are in cache and each task may require
only a finite number of locks.

b1
b2
b3
b4
Locked and blocked on I/O by T1
Locked by T2 waiting for lock on b4
Locked by T4 waiting for lock on b2
Locked by T3 waiting for lock on b3
18
Scheduling Based on Data Availability

Two transaction T1 and T2 request blocks b1, b2,
and b1, b3 respectively and T1 acquires the lock
on b1 first
Problem is if T1 acquires a lock on b2 and
blocks, T2 cannot make progress, even though T2
can access both b1 and b3
Lesson schedule depending on how data is
available not how requests enter the system

19
Scheduling Based on Data Availability (Example of
Misordering)

Transferring funds from checking to savings.
Begin(transaction)
1 read (checking account)
2 read(savings_account)
3 read(teller) // in cache
4 read(bank) // in cache
5 update(savings_account)
6 update(checking_account)
7 update(teller)
8 update(bank)
End (transaction)

If steps 3 and 4 were swapped with 1 and 2, we
would be blocking while holding locks on the bank
and teller balances. In a global scheduling model
ordering of reads does not matter because a
request does not start execution unless all the
required data in the most probable execution path
is available.
20
Distributed Synchronization

Conventional lock-based implementations
serialize the lock manager code. In the example
above, T1 serializes against T3, although T1 and
T3 should ideally execute concurrently.
Distributed synchronization on distinct queues is
possible in FSMs running on multiprocessors,
without requiring static data partition

21
Single Node Btree Brick
22
FSM for Non-blocking Fetch
23
Splitting node a into nodes a and b
24
A Single Node B-tree
25
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

The Single Node B-tree for Highly Concurrent Distributed Data Structures PowerPoint PPT Presentation