Title: Concurrency Control in Distributed MRA Index Structures
1Concurrency Control inDistributed MRA Index
Structures
18th December 2008
- Neha Singh, S. Sudarshan
- (IIT Bombay)
2Introduction
- Problem statement Computing aggregate queries
over a region in a multi-dimensional space
containing mobile point data, when the data is
stored in a distributed system. - Need for such aggregate queries
- Networked Virtual Environment Widely used in
online games and training simulator - E.g. "Run in fear if number of marching enemy
troop exceed number of friends around you"
(aggregate count query) - Real-time traffic monitoring
- Issues
- Large non-static data storage requires
distributed system dynamic spatial partitioning
- Synchronized system clocks not feasible for large
scale distributed systems - Combining local state information from different
peers can give inconsistent aggregates due to
varied communication delay between them
3Key contributions
- A distributed multi-resolution aggregate index
structure to support dynamic object set - The multi-resolution aggregate tree stores
precomputed aggregates at each tree node in a
centralized system, to speed up aggregate queries - We extend it to support non-static data in a
distributed system - Atomic updates/reads Our read protocol and
aggregate tree update protocols ensure that
updates are atomic to reads (aggregate queries), - Highly concurrent update protocol We present a
highly concurrent multi-phase update protocol
that - avoids blocking of reads
- minimizes contention with concurrent updates
4Agenda
- Introduction
- Problem statement Motivation
- System Model
- Readers protocol
- Maintenance of the index structure
- Definitions
- Naive update protocol
- Multi-phase update protocol
- Experimental Analysis
5System Model Partitioning the global space
- For partitioning, we use a quad tree based
regular decomposition of space - Benefits
- Partitions independent of order of data insertion
- Decomposition implicitly known by all end systems
- Mapping the regions onto the P2P overlay
- Each quad tree block has a unique centroid
- Use this as key in DHT to map the regions onto
peer set
6System Model MRA Tree
- An MRA-Tree (Multi-Resolution Aggregate Tree) is
a modified multi-dimensional index - structure that stores pre-computed aggregates at
various resolutions in its intermediate - nodes
- A leaf node contains the actual data points -
ltloc, valuegt - A non-leaf node stores aggregate for all data
points indexed by it - COUNT,SUM,
MINARRAY,MAXARRAY where MINARRAY,MAXARRAY are
the min and max resp. of each child node
7Readers Protocol
Readers Protocol
1
Relation of the query to node
Read query over region Q
Nodes read
Q Is contained orpartially overlaps
1
- Query traverses the index structure top down,
starting from root node and selectively exploring
the nodes
2
Q
N
Q Encloses
3
Encloses so further traversal not required
4
Q Disjoint
Intersecting with Q and read
Further traversal not necessary
Intersecting with Q but not read
8Readers Protocol
Readers Protocol
1
- Naïve Read Method
- Get lock on all nodes while traversing down the
tree - Release lock after read is completed
- However
- This reduces the concurrency of the index
structure for concurrent updates - Needed
- Release of locks early
- At the same time prevent updates coming from
top-down to overtake - Solution Use Crabbing Protocol Acquire lock on
all the child nodes before releasing the lock on
the parent
9Maintenance of Index Structure
Maintenance of Index Structure
2
- Aim To update the distributed aggregate tree
such that these operations are atomic while
causes minimum blocking to read - We consider two types of updates
- Move operation
- Within the same node
- Across different nodes
- Insertion / deletion operation
- Index tree needs to be updated only in case of
transfer across different nodes and insertion /
deletion operation - Since our application is data-driven, updates
percolate from leaf nodes to the higher levels of
the hierarchy
10Definition Update Tree
Maintenance of Index Structure
2
- Update Tree Set of all the nodes (UT) of the
distributed MRA tree whose stored - aggregate values are affected by the transaction
T
Insertion / Deletion operation
Move operation
Consists of all ancestor nodes of the leaf node
Consists of ancestors up to lowest common
ancestor of the leaf nodes
N
Insertion / Deletion
A
11Definition Update Tree
Maintenance of Index Structure
2
- Importance of Update Tree
- Although updates propagate up from leaf nodes,
only nodes in update tree are affected - Hence locks can be acquired top-down from root of
update tree - going to the tree root each time would overload
site containing root - We use order of lock acquire at root node of
update tree to serialize concurrent intersecting
read and update queries
12Definition Conflicting Updates
Maintenance of Index Structure
2
- Conflicting updates Two updates U1 and U2 are
said to be conflicting updates - if U1T n U2T ? F, where U1T and U2T are the
corresponding update trees
N2
N1
N1
U1
U2
U1
- Importance
- Common part of two update trees is connected and
has a unique highest node - Order of access to this node serialization
order of concurrent conflicting updates
13Naïve Update Protocol
Maintenance of Index Structure
2
- X-Lock on all update tree nodes and then update
them - Order of acquiring locks top-down, as bottom-up
can lead to deadlock with read query
Step II Update and Release Lock
Step I Acquire Lock Phase
- X-locks is acquired on all update tree nodes top
down starting from root node
- Updates propagate bottom-up
- Nodes release locks after updating agg
- Root node releases lock only after update over in
both legs
A
14Naïve Update Protocol
Maintenance of Index Structure
2
Key modifications
Problem Low concurrency
- Lock retained on root node of update tree for
the entire duration - It being X-locked results in low concurrency and
higher read time - Issues
- Read query comes top-down, and updates go
bottom-up - Root node last to be updated and first to be read
- Still need to ensure update is atomic for read
- We propose a highly concurrent multi-phase update
protocol - Key modifications
- Allow concurrent read while acquiring locks and
updating other nodes - Introduce a new locking mode U-lock compatible
with S-lock - Nodes updated top-down
- Split the update process in 3 phases
- Prevent read to overtake top-down update and read
inconsistent value - Use crabbing protocol while acquiring locks to
update nodes
1
2
3
15Multi Phase Update Protocol
Maintenance of Index Structure
2
- Update Lock Mode A new locking mode compatible
with read - Locked nodes for possible future modification
- Can be upgraded to X-lock when needed
1
S U X
S True True False
U False False
X False
Compatibility Matrix
- U-S True gt Read Query can proceed while update
is modifying other nodes of the update tree - U-U False gt Conflicting updates need to wait
for each other
16Maintenance of Index Structure
2
Multi-phase update Protocol Update split into 3
phases
2
Acquire Lock Phase
Propagate Phase
Refresh Phase
- U-locks upgraded to X-locks
- Stored pendingUpdates get executed top-down
- X-locks acquired on child nodes then lock
released Crabbing Protocol
U-locks acquired top-down starting from root node
Update gets propagated bottom-up from leaf nodes
and are stored as pendingUpdates
3
17Multi Phase Update Protocol- Correctness and
efficiency
Maintenance of Index Structure
2
- Serialization Order
- R - U Order Order of read query S-lock and
update X-lock at update tree root node - U - U Order Order of U-lock point at the highest
node of the common twig pattern - Importance of separation of Acquire Lock and
Propagation phases - U-locks acquired bottom-up
- Acquiring U-locks top-down cannot lead to
deadlock with read (as in the naïve case) - But, merging both phases can lead to deadlock
between concurrent conflicting updates - Importance of Crabbing protocol
- Used for upgrading U-lock to X-lock top-down
- Thus read query cannot overtake an update
- It sees the state either before or after refresh
on all intersecting nodes - gt Update atomic for read
18Multi Phase Update ProtocolScenario I Can the
propagated leaf node value change before update
gets over?
Maintenance of Index Structure
2
- Consider a new max value (m) was propagated up
during the propagate phase - What if this max gets changed between the time it
is propagated up and it gets executed at the
nodes?
Getting a U-lock for this entire duration between
propagate and refresh phases, ensures that no
other update can change the nodes value being
propagated up the tree
19Maintenance of Index Structure
2
Multi Phase Update ProtocolScenario II Can the
stored pendingUpdate value get stale?
10
- Assume U decreases max aggregate
- value at node B to 3
10
6
10
C
D
2
6
8
4
6
A
3
B
4
1
3
4
20Maintenance of Index Structure
2
Multi Phase Update ProtocolScenario II Can the
stored pendingUpdate value get stale?
10
- Assume U decreases max aggregate
- value at node B to 3
10
6
4
10
C
D
2
6
8
4
4
6
A
3
B
4
1
3
4
21Maintenance of Index Structure
2
Multi Phase Update ProtocolScenario II Can the
stored pendingUpdate value get stale?
10
- Assume U decreases max aggregate
- value at node B to 3
- What if max value of node A
- changes meanwhile?
- This cannot happen because
- Any transaction attempting to modify the max
value of A would intersect with U on at least
node C. - Thus would be executed serially
10
6
4
10
C
D
2
6
8
4
4
6
A
3
B
4
1
3
4
22Maintenance of Index Structure
2
Multi Phase Update ProtocolScenario II Can the
stored pendingUpdate value get stale?
10
- Assume U decreases max aggregate
- value at node B to 3
- What if max value of node A
- changes meanwhile?
- This cannot happen because
- Any transaction attempting to modify the max
value of A would intersect with U on at least
node C. - Thus would be executed serially
- Note Caching min/max values on parent node helps
reduce update latency by greatly reducing number
of nodes required to be locked
10
6
4
10
C
D
2
6
8
4
4
6
A
3
B
4
1
3
4
23Multi Phase Update ProtocolScenario III
Multiple updates to an entity
Maintenance of Index Structure
2
- Consider an entity to be transferred by an update
U1 from A to B and then by U2 from B to C - Logically, U1 should get reflected on nodes B and
D before U2 - Causal order of execution at node B
- makes sure that U1 completes before U2
- begin
24Divisible Aggregates
Maintenance of Index Structure
2
- Consider an update transaction which causes only
change in the sum - and count of the leaf nodes and no change in
min/max - Observation
- Change in the agg for all nodes in the update
tree is known - No need to propagate these changes bottom-up
- Overview of update protocol
- Such transactions can have only one update phase
- X-lock acquired top-down using crabbing protocol
- Updates are executed and locks released
25Comparative Analysis of the Update Methods
Maintenance of Index Structure
2
Aim Estimate difference in concurrency provided
by the update protocols
- d communication delay per link
- m - edges in the longer leg
- N root node of update tree
Acquire Lock Phase dm
Update Phase dm
Naïve Update Protocol
N locked for Read Query
Acquire Lock Phase dm
Propagate Phase dm
Refresh Phase
2d (m-2)d
Multi Phase Update Protocol
Update Phase
2d (m-2)d
Updating only Divisible Aggregates
26Experimental Setup
Experimental Analysis
3
- Synthetic data set with non-uniform distribution
of data points - DHT used FreePastry implementation of Pastry
DHT - Parameters for quadtree
- fmin 14
- Every peer node specifies its threshold (t) as
count of the number of entities it can support - Threshold can be as low as 0
- We study the distributed MRA trees update
protocol - Running time of read queries with and without
updates - Running time of updates
27For getting MRA index structure of varying
depths, use peer threshold as the lever
Experimental Analysis
3
Variation of the min/max depth of the
partitioning tree
Depth of the quad tree increases as the threshold
approaches µ (entities/ peers)
Depth
Threshold / µ
28Read query time taken depends number of nodes
read, and not the query region size
Experimental Analysis
3
Variations of the read query duration as query
region increases, with no updates
Variations of the read query duration as nodes
read increase, with no updates
Time (ms)
Time (ms)
400
350
300
250
Query region / Total area (in )
Nodes covered
29Update time taken directly proportional to
update tree nodes
,
Experimental Analysis
3
Variation of average update time with increase in
number of update tree nodes
Time (ms)
Naive protocol
MultiPhase locking protocol
Number of nodes
30Average read time taken increases much less for
multi-phase update protocol as compared to naive
protocol
Experimental Analysis
3
Read Query Duration for different Naïve Update
Workloads
Read Query Duration for different MP Update
Workloads
F2 gt F1 (Frequency of updates)
F2 gt F1 (Frequency of updates)
Time (ms)
Time (ms)
Number of nodes
Number of nodes
31Conclusion
- We propose a Distributed Multi-Resolution
Aggregate Tree index structure for answering
aggregate queries over mobile entities - We point out problems with concurrent updates and
propose the multi-phase update protocol - ensures that updates and aggregate queries are
atomic wrt each other - minimizes contention and avoid deadlock
- Analysis and experimental results show
- The multi-phase update protocol requires a longer
update time - But offers high concurrency for the read queries
as compared to naïve update protocol
32Thank you!
Questions?