Title: Purva P. Joshi
1Asynchronous View Maintenance for VLSD
DatabasesUnder the guidance of Prof. S.
Sudarshan
2Introduction
- Large scale shared nothing databases (PNUTS,
BigTable, Dynamo, Cassandra) trade query
expressiveness for scalability and performance. - Support primary key operations like
- Range scans
- Single record look-ups
- Indexes and Materialised Views are the only
feasible way for such D/b s to handle - Equijoins
- Aggregate queries
- Look-ups on secondary attributes
-
3Motivation for Asynchronous Updates
- Asynchronous updates Updates to the view are
made after updates to the base table. - Synchronous updates Updates to the view are made
together with updates to the base table. - The data is highly distributed across different
geographical areas and hence synchronous updates
are extremely expensive and lead to higher query
response times.
4Asynchronous v/s Synchronous View Maintenance
- Advantages of Asynchronous View Maintenance
- Decrease in latency of writing to the database.
- Less cross-server communication.
- Approach followed Deferred View Maintenance
- Challenges
- Scalability
- View must be updated even in presence of failures
(aborting an update is not possible) - Efficient view replication across distant data
centres
5PNUTS Architectural Overview
API
Read Query Request
Storage server
Query Processor
Query Routers
Log Manager
Request forwarded to server
Partition Controller
Storage servers
6PNUTS Architectural Overview
To remote Data centers asynchronous- -ly
API
Write Query Request
Storage server
Query Processor
Query Routers
Log Manager
Request forwarded to server
Write-ahead log
Partition Controller
Storage servers
7Publish-Subscribe Model and Distribution
- Updates are committed when they are published to
the YMB - YMB propagates the update to different regions
and applies to the replicas - Steps to ensure no loss occurs before updates are
applied to database - Logging to multiple disks on different servers
- Wide area replication messages are relayed
across YMB clusters in separate geographical
areas for delivery to local subscribers.
8Data and Query Model
- Tables
- Items (ItemId, Name, Category, Description,
Price) - Reviews (ReviewId, ItemId, Rating, Text,
ReviewerId) - Queries not supported
- Join and Group-by aggregate queries against base
tables - Consistency
- Per-record consistency
- No ACID-style guarantee
- Range and table scans do not guarantee a
consistent snapshot of the data
9Record mastery and Record timelines
- Record mastery
- Per-record master-ship
- All updates first applied to master and then
propagated to replicas - Master record holds locks and prevents
conflicting writes - Log manager delivers updates made to a record to
all replicas in appropriate order w.r.t the
master record - Record timeline
- Each point on timeline identified by a record
version - Matser record latest version
10Mechanisms for View Maintenance RVTs
- Remote View Tables (RVTs)
- Each view is stored in an independent PNUTS table
separate from the base table (the table on
which the view is defined) - Partitioned based on its own key, which is
different from the base table key . - So, view records are likely to be stored on
different servers than the corresponding base
table records. - Maintained asynchronously can be stale
- E.g. CREATE VIEW ByPrice SELECT Price, ItemId,
Name, Category FROM Items - Partitioned on View key (Price, ItemId)
-
11Maintaining RVTs
To remote Data centers asynchronous- -ly
API
Update base table Request
Storage server S1
Query Processor
Query Routers
Log Manager
Request forwarded to server
Write-ahead log plus Information necessary to
update views (old field values)
RVT Maintainer
Partition Controller
Storage servers
RVT Mainainer of S1 subscribes to the log manager
to get updates for records in S1
12Mechanisms for View Maintenance LVTs
- Local View Tables (LVTs)
- Construct a view over each base table partition
- View records are on same partition as
corresponding base records. - Maintained synchronously High query cost
- E.g. CREATE VIEW ByPrice AS SELECT Price,
count() FROM Items GROUP BY Price - Items of a price range (say, 7000) will be
scattered across partitions. - Need to retrieve LVT records from all such
partitions and add them to get the total sum.
13Combining RVTs and LVTs
- Main application of LVTs materialize aggregates
over RVTs. - E.g.
Price ItemID Name Category
7000 2 T.V Electronics
2000 4 Chair Furniture
2000 1 Car Toys
50 3 Doll Toys
LVT on RVT
Price Count
7000 1
2000 2
50 1
ItemID Name Category Price
1 Car Toys 2000
2 T.V Electronics 7000
3 Doll Toys 50
4 Chair Furniture 2000
5 Table Furniture 5000
RVT on Base table
Base Table
14View Types
- Indexes
- Equi-joins
- Selections
- Group-by-aggregates
15View Types Indexes
- Index Projection and re-ordering of a base
table - E.g. RVT ByPrice(Price, ItemId, Name, Category)
- Name, Category extra attributes
- Addition of new record to base table
16View Types Indexes
- Updation of View key attribute
- Updation of non view-key attribute
- Index as LVT search expensive
17View Types Equi-joins
- E.g. Joining Items and Reviews tables
- RVT on two ItemId indexes (one each for tables
Items and Reviews), sorted on ItemId - Actual join performed at query time
ItemId Name Category Price Reviewer Id Text Rating
1 ABC XYZ 1000
1 123 GOOD 5
2 DEF PQR 200
3 LMN XYZ 500
3 896 BAD 1
18View Types Equi-joins
- Co-locating records to be joined in the same
partitions but deferring the actual join until
query time. - Outer-join
- Can join three or more tables on the same
attribute via view - LVTs are not appropriate different base table
partitions
19View Types Selections
- Subset of base table records
- E.g. CREATE VIEW ELECTRONICITEMS
- SELECT FROM Items WHERE CategoryElectronics
- Implemented as RVTs
- LVTs can be used , but cost is high
- Not widely used
20View Types Group-by aggregates
- RVTs replica of a single record cannot be used
- LVTs are useful synchronously updated
- Design issue
- LVTs on base table
- LVTs on RVT of base table
- Maintaining aggregates synchronous update when
base record changes - For min (max) queries, scan on partition after
update to get new min (max)
21Aggregate LVT and Query example
RVT on Base table
Price Count
7000 1
2000 2
50 1
Price ItemID Name Category
7000 2 T.V Electronics
2000 4 Chair Furniture
2000 1 Car Toys
50 3 Doll Toys
LVT on RVT
ItemID Name Category Price
1 Car Toys 2000
2 T.V Electronics 7000
3 Doll Toys 50
4 Chair Furniture 2000
5 Table Furniture 7000
Price Count
7000 2
2000 2
50 1
LVT on Base
Base Table
22Unsupported View Definitions
- Joins of three or more tables on different
attributes - Non-co-located records
- Joins other than equi-joins
- Expensive
- Full SQL-99 aggregate functions
- Only SQL-92 aggregates count, min, max, sum,
avg - No support for percentile, standard deviation
23Design Issues
- View Maintenance by Client or by System
- Maintenance by clients possible
- Hiding logs maintenance by system preferred
- Updates frequency
- Synchronous
- View update as part of base table transaction -
approach followed by LVTs - Lazy
- approach followed by RVTs
- Batched lazy
- Group commit
- Periodic view refresh
- high throughput
- high staleness
- wasted effort
24Consistency Model
- Review of record-level consistency model
- Figure modifed from the original at PNUTS
Yahoo!s Hosted Data Serving Platform slides by
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh
Srivastava, Adam Silberstein, Philip Bohannon,et
al
Update(20)
Update(15)
Insert (toaster,10)
ReadAny10
ReadLatest15
ReadCritical(5)20
25Maintaining View Consistency
- Base-consistency model timelines of all
records are independent - Views Multiple records are connected to base
records - A view record vr is depedent on base record r
on which it is defined, while the base record r
is incident on vr - Indexes, selections, equi-joins one-to-one
- Group-by-aggregates many-to-one
26Maintaining View Consistency
Update Review(1,BAD,1)
Insert Review(1,GOOD,5)
ReadCritical(v.6) BAD
ItemId ReviewerId Review Rating
1 ABC
2 PQR AVERAGE 3
BAD
GOOD
5
1
27Cost of View Maintenance
- For an update to a base record br
- Indexes at most two updates (updates to view
key) - Equi-joins similar to indexes
- Selections single view update if selection
condition is satisfied/not fulfilled by br - Group-by-aggregates for sum/count, updation
needs only the change in the value of aggregate
field of br and value off grouped attribute - Log record for br holds required information
28Read Consistency for Views RVTs
- Single-record Reads one-to-one views
- consistency guarantees are same as base table
views ReadAny(vr), ReadCritical(vr,v),
ReadLatest(vr) - No need for separate version numbers for view
records - Routing to ensure readlatest returns correct
value. - Single-record Reads many-to-one views
- Multiple base records are incident on a single
view record - ReadAny any version of base records
- ReadCritical Specific versions of a certain
subset of base records (mentioned via a vector)
and ReadAny for all other base records - Easy since base record versions are available in
the RVT/base table on which the aggregate LVT is
defined - ReadLatest accessing base table
29Read Consistency for Views Cost
- ReadAny may return ver.0 (if record is absent)
- ReadCritical in case of stale record, read base
table master - ReadLatest
- High cost for RVTs
- Must access master base table every time
- Cheaper than scanning the entire table
- LVTs are always up-to-date w.r.t local replicas
- LVT-on-RVT
- ReadCritical cheap
- ReadLatest - expensive
30Read Consistency for Views Range Scans
- Range scanning over views
- Stale records, Missing records
- Challenges due to insert and delete during View
Maintenance asynchronous propagation - Insert arrives before delete a record may
appear twice - Delete appears before insert missing record
- Solution
- Filter out multiple records that correspond to
the same base record - Retain tombstones during deletes
- Look-up base-table record using key stored in
tombstones - Needs garbage collection
- Not yet implemented in PNUTS
31Evaluation
- View Maintenance cost measured on
- Latency reasonable impact
- Throughput decreases
- Staleness
- Setup C and Linux/FreeBSD
- Evaluation of costs
- 10 GB data on each server
- MySQL buffer pool 2GB
- 90 reads served from cache
- Thin views (indexed attribute and record primary
key) - I/O bound
-
32Experiment 1 Varying View Type
- Need to provide enough capacity to accommodate
extra view maintenance work
33Experiment 2 Varying Read/Write Workload
Latency increases with increase in write
percentage
34Experiment 3 Varying no. of views
- Effect is larger for RVTs than LVTs
35Query Evaluation
- Index plans
- Look-up on secondary attributes
- Cost of index scan increases with size of result
set - Aggregates
- Count
- Index scan
- LVT on base
- LVT on RVT
36Query Evaluation Aggregates
- LVT approaches constant across all group sizes
- LVT-on-base most expensive
- LVT on RVT cheapest
- Cost of index scan increases with group size
37Query Evaluation Aggregates
- Fixed group size 500
- Index scan and LVT on RVT unaffected by no of
partitions - For small partitions, LVT on base beats index
scan - LVT on RVT best strategy
38Conclusion
- Views are essential to enhance query power in
distributed systems. - RVTs index, equi-join and selection views
- LVTs group-by-aggregate views
- Deferred view maintenance easier since it uses
existing PNUTs mechanisms for replication and
recovery.
39References
- Asynchronous View maintenance for VLSD databases
- Parag Agrawal,Adam S, Brian C, Utkarsh S, Raghu
Ramakrishnan, SIGMOD 2009 - PNUTS Yahoo!s Hosted Data Serving Platform
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh
Srivastava, Adam Silberstein, Philip Bohannon,et
al
40Any Questions ???
41Thank You!