Caching with - PowerPoint PPT Presentation

About This Presentation
Title:

Caching with

Description:

Hongfei Guo University of Wisconsin. Per- ke Larson Microsoft Research ... Updates. Backend DBMS. How to tell whether the cached data is 'good enough' for an ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 41
Provided by: hongf7
Category:

less

Transcript and Presenter's Notes

Title: Caching with


1
Caching with Good Enough Currency,
Consistency, and Completeness
  • Hongfei Guo University of Wisconsin
  • Per-Åke Larson Microsoft Research
  • Raghu Ramakrishnan University of Wisconsin

2
Motivation Scaling Google

3
Motivation Scaling A DBMS By Caching
  • How to tell whether the cached data is good
    enough for an application?
  • NO data quality requirements from the
    applications!
  • NO data quality guarantees from the caching DBMS!

Application Server
App specific code


Caching DBMS
Updates
Updates
Asynchronous
Backend DBMS
4
The Big Picture
  • Apps Specifies data quality requirements in
    queries
  • Cache Enforces data quality constraint
  • SIGMOD 2004 SIGMOD 2004 Demo
  • Cache admin Specify local data quality to be
    maintained by cache
  • (Data quality-aware database caching model)
  • This presentation
  • System performance evaluation
  • dissertation

View level granularity
Finer granularity (Partitions of a view)
5
Data Quality Metrics (informal)
  • Currency The elapsed time since this copy
    becomes stale
  • Consistency A query result is (snapshot)
    consistent iff it is as if evaluated from a
    snapshot of the master database
  • CC Currency Consistency

6
Roadmap
  • Background
  • Cache data quality properties
  • Cache property specification
  • Enforcing data quality constraints
  • Experiments
  • Future directions and conclusions

7
Why Define Cache Properties?
Query processing
Cache Properties ( contract)
Cache maintenance
8
Cache Properties (P3C)
  • Presence per object
  • Consistency a set of objects
  • Completeness per predicate
  • Currency object staleness

9
Basic Concepts
Tables
Object
H1
Snapshots
Master Database
Cache
H2
10
Cache Property Examples
Currency now stale point
Consistent
Complete
Present
H1
Master Database
Stale point
Cache
H2
11
Roadmap
  • Background
  • Cache data quality properties
  • Cache property specification
  • Enforcing data quality constraints
  • Experiments
  • Future directions and conclusions

12
Specifying Cache Properties
  • Specified as integrity constraints
  • Presence constraint
  • Consistency constraint
  • Completeness constraint
  • Presence correlation constraint
  • Consistency correlation constraint

13
Presence Constraint
AuthorCopy
authorId name city



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
Backend DBMS
AuthorList_PCT
authorId



1 2 3
Caching DBMS
14
Presence Constraint
Partially materialized view Zhou et al 2005
AuthorCopy
CREATE VIEW AuthorCopy AS SELECT FROM
Authors CREATE TABLE AuthorList_PCT (authorId
int) ALTER VIEW AuthorCopy ADD
ON authorId IN (SELECT authorId
FROM authorId_PCT
authorId name city



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
control-key
control-table
AuthorList_PCT
PRESENCE
authorId



1 2 3
15
Consistency Constraint
AuthorCopy
Cache Region
CREATE TABLE CityList_CsCT (city
string) ALTER VIEW AuthorCopy ADD
ON city IN (SELECT city
FROM cityList_CsCT
authorId name city



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
Backend DBMS
CityList_CsCT
AuthorList_PCT
AuthorList_PCT
Consistency
city



authorId



authorId



Madison
1 2 3
1 2 3
16
Completeness Constraint
AuthorCopy
CREATE TABLE CityList_CpCT (city
string) ALTER VIEW AuthorCopy ADD
ON city IN (SELECT city
FROM cityList_CsCT
authorId name city



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
Backend DBMS
AuthorList_PCT
CityList_CpCT
AuthorList_PCT
Completeness
authorId



city



authorId



Madison
1 3
1 3
17
Presence Correlation Constraint
AuthorCopy
AuthorList_PCT
authorId name city



authorId



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
authorId
1 2 3
authorId
BookCopy
Backend DBMS
isbn authorId title





111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3
eee
ALTER VIEW BookCopy ADD PRESENCE ON authorId IN
(SELECT authorId FROM AuthorCopy)
18
Presence Correlation Constraint
AuthorCopy
AuthorList_PCT
authorId name city



authorId



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
authorId
1 2 3
authorId
BookCopy
isbn authorId title





AuthorList_PCT
111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3
eee
authorId
AuthorCopy
authorId
BookCopy
19
Consistency Correlation Constraint
AuthorCopy
AuthorList_PCT
authorId name city



authorId



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
authorId
1 2 3
authorId
BookCopy
Backend DBMS
isbn authorId title





111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3
eee
ALTER VIEW BookCopy ADD CONSISTENCY ROOT
20
Consistency Correlation Constraint
AuthorCopy
AuthorList_PCT
authorId name city



authorId



1 Alice Madison 2 Bob Madison 3 Cedric Seattle
authorId
1 2 3
authorId
BookCopy
isbn authorId title





AuthorList_PCT
111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3
eee
authorId
AuthorCopy
authorId
BookCopy
21
Cache Schema Example
AuthorList_PCT
ReviewerList_PCT
authorId
reviewerId
AuthorCopy
ReviewerCopy
authorId
reviewId
ReviewCopy
BookCopy
isbn
22
Roadmap
  • Background
  • Cache data quality properties
  • Cache property specification
  • Enforcing data quality constraints
  • Experiments
  • Future directions and conclusions

23
Changing The Assumptions
Fully materialized views Consistent
views Push-based maintenance
Partially materialized views Row-level
consistency Pull-based maintenance
  • More general algorithms
  • Run-time check for consistency constraints that
    can not be validated at compile-time

24
Run-time CC Checking
  • When view V matches expression E

E
V
CC Guard
ChoosePlan
Remote plan requesting E
Local plan using V
Currency guard Check if local view V satisfies
currency requirement Consistency guard
Check if local view V satisfies consistency
requirement
25
Performance Evaluation Goals
  • Consistency guards overhead
  • Simple checks
  • A spectrum of checks ranging from simple to
    complicated

26
Experimental Setting
  • Back-end hosts a TPCD database tpcd1gh with scale
    factor 1.0 (1GB)
  • Cache server has a shadow of tpcd1gh
  • Two local views custCopy, orderCopy
  • LAN connection between cache and backend server

27
Queries Used
28
Simple Consistency Guards Overhead
1.6
1.72
Execution time (ms)
1.66
1.59
16.56
14.00
Local
Remote
29
Single Table Consistency Guard Overhead
8.79
7.48
2.33
6.06
4.95
(Qa is used)
Execution time (ms)
71.41
62.85
16.98
58.32
23.77
Local
Remote
30
Future Directions
  • Adaptive data quality aware caching policies
  • Control-table content?
  • Refresh intervals?
  • Improve current prototype
  • Read-write transactions?
  • Time-line constraints?
  • Automate cache design/tuning
  • How to get a good cache schema? (i.e., cache
    region granularity, assignment)
  • Apply good enough to other forms of
    replications
  • Indexing data?

31
Summary
  • Goal fine-grained data quality-aware cache
    management
  • A comprehensive solution
  • How the cache tracks data quality?
  • How admin specify cache properties?
  • How to maintain the cache efficiently?
  • How to do enforce CC constraints for queries?

So long, and thanks for all the fish!
  • Four cache properties
  • Dynamic cache model
  • Efficient cache maintenance and safety
  • Efficiently enforce CC checking

Questions?
32
(No Transcript)
33
Proposed SQL Syntax
SELECT FROM Books B, Reviews R WHERE B.bid
R.bid AND B.title Databases
Consistency class
Currency bound
Group by
CURRENCY BOUND 10 min ON (B, R) BY B.bid
CURRENCY BOUND 10 min ON (B),
30 min ON (R)
CURRENCY BOUND 10 min ON (B, R)
bid title author bid rid text
1 databases Raghu 1 1
1 databases Raghu 1 2
2 databases Ullman 2 3
34
Pull-Maintenance
  • Refresh a region by pulling query results
  • When refreshing a region, also refresh the
    affected closure
  • All overlapping regions
  • All correlated regions

35
Theoretical Results
  • Definition (Safe partially materialized views)
  • A partially materialized view V is safe if the
    following two conditions hold for every instance
    of the cache that satisfies all integrity
    constraints
  • For any pair of regions in V, either they dont
    overlap or one is contained in the other.
  • If V is gray, let X denote the set of regions in
    V defined by presence control-key values. X is a
    partitioning of V and no pair of regions in X is
    contained in any one region defined on V.
  • Cache schema design rules
  • Rule 1 A cache graph is a DAG.
  • Rule 2 Only red nodes can have independent
    completeness or consistency control-tables.
  • Rule 3 Every PMV with more than one parent must
    be a red circle.
  • Rule 4 If a PMV has the shared-row problem
    according to Lemma 5.2, then it cannot be gray.
  • Rule 5 A PMV cannot have non-compatible
    control-tables.

Syntactically checkable conditions (polynomial)
Property held for every instance
  • Theorem
  • Given a cache schema ltW, Egt, if it satisfies the
    design rules, then every PMV in W is safe.
    Conversely, if the schema violates one of these
    rules, there is an instance of the cache
    satisfying all specified integrity constraints in
    which some PMV is unsafe.

36
Pull-Maintenance
BookCopy
AuthorList_PCT
isbn authorId title





authorId



111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 4
eee
1 3 4
authorId
TitleList_CsCT
title


aaa
37
Pull-Maintenance
AuthorCopy
AuthorList_PCT
authorId name city


1 Alice Madison 3 Cedric Seattle
authorId
AuthorCopy
authorId
BookCopy
authorId
isbn authorId title





BookCopy
111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 3
eee
38
Inefficient Pulling
AuthorCopy
authorId name city


Shared-row problem
1 Alice Madison 3 Cedric Seattle
AuthorBookCopy
authorId
authorId isbn





BookCopy
1 111 1 222 1 333 3 111 3 555
isbn price title




111 10 aaa 222 20 bbb 333 30 ccc 555 50 eee
isbn
39
Issues
  • Inefficient pulling
  • Calculation of the affected closure requires
    checking the rows
  • Efficient pulling
  • The affected closure does NOT depend on the
    instance of a view
  • Only requires forward pull among correlated views

40
Related Work
  • Relaxing data quality
  • Distributed databases
  • Read-only transactions Garcia-Monina
  • et al. 1982
  • Demarcation protocol Barbará et al 1992
  • TACC Yu et al. 2000
  • Epsilon-serilizability Pu et al. 1992
  • Warehousing and web views
  • WebViews Labrinidis et al 2003
  • FAS Röhm et al. 2002
  • Obsolescent views Gal 1999
  • Distributed views Segev et al 1990
  • Freshness-driven web caching Li et al 2003
  • Replica management
  • Quasi-copies Alonso et al. 1998, Gallersdörfer
    et al. 1995
  • Good-enough views Seligman et al. 1997
  • TRAPP Olson et al. 2000
  • Caching
  • Database caching
  • DBCache Altinel et al. 2003
  • Constraint-based database caching Härder et al.
    2004
  • Mid-Tier caching TimesTen 2002
  • Shared-storage caching Khalil et al 2002
  • Others
  • Semantic caching Dar et al 1996
  • Cache in Postgres Stonebraker et al 1990
  • Predicate-based caching Keller et al 1996
  • WATCHMAN Scheuermann et al 1996
  • Cache investment Kossmann et al 2000
  • DECAF Kiernan et al 2000
  • Proxy caching Luo et al 2001
  • Uniqueness of our approach (query-centric)
  • Query Specifies fine-grained CC constraints
  • Admin Flexible local data quality control in
    terms of granularity and properties
  • Caching DBMS Provides CC guarantees for
    individual query
Write a Comment
User Comments (0)
About PowerShow.com