Putting the Scalability into Database Scalability Services - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Putting the Scalability into Database Scalability Services

Description:

Must set TTL=0 for strong consistency! ... If query and update are independent, saves consistency traffic. Query and update templates ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 85
Provided by: charlie51
Category:

less

Transcript and Presenter's Notes

Title: Putting the Scalability into Database Scalability Services


1
Scalable Query Result Caching for Web Applications
Bruce Maggs Carnegie Mellon University and Akamai
Technologies
Joint work with Charlie Garrod and Amit Manjhi
and Natassa Ailamaki, Phil Gibbons, Todd Mowry,
Chris Olston, and Anthony Tomasic.
2
Load at a Web site varies
3
Load can be unpredictable
4
The provisioning dilemma
  • Heavily overprovision systems?
  • Waste resources
  • Risk loss of availability?
  • Lose revenue, reputation

5
Static content vs. dynamic content
  • Changes infrequently
  • e.g., fixed web pages, images, movies
  • Often tailored for each request

Executecode
AccessDB
Web Server
6
Content Delivery Networks scale static content
Internet core
Users
CDN nodes
Content providers
7
CDN Application Services
CDNs can also run applications
Internet
DB
Users
  • but for data-intensive dynamic applications

database server becomes the bottleneck!
8
Our goal a Database Scalability Service (DBSS)
DBSS
DB
Users
9
Database Scalability Service
users
Content Delivery Network
DBSS
Internet
home server databases
10
Database Scalability Service
users
Web and application servers
DBSS
home server databases
11
Database Scalability Service
client apps
DBSS
Internet
home server databases
12
The challenges for a DBSS
  • Provide economical, on-demand scalability
  • New requests must reflect database updates
  • Provide data privacy for content providers
  • Should not increase end-user latency

13
One scalability approach database query result
caching
  • Simple map from queries to query results
  • Advantages
  • Easy, well-understood
  • Compatible with our privacy goals
  • Problems
  • Scalability limited by cache miss rate
  • Hard to keep caches up-to-date

14
Our solution The Ferdinand DBSS
part of shared cache
local cache
  • 2-tier caching local and cooperative
  • All updates sent to home database server

15
Our solution The Ferdinand DBSS
part of shared cache
local cache
  • Pub / sub for consistency management
  • Consistency is slightly relaxed

16
Outline
  • Need for on-demand scalability
  • Invalidation mechanism
  • Security-scalability tradeoff
  • Reducing latency

17
Addressing consistency
  • TTL is wasteful
  • Often refresh cached data unnecessarily
    (workloads dominated by reads)
  • Must set TTL0 for strong consistency!
  • Solution update or invalidate cached data only
    when affected by updates
  • Naïve approach home organizations notify proxy
    servers of relevant updates ? not scalable

Our approach Fully-distributed,
proxy-to-proxy update notification mechanism
18
Publish / subscribe for cache consistency
management
  • On caching a query Q
  • Subscribe to messages for updates that affect Q
  • On an update U
  • Publish U to notify all affected query caches

The challenge Relate (current) query to
(future) possible updates
19
Our solution Analyze the Web app
  • Determine which updates affect which queries
  • If query and update are independent, saves
    consistency traffic
  • Query and update templates
  • E.g. SELECT name FROM emp WHERE
    salary gt ?
  • UPDATE emp SET dept ?
    WHERE id ?

value set at run-time
20
Distributed Consistency Mechanism
users
proxy node
  • Distributed app-level multicast environment,
    e.g., Scribe
  • Forward all updates to backend home servers

21
Configuring Multicast Channels
  • Key observation Web applications typically
    interact with DB via a small, fixed set of
    query/update templates (usually 10-100)
  • Example
  • SELECT qty FROM inv WHERE id ?
  • UPDATE inv SET qty ? WHERE id ?

Templates natural way to configure channels
Options Channel-by-query or Channel-by-update
22
Channel-by-Query Option
  • One channel per query template Q C(Q)
  • Few subscriptions/cached result
  • Many invalidation notifications/update

Conflicts determined lazily (upon update)
23
Channel-by-Update Option
  • One channel per update template U C(U)
  • Many subscriptions/cached result
  • Few invalidation notifications/update

Conflicts determined eagerly (when caching Q)
24
Parameter-Specific Channels
  • Optimization consider parameter bindings
    supplied at runtime for example
  • Q5 SELECT qty FROM inv WHERE id ?
  • When issued with id 29, create extra
    parameter-specific channel C(5, 29)
  • Subscribe to both C(5) and C(5, 29)
  • Upon update
  • If update affects a single item with id X, send
    notification on channel C(5, X)
  • Saves work if X ? 29
  • Updates affecting multiple items sent to C(5)

25
Recall the DB bottleneck
Argh!
Content providers home DB server
26
The cache can be a bottleneck too
1 poor, lonely cache
Content providers home DB server
27
Scalable query caching is hard
  • Reduces chance of query reuse
  • Sends extra queries to home server

2 caches
Content providers home DB server
28
A solution Ferdinands 2-tier cache
  • If any node stores the current result for a
    query, the cooperative cache stores it too
  • Each query sent to home server at most once
    between updates
  • Possible drawbacks
  • Complicates consistency management
  • Checking the cooperative cache might introduce
    latency

29
Queries with Ferdinand
App server
query Q
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
DBSS nodes
part of shared cache
local cache
Content providers home DB server
30
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
31
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
part of shared cache
Qs master node
local cache
Content providers home DB server
32
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
33
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Q
Content providers home DB server
34
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
Response
35
Queries with Ferdinand
App server
Q
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
36
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
37
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
38
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
Response
local cache
Content providers home DB server
39
Updates with Ferdinand
App server
Ferdinand
update U
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
U
local cache
Response
Content providers home DB server
40
Consistency for a 2-tier cache
App server
Notify the cooperative cache first
Ferdinand
Publish / subscribe
Q
Q
U
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
41
Consistency for a 2-tier cache
App server
and then notify the local caches.
Ferdinand
Publish / subscribe
Q
Q
U
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
42
Evaluating Ferdinands 2-tier cache
  • 3 competing scalability approaches
  • Benchmarks and metrics
  • Our evaluation goal
  • Impact on cache hit rates and scalability
  • Performance in higher latency environments

43
Competitor 1 1-tier cache
  • No cooperative cache
  • Uses pub / sub like Ferdinand

44
Competitor 2 No cache
  • CDN-like proxy servers scale the Web and app
    servers

45
Competitor 3 No proxies
  • A good baseline to compare against

Argh!
Argh!
Banana?
46
Web application benchmarks
  • Simulate users as they browse online Web sites
  • TPC-W bookstore
  • Browsing mix (5 purchases)
  • Shopping mix (20 purchases)
  • RUBiS auction
  • RUBBoS bulletin board

47
Our scalability metric WIPS
  • Web Interactions Per Second
  • 90 of responses must meet a latency threshold
  • End-user latency for the whole Web request

48
Implementation details
  • Ferdinand 100 Java
  • Interface is a JDBC driver
  • MySQL for home DB
  • Apache Tomcat as Web / app server
  • Scribe publish / subscribe
  • Pastry DHT for cooperative cache

49
Experiments run on Emulab
  • DBSS nodes also run Web and app server

benchmark node

DBSS node
home database node
50
Miss rates for 1- and 2-tier caches
misses sent to home database server
51
Ferdinand excels on a LAN
52
Ferdinand at higher latencies
53
Ferdinand OK for medium latencies
bookstore browsing mix
54
Scalable consistency matters
55
Outline
  • Need for on-demand scalability
  • Invalidation mechanism
  • Security-scalability tradeoff
  • Reducing latency

56
Guaranteeing security in a DBSS setting
  • Limit ability to observe an applications data
    by
  • DBSS administrator
  • Unauthorized application through the DBSS
  • Security-Scalability tradeoff in the DBSS setting

Analyzing the code helps in managing this tradeoff
57
A simple solution for guaranteeing security
  • Outsource database scalability
  • Home server master copies of all datahandles
    updates directly
  • No query execution on the DBSS
  • DBSS caches query results (read-only)kept
    consistent by invalidation

All data passing through the DBSS can be
encrypted Query, Update, Query results
58
A Simple Example
toys (toy_id, toy_name)
No Invalidations
Nothing is encrypted
Empty
Q1 toy_id15
Q1
U1
DBSS
Home server Database
Q1 SELECT toy_id FROM toys WHERE toy_nameGI
Joe
U1 DELETE FROM toys WHERE toy_id5
Invalidate
Results are encrypted
Empty
Q1
Q1
U1
More encryption leads to more invalidations
59
Challenge providing scalability while
guaranteeing security
When updates occur, DBSS needs to invalidate
Application faces a dilemma in what data to
encrypt (secure)
More encryption
Less encryption
Conservative Invalidation
Precise Invalidation
Security
Scalability
Security-scalability tradeoff
60
Opportunity for managing the tradeoff
Not all data is equally sensitive
Data Sensitivity
Extremely sensitive
Completely insensitive
Moderately sensitive
Credit Card Information
Bestsellers list
Inventory records, customer records
Care but worried about scalability impact
Secure at all costs
Dont care
  • But for most data, nontrivial to assess
  • Data-sensitivity
  • Scalability impact of securing the data

61
Key Insight arbitrary queries and updates not
possible
function get_toy_id (toy_name)
templateSELECT toy_id FROM toys
WHERE toy_name? queryattach_to_template
(template, toy_name) execute (query)
62
Data not useful for invalidation examples
Example 1
Q1 SELECT toy_id FROM toys WHERE toy_name?
Q2 SELECT toy_name FROM toys WHERE toy_id?
No data is needed for precise invalidation
Example 2
Q1 SELECT toy_id FROM toys WHERE toy_name?
U1 DELETE FROM toys WHERE toy_id?
Query parameters are not needed for precise
invalidation (the query result is needed though)
63
Security without hurting scalability
Data not needed for invalidation
Can secure for free (without hurting
scalability)
Security Conscious Scalability Approach SIGMOD
06
As a result,
Tradeoff has to be only managed over remaining
data
64
Sample experiment methodology
  • Scalability max concurrent users with
    acceptable response times
  • Security templates with encrypted results
  • California Privacy Law determined sensitive data
  • Non-transactional invalidation
  • Start with a cold cache

65
Benchmark Applications
  • Bookstore (TPC-W, from UW-Madison)
  • Online bookseller, a standard web benchmark
  • Changed the popularity of books
  • Auction (RUBiS, from Rice)
  • Modeled after Ebay
  • Bulletin board (RUBBoS, from Rice)
  • Modeled after Slashdot

Benchmarks model popular websites
66
Security-Scalability Tradeoff
U1 DELETE FROM toys WHERE toy_id5
Security
Scalability
X denotes encrypted, visible
67
Magnitude of Security-Scalability tradeoff
Scalability (number of concurrent users supported)
0
0
Benchmark Applications
68
Security Results
Query data that can be encrypted for free
7
7
7
6
4
17
and result
14
18
12
Bboard
Bookstore
Auction
69
Security Results in Detail
  • Auction The historical record of user bids was
    not exposed
  • Bboard The rating users give one another based
    on the quality of their posting
  • Bookstore Book purchase association rules
    discovered by the vendor customers who purchase
    book A also purchase book B

70
Scalability Conscious Security Approach (SCSA)
to managing the tradeoff
900
Nothing
encrypted
600
Scalability (Number of concurrent users supported)
Everything
300
encrypted
0
0
5
10
15
20
25
30
Security (Number of query templates with
encrypted results)
1. Easy to either get good scalability or good
security 2. SCSA presents a shortcut to manage
the tradeoff
71
Outline
  • Need for on-demand scalability
  • Invalidation mechanism
  • Security-scalability tradeoff
  • Reducing latency

72
Contributors to User Latency
Request, high latency
Database
Web server
App server
Response, high latency
Traditional architecture
high latency
Database
DBSS
CDN
DBSS architecture
A single HTTP request ? Multiple database requests
72
73
Sample Web Application Code
function find_comments (user_id)
templateSELECT from_id, body FROM comments
WHERE to_id? queryattach_to_te
mplate (template, user_id) resultexecute
(query) foreach (row in result)
print (get_body (row), get_name (get_id
(row)))
  • (N1) queries are issued because
  • Convenient for programmers to abstract database
    values
  • No effect in the traditional setting

Found many examples in the benchmark applications
73
74
Reducing User Latency in a DBSS Setting
  • Transformations to reduce number of round-trips
  • Group execution of queries MERGING
    transformation
  • Overlap execution of queries NONBLOCKING
    transformation

Web Application Code
Transformed Code
Procedural program with embedded SQL
Holistic transformations using src-to-src
compilers
74
75
The MERGING Transformation
www.ebay.com
John
Names of users who have posted comments about
John
Content Delivery Network
1 Query
  • Find user_ids who have made comments
  • For each user_id, find name of the user

Database Scalability Service
N Queries
High latency
75
76
The MERGING Transformation
Find names of users who have commented about John
Names of users who have posted comments about
John
  • SELECT from_id, u.name
  • FROM comments, users u
  • WHERE from_id u.id AND to_id ?

?
  • Find user_ids who have made comments
  • For each user_id, find name of the user

Assuming constant cache hit rate, the
round-trips to the database decreases by a
factor of (N1)
76
77
The NONBLOCKING Transformation
www.amazon.com
John
Home page
Content Delivery Network
  • Greet user
  • Get names of related books

Database Scalability Service
High latency
Issue queries concurrently to reduce latency
77
78
Applicability of the Transformations
Either transformation applies to 25 (Auction),
75 (Bboard), and 50 (Bookstore) dynamic
runtime interactions
78
79
BBOARD Application Impact on Latency
Average latency in ms
Transformations
Overall latency decreases by 38, the DBSS-DB
latency decreases by 65
79
80
Impact of Latency on Scalability
Improved scalability
Scalability
Threshold
Latency curve
Latency
Reduced latency curve
Simultaneous users supported
Reducing latency improves scalability
80
81
Effect of the Transformations on Scalability
Scalability (number of concurrent users supported)
Applying both transformations yield the best
scalability
81
82
Related work database scalability for Web
applications
  • Database caching
  • DBCache, DBProxy, MTCache, NEC Cache Portal,
    MySQL
  • Database replication
  • many
  • Database outsourcing
  • Hacigumus ICDE02, Hacigumus SIGMOD02, Amazon
    SimpleDB, Amazon S3

83
Related work non-DB-oriented Web scalability
  • Caching Web application output
  • Challenger INFOCOM99, Challenger ACMTrans05,
    Chabbouh and Makpangou iiWAS05
  • Modifying the application design
  • Gao IEEETrans05, Wei WWW08

84
Conclusions
  • Ferdinands 2-tier cache very effective compared
    to 1-tier cache
  • Better miss rates and scalability
  • Pub / sub can manage consistency for a 2-tier
    cache
  • Results suggest that neither Ferdinand nor a
    1-tier cache should be fully distributed in a
    high-latency environment without additional
    techniques
Write a Comment
User Comments (0)
About PowerShow.com