Putting the Scalability into Database Scalability Services - PowerPoint PPT Presentation

1 / 84

About This Presentation

Title:

Putting the Scalability into Database Scalability Services

Description:

Must set TTL=0 for strong consistency! ... If query and update are independent, saves consistency traffic. Query and update templates ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 85

Provided by: charlie51

Category:

more less

Transcript and Presenter's Notes

Title: Putting the Scalability into Database Scalability Services

1
Scalable Query Result Caching for Web Applications
Bruce Maggs Carnegie Mellon University and Akamai
Technologies
Joint work with Charlie Garrod and Amit Manjhi
and Natassa Ailamaki, Phil Gibbons, Todd Mowry,
Chris Olston, and Anthony Tomasic.
2
Load at a Web site varies
3
Load can be unpredictable
4
The provisioning dilemma

Heavily overprovision systems?
Waste resources
Risk loss of availability?
Lose revenue, reputation

5
Static content vs. dynamic content

Changes infrequently
e.g., fixed web pages, images, movies

Often tailored for each request

Executecode
AccessDB
Web Server
6
Content Delivery Networks scale static content
Internet core
Users
CDN nodes
Content providers
7
CDN Application Services
CDNs can also run applications
Internet
DB
Users

but for data-intensive dynamic applications

database server becomes the bottleneck!
8
Our goal a Database Scalability Service (DBSS)
DBSS
DB
Users
9
Database Scalability Service
users
Content Delivery Network
DBSS
Internet
home server databases
10
Database Scalability Service
users
Web and application servers
DBSS
home server databases
11
Database Scalability Service
client apps
DBSS
Internet
home server databases
12
The challenges for a DBSS

Provide economical, on-demand scalability
New requests must reflect database updates
Provide data privacy for content providers
Should not increase end-user latency

13
One scalability approach database query result
caching

Simple map from queries to query results
Advantages
Easy, well-understood
Compatible with our privacy goals
Problems
Scalability limited by cache miss rate
Hard to keep caches up-to-date

14
Our solution The Ferdinand DBSS
part of shared cache
local cache

2-tier caching local and cooperative
All updates sent to home database server

15
Our solution The Ferdinand DBSS
part of shared cache
local cache

Pub / sub for consistency management
Consistency is slightly relaxed

16
Outline

Need for on-demand scalability
Invalidation mechanism
Security-scalability tradeoff
Reducing latency

17
Addressing consistency

TTL is wasteful
Often refresh cached data unnecessarily
(workloads dominated by reads)
Must set TTL0 for strong consistency!
Solution update or invalidate cached data only
when affected by updates
Naïve approach home organizations notify proxy
servers of relevant updates ? not scalable

Our approach Fully-distributed,
proxy-to-proxy update notification mechanism
18
Publish / subscribe for cache consistency
management

On caching a query Q
Subscribe to messages for updates that affect Q
On an update U
Publish U to notify all affected query caches

The challenge Relate (current) query to
(future) possible updates
19
Our solution Analyze the Web app

Determine which updates affect which queries
If query and update are independent, saves
consistency traffic
Query and update templates
E.g. SELECT name FROM emp WHERE
salary gt ?
UPDATE emp SET dept ?
WHERE id ?

value set at run-time
20
Distributed Consistency Mechanism
users
proxy node

Distributed app-level multicast environment,
e.g., Scribe
Forward all updates to backend home servers

21
Configuring Multicast Channels

Key observation Web applications typically
interact with DB via a small, fixed set of
query/update templates (usually 10-100)
Example
SELECT qty FROM inv WHERE id ?
UPDATE inv SET qty ? WHERE id ?

Templates natural way to configure channels
Options Channel-by-query or Channel-by-update
22
Channel-by-Query Option

One channel per query template Q C(Q)
Few subscriptions/cached result
Many invalidation notifications/update

Conflicts determined lazily (upon update)
23
Channel-by-Update Option

One channel per update template U C(U)
Many subscriptions/cached result
Few invalidation notifications/update

Conflicts determined eagerly (when caching Q)
24
Parameter-Specific Channels

Optimization consider parameter bindings
supplied at runtime for example
Q5 SELECT qty FROM inv WHERE id ?
When issued with id 29, create extra
parameter-specific channel C(5, 29)
Subscribe to both C(5) and C(5, 29)
Upon update
If update affects a single item with id X, send
notification on channel C(5, X)
Saves work if X ? 29
Updates affecting multiple items sent to C(5)

25
Recall the DB bottleneck
Argh!
Content providers home DB server
26
The cache can be a bottleneck too
1 poor, lonely cache
Content providers home DB server
27
Scalable query caching is hard

Reduces chance of query reuse
Sends extra queries to home server

2 caches
Content providers home DB server
28
A solution Ferdinands 2-tier cache

If any node stores the current result for a
query, the cooperative cache stores it too
Each query sent to home server at most once
between updates
Possible drawbacks
Complicates consistency management
Checking the cooperative cache might introduce
latency

29
Queries with Ferdinand
App server
query Q
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
DBSS nodes
part of shared cache
local cache
Content providers home DB server
30
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
31
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
part of shared cache
Qs master node
local cache
Content providers home DB server
32
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
33
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Cooperative caching via a DHT
part of shared cache
local cache
Q
Content providers home DB server
34
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
Response
35
Queries with Ferdinand
App server
Q
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
36
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
37
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
38
Queries with Ferdinand
App server
Ferdinand
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
Response
local cache
Content providers home DB server
39
Updates with Ferdinand
App server
Ferdinand
update U
Publish / subscribe
Q
Q
Cooperative caching via a DHT
Q
part of shared cache
U
local cache
Response
Content providers home DB server
40
Consistency for a 2-tier cache
App server
Notify the cooperative cache first
Ferdinand
Publish / subscribe
Q
Q
U
Cooperative caching via a DHT
Q
part of shared cache
local cache
Content providers home DB server
41
Consistency for a 2-tier cache
App server
and then notify the local caches.
Ferdinand
Publish / subscribe
Q
Q
U
Cooperative caching via a DHT
part of shared cache
local cache
Content providers home DB server
42
Evaluating Ferdinands 2-tier cache

3 competing scalability approaches
Benchmarks and metrics
Our evaluation goal
Impact on cache hit rates and scalability
Performance in higher latency environments

43
Competitor 1 1-tier cache

No cooperative cache
Uses pub / sub like Ferdinand

44
Competitor 2 No cache

CDN-like proxy servers scale the Web and app
servers

45
Competitor 3 No proxies

A good baseline to compare against

Argh!
Argh!
Banana?
46
Web application benchmarks

Simulate users as they browse online Web sites
TPC-W bookstore
Browsing mix (5 purchases)
Shopping mix (20 purchases)
RUBiS auction
RUBBoS bulletin board

47
Our scalability metric WIPS

Web Interactions Per Second
90 of responses must meet a latency threshold
End-user latency for the whole Web request

48
Implementation details

Ferdinand 100 Java
Interface is a JDBC driver
MySQL for home DB
Apache Tomcat as Web / app server
Scribe publish / subscribe
Pastry DHT for cooperative cache

49
Experiments run on Emulab

DBSS nodes also run Web and app server

benchmark node

DBSS node
home database node
50
Miss rates for 1- and 2-tier caches
misses sent to home database server
51
Ferdinand excels on a LAN
52
Ferdinand at higher latencies
53
Ferdinand OK for medium latencies
bookstore browsing mix
54
Scalable consistency matters
55
Outline

Need for on-demand scalability
Invalidation mechanism
Security-scalability tradeoff
Reducing latency

56
Guaranteeing security in a DBSS setting

Limit ability to observe an applications data
by
DBSS administrator
Unauthorized application through the DBSS

Security-Scalability tradeoff in the DBSS setting

Analyzing the code helps in managing this tradeoff
57
A simple solution for guaranteeing security

Outsource database scalability
Home server master copies of all datahandles
updates directly
No query execution on the DBSS
DBSS caches query results (read-only)kept
consistent by invalidation

All data passing through the DBSS can be
encrypted Query, Update, Query results
58
A Simple Example
toys (toy_id, toy_name)
No Invalidations
Nothing is encrypted
Empty
Q1 toy_id15
Q1
U1
DBSS
Home server Database
Q1 SELECT toy_id FROM toys WHERE toy_nameGI
Joe
U1 DELETE FROM toys WHERE toy_id5
Invalidate
Results are encrypted
Empty
Q1
Q1
U1
More encryption leads to more invalidations
59
Challenge providing scalability while
guaranteeing security
When updates occur, DBSS needs to invalidate
Application faces a dilemma in what data to
encrypt (secure)
More encryption
Less encryption
Conservative Invalidation
Precise Invalidation
Security
Scalability
Security-scalability tradeoff
60
Opportunity for managing the tradeoff
Not all data is equally sensitive
Data Sensitivity
Extremely sensitive
Completely insensitive
Moderately sensitive
Credit Card Information
Bestsellers list
Inventory records, customer records
Care but worried about scalability impact
Secure at all costs
Dont care

But for most data, nontrivial to assess
Data-sensitivity
Scalability impact of securing the data

61
Key Insight arbitrary queries and updates not
possible
function get_toy_id (toy_name)
templateSELECT toy_id FROM toys
WHERE toy_name? queryattach_to_template
(template, toy_name) execute (query)
62
Data not useful for invalidation examples
Example 1
Q1 SELECT toy_id FROM toys WHERE toy_name?
Q2 SELECT toy_name FROM toys WHERE toy_id?
No data is needed for precise invalidation
Example 2
Q1 SELECT toy_id FROM toys WHERE toy_name?
U1 DELETE FROM toys WHERE toy_id?
Query parameters are not needed for precise
invalidation (the query result is needed though)
63
Security without hurting scalability
Data not needed for invalidation
Can secure for free (without hurting
scalability)
Security Conscious Scalability Approach SIGMOD
06
As a result,
Tradeoff has to be only managed over remaining
data
64
Sample experiment methodology

Scalability max concurrent users with
acceptable response times
Security templates with encrypted results

California Privacy Law determined sensitive data
Non-transactional invalidation
Start with a cold cache

65
Benchmark Applications

Bookstore (TPC-W, from UW-Madison)
Online bookseller, a standard web benchmark
Changed the popularity of books
Auction (RUBiS, from Rice)
Modeled after Ebay
Bulletin board (RUBBoS, from Rice)
Modeled after Slashdot

Benchmarks model popular websites
66
Security-Scalability Tradeoff
U1 DELETE FROM toys WHERE toy_id5
Security
Scalability
X denotes encrypted, visible
67
Magnitude of Security-Scalability tradeoff
Scalability (number of concurrent users supported)
0
0
Benchmark Applications
68
Security Results
Query data that can be encrypted for free
7
7
7
6
4
17
and result
14
18
12
Bboard
Bookstore
Auction
69
Security Results in Detail

Auction The historical record of user bids was
not exposed
Bboard The rating users give one another based
on the quality of their posting
Bookstore Book purchase association rules
discovered by the vendor customers who purchase
book A also purchase book B

70
Scalability Conscious Security Approach (SCSA)
to managing the tradeoff
900
Nothing
encrypted
600
Scalability (Number of concurrent users supported)
Everything
300
encrypted
0
0
5
10
15
20
25
30
Security (Number of query templates with
encrypted results)
1. Easy to either get good scalability or good
security 2. SCSA presents a shortcut to manage
the tradeoff
71
Outline

Need for on-demand scalability
Invalidation mechanism
Security-scalability tradeoff
Reducing latency

72
Contributors to User Latency
Request, high latency
Database
Web server
App server
Response, high latency
Traditional architecture
high latency
Database
DBSS
CDN
DBSS architecture
A single HTTP request ? Multiple database requests
72
73
Sample Web Application Code
function find_comments (user_id)
templateSELECT from_id, body FROM comments
WHERE to_id? queryattach_to_te
mplate (template, user_id) resultexecute
(query) foreach (row in result)
print (get_body (row), get_name (get_id
(row)))

(N1) queries are issued because
Convenient for programmers to abstract database
values
No effect in the traditional setting

Found many examples in the benchmark applications
73
74
Reducing User Latency in a DBSS Setting

Transformations to reduce number of round-trips
Group execution of queries MERGING
transformation
Overlap execution of queries NONBLOCKING
transformation

Web Application Code
Transformed Code
Procedural program with embedded SQL
Holistic transformations using src-to-src
compilers
74
75
The MERGING Transformation
www.ebay.com
John
Names of users who have posted comments about
John
Content Delivery Network
1 Query

Find user_ids who have made comments
For each user_id, find name of the user

Database Scalability Service
N Queries
High latency
75
76
The MERGING Transformation
Find names of users who have commented about John
Names of users who have posted comments about
John

SELECT from_id, u.name
FROM comments, users u
WHERE from_id u.id AND to_id ?

Find user_ids who have made comments
For each user_id, find name of the user

Assuming constant cache hit rate, the
round-trips to the database decreases by a
factor of (N1)
76
77
The NONBLOCKING Transformation
www.amazon.com
John
Home page
Content Delivery Network

Greet user
Get names of related books

Database Scalability Service
High latency
Issue queries concurrently to reduce latency
77
78
Applicability of the Transformations
Either transformation applies to 25 (Auction),
75 (Bboard), and 50 (Bookstore) dynamic
runtime interactions
78
79
BBOARD Application Impact on Latency
Average latency in ms
Transformations
Overall latency decreases by 38, the DBSS-DB
latency decreases by 65
79
80
Impact of Latency on Scalability
Improved scalability
Scalability
Threshold
Latency curve
Latency
Reduced latency curve
Simultaneous users supported
Reducing latency improves scalability
80
81
Effect of the Transformations on Scalability
Scalability (number of concurrent users supported)
Applying both transformations yield the best
scalability
81
82
Related work database scalability for Web
applications

Database caching
DBCache, DBProxy, MTCache, NEC Cache Portal,
MySQL
Database replication
many
Database outsourcing
Hacigumus ICDE02, Hacigumus SIGMOD02, Amazon
SimpleDB, Amazon S3

83
Related work non-DB-oriented Web scalability

Caching Web application output
Challenger INFOCOM99, Challenger ACMTrans05,
Chabbouh and Makpangou iiWAS05
Modifying the application design
Gao IEEETrans05, Wei WWW08

84
Conclusions

Ferdinands 2-tier cache very effective compared
to 1-tier cache
Better miss rates and scalability
Pub / sub can manage consistency for a 2-tier
cache
Results suggest that neither Ferdinand nor a
1-tier cache should be fully distributed in a
high-latency environment without additional
techniques

Write a Comment

User Comments (0)