Caching Solutions to increase availability of Web Content - PowerPoint PPT Presentation

About This Presentation

Title:

Caching Solutions to increase availability of Web Content

Description:

... TimesTen -- mitigates only database access latency -- caching at table granularity results in poor cache utilization -- main-memory databases are difficult to ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 41

Provided by: MonicaCR7

Category:

more less

Transcript and Presenter's Notes

Title: Caching Solutions to increase availability of Web Content

1
Caching Solutions to increase availability of
Web Content
Krithi Ramamritham IIT Bombay krithi_at_cse.iitb.ern
et.in
2
Web Content

Web sites have traditionally served static
content
But, dynamic content generation has come into
vogue
generated on the fly by running dynamic scripts,
e.g., Active Server Pages (ASP), Java Server
Pages (JSP), Servlets
allows generation of different content for the
same request

3
Dynamic Web Pages
Web Page
A News content site
4
IIT Bombays aAQUA Community Forum
Farmers get information and get their questions
answered -- In the local context -- In their
local language
Capitalizes on existing human and infrastructural
resources Agri-extension center KVK,
Baramati NGO Vigyan Ashram, Pabal Corporate
infrastructure -- ITC e-chaupal Government
MCIT
www.aAQUA.org
5
Typical End-to-end Web Site Architecture
Application Server Cluster
Web Server Cluster
Data
. . . .
6
WS vs. AS

Web servers
Do well defined and quantifiable local work
e.g., processing HTTP headers, serving static
content
Application servers
Run multi-layer programs
e.g., scripts involving calls to backends

7
Application Layer Details
Servlets
8
The Problem Page Generation Delays

Causes of page generation delays include
(in addition to pure processing overhead)
Remote database accesses Heavy I/O loads,
Network delays
XML-HTML transformations Extensive processing
delays
Personalization logic e.g., Broadvision,
Vignette, etc.
Interaction bottlenecks e.g., database
connection pools
gt serious performance and scalability
problems
for web sites
due to increased load
on server-side infrastructure

9
Access over low bandwidthResource Optimization
Resource constraints Low/unpredictable bandwidth
gt disconnected operation/access Exploit cachi
ng prefetching (through prediction of future
needs) Profiling by user type, location Data
characteristics Static data text, images land
records, photos can be cached/hoarded Dynamic
data weather/price information cached info need
to be refreshed carefully Continuous media
VoIP, video data QoS considerations
10
Reducing delays

Approaches fall into 3 broad categories
Database caching
Page level caching
Fragment level caching

11
Alternative CDNs
Content Distribution Networks
12
Push Based Core Infrastructure

Resilient and efficient
content distribution network (CDN)
for dynamic data.
Existing CDNs
Akamai, Dynamai

13
Generic Architecture
wired hosts
sensors
Network
Network
mobile hosts
servers
Data sources
End-hosts
14
Generic Architecture
wired host
sensors
Network
Network
servers
Proxies /caches
mobile host
Data sources
End-hosts
15
The Push Approach

Proxy registers the data item of interest and the
coherency requirement with the server
Server pushes interesting changes
Achieves Strong Consistency
Keeps network overhead minimum
-- Poor Scalability (has to maintain state and
has to keep connections open)
-- Low Resiliency

16
The Pull Approach

Proxy Pulls after
Time to Live (TTL)
Time To next Refresh (TTR / TNR)
Can be implemented using the HTTP protocol
Stateless and hence is generally scalable with
respect to state space and computation
Weak cache consistency
Heavy polling for stringent coherence requirement
or highly dynamic data
Network overheads higher than for Push

17
Database Caching

Two broad types
Query result caching
Middle tier database caching
caching database tables in main memory

18
Query result caching

Many application server products offer this
feature
Luo et. al., 2000 proposed query result caching
at Web proxy caches
-- mitigates only local database access latency
-- only a subset of query results may be reused
in page generation
-- page fragments may not all be from databases

19
Middle tier database caching

Caching database tables in main memory
Oracle 9i Cache
Main-memory databases, e.g., TimesTen
-- mitigates only database access latency
-- caching at table granularity results in poor
cache utilization
-- main-memory databases are difficult to
integrate and maintain and can be expensive

20
Page Level Caching

Dynamically generated HTML pages are cached
Iyengar Challenger, 1997 Zhu Yang, 2000
Several commercially available products follow
this approach, e.g., SpiderCache, Xcache, Dynamai
Can completely offload work from web/app
server
Low reusability for highly personalized web pages
URL may not uniquely identify a page
-- increasing the risk of delivering
incorrect pages
Often introduces excessive invalidations
-- e.g., even if a single element on the
page changes

21
Reducing page generation delays

Approaches fall into 3 broad categories
Database caching
Page level caching
Fragment level caching

22
How Dynamic Scripting Works
Page generation script
Write to Out
Write to Out
. . .
23
Code Blocks Perform Work
Page generation script
Write to Out
Write to Out
. . .
. . .
24
Code Blocks lt-gt Components
Page generation script
Web Page
Ad Component
Write to Out
Headline Component
Headline Component
Navigation Component
Headline Component
Headline Component
Write to Out
. . .
Personalized Component
(Example News content site)
Certain components can be cached
25
DCA Our Solution
Page generation script
Code block
Request
Dynamic Content Accelerator
Code Block Output
Application logic
Code block
Work bypassed
Database calls
HTML formatting
. . .
26
DCA in a Typical End-to-end Web Site Architecture

A single instance of the DCA serves a rack of
application servers
Application servers communicate with DCA through
a lightweight API

Application Server Cluster
Web Server Cluster
Data
Dynamic Content Accelerator
27
Cache Management

A critical aspect of any caching solution
DCA supports novel cache management strategies
Prediction-based cache replacement
Observation-based cache invalidation

28
Cache Replacement

Prediction-based replacement
fragments having lowest probability of access
replaced
Least-Likely-to-be-Used (LLU)
Access probabilities based on
Current user navigational patterns over site
graph
(in the form of clickstreams)
Historical user navigational patterns over site
graph
(in the form of association rules)

(News, Sports, Hockey) ? Schedules 20
(News, Sports, Hockey) ? Players 15
LLU
(News, Sports, Hockey) ? Teams 10
(News, Sports, Hockey) ? Scores 55
29
Cache Invalidation

DCA supports common cache invalidation
techniques
Time-based Each cache element assigned a TTL
Event-based Updates to the database send an
invalidation message to the cache
On demand Manual invalidation of selected
elements
DCA supports additional invalidation techniques.

30
Cache Invalidation

Other invalidation techniques supported
Observation-based
User-initiated updates are observed in scripts
each such update sends an invalidation message to
the cache
Most appropriate for auction sites, online
trading sites
Invalidation does not require communication with
the databases
Keyword-based
Elements can be associated with keywords e.g.,
a retailer may wish to invalidate all
seasonal items
Regular expression-based
Elements can be invalidated based on regular
expression matching

31
Other Fragment Level Caching
app servers (e.g., BEAs WebLogic, IBMs
WebSphere) cache fragments produced by JSP
scripts
Application Server Cluster

can offload presentation layer tasks
runs in the application server process space
gt competes for server resources
application server cluster
gt multiple cache instances,
duplication of content,
additional synchronization overhead

32
Other Fragment Level Caching.

Weave system VLDB 2000 caches XML fragments, as
well as query results and HTML pages
Requires use of declarative web site
specification language

33
Performance Study

Metric
Average page generation time
time required to construct HTML page

34
Performance Study

Test Site
Fictitious online retail site, allows browsing of
product catalog
Pages generated using JSP scripts
Site content stored in Oracle database
Database schema based on Dublin Core Metadata
Open Standard
Contains 200,000 products and 44,000 categories
Each page consists of 3 components, each
involving a database call

35
Performance Study

Test Setup
Content Database Server
Oracle 8.1.6
Web/Application Server
WebLogic 6.0 running on cluster of 2 machines
Server machines
have 1 GB RAM, dual P III-933 Mhz processors
run Windows 2K Advanced Server

36
Testing Methodology

DCA compared to 2 middle tier caching solutions
Main Memory Database TimesTen used to cache the
content database (entire database is cached, runs
on database server machine)
Application Server Cache WebLogic Server JSP
caching (WLS Cache)
For both WLS and DCA, 2 (of 3) page components
are cached
Usually, DCA runs on a separate machine (512 MB
RAM, P III-600Mhz processor, running Windows 2K
Advanced Server)

37
Testing Methodology...

Baseline Parameters
Cache Size, i.e., percentage of fragments that
fit into cache 75
Cache replacement policy LLU for DCA
User load is varied by sending requests from
client machines running Radviews WebLoad
Simulated users navigate site according to Zipf
80-20 distribution (i.e., 80 of users follow 20
of navigation links)

38
Page Gen. Times vs. Number of Users
TimesTen vs. DCA -- 3x to 9x
improvement TimesTen only mitigates local
database access latency -- still requires
query processing, formatting operations
39
Page Generation Times...
WLS vs. DCA -- 2x to 5x improvement WLS runs
in application server process space, competes
for server resources WLS utilizes
multiple caches, causing redundant caching DCA
runs as single, standalone logical cache
40
Sensitivity to Cache Size
As expected, performance improves as cache size
increases Since cached elements are typically
quite small (e.g., a few hundred bytes), larger
cache sizes are feasible in practice
41
Conclusion

Increased use of dynamic page generation
technologies
gt increases load on application servers
gt serious performance and scalability
problems
for e-business sites
DCA (Dynamic Content Acceleration)
gt significantly reduces the load on the
server side infrastructure, allows e-business
sites to scale
gt significantly outperforms existing middle
tier caching solutions