Data Dissemination on the Web - PowerPoint PPT Presentation

About This Presentation

Title:

Data Dissemination on the Web

Description:

... scripts, e.g., Active Server Pages (ASP), Java Server Pages (JSP), Servlets ... Many application server products offer this feature ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 43

Provided by: monicacr

Category:

more less

Transcript and Presenter's Notes

Title: Data Dissemination on the Web

1
Data Dissemination on the Web
Krithi Ramamritham IIT Bombay krithi_at_cse.iitb.ern
et.in
2
Web Content

Web sites have traditionally served static
content
But, dynamic content generation has come into
vogue
generated on the fly by running dynamic scripts,
e.g., Active Server Pages (ASP), Java Server
Pages (JSP), Servlets
allows generation of different content for the
same request

3
Dynamic Web Pages
Web Page
A News content site
4
Generic Architecture
wired hosts
sensors
Network
Network
mobile hosts
servers
Data sources
End-hosts
5
Coherency of Dynamic Data

Strong coherency
The client and source always in sync with each
other
Strong coherency is expensive!
Relax strong coherency ? - coherency
Time domain ?t - coherency
The client is never out of sync with the source
by more than ?t time units
eg Traffic data not stale by more than a minute
Value domain ?v - coherency
The difference in the data values at the client
and the source bounded by ?v at all times
eg Only interested in temperature changes larger
than 1 degree

6
Generic Architecture
wired host
sensors
Network
Network
servers
Proxies /caches
mobile host
Data sources
End-hosts
7
The Push Approach

Proxy registers the data item of interest and the
coherency requirement with the server
Server pushes interesting changes
Achieves Strong Consistency
Keeps network overhead minimum
-- Poor Scalability (has to maintain state and
has to keep connections open)
-- Low Resiliency

8
The Pull Approach

Proxy Pulls after
Time to Live (TTL)
Time To next Refresh (TTR / TNR)
Can be implemented using the HTTP protocol
Stateless and hence is generally scalable with
respect to state space and computation
Weak cache consistency
Heavy polling for stringent coherence requirement
or highly dynamic data
Network overheads higher than for Push

9
Typical End-to-end Web Site Architecture
Application Server Cluster
Web Server Cluster
Data
. . . .
10
WS vs. AS

Web servers
Do well defined and quantifiable local work
e.g., processing HTTP headers, serving static
content
Application servers
Run multi-layer programs
e.g., scripts involving calls to backends

11
Application Layer Details
Servlets
12
The Problem Page Generation Delays

Causes of page generation delays include
(in addition to pure processing overhead)
Remote database accesses Heavy I/O loads,
Network delays
XML-HTML transformations Extensive processing
delays
Personalization logic e.g., Broadvision,
Vignette, etc.
Interaction bottlenecks e.g., database
connection pools
gt serious performance and scalability
problems
for web sites
due to increased load
on server-side infrastructure

13
Reducing delays

Approaches fall into 3 broad categories
Database caching
Page level caching
Fragment level caching

14
Alternative CDNs
Content Distribution Networks
15
Push Based Core Infrastructure

Resilient and efficient
content distribution network (CDN)
for dynamic data.
Existing CDNs
Akamai, Dynamai

16
Database Caching

Two broad types
Query result caching
Middle tier database caching
caching database tables in main memory

17
Query result caching

Many application server products offer this
feature
Luo et. al., 2000 proposed query result caching
at Web proxy caches
-- mitigates only local database access latency
-- only a subset of query results may be reused
in page generation
-- page fragments may not all be from databases

18
Middle tier database caching

Caching database tables in main memory
Oracle 9i Cache
Main-memory databases, e.g., TimesTen
-- mitigates only database access latency
-- caching at table granularity results in poor
cache utilization
-- main-memory databases are difficult to
integrate and maintain and can be expensive

19
Page Level Caching

Dynamically generated HTML pages are cached
Iyengar Challenger, 1997 Zhu Yang, 2000
Several commercially available products follow
this approach, e.g., SpiderCache, Xcache, Dynamai
Can completely offload work from web/app
server
Low reusability for highly personalized web pages
URL may not uniquely identify a page
-- increasing the risk of delivering
incorrect pages
Often introduces excessive invalidations
-- e.g., even if a single element on the
page changes

20
Reducing page generation delays

Approaches fall into 3 broad categories
Database caching
Page level caching
Fragment level caching

21
How Dynamic Scripting Works
Page generation script
Write to Out
Write to Out
. . .
22
Code Blocks Perform Work
Page generation script
Write to Out
Write to Out
. . .
. . .
23
Code Blocks lt-gt Components
Page generation script
Web Page
Ad Component
Write to Out
Headline Component
Headline Component
Navigation Component
Headline Component
Headline Component
Write to Out
. . .
Personalized Component
(Example News content site)
Certain components can be cached
24
DCA Our Solution
Page generation script
Code block
Request
Dynamic Content Accelerator
Code Block Output
Application logic
Code block
Work bypassed
Database calls
HTML formatting
. . .
25
DCA in a Typical End-to-end Web Site Architecture

A single instance of the DCA serves a rack of
application servers
Application servers communicate with DCA through
a lightweight API

Application Server Cluster
Web Server Cluster
Data
Dynamic Content Accelerator
26
Cache Management

A critical aspect of any caching solution
DCA supports novel cache management strategies
Prediction-based cache replacement
Observation-based cache invalidation

27
Cache Replacement

Prediction-based replacement
fragments having lowest probability of access
replaced
Least-Likely-to-be-Used (LLU)
Access probabilities based on
Current user navigational patterns over site
graph
(in the form of clickstreams)
Historical user navigational patterns over site
graph
(in the form of association rules)

(News, Sports, Hockey) ? Schedules 20
(News, Sports, Hockey) ? Players 15
LLU
(News, Sports, Hockey) ? Teams 10
(News, Sports, Hockey) ? Scores 55
28
Cache Invalidation

DCA supports common cache invalidation
techniques
Time-based Each cache element assigned a TTL
Event-based Updates to the database send an
invalidation message to the cache
On demand Manual invalidation of selected
elements
DCA supports additional invalidation techniques.

29
Cache Invalidation

Other invalidation techniques supported
Observation-based
User-initiated updates are observed in scripts
each such update sends an invalidation message to
the cache
Most appropriate for auction sites, online
trading sites
Invalidation does not require communication with
the databases
Keyword-based
Elements can be associated with keywords e.g.,
a retailer may wish to invalidate all
seasonal items
Regular expression-based
Elements can be invalidated based on regular
expression matching

30
Other Fragment Level Caching
app servers (e.g., BEAs WebLogic, IBMs
WebSphere) cache fragments produced by JSP
scripts
Application Server Cluster

can offload presentation layer tasks
runs in the application server process space
gt competes for server resources
application server cluster
gt multiple cache instances,
duplication of content,
additional synchronization overhead

31
Other Fragment Level Caching.

Weave system VLDB 2000 caches XML fragments, as
well as query results and HTML pages
Requires use of declarative web site
specification language

32
Performance Study

Metric
Average page generation time
time required to construct HTML page

33
Performance Study

Test Site
Fictitious online retail site, allows browsing of
product catalog
Pages generated using JSP scripts
Site content stored in Oracle database
Database schema based on Dublin Core Metadata
Open Standard
Contains 200,000 products and 44,000 categories
Each page consists of 3 components, each
involving a database call

34
Performance Study

Test Setup
Content Database Server
Oracle 8.1.6
Web/Application Server
WebLogic 6.0 running on cluster of 2 machines
Server machines
have 1 GB RAM, dual P III-933 Mhz processors
run Windows 2K Advanced Server

35
Testing Methodology

DCA compared to 2 middle tier caching solutions
Main Memory Database TimesTen used to cache the
content database (entire database is cached, runs
on database server machine)
Application Server Cache WebLogic Server JSP
caching (WLS Cache)
For both WLS and DCA, 2 (of 3) page components
are cached
Usually, DCA runs on a separate machine (512 MB
RAM, P III-600Mhz processor, running Windows 2K
Advanced Server)

36
Testing Methodology...

Baseline Parameters
Cache Size, i.e., percentage of fragments that
fit into cache 75
Cache replacement policy LLU for DCA
User load is varied by sending requests from
client machines running Radviews WebLoad
Simulated users navigate site according to Zipf
80-20 distribution (i.e., 80 of users follow 20
of navigation links)

37
Page Gen. Times vs. Number of Users
TimesTen vs. DCA -- 3x to 9x
improvement TimesTen only mitigates local
database access latency -- still requires
query processing, formatting operations
38
Page Generation Times...
WLS vs. DCA -- 2x to 5x improvement WLS runs
in application server process space, competes
for server resources WLS utilizes
multiple caches, causing redundant caching DCA
runs as single, standalone logical cache
39
Sensitivity to Cache Size
As expected, performance improves as cache size
increases Since cached elements are typically
quite small (e.g., a few hundred bytes), larger
cache sizes are feasible in practice
40
Conclusion

Increased use of dynamic page generation
technologies
gt increases load on application servers
gt serious performance and scalability
problems
for e-business sites
DCA (Dynamic Content Acceleration)
gt significantly reduces the load on the
server side infrastructure, allows e-business
sites to scale
gt significantly outperforms existing middle
tier caching solutions

41
IIT Bombays aAQUA Community Forum
Farmers get information and get their questions
answered -- In the local context -- In their
local language
Capitalizes on existing human and infrastructural
resources Agri-extension center KVK,
Baramati NGO Vigyan Ashram, Pabal Corporate
infrastructure -- ITC e-chaupal Government
MCIT
www.aAQUA.org
42
Access over low bandwidthResource Optimization
Resource constraints Low/unpredictable bandwidth
gt disconnected operation/access Exploit cachi
ng prefetching (through prediction of future
needs) Profiling by user type, location Data
characteristics Static data text, images land
records, photos can be cached/hoarded Dynamic
data weather/price information cached info need
to be refreshed carefully Continuous media
VoIP, video data QoS considerations

Write a Comment

User Comments (0)