Title: How to live with lowintermittent bandwidthconnectivity
1How to live with low/intermittent
bandwidth/connectivity
Krithi Ramamritham IIT Bombay krithi_at_cse.iitb.ern
et.in
2Web Content
- Web sites have traditionally served static
content - But, dynamic content generation has come into
vogue - generated on the fly by running dynamic scripts,
e.g., Active Server Pages (ASP), Java Server
Pages (JSP), Servlets - allows generation of different content for the
same request
3Dynamic Web Pages
Web Page
A News content site
4Generic Architecture
wired hosts
sensors
Network
Network
mobile hosts
servers
Data sources
End-hosts
5Coherency of Dynamic Data
- Strong coherency
- The client and source always in sync with each
other - Strong coherency is expensive!
- Relax strong coherency ? - coherency
- Time domain ?t - coherency
- The client is never out of sync with the source
by more than ?t time units - eg Traffic data not stale by more than a minute
- Value domain ?v - coherency
- The difference in the data values at the client
and the source bounded by ?v at all times - eg Only interested in temperature changes larger
than 1 degree
6Generic Architecture
wired host
sensors
Network
Network
servers
Proxies /caches
mobile host
Data sources
End-hosts
7The Push Approach
- Proxy registers the data item of interest and the
coherency requirement with the server - Server pushes interesting changes
- Achieves Strong Consistency
- Keeps network overhead minimum
- -- Poor Scalability (has to maintain state and
has to keep connections open) - -- Low Resiliency
8The Pull Approach
- Proxy Pulls after
- Time to Live (TTL)
- Time To next Refresh (TTR / TNR)
- Can be implemented using the HTTP protocol
- Stateless and hence is generally scalable with
respect to state space and computation - Weak cache consistency
- Heavy polling for stringent coherence requirement
or highly dynamic data - Network overheads higher than for Push
9Typical End-to-end Web Site Architecture
Application Server Cluster
Web Server Cluster
Data
. . . .
10WS vs. AS
- Web servers
- Do well defined and quantifiable local work
- e.g., processing HTTP headers, serving static
content - Application servers
- Run multi-layer programs
- e.g., scripts involving
- calls to backends
11Inside the Application Layer3-tier model
HTML
PRESENTATION
Objects
ADDTL SERVICES
BUSINESS LOGIC
Row Set
- Commerce
- Content Mgt.
- Personalization
DATA CONNECTOR
Legacy Systems
Databases
12Inside the Application Layer
Code Block(s)
PRESENTATION
. . .
ADDTL SERVICES
Code Block(s)
BUSINESS LOGIC
. . .
- Commerce
- Content Mgt.
- Personalization
DATA CONNECTOR
4. DBMS calls storage system
Legacy Systems
Databases
13Performance and Scalability Issues
- Computationally-intensive logic executed
atmultiple tiers - Cross-tier communication
- Object instantiation and cleanup processing
- External I/O calls
- Database connection pool latencies
- Content conversion and formatting
14Optimizing the Application LayerTraditional Means
- Optimize each tier independently
- Presentation-level caches built inside
application server processes - Main memory database employed over persistent
DBMS - Persistent object storage techniques employed
inside content management systems and so on
Local cache and optimization code
15Query result caching
- Many application server products
- offer this feature
- -- mitigates only local database access latency
- -- only a subset of query results may be reused
in page generation - -- page fragments may not all be from databases
16Middle tier database caching
- Caching database tables in main memory
- Oracle 9i Cache
- Main-memory databases, e.g., TimesTen
- -- mitigates only database access latency
- -- caching at table granularity results in poor
cache utilization - -- main-memory databases are difficult to
integrate and maintain and can be expensive
17Page Level Caching
- Dynamically generated HTML pages are cached
- Can completely offload work from web/app
server - Low reusability for highly personalized web pages
- URL may not uniquely identify a page
- -- increasing the risk of delivering
incorrect pages - Often introduces excessive invalidations
- -- e.g., even if a single element on the
page changes
18Optimizing the Application LayerIssues
- Traditional techniques impact specific components
within the application, but not the entire
application - No mitigation of component-to-component
interaction latencies - Different synchronization and invalidation
policies risk data integrity - Each optimization scheme consumes programmer
timefor development and maintenance
19Key ideas
- Re-use program results to eliminate redundant
work -
- Facilitate single-point, architecture-wide
optimization - Apply to both
- programmatic objects and result fragments
20Optimizing the Application Layer
PRESENTATION
ADDTL SERVICES
BUSINESS LOGIC
Enables the results of programs to be re-used.
- Commerce
- Content Mgt.
- Personalization
DATA CONNECTOR
Legacy Systems
Databases
21Usually.
Legacy Systems
Plus, at each step there are communication delays
and logic processing delays
22Novel Solution
Can store any program output, but is most
commonly an HTML fragment or a Programmatic
Object.
Appl. Programming Interface
Chutney tags
Real-time storage engine
Code Block(s)
PRESENTATION
. . .
Function
Parameter(s)
Result
Code Block(s)
BUSINESS LOGIC
. . .
Tags trigger calls to the storage engine.
When the Result of a Function with a
specific Parameter set is already known (and
up-to-date), the work normally necessary to
produce that Result is bypassed.
DATA CONNECTOR
23Code Blocks Perform Work
Page generation script
Write to Out
Write to Out
. . .
. . .
24Code Blocks lt-gt Components
Page generation script
Web Page
Ad Component
Write to Out
Headline Component
Headline Component
Navigation Component
Headline Component
Headline Component
Write to Out
. . .
Personalized Component
(Example News content site)
Certain components can be cached
25DCA Our Solution
Page generation script
Code block
Request
Dynamic Content Accelerator
Code Block Output
Application logic
Code block
Work bypassed
Database calls
HTML formatting
. . .
26DCA in a Typical End-to-end Web Site Architecture
- A single instance of the DCA serves a rack of
application servers - Application servers communicate with DCA through
a lightweight API
Application Server Cluster
Web Server Cluster
Data
Dynamic Content Accelerator
27Cache Management
- A critical aspect of any caching solution
- DCA supports novel cache management strategies
- Prediction-based cache replacement
- Observation-based cache invalidation
28Cache Replacement
- Prediction-based replacement
- fragments having lowest probability of access
replaced - Least-Likely-to-be-Used (LLU)
- Access probabilities based on
- Current user navigational patterns over site
graph - (in the form of clickstreams)
- Historical user navigational patterns over site
graph - (in the form of association rules)
(News, Sports, Hockey) ? Schedules 20
(News, Sports, Hockey) ? Players 15
LLU
(News, Sports, Hockey) ? Teams 10
(News, Sports, Hockey) ? Scores 55
29Cache Invalidation
- DCA supports common cache invalidation
techniques - Time-based Each cache element assigned a TTL
- Event-based Updates to the database send an
invalidation message to the cache - On demand Manual invalidation of selected
elements - DCA supports additional invalidation techniques.
30Cache Invalidation
- Other invalidation techniques supported
- Observation-based
- User-initiated updates are observed in scripts
each such update sends an invalidation message to
the cache - Most appropriate for auction sites, online
trading sites - Invalidation does not require communication with
the databases - Keyword-based
- Elements can be associated with keywords e.g.,
a retailer may wish to invalidate all
seasonal items - Regular expression-based
- Elements can be invalidated based on regular
expression matching
31Performance Study
- Test Site
- Fictitious online retail site, allows browsing of
product catalog - Pages generated using JSP scripts
- Site content stored in Oracle database
- Database schema based on Dublin Core Metadata
Open Standard - Contains 200,000 products and 44,000 categories
- Each page consists of 3 components, each
involving a database call
32Performance Study
- Test Setup
- Content Database Server
- Oracle 8.1.6
- Web/Application Server
- WebLogic 6.0 running on cluster of 2 machines
- Server machines
- have 1 GB RAM, dual P III-933 Mhz processors
- run Windows 2K Advanced Server
33Testing Methodology...
- Baseline Parameters
- Cache Size, i.e., percentage of fragments that
fit into cache 75 - Cache replacement policy LLU
- User load is varied by sending requests from
client machines running Radviews WebLoad - Simulated users navigate site according to Zipf
80-20 distribution (i.e., 80 of users follow 20
of navigation links)
34Performance Impact
80 faster response times through existing
application infrastructure
Source Fortune 100 client results
35Chutney Throughput Impact
250 increase in transaction rates
Source Fortune 100 client results
36Alternative CDNs
Content Distribution Networks
Push BasedCore Infrastructure
37Conclusion
- Increased use of dynamic page generation
technologies - gt increases load on application servers
- gt serious performance and scalability
problems - for e-business sites
-
- DCA (Dynamic Content Acceleration)
- gt significantly reduces the load on the
server side infrastructure, allows e-business
sites to scale - gt significantly outperforms existing middle
tier caching solutions
38IIT Bombays aAQUA Community Forum
Farmers get information and get their questions
answered -- In the local context -- In their
local language
Capitalizes on existing human and infrastructural
resources Agri-extension center KVK,
Baramati NGO Vigyan Ashram, Pabal Government
MCIT
www.aAQUA.org
39Access over low bandwidthResource Optimization
Resource constraints Low/unpredictable bandwidth
gt disconnected operation/access Exploit cachi
ng prefetching (through prediction of future
needs) Profiling by user type, location
gtoffline aAQUA Data characteristics Static data
text, images land records, photos can be
cached/hoarded Dynamic data weather/price
information cached info need to be refreshed
carefully Continuous media VoIP, video data QoS
considerations