Title: Dynamic Web Application Deployment
1Dynamic Web Application Deployment
- Instructor Dr. Zhang
- Presenter Ningfang Mi
- Date Nov. 3 2004
2Outline
- Motivation
- Challenge of Dynamic Content
- Approaches
- ESI
- CSI
- ACDN
- Discussion
- Conclusion
- Reference
3Motivation
- Caching
- Important tool to deal with the rate of requests
to Internet servers - Reduce network congestion
- Reduce page display time
- Client-centric proxy caching
- Server-centric
- reverse proxy caching
- content delivery network (CDN)
- Limitation mostly oriented toward static content
4Dynamic Web Pages
- Static Web pages are not ENOUGH!
- More and more pages contain dynamic content
- News headlines, stock information, current
temperature - Good news
- a more compelling experience for the end-user
- an easier development model for the application
designer - Bad news
- Bad for caching!
5Dynamic component
6Challenge of Dynamic Content
- Web developers frequently use technologies like
JavaServer Pages (JSP) and Active Server Pages
(ASP) to design their applications - But when traffic on Web sites increases, the
computing overhead can result in increasing
delays and failures in data delivery
7Challenge of Dynamic Content (2)
- Dynamic content places significant strain on
traditional Web site architectures - the same infrastructure used to generate the
content is used to deliver the content
www.esi.org
8Challenge of Dynamic Content (3)
- Generating dynamic content typically incurs
- network overhead as user requests are dispatched
to appropriate software modules that service
these requests - processing overhead as these modules determine
which data to fetch and present - disk I/O as these modules query the back-end
database - In short, building dynamic Web pages is
computationally expensive
9Challenge of Dynamic Content (4)
- Two major issues
- Site Experience and Effectiveness
- dynamic content, abandon rate, download speed
- Site Cost Structure
- investments to support scalability, reliability,
performance, system management, etc. - One more problem
- How to facilitate caching for dynamic Web pages?
10Approaches
- Caching dynamic responses and on mechanisms of
timely invalidation of the cached copies - Assembling a response at the edge from static and
dynamic components - Edge Side Includes (ESI) assembling
- Client Side Includes (CSI) assembling
- Application distribution networks, running
complete applications at the edge of the network - Application Content Delivery Network (ACDN)
11Fragment-based Technologies
- Dynamic pages are not all dynamic
- Most bytes are in a static page template
- Dynamic fragments are a small fraction of the
entire page - Different portions have different properties
- Template slow-changing content
- Fragment fast-changing content
12- Full page
- 30731 bytes
-
- news headlines
- 927 bytes (3)
- refetched every few hours
- stock quotes
- 1231 bytes (4)
- refetched every minute
13Reassembling Fragmented Page
- Historically, page is assembled at origin sites
using ASP, JSP, Server-Side Includes. - Edge Side Includes Language
- An XML-based mark-up language
- A mechanism to assemble page from different
components at edge servers (reverse proxies) - Separate cache control for each component
- Independently download changed fragments
14 Akamais Approach for ESI-encoded Contents
Origin server
Edge server
Browser
No ESI
Edge server
(template cached)
Origin server
Browser
Page Assembly
ESI with edge-side page assembly
15ESI -- Benefits and Limitation
- Key Benefits of ESI
- extends the performance and cost-saving benefits
of Web caching and content delivery services - Bottleneck of the Last Mile
- A large majority of Web users still rely on
dial-up connections. - Network traffic revenue analysis 79 of
consumer subscribers as of March 2002. - Jupiter Media Metrix, Aug 2001 59 of the
predicated on-line households in the US in 2006. - ESI does NOT help dial-up customers!
- The speed of the last mile dominates the page
display time.
16Client-Side Includes (CSI)
- Key idea
- Assemble page in the browsers
- Dramatically reduce user response time
- Not need browser modifications or configurations.
- Use existing technologies inside Internet
Explorer - Page parsing and assembly JavaScript
- Retrieval of page components ActiveX
- Free to use or not use a CDN
- Without client ?? origin server directly
- With client ?? edge server ?? origin server
- scalable delivery of page template and fragments
- Traffic reduction between client and edge server
17Page Assembly Alternatives
Edge server
Origin server
Browser
GET /www.att.com
GET /www.att.com
No ESI
Full page
Full page
Page Assembly
GET /www.att.com
GET /frag1.html
ESI with edge-side assembly
Frag1
Full page
(template cached)
(template cached)
ESI with client-side assembly
Page Assembly
18ESI vs. CSI
- Same markup language (ESI)
- ESI assembling
- Reduces bandwidth and server load
- CSI assembling
- Reduces connectivity costs at origin server (less
load and bandwidth). - Reduces CDN-related costs (less bandwidth from
edge to clients). - Reduces browser download times (less bandwidth at
last mile).
19Implementation of CSI
- Implement CSI for the prevalent browser only
(Microsoft Internet Explorer MSIE) - Resort to edge-side or server-side page assembly
for all other browsers
20Implementation (with a CDN)
Edge server
Origin server
- Javascript assemble a Web page
- ActiveX download page component
- Wrapper invoke Javascript and pass it the
URL of the requested page
Browser
21Performance Evaluation
- Synthetic pages random generated contents
- Sizes 20K, 60K, 100K
- Template (80) four fragments (5 each)
- ATT page http//www.att.com
- One template, two fragments
- Wall Street Journal page http//online.wsj.com
- One template, three fragments
22Display Time of Synthetic Pages
Over dial-up links
Conclusion Substantial reduction in display time
across all page sizes
23Bandwidth Reduction
ATT Page WSJ Page
Full Page 30731 (100) 79608 (100)
Page Template 28661 (93) 56324 (71)
Current Time N/A 55 (0)
News Headlines 927 (3) 20161 (25)
Stock Quotes 1231 (4) 3166 (4)
All numbers are in bytes
Conclusion CSI can achieve significant reduction
in bandwidth when the templates are
cached in the browser.
24Limitation of CSI
- The need to download the wrapper increases
latency when first time access the page. - Sequentially and synchronously downloading
fragments may slow down page assembly. - Javascript downloads the template and all
fragments only from the same Web site. - Some pages that are well suited to ESI assembly
may not be amenable to CSI. - Accessed by very many clients
- Once per client over a long interval
25Application Content Delivery Networks (ACDNs)
- Currently CDN provide access to static and
streaming content - Proxy caches can improve the delivery
- Unique CDN value
- Delivering dynamic content
- Proxy cant cache the dynamic content
- An Application CDN (ACDN)
- Deploy the application on a single computer
- Replicate or migrate the application as needed
26Issues of ACDN
- Application distribution framework
- Dynamically deploy a replica
- Keep consistency of replicas
- Content placement algorithm
- Decide which applications to deploy where and
when - Request distribution algorithm
- Decide how to distribute requests among replicas
- System stability reach a steady state
- Bandwidth overhead create replicas
27Architecture Overview
Standard Web server
Keep track of application replicas
Central Replicator
Compute request distribution policy
Load-balancing DNS
Client
Client
Client
28Architecture Overview
Invoked by system administrator when a new ACDN
server on-line
Central Replicator
Load-balancing DNS
Client
Client
Client
29Architecture Overview
Invoked by central replicator and report load of
the server
Central Replicator
Load-balancing DNS
Client
Client
Client
30Architecture Overview
Periodically examine every application to decide
replicate or delete
Central Replicator
Load-balancing DNS
Client
Client
Client
31ACDN Components
- Application distributed framework
- Dynamically create and delete application
replicas based on demand - Maintain replica consistency
- Content placement algorithms
- Request distribution algorithms
32Application Distributed Framework -- Metafile
- Two parts in a metafile
- A list of all files comprising the application
along with their last-modified dates - An initialization script (or a URL of the file
with the script) ran by the recipient server
before accepting any request
Metafile
Executable file
FILE /home/applications/mapping/query_engine.cgi
1999.apr.14.084612 FILE /home/applications/mapp
ing/map_database 2000.oct.15.131559 FILE
/home/applications/mapping/user_preferences
2001.jan.30.180005
Two Data files
SCRIPT mkdir /home/applications/mapping/access
_stats setenv ACCESS_DIRECTORY
/home/applications/mapping/access_stats
ENDSCRIPT
Create a directory
Set the environment variable
33Application Metafile
- A metafile is treated as a static Web page with
its own URL. - Using a metafile, the application distribution
framework can be implemented over standard HTTP. - Operations of framework
- Replica creation
- Replica deletion
- Replica consistency
Migration creation deletion
34Replica Creation
- Initiated by the decision process on the source
server
Central Replicator
overload
Source Server
Target Server
35Replica Creation
- Initiated by the decision process on the source
server
Central Replicator
Query for least-load server
Return the least-load server
overload
Source Server
Target Server
36Replica Creation
- Initiated by the decision process on the source
server
Unpack Install Execute initialization script
Central Replicator
Query for least-load server
Return the least-load server
overload
Source Server
Target Server
37Replica Creation
compute request distribution policy
- Initiated by the decision process on the source
server
Central Replicator
Query for least-load server
Return the least-load server
overload
Source Server
Target Server
38Replica Deletion
- Initiated by the decision process on a server
with the replica
compute request distribution policy
Mark the replica as deleted Delete it after the
TTL
Not the last replica
Source Server
TTL delay for the application requests arriving
due to earlier DNS responses
39Consistency Maintenance
- Only deal with the developer updates
- Three issues
- Replica divergence conflicting updates
- Only update the primary application replica
- Replica staleness and replica coherency
- Missing updates and updates not to all files
- If detect the cached metafile not valid, then
download the new metafile and copy all modified
objects from the primary server
40ACDN Algorithms
- Application distribution framework
- Content placement algorithm
- Decide which applications to deploy where and
when - Request distribution algorithm
41Content Placement Algorithm
- Executed periodically by ACDN server
- Make a local decision on deleting, replicating,
migrating its applications - For each application app
- (1) If demand below Deletion threshold, delete
app unless the only replica - (2) If demand from another servers region
exceeds Deletion threshold and replication
benefits are likely to exceed transfer overhead,
try to replicate there - (3) If demand from another servers region
exceeds 50 of total and migration benefits are
likely to exceed transfer overhead, try to
migrate there
Improve proximity of servers to client requests
42Content Placement Algorithm (2)
- If server is overloaded
- (1) Find the least-loaded server from central
replicator - (2) Replicate some applications there if the load
at the least-loaded server is above the deletion
threshold - (3) Otherwise, migrate some applications there if
its projected load after receiving the
application will remain acceptable (below LW)
Achieve load balancing among servers
43ACDN Algorithms
- Application distribution framework
- Content placement algorithm
- Request distribution algorithm
- Decide how to distribute requests among replicas
44Request Distribution Algorithm
- Goal Never skip the nearest non-overloaded
server and yet reduce oscillations in request
distribution - iDNS load-balancing DNS server
- Request distribution policy
- (R, Prob(1), , Prob(N))
- Prob(i) is the probability of selecting server i
for a request from the region R
45Request Distribution Algorithm(2)
- Three phases
- Assign the probability to each server based on
its load - Examine all servers with a replica of the
application in the order of the increasing
distance from the region - Normalize the probabilities of these servers so
that they sum up to one
46- Initial probabilities
- Set prob(i) 0 for all i
- Loop through the replicas in order of
decreasing proximity - if load(i) lt LW
- prob(i) 1.0
- exit
- else if LW lt load(i) lt HW
- prob(i) (HW load(i)) / (HW LW)
- Adjustments to distance from region R
- remainder 1.0
- Loop through the servers with a replica of the
application in order of increasing distance from
region R - prob(i) prob(i) remainder
- remainder remainder prob(i)
- Final probabilities
- if sum of all gt 0
- prob(i) prob(i) / sum of all
- else prob(i)1/n, where n is the number of
replicas
47ACDN Performance -- Request Distribution
- Three servers with decreasing proximity to all
clients - Server 1 is the closest, server 2 is the next
closest, server 3 is the farthest. - HW1000 request/second
- LW200 request/second
- Start with 10 clients, gradually increase to over
server capacity, then decrease back to 10 clients - ACDN, pure random and CDN brokering
- CDN brokering select the closest one with load lt
80 of its capacity
48Random
Prefect load balancing Unnecessary high latency
ACDN
Efficient using proximity information Avoid
overloading the replicas
CDN brokering
Consider both load and proximity But not as well
as ACDN
49ACDN Performance -- Content Placement
- 10 of servers are in hot regions with 90 of
demand - 90 of servers are in cold regions with 10 of
demand - The set of hot regions changed every 400 seconds,
see how the system adapts. - Two other algorithms
- Static a replica is created when the simulation
starts and is fixed through the simulation - Ideal can get instantaneous knowledge of hot
region and replicates or deletes application.
50static
static
ACDN
ACDN
Ideal
Ideal
Response Latency
Network bandwidth consumption
Conclusion Quickly adapt to the set of hot
regions and significantly reduce network
bandwidth and response time
51ACDN Performance -- Redeployment Threshold
- Low threshold ? more replicas
- response latency overhead
- High threshold ? less replicas
- response latency overhead
Conclusion threshold that are either too high or
too low result in increased bandwidth consumption
52Discussion
- Fragment-based techniques reduce bandwidth
because only modified fragments are needed to
transfer and most part of dynamic page are still
static. How about a totally dynamic page with
frequently changed fragments? - ACDN only consider read-only application. How to
deal with consistency when user updates the
application data?
53Conclusion
- Static Web pages are not ENOUGH!
- ESI
- reduces bandwidth and server load
- but not help dial-up customers!
- CSI
- reduces load and bandwidth at origin server,
bandwidth from edge to clients, bandwidth
consumption over the last mile and decrese
browser download time. - But not good for some pages that are accessed by
very many clients but once per client over a long
interval - ACDN
- A middleware platform for providing scalable
access to Web application - Unique CDN value dynamic Web page
54Reference
- www.esi.org
- Michael Rabinovich, et. al., Moving Edge-Side
Includes to the Real Edgethe Clients
Proceedings of the 4th USENIX Symposium on
Internet Technology, 2003. - Michael Rabinovich and Zhen Xiao, Computing on
the Edge A Platform for Replicating Internet.
Applications , Proceedings of WCW'03, 2003. - Arun Iyengar, Jim Challenger, Improving Web
Server Performance by Caching Dynamic Data,
USENIX Symposium on Internet Technologies and
Systems, 1997. - Fred Douglis, Antonio Haro, Michael Rabinovich,
HPP HTML Macro-Preprocessing to Support Dynamic
Document Caching , USENIX Symposium on Internet
Technologies and Systems, 1997.