Title: Advanced Internet and Web Systems
1Advanced Internet and Web Systems
2Outline of the Talk
- Syllabus
- Introduction to WWW Systems
- Survey of Web Cluster Systems
- Survey of Caching Techniques
- Server Selection and Load Balancing
3Introduction to WWW Systems
Web Server Hosting web pages
Retrieving web pages using HTTP protocol
Web Authoring System create web pages
Internet
Web Client Browser
Publish web pages
Scanner
Video capture
Sound card
Web page document written in HTML
4What is Unique in WWW?
- Hyperlink Use Hypertext Markup Language HTML to
describe the document in ASCII text (extended to
iso-8859-1) - Naming scheme Name object in the web with
Universal Resource Locator (URL) with
syntaxprotocol//domain_name/lturi or path namegt - HTTP HyperText Transfer Protocola simple
request-response protocol for transferring HTML
documents - ASCII text based (not binary, therefore easy to
debug)
5Web Authoring System
- Text Editor type in HTML lttaggt and content
- HTML Editor like normal word processor, user did
not have know a lot about HTML syntax. Provide
easy upload/download function. - Dreamweaver
- Netscape Page Composer, MS Front Page
- Front Page takes another step by providing
templates and hyperlink management functions - Most desktop publishing software and word
processor have built-in converters to convert
from their internal format to HTML format. For
example - FrameMaker, Office97(require special viewer)
6Web Delivery Systems
- Delivery web documents efficiently and reliably
to the web clients. - Content Distribution and Content Delivery
- Performance is decided by
- Web server performance
- Network path performance
- Client browser performance.
- Use multiple physical servers (server farm), and
multiple server farms in wide area. - New generation of proxy servers/content switches
emerge.
7Content Delivery Network (CDN)
Slow Response
Huge Requests
_at_Home
Clients
PSINet
Server Crash
MindSpring
Clients
8Content Delivery Problems
http//www.akamai.com
9Use Client Cache/Client Side Cache Server
Fewer Requests
Clients
_at_Home
PSINet
Fast Response
Sprint
UUnet
Client Cache
Gloobix
MindSpring
Client Side Cache Server
Clients
Clients
10Use Mirror Sites
Need improvement by guiding the selection of
mirror servers with server load/network bandwidth
measurement
Mirror Site
Clients
_at_Home
PSINet
Clients
MindSpring
Fast Response
Clients
11Edge Network Cache Servers
Fast Response
Clients
_at_Home
PSINet
Client Cache
MindSpring
Edge Network Cache Server
Client Side Cache Server
Clients
Clients
12Architecture solutions for scalable Web-server
systems (Fig. 1)
13Fig. 2. Model architecture for a locally
distributed Web system
14Fig. 3. Architecture of a cluster-based Web system
15Fig. 4. Architecture of a virtual Web cluster
16Fig. 5. Architecture of a distributed Web system
17Content Distribution
- Secure, automate content/application distribution
to single (multiple server)/wide area Internet
sites. - Provide replication, synchronization, staged
rollout and roll back. - With revision control, transmit only updates.
- User-defined file distribution profiles/rules
18Content Delivery Problem
- Cache Location Problem Where to put cache
servers? - How many are needed?
- When/where/how to push/delivery the content?
- How about dynamic content?
19Akamai Edge Delivery Service
Date of Edge Servers of Networks of Countries
11/2000 6000 335 54
6/2001 9700 650 56
- Peering Bottleneck Problem Access traffic
evenly spread over 7400 networks (no one over
5 most ltlt 1)? Need to put edge servers in
many networks. - 11/2000, 4 billion bits/day for 2800 sites.
- Source Http//www.akamai.com
20F5 Web System Product
21BIG/ip - Delivers High Availability
- E-commerce - ensures sites are not only
up-and-running, but taking orders - Fault-tolerance - eliminates single points of
failure - Content Availability - verifies servers are
responding with the correct content - Directory Authentication - load balance
multiple directory and/or authentication services
(LDAP, Radius, and NDS) - Portals/Search Engines Using EAV administrators
perform key-word searches - Legacy Systems - Load balance services to
multiple interactive services - Gateways Load balance gateways (SAA, SNA, etc.)
- E-mail (POP, IMAP, SendMail) - Balances traffic
across a large number of mail servers
223DNS Intelligent Load Balancing
- Intelligent Load Balancing
- QoS Load Balancing
- Quality of Service load balancing is the ability
to select apply different load balancing methods
for different users or request types - Modes of Load Balancing
- Round Robin Ratio
- Least Connections Random
- User-defined Quality-of-Service Round Trip Time
- Completion Rate (Packet Loss) BIG/ip Packet Rate
- Global Availability HOPS
- Topology Distribution Access Control
- LDNS Round Robin Dynamic Ratio
- E-Commerce
23GLOBAL-SITE Replicate Multiple Servers and Sites
- File archiving engine and scheduler for automated
site and server replication - BIG-IP controls server availability during
replication and synchronization - Gracefully shutdown for update
- update in group/scheduled manner
- FTP provides transferring files from GLOBAL-SITE
to target servers (agent free, scalable) - RCE for source control
- No client side software
- Complete, turnkey system (appliance)(adapt from
F5 presentation)
24Intel NetStructure
- Routing based on XML tag (e.g., given preferred
treatment for buyers, large volume) - http//www.intel.com/network/solutions/xml.htm
251. Compared to SUN E450 server
26Simple Web Access Example Step1
- Someone requests a document using a browser (Web
Client) on a computer connected to Internet - On a browser window Type in a URL,
http//news.netcraft.com/archives/web_server_surve
y.html - Equivalent of telnet www.netcraft.co.uk 80 gt
outGET /survey/ HTTP/1.0ltcrgtltcrgt - Here ltcrgt is carriage return entered by
pressing enterkey - The browser parses the URL,
- obtains domain name of url, www.netcraft.co.uk
- asks Domain Name Server (DNS) for translating the
domain name to the IP address - with IP address the client computer set up a HTTP
connection to the server
27Computer Network
- Local Area Network (LAN) a private-owned network
within a single building or campus of up to a few
kilometer in size (Tanenbaum). - Wide Area Network (WAN) a network that spans a
large geographical area, often a country or
continent, and connects LANs or MANs. It consists
of transmission line (called circuits, channels,
or trunks) and switching elements (called
switching nodes, data switching exchanges or
router).
web client
web server
DNS
DNS
28Protocol and Protocol Layer
- A set of rules for achieving a global objective
exercised by geographically distributed nodes.
(Robert Gallager, Prof. EE MIT)
29Protocol Data Encapsulation
30Internet Protocol Layer Interface
31Simple Web Access Example Step2
- Browser sends the following character string to
serverGET /survey/ HTTP/1.0User-agent Mosaic
for X windows/2.4Accept text/plainAccept
text/htmlAccept image/ - httpd server
- parses the request according to HTTP protocol 1.0
- interprets rest of the metainfo for browser
capabilities - Maps the /survey/ to c/InetPub/wwwroot/survey/def
ault.htma file path in its file system according
to server configuration. - retrieves c/InetPub/wwwroot/survey/default.htm
or index.html - sends information back using HTTP/1.0 format
32Simple Web Access Example Step3
- Server replies information using HTTP/1.0 format
- HTTP/1.0 200 Document follows
- Date Tue, 19 Jan 1999 181020 GMT
- Server NCSA/1.5
- Content-type text/html
- lthtmlgt
- ltheadgtlttitlegtNetcraft Web Server
Surveylt/titlegtlt/headgt - Server close file, set certain timeout and wait
for next subsequent requests, such as images/midi
files referenced in the web page. (called
keep-alive connection). When time expires,
disconnect the connection.
33Simple Web Access Example Step3a
- Browser send GET /sample.htm HTTP/1.0
- Server replies
- HTTP/1.0 404 Object Not Found
- Content-Type text/html
- ltbodygtlth1gtHTTP/1.0 404 Object Not Found
- lt/h1gtlt/bodygt
- Server close file, network connection, wait for
next request
34Simple Web Access Example Step4
- Browser receives http response, a web document
with HTML tags, from the server. - Browser parses/processes the HTML document,
display the document content according the tags. - When other images/audio/video data are referenced
by ltimggt ltobjectgt ltappletgt tags, the browser
initiates the retrieval of those data. - Some of them will http requests to the same web
servers. That is the reason why keep-alive
connection improves the web server throughput. - A URL request may trigger many http requests to
several web servers.
35HTTP
- HTTP1.0/1.1http//www.w3.org/Protocols/rfc2068/rf
c2068 - A HTTP request consists of
- method GET, HEAD, POST, PUT, DELETE,
- Universal Resource Identifier (URI)
- Protocol version
- other info to modify or supplement the request
- If-Modified-Since (only return object if it is
newer the date - authorization (user password or other
authentication as required) - accept application/postscript
36HTTP Response
- consists of
- status line (success or failure)
- HTTP/1.1 400 Bad Request200 (Document Follow),
301 (Move Permanently), 302 (Move Temporarily),
304 (Not Modified), 401 (Unauthorized), 402
(payment required), 403 (Forbidden), 404 (Not
Found), 500 (server error) - description of the information (metaheader)
- Server, Date, Content-Length, Content-Type,
Content-Encoded, Last Modified - actual info requested
37Content-Type MIME Type
- MIME Type File
Extension - text/plain txt,
default (most server)
- text/html htm,
html - application/postscript ps
- application/ms-powerpoint ppt
- application/x-javascript js
- image/gif gif
- image/jpeg jpg
- audio/midi mid
- video/mpeg mpg
- x-world/x-vrml wrl
38Configure MIME Types
- For supporting new mime types, both web server
and web client may need to be reconfigured. - For web server,
- Include new mime.type definition in the
mime.types file of the configuration directory of
the web server - By default, most servers deliver unknown type as
text/plainbrowser then may display them as
gibberish - Restart the web server
- For web client,
- Specify external viewer associated with the mime
type - Or, install the plug-in associate with the mime
type
39Brief Survey of Web Servers
- http//www.w3c.org/hypertext/WWW/Servers.html
- Jigsaw, http//www.w3c.org/Jigsaw/
- http//java.sun.com/products/java-servers/
- http//www.yahoo.com/computers_and_Internet/Intern
et/World_Wide_Web/HTTP/Servers - http//www.netcraft.co.uk/Survey/
- Web Server Technologies by Nancy J. Yeager and
Robert E. McGrath, Morgan Kaufmann 1996.
40CGI Script Example
- Client type http//owl.uccs.edu/cgi-bin/chow/uptim
e.pl - or click on ltA HREF http//owl.uccs.edu
/cgi-bin/chow /uptime.plgt Show the load on
owllt/Agt in a web page. - uptime.pl
- !/usr/bin/perl
- UPTIME '/usr/ucb/uptime'
- select(STDOUT) 1 make output
unbufferedprint "Content-type text/html\n\n" - if (-x UPTIME) exec(UPTIME)
- else print "cannot find uptime command on
this system.\n" exit(1)
41CGI Script Example (Step 2)
- Web browser sends GET /cgi-bin/chow/uptime.pl
HTTP/1.0 to owl.uccs.edu - httpd server at owl parses the request and
discovers that a perl script needs to be
executed. - It locates the script in the file system.
- Create the execution environment
- starting a process with appropriate shell
environment variable set - with STDIN from httpd program
- with STDOUT to httpd
42CGI Script Example (Step 3)
- uptime.pl generates
- Content-type text/plain
- 1555 up 18 days, 715, 5 users, load average
0.89, 0.81, 0.79 - It was sent over STDOUT back to httpd
- httpd add
- HTTP/1.0 200 OK
- Server Netscape-Communications/1.1
- Date Tuesday, 27-Jan-98 231245 GMT
- httpd relays the text string back to the web
browser
43What problems can occur?
- How to detect a script running infinite loop?
- How to detect a hung script?
44Handle Multiple Requests
- Cant afford sequential processing, since some
requested documents are big. - Three basic approaches
- 1. Fork a new child process Cloning a copy of
httpd - 2. Use multithread (if the OS or language support
it)e.g., IIS, Java Web Server, Jigsaw - 3. Spread the load among several helper
programse.g., Apache - Apache allows the starting , min, max of child
web server processes to be specified in a
configuration file. It can dynamically adjust to
the load.
45More than One Web Service on the Same Server
Platform
- Run different/same httpd programs on different
ports - http//www.server.org/intro.html (port 80 by
default) - http//www.server.org8080/intro.html (port 8080)
- http//www.server.org8081/intro.html (port 8081)
- They may have different document trees, content,
and access control, and serve different user
groups (customer, sales, authorized) - Note that running program at any port lt 1024
requires root privilege.
46Virtual Hosting
- To allow one server to server requests with
multiple IP addresses. - It is a low cost option for clients that want
own id and cannot afford a separate
machine/connection. - Hosting other domain names on the same machine.
- http//www.a.com/home.html
- http//www.b.com/home.html
- Require OS with virtual host support.
- Assign Multiple IP numbers to the same
interfaceusing the ifconfig command in UNIX or
ipconfig in NT.
47Assign Multiple IP Address to the Same Interface
- On FreeBSD, execute
- ifconfig ep0 192.168.123.2
- ifconfig ep0 192.168.123.3 alias netmask
0XFFFFFFFF - ifconfig ep0 192.168.124.1 alias
- (netmask option is used to suppress error msg)
- On Linux, execute
- ifconfig eth00 192.168.123.3 192.168.124.1
- you may add
- route add -host 192.168.123.3 dev eth00
- route add -host 192.168.124.1 dev eth00
48New Hosting Technique
- Set up virtual machines for each customer
- Related software packages
- User mode Linux
- VMWare ESX and Virtual Center/Infrastructure.
- MS VS 2005
- Utility Computing (On-Demand Computing)
49Improving WWW Delivery Systems
- Currently network is bottleneck.
- The retrieval of web pages can be improved by
- increasing network bandwidth, e.g., ADSL link
- reducing round trip, e.g., use client side
programming to check data with Java/Javascript - caching (both at client and proxy cache server)
- increase and processing power of web servers
- load balancing by partitioning client-server
requests
50Large Web Sites
- Mapping the request, e.g., ftp.netscape.com,
evenly across a set of server, e.g.,
ftp1-28.netscape.com
51CISCO Distributed Director
- Distributed Director uses, the Director Response
Protocol (DRP), a UDP-based application for
querying DRP server agent for BGP and IGP routing
table metrics between distributed servers and
clients, and perform load distribution.
52Internet Caching
- Harvest/SQUID Cache hierarchical, 42 ftp bw
reduction - Client/Proxy Cache. Local Small, 65 bw reduction
- Server Push Cache Gwertzman and Seltzer
(cornell) - Distributed Internet Cache Povey and Harrison
(uq) hierarchical index on tree top, content on
the leave - Cachemesh Wong and Crowcroft (ucl)cache routing
table for reducing search overhead - WebWave Heddaya and Mirdad (bu)Cache on Route,
Tree Load Balancing Load Diffusion - Adaptive Web Caching Zhang, Floyd,
JacobsonSelf-configuration Cache Group,
Multicast.
53Havest/SQUID Object Cache
- Hierarchical Cache Danzig, Hall Schwartz shows
it reduces 42 of FTP traffic. Place Big caches
between regional networks and backbone. Bytehop
as metric - Havest Object Cache manual configurated
hierarchical cache system. Client uses Internet
Cache Protocol (ICP) to (recursively) query
Sibling and Parent caches - NLANR SQUID Object Cache. Internet hierarchical
cache system. Problems - 14 separate Australian branches from US
- CA content sources distribute content through
East Coast root cache, back to CA clients.
54Server Push Cache
- Assume a network with a lot of push cache
servers. - Show server initiated cache (push cache) can be
combined with client cache to be very effective. - Use network topology info and access history to
decide which push cach server to place replica.
55Distributed Internet Cache
- Povey and Harrison (Univ. of Queenland, Brisbane)
- Address hierarchical cache problem
- Hierarchical structure for data searhing
only.with mapping info on non-leave nodes,
content on the leave. - After retrieving a new page, send advertisment up
the tree hierarchy. Non-leave node in the path
store the advertisement (url, cache loc.) in its
table. - Disadvantages increase load on leave caches.
56Cachemesh
- Wong and Crowcroft (University College London).
- Client search cache routing table for cache
location. - A collection of co-operating caches use Cache
Information Exchange Protocol (CIEP) to
add/delete entries to the cache routing table. - Web site as unit for cache table entries
- Collision resolution when multiple cache servers
claim responsibility (based on freq.) for a web
site use random CIEP_ADD/DELETE sending delay. - Realistic metrics to be used for selecting cache
server.
57WebWave
- Heddaya and Mirdad (Boston Univ.)
- No directory lookup or cache search.
- Cache lies along the route to the source.
- Assume cache server can change filter rules in
router to intercept and server the web requests. - Define Optimal Tree Load Balancing (TLB).
- Provide load diffusion algorithms that achieve
TLB. - Only address single tree for now.
58Adaptive Web Cache
- Zhang (UCLA), Floyd, Jacobson (NRG, LBNL)
- New DARPA-funded research project.
- Focus on scalability and self-configuration.
- Self-configuration Cache Group use Cache Group
Management Protocol (CGMP). - IP Multicast delivery.
- Cache server may join multiple cache groups
(select multi-homed hosts as cache server) - Ideally one cache server forward requests to the
source.
59Dynamic Server Selection One candidate
architecture
Server push load status
Client probe response time
client
60Novel Server Selection TechniqueFei et al
(Ammar) GIT-CC-97-24
- Use application layer anycast to select the best
geographically separated web servers. - Server push (server load status) to resolver.
- Only push when load change over threshold.
- Client (resolver) probe (response time of the
server) - Retrieve fixed size document in each server.
- Avoid oscillation by returning one server from a
set of equivalent servers. - Investigate the impact of push/probe frequency on
response time.
61Application-layer Anycast Architecture
62Experimental Topology
63Performance of Server Location Scheme
64Response Time Varying with Push and Probe
Frequency
Server push twice/min Client Probe once/6min
Server push 12 times/min Client probe once/10min
vs.
65Dynamic Server Selection vs.Load Balancing in
Servers
- In Fei et als work, after every client chooses
the lightest server, it becomes the heavy loaded
server. - Next round, every client swings to next lightest
server and results in oscillation in server
selection. - How to damp the oscillation
- Anycast resolvers return a set of good servers
- A threshold is used to add/delete good server set
- User response time vs. System throughputDynamic
server Load Balancing
selection in Servers
66WAN Load Balancing Architecture
LBed Server
Probes
Performance Update
Server Pushes (multicast)
LBCoord. Protocol
client/server comm.
67WAN Load Balancing Architecture-2
Server
Probes
LBCoord. Protocol
client/server comm.
68WAN Load Balancing Architecture-3
LB Query/Response
69Functions of LB coordinator
- Collect server load, network status, and traffic
status from probing and traffic control module - Share the server and traffic status with other LB
coordinators via LB coordinating protocol - Run load balancing algorithm that
- directs the client requests (macro control)
- dynamically regulates the client-server traffic
(micro control) - Control the probing frequency of probing module
- Regulate the traffic of client-server
communication
70Status Collection in WLB system
- Passive traffic monitoring on client-server comm.
- Server load report from other LB coordinators
- Active probing on server and network loads when
there is no traffic status reports - Research issues
- traffic monitoring system design
- efficiency, accuracy, coordination of probing
system - derive server and network load from traffic data
71Traffic Control in WLB System
- Admission control (Macro level Control)
- Estimate the load of the requests
- Direct the requests
- Taffic grooming/shapping (Micro level control)
- At what protocol level (TCP, IP?)
- At which module/interface (Router?
Layer4/Content/Web switch)
72Important Web Sites
- http//www.w3c.org/
- http//developer.netscape.com/
- http//java.sun.com/
- http//www.microsoft.com/workshop/default.asp
- http//www.apache.org/
- http//www.netcraft.co.uk/Survey/
- http//web.mit.edu/afs/athena/user/w/s/wsmart/WEB/
HTMLtutor.html - ...
73Useful References
- Oreilys Web series
- HTML, CGI, Dynamic HTML, Programming Perl
- Web Server Technologies by Nancy J. Yeager and
Robert E. McGrath, Morgan Kaufmann 1996. - HTMLCGI
- World Wide Web Beyond the Basics, edit by Marc
Abrams, Prentice Hall, 1998 - MS Technical Support for IIS, self learning
manual. - How to setup and maintain a web sites, L. Stein.
- Web Server Tuning