Advanced Internet and Web Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced Internet and Web Systems

Description:

Hyperlink: Use Hypertext Markup Language HTML to describe the document ... browser then may display them as 'gibberish' Restart the web server. For web client, ... – PowerPoint PPT presentation

Number of Views:353
Avg rating:3.0/5.0
Slides: 74
Provided by: rudolp9
Learn more at: http://cs.uccs.edu
Category:

less

Transcript and Presenter's Notes

Title: Advanced Internet and Web Systems


1
Advanced Internet and Web Systems
  • C. Edward Chow

2
Outline of the Talk
  • Syllabus
  • Introduction to WWW Systems
  • Survey of Web Cluster Systems
  • Survey of Caching Techniques
  • Server Selection and Load Balancing

3
Introduction to WWW Systems
Web Server Hosting web pages
Retrieving web pages using HTTP protocol
Web Authoring System create web pages
Internet
Web Client Browser
Publish web pages
Scanner
Video capture
Sound card
Web page document written in HTML
4
What is Unique in WWW?
  • Hyperlink Use Hypertext Markup Language HTML to
    describe the document in ASCII text (extended to
    iso-8859-1)
  • Naming scheme Name object in the web with
    Universal Resource Locator (URL) with
    syntaxprotocol//domain_name/lturi or path namegt
  • HTTP HyperText Transfer Protocola simple
    request-response protocol for transferring HTML
    documents
  • ASCII text based (not binary, therefore easy to
    debug)

5
Web Authoring System
  • Text Editor type in HTML lttaggt and content
  • HTML Editor like normal word processor, user did
    not have know a lot about HTML syntax. Provide
    easy upload/download function.
  • Dreamweaver
  • Netscape Page Composer, MS Front Page
  • Front Page takes another step by providing
    templates and hyperlink management functions
  • Most desktop publishing software and word
    processor have built-in converters to convert
    from their internal format to HTML format. For
    example
  • FrameMaker, Office97(require special viewer)

6
Web Delivery Systems
  • Delivery web documents efficiently and reliably
    to the web clients.
  • Content Distribution and Content Delivery
  • Performance is decided by
  • Web server performance
  • Network path performance
  • Client browser performance.
  • Use multiple physical servers (server farm), and
    multiple server farms in wide area.
  • New generation of proxy servers/content switches
    emerge.

7
Content Delivery Network (CDN)
Slow Response
Huge Requests
_at_Home
Clients
PSINet
Server Crash
MindSpring
Clients
8
Content Delivery Problems
http//www.akamai.com
9
Use Client Cache/Client Side Cache Server
Fewer Requests
Clients
_at_Home
PSINet
Fast Response
Sprint
UUnet
Client Cache
Gloobix
MindSpring
Client Side Cache Server
Clients
Clients
10
Use Mirror Sites
Need improvement by guiding the selection of
mirror servers with server load/network bandwidth
measurement
Mirror Site
Clients
_at_Home
PSINet
Clients
MindSpring
Fast Response
Clients
11
Edge Network Cache Servers
Fast Response
Clients
_at_Home
PSINet
Client Cache
MindSpring
Edge Network Cache Server
Client Side Cache Server
Clients
Clients
12
Architecture solutions for scalable Web-server
systems (Fig. 1)
13
Fig. 2. Model architecture for a locally
distributed Web system
14
Fig. 3. Architecture of a cluster-based Web system
15
Fig. 4. Architecture of a virtual Web cluster
16
Fig. 5. Architecture of a distributed Web system
17
Content Distribution
  • Secure, automate content/application distribution
    to single (multiple server)/wide area Internet
    sites.
  • Provide replication, synchronization, staged
    rollout and roll back.
  • With revision control, transmit only updates.
  • User-defined file distribution profiles/rules

18
Content Delivery Problem
  • Cache Location Problem Where to put cache
    servers?
  • How many are needed?
  • When/where/how to push/delivery the content?
  • How about dynamic content?

19
Akamai Edge Delivery Service
Date of Edge Servers of Networks of Countries
11/2000 6000 335 54
6/2001 9700 650 56
  • Peering Bottleneck Problem Access traffic
    evenly spread over 7400 networks (no one over
    5 most ltlt 1)? Need to put edge servers in
    many networks.
  • 11/2000, 4 billion bits/day for 2800 sites.
  • Source Http//www.akamai.com

20
F5 Web System Product
21
BIG/ip - Delivers High Availability
  • E-commerce - ensures sites are not only
    up-and-running, but taking orders
  • Fault-tolerance - eliminates single points of
    failure
  • Content Availability - verifies servers are
    responding with the correct content
  • Directory Authentication - load balance
    multiple directory and/or authentication services
    (LDAP, Radius, and NDS)
  • Portals/Search Engines Using EAV administrators
    perform key-word searches
  • Legacy Systems - Load balance services to
    multiple interactive services
  • Gateways Load balance gateways (SAA, SNA, etc.)
  • E-mail (POP, IMAP, SendMail) - Balances traffic
    across a large number of mail servers

22
3DNS Intelligent Load Balancing
  • Intelligent Load Balancing
  • QoS Load Balancing
  • Quality of Service load balancing is the ability
    to select apply different load balancing methods
    for different users or request types
  • Modes of Load Balancing
  • Round Robin Ratio
  • Least Connections Random
  • User-defined Quality-of-Service Round Trip Time
  • Completion Rate (Packet Loss) BIG/ip Packet Rate
  • Global Availability HOPS
  • Topology Distribution Access Control
  • LDNS Round Robin Dynamic Ratio
  • E-Commerce

23
GLOBAL-SITE Replicate Multiple Servers and Sites
  • File archiving engine and scheduler for automated
    site and server replication
  • BIG-IP controls server availability during
    replication and synchronization
  • Gracefully shutdown for update
  • update in group/scheduled manner
  • FTP provides transferring files from GLOBAL-SITE
    to target servers (agent free, scalable)
  • RCE for source control
  • No client side software
  • Complete, turnkey system (appliance)(adapt from
    F5 presentation)

24
Intel NetStructure
  • Routing based on XML tag (e.g., given preferred
    treatment for buyers, large volume)
  • http//www.intel.com/network/solutions/xml.htm

25
1. Compared to SUN E450 server
26
Simple Web Access Example Step1
  • Someone requests a document using a browser (Web
    Client) on a computer connected to Internet
  • On a browser window Type in a URL,
    http//news.netcraft.com/archives/web_server_surve
    y.html
  • Equivalent of telnet www.netcraft.co.uk 80 gt
    outGET /survey/ HTTP/1.0ltcrgtltcrgt
  • Here ltcrgt is carriage return entered by
    pressing enterkey
  • The browser parses the URL,
  • obtains domain name of url, www.netcraft.co.uk
  • asks Domain Name Server (DNS) for translating the
    domain name to the IP address
  • with IP address the client computer set up a HTTP
    connection to the server

27
Computer Network
  • Local Area Network (LAN) a private-owned network
    within a single building or campus of up to a few
    kilometer in size (Tanenbaum).
  • Wide Area Network (WAN) a network that spans a
    large geographical area, often a country or
    continent, and connects LANs or MANs. It consists
    of transmission line (called circuits, channels,
    or trunks) and switching elements (called
    switching nodes, data switching exchanges or
    router).

web client
web server
DNS
DNS
28
Protocol and Protocol Layer
  • A set of rules for achieving a global objective
    exercised by geographically distributed nodes.
    (Robert Gallager, Prof. EE MIT)

29
Protocol Data Encapsulation
30
Internet Protocol Layer Interface
31
Simple Web Access Example Step2
  • Browser sends the following character string to
    serverGET /survey/ HTTP/1.0User-agent Mosaic
    for X windows/2.4Accept text/plainAccept
    text/htmlAccept image/
  • httpd server
  • parses the request according to HTTP protocol 1.0
  • interprets rest of the metainfo for browser
    capabilities
  • Maps the /survey/ to c/InetPub/wwwroot/survey/def
    ault.htma file path in its file system according
    to server configuration.
  • retrieves c/InetPub/wwwroot/survey/default.htm
    or index.html
  • sends information back using HTTP/1.0 format

32
Simple Web Access Example Step3
  • Server replies information using HTTP/1.0 format
  • HTTP/1.0 200 Document follows
  • Date Tue, 19 Jan 1999 181020 GMT
  • Server NCSA/1.5
  • Content-type text/html
  • lthtmlgt
  • ltheadgtlttitlegtNetcraft Web Server
    Surveylt/titlegtlt/headgt
  • Server close file, set certain timeout and wait
    for next subsequent requests, such as images/midi
    files referenced in the web page. (called
    keep-alive connection). When time expires,
    disconnect the connection.

33
Simple Web Access Example Step3a
  • Browser send GET /sample.htm HTTP/1.0
  • Server replies
  • HTTP/1.0 404 Object Not Found
  • Content-Type text/html
  • ltbodygtlth1gtHTTP/1.0 404 Object Not Found
  • lt/h1gtlt/bodygt
  • Server close file, network connection, wait for
    next request

34
Simple Web Access Example Step4
  • Browser receives http response, a web document
    with HTML tags, from the server.
  • Browser parses/processes the HTML document,
    display the document content according the tags.
  • When other images/audio/video data are referenced
    by ltimggt ltobjectgt ltappletgt tags, the browser
    initiates the retrieval of those data.
  • Some of them will http requests to the same web
    servers. That is the reason why keep-alive
    connection improves the web server throughput.
  • A URL request may trigger many http requests to
    several web servers.

35
HTTP
  • HTTP1.0/1.1http//www.w3.org/Protocols/rfc2068/rf
    c2068
  • A HTTP request consists of
  • method GET, HEAD, POST, PUT, DELETE,
  • Universal Resource Identifier (URI)
  • Protocol version
  • other info to modify or supplement the request
  • If-Modified-Since (only return object if it is
    newer the date
  • authorization (user password or other
    authentication as required)
  • accept application/postscript

36
HTTP Response
  • consists of
  • status line (success or failure)
  • HTTP/1.1 400 Bad Request200 (Document Follow),
    301 (Move Permanently), 302 (Move Temporarily),
    304 (Not Modified), 401 (Unauthorized), 402
    (payment required), 403 (Forbidden), 404 (Not
    Found), 500 (server error)
  • description of the information (metaheader)
  • Server, Date, Content-Length, Content-Type,
    Content-Encoded, Last Modified
  • actual info requested

37
Content-Type MIME Type
  • MIME Type File
    Extension
  • text/plain txt,
    default (most server)
  • text/html htm,
    html
  • application/postscript ps
  • application/ms-powerpoint ppt
  • application/x-javascript js
  • image/gif gif
  • image/jpeg jpg
  • audio/midi mid
  • video/mpeg mpg
  • x-world/x-vrml wrl

38
Configure MIME Types
  • For supporting new mime types, both web server
    and web client may need to be reconfigured.
  • For web server,
  • Include new mime.type definition in the
    mime.types file of the configuration directory of
    the web server
  • By default, most servers deliver unknown type as
    text/plainbrowser then may display them as
    gibberish
  • Restart the web server
  • For web client,
  • Specify external viewer associated with the mime
    type
  • Or, install the plug-in associate with the mime
    type

39
Brief Survey of Web Servers
  • http//www.w3c.org/hypertext/WWW/Servers.html
  • Jigsaw, http//www.w3c.org/Jigsaw/
  • http//java.sun.com/products/java-servers/
  • http//www.yahoo.com/computers_and_Internet/Intern
    et/World_Wide_Web/HTTP/Servers
  • http//www.netcraft.co.uk/Survey/
  • Web Server Technologies by Nancy J. Yeager and
    Robert E. McGrath, Morgan Kaufmann 1996.

40
CGI Script Example
  • Client type http//owl.uccs.edu/cgi-bin/chow/uptim
    e.pl
  • or click on ltA HREF http//owl.uccs.edu
    /cgi-bin/chow /uptime.plgt Show the load on
    owllt/Agt in a web page.
  • uptime.pl
  • !/usr/bin/perl
  • UPTIME '/usr/ucb/uptime'
  • select(STDOUT) 1 make output
    unbufferedprint "Content-type text/html\n\n"
  • if (-x UPTIME) exec(UPTIME)
  • else print "cannot find uptime command on
    this system.\n" exit(1)

41
CGI Script Example (Step 2)
  • Web browser sends GET /cgi-bin/chow/uptime.pl
    HTTP/1.0 to owl.uccs.edu
  • httpd server at owl parses the request and
    discovers that a perl script needs to be
    executed.
  • It locates the script in the file system.
  • Create the execution environment
  • starting a process with appropriate shell
    environment variable set
  • with STDIN from httpd program
  • with STDOUT to httpd

42
CGI Script Example (Step 3)
  • uptime.pl generates
  • Content-type text/plain
  • 1555 up 18 days, 715, 5 users, load average
    0.89, 0.81, 0.79
  • It was sent over STDOUT back to httpd
  • httpd add
  • HTTP/1.0 200 OK
  • Server Netscape-Communications/1.1
  • Date Tuesday, 27-Jan-98 231245 GMT
  • httpd relays the text string back to the web
    browser

43
What problems can occur?
  • How to detect a script running infinite loop?
  • How to detect a hung script?

44
Handle Multiple Requests
  • Cant afford sequential processing, since some
    requested documents are big.
  • Three basic approaches
  • 1. Fork a new child process Cloning a copy of
    httpd
  • 2. Use multithread (if the OS or language support
    it)e.g., IIS, Java Web Server, Jigsaw
  • 3. Spread the load among several helper
    programse.g., Apache
  • Apache allows the starting , min, max of child
    web server processes to be specified in a
    configuration file. It can dynamically adjust to
    the load.

45
More than One Web Service on the Same Server
Platform
  • Run different/same httpd programs on different
    ports
  • http//www.server.org/intro.html (port 80 by
    default)
  • http//www.server.org8080/intro.html (port 8080)
  • http//www.server.org8081/intro.html (port 8081)
  • They may have different document trees, content,
    and access control, and serve different user
    groups (customer, sales, authorized)
  • Note that running program at any port lt 1024
    requires root privilege.

46
Virtual Hosting
  • To allow one server to server requests with
    multiple IP addresses.
  • It is a low cost option for clients that want
    own id and cannot afford a separate
    machine/connection.
  • Hosting other domain names on the same machine.
  • http//www.a.com/home.html
  • http//www.b.com/home.html
  • Require OS with virtual host support.
  • Assign Multiple IP numbers to the same
    interfaceusing the ifconfig command in UNIX or
    ipconfig in NT.

47
Assign Multiple IP Address to the Same Interface
  • On FreeBSD, execute
  • ifconfig ep0 192.168.123.2
  • ifconfig ep0 192.168.123.3 alias netmask
    0XFFFFFFFF
  • ifconfig ep0 192.168.124.1 alias
  • (netmask option is used to suppress error msg)
  • On Linux, execute
  • ifconfig eth00 192.168.123.3 192.168.124.1
  • you may add
  • route add -host 192.168.123.3 dev eth00
  • route add -host 192.168.124.1 dev eth00

48
New Hosting Technique
  • Set up virtual machines for each customer
  • Related software packages
  • User mode Linux
  • VMWare ESX and Virtual Center/Infrastructure.
  • MS VS 2005
  • Utility Computing (On-Demand Computing)

49
Improving WWW Delivery Systems
  • Currently network is bottleneck.
  • The retrieval of web pages can be improved by
  • increasing network bandwidth, e.g., ADSL link
  • reducing round trip, e.g., use client side
    programming to check data with Java/Javascript
  • caching (both at client and proxy cache server)
  • increase and processing power of web servers
  • load balancing by partitioning client-server
    requests

50
Large Web Sites
  • Mapping the request, e.g., ftp.netscape.com,
    evenly across a set of server, e.g.,
    ftp1-28.netscape.com

51
CISCO Distributed Director
  • Distributed Director uses, the Director Response
    Protocol (DRP), a UDP-based application for
    querying DRP server agent for BGP and IGP routing
    table metrics between distributed servers and
    clients, and perform load distribution.

52
Internet Caching
  • Harvest/SQUID Cache hierarchical, 42 ftp bw
    reduction
  • Client/Proxy Cache. Local Small, 65 bw reduction
  • Server Push Cache Gwertzman and Seltzer
    (cornell)
  • Distributed Internet Cache Povey and Harrison
    (uq) hierarchical index on tree top, content on
    the leave
  • Cachemesh Wong and Crowcroft (ucl)cache routing
    table for reducing search overhead
  • WebWave Heddaya and Mirdad (bu)Cache on Route,
    Tree Load Balancing Load Diffusion
  • Adaptive Web Caching Zhang, Floyd,
    JacobsonSelf-configuration Cache Group,
    Multicast.

53
Havest/SQUID Object Cache
  • Hierarchical Cache Danzig, Hall Schwartz shows
    it reduces 42 of FTP traffic. Place Big caches
    between regional networks and backbone. Bytehop
    as metric
  • Havest Object Cache manual configurated
    hierarchical cache system. Client uses Internet
    Cache Protocol (ICP) to (recursively) query
    Sibling and Parent caches
  • NLANR SQUID Object Cache. Internet hierarchical
    cache system. Problems
  • 14 separate Australian branches from US
  • CA content sources distribute content through
    East Coast root cache, back to CA clients.

54
Server Push Cache
  • Assume a network with a lot of push cache
    servers.
  • Show server initiated cache (push cache) can be
    combined with client cache to be very effective.
  • Use network topology info and access history to
    decide which push cach server to place replica.

55
Distributed Internet Cache
  • Povey and Harrison (Univ. of Queenland, Brisbane)
  • Address hierarchical cache problem
  • Hierarchical structure for data searhing
    only.with mapping info on non-leave nodes,
    content on the leave.
  • After retrieving a new page, send advertisment up
    the tree hierarchy. Non-leave node in the path
    store the advertisement (url, cache loc.) in its
    table.
  • Disadvantages increase load on leave caches.

56
Cachemesh
  • Wong and Crowcroft (University College London).
  • Client search cache routing table for cache
    location.
  • A collection of co-operating caches use Cache
    Information Exchange Protocol (CIEP) to
    add/delete entries to the cache routing table.
  • Web site as unit for cache table entries
  • Collision resolution when multiple cache servers
    claim responsibility (based on freq.) for a web
    site use random CIEP_ADD/DELETE sending delay.
  • Realistic metrics to be used for selecting cache
    server.

57
WebWave
  • Heddaya and Mirdad (Boston Univ.)
  • No directory lookup or cache search.
  • Cache lies along the route to the source.
  • Assume cache server can change filter rules in
    router to intercept and server the web requests.
  • Define Optimal Tree Load Balancing (TLB).
  • Provide load diffusion algorithms that achieve
    TLB.
  • Only address single tree for now.

58
Adaptive Web Cache
  • Zhang (UCLA), Floyd, Jacobson (NRG, LBNL)
  • New DARPA-funded research project.
  • Focus on scalability and self-configuration.
  • Self-configuration Cache Group use Cache Group
    Management Protocol (CGMP).
  • IP Multicast delivery.
  • Cache server may join multiple cache groups
    (select multi-homed hosts as cache server)
  • Ideally one cache server forward requests to the
    source.

59
Dynamic Server Selection One candidate
architecture
Server push load status
Client probe response time
client
60
Novel Server Selection TechniqueFei et al
(Ammar) GIT-CC-97-24
  • Use application layer anycast to select the best
    geographically separated web servers.
  • Server push (server load status) to resolver.
  • Only push when load change over threshold.
  • Client (resolver) probe (response time of the
    server)
  • Retrieve fixed size document in each server.
  • Avoid oscillation by returning one server from a
    set of equivalent servers.
  • Investigate the impact of push/probe frequency on
    response time.

61
Application-layer Anycast Architecture
62
Experimental Topology
63
Performance of Server Location Scheme
64
Response Time Varying with Push and Probe
Frequency
Server push twice/min Client Probe once/6min
Server push 12 times/min Client probe once/10min
vs.
65
Dynamic Server Selection vs.Load Balancing in
Servers
  • In Fei et als work, after every client chooses
    the lightest server, it becomes the heavy loaded
    server.
  • Next round, every client swings to next lightest
    server and results in oscillation in server
    selection.
  • How to damp the oscillation
  • Anycast resolvers return a set of good servers
  • A threshold is used to add/delete good server set
  • User response time vs. System throughputDynamic
    server Load Balancing
    selection in Servers

66
WAN Load Balancing Architecture
LBed Server
Probes
Performance Update
Server Pushes (multicast)
LBCoord. Protocol
client/server comm.
67
WAN Load Balancing Architecture-2
Server
Probes
LBCoord. Protocol
client/server comm.
68
WAN Load Balancing Architecture-3
LB Query/Response
69
Functions of LB coordinator
  • Collect server load, network status, and traffic
    status from probing and traffic control module
  • Share the server and traffic status with other LB
    coordinators via LB coordinating protocol
  • Run load balancing algorithm that
  • directs the client requests (macro control)
  • dynamically regulates the client-server traffic
    (micro control)
  • Control the probing frequency of probing module
  • Regulate the traffic of client-server
    communication

70
Status Collection in WLB system
  • Passive traffic monitoring on client-server comm.
  • Server load report from other LB coordinators
  • Active probing on server and network loads when
    there is no traffic status reports
  • Research issues
  • traffic monitoring system design
  • efficiency, accuracy, coordination of probing
    system
  • derive server and network load from traffic data

71
Traffic Control in WLB System
  • Admission control (Macro level Control)
  • Estimate the load of the requests
  • Direct the requests
  • Taffic grooming/shapping (Micro level control)
  • At what protocol level (TCP, IP?)
  • At which module/interface (Router?
    Layer4/Content/Web switch)

72
Important Web Sites
  • http//www.w3c.org/
  • http//developer.netscape.com/
  • http//java.sun.com/
  • http//www.microsoft.com/workshop/default.asp
  • http//www.apache.org/
  • http//www.netcraft.co.uk/Survey/
  • http//web.mit.edu/afs/athena/user/w/s/wsmart/WEB/
    HTMLtutor.html
  • ...

73
Useful References
  • Oreilys Web series
  • HTML, CGI, Dynamic HTML, Programming Perl
  • Web Server Technologies by Nancy J. Yeager and
    Robert E. McGrath, Morgan Kaufmann 1996.
  • HTMLCGI
  • World Wide Web Beyond the Basics, edit by Marc
    Abrams, Prentice Hall, 1998
  • MS Technical Support for IIS, self learning
    manual.
  • How to setup and maintain a web sites, L. Stein.
  • Web Server Tuning
Write a Comment
User Comments (0)
About PowerShow.com