Survey on Web Server Clusters - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Survey on Web Server Clusters

Description:

cache affinity(locality) based scheduling [asplos98-LARD] ... Little benefit of LARD on persistent connections. Scheduling granularity is still per connection ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 29
Provided by: cal54
Category:

less

Transcript and Presenter's Notes

Title: Survey on Web Server Clusters


1
  • Survey on Web Server Clusters

2
Web Server Architecture
3
C10K Problems?
  • http//www.kegel.com/c10k.html (Dan Kegel,
    California Institute of Tech.)
  • 3,000 Web server
  • 500MHz CPU, 1GB RAM, six 100Mbps Ethernet cards
  • 0.3, 50KHz, 100 KB, 60 Kbps per client at 10,000
    clients
  • Hardware is no longer the bottleneck
  • Sufficient to deliver 4 KB data per second to
    each of 10,000 clients.
  • However, current most web servers lt 1,000
    concurrent requests/sec
  • How to configure OS and write code to support
    C10K?
  • Serve many clients with each server thread, and
    use non-blocking I/O
  • Serve one client with each server thread
  • Serve many clients with each server thread, and
    use asynchronous I/O
  • Build the server code into the kernel
  • kernel scalability, limits on open file handles
    and threads

4
Single-Process Event-Driven (SPED)
  • Examples Zeus server, Harvest/Squid proxy
  • Similar to a state machine
  • Non-blocking system calls for asynchronous I/O
    operations
  • to avoid context switching and thread
    synchronization overheads
  • to overlap CPU, disk and network operations to
    serve many requests
  • Good performance for workloads cached in the
    server main memory
  • Drawbacks
  • Non-blocking reads may actually block for disk
    files
  • Select calls waste CPU time when serving a lot of
    concurrent connections.

5
SPED Overhead Measurement usenix98-banga
  • Under real workload, SPED has to manage a lot of
    open concurrent connections due to large WAN RTTs
  • Traditional UNIX kernel features (select,
    fdalloc) do not scales
  • Up to 53 of CPU time sleect() related routines
  • Up to 11 of CPU time user process collating
    information from the bitmaps returned by select()

select() costs measured on a Squid proxy server
running on AlphaStation with 400MHz 21164 CPU,
192 MB memory and Digital Unix 4.0B
6
Multi-Process/Multi-Threads (MP/MT)
  • Example Apache server
  • Assign a process to execute sequential steps to
    serve a client request
  • overlap CPU, disk, and network operations by
    context switching
  • Drawbacks
  • context switching and synchronization overhead
  • difficult for optimization on global information
    (e.g, shared URLs cache)

7
Asymmetric Multi-Process Event-Driven(AMPED)
  • Example Flash
  • The main SPED process handles all processing
    steps
  • Non-blocking read, write, and accept system calls
    on sockets and pipes
  • Helper processes to perform potentially blocking
    operations
  • instructed by the main process via IPC when a
    disk operation is necessary
  • mincore system call to check if a file is in main
    memory cache
  • mmap system call to access data from the file
    system

8
Performance of Web Server Architectures
  • Performance Evaluation usenix99-vivekon a
    333MHz Pen-II server (128 MB, multiple 100Mbps,
    FreeBSD 2.2.6)

Synthetic workload repetitive single file read
test
Real workload
9
Cluster-based Web Servers
10
Changes in Web Server Environmentswith the
Explosive Growth of World Wide Web
  • Bursts as well as perpetual increase in client
    requests
  • Variation of content
  • Small static files -gt graphics or multimedia
    content of various size
  • Increase of dynamic content
  • due to new Web-based applications such as
    E-commerce
  • one or two order of magnitude larger resource
    usage than static file
  • HTTP/1.1 protocol is prevalent
  • Persistent connections, request pipelining

11
Demand on scalable system
  • Distributed or Cluster-based Network Servers
  • Clustered Web server is a viable approach
  • cost effective and scalable
  • Burst requests and variation of content
  • may cause skewed utilization among the servers
    within a cluster
  • Importance of load distribution
  • In order to actually attain scalable high
    performance
  • Distribute the requests to the servers best
    suited to respond

12
Request Distribution (Load Balancing) Strategies
  • Replication across mirrored servers
  • not user-transparent as well as not controllable
  • Load-balancing over a distributed Web server
    system
  • Client-based approach
  • DNS-based approach
  • Dispatcher-based approach
  • Server-based approach

13
Client-based Approaches
  • Examples
  • Web-clients based on random select (Netscape
    Navigator)
  • Smart clients
  • Client-side proxies
  • Limited applicability due to
  • Porting issues to client(Java applet) or network
    proxy
  • Overhead message exchange with servers (load
    state, network delays)

14
DNS-based Approaches
  • Cluster DNS
  • translate(URL-to-IP) and specify TTL values
  • distribution policies constant or adaptive TTL
    algorithms
  • more user-transparent than client-based approach
  • limited control due to address caching in
    intermediate name servers

15
Dispatcher-based Approaches
  • Dispatcher (IP-Sprayer)
  • packet single/double-rewriting, packet
    forwarding, HTTP redirect
  • Examples
  • Magic Router, Cisco LocalDirector/DistributedDirec
    tor, IBM NetDispatcher, Linux Virtual Server
    (LVS)
  • user-transparent at IP-SVA level fine-grained
    control
  • dispatcher bottleneck packet rewriting overhead

16
Dispatcher-based Approaches L4 Switch
  • TCP router mechanism
  • Clients know only the IP address of L4 switch
  • All client requests reach the front-end
  • Packet forwarding

Front-end L4 switch
Back-end servers
TCP router
  • Scheduling granularity connection
  • content-blind

Client
selects a server
  • Scheduling algorithms
  • RR, LC,
  • WLC, WRR
  • schedules the connections in proportion to each
    servers weight
  • weight the excess capacity available for new
    connections

17
L4 Switch
  • WRR Scheduling Algorithm

/ initially index 0, L (S0,S1,...,Sn-1)
/ While (1) i index if (i 0)
cw-- if (cw lt 0) cw max of
W(Sn) if (cw 0) return NULL if
(W(Si) gt cw) index (i 1) n
return Si else index (i1)
n
18
L4 Switch
  • WRR weight estimation

user
Server 1
Collector
Server 2
TCP router
Server n
kernel
  • Pros
  • high routing throughput
  • Cons
  • Content-blind scheduling
  • Difficult to compute weights accurately
  • insufficient load information, load information
    feedback delay overhead
  • Dynamic load imbalance across servers

19
Content-aware Routers L5/L7 Switch
  • Main idea
  • Dispatch a request to the server best suited to
    respond by inspecting URI

Front-end
Back-end servers
  • Scheduling granularity connection
  • but, content-aware scheduling

dispatcher (httpd)
Handoff
Client
TCP/IP
  • Benefits
  • Performance improvement of back-end servers
  • cache affinity(locality) based scheduling
    asplos98-LARD
  • Easy to specialize the back-ends for certain
    types of content or services

20
Content-aware Routers
  • Routing Mechanisms

21
Content-aware Routers
  • Locality-Aware Request Distribution
    asplos98-vivek

22
Content-aware Routers
  • Drawbacks
  • Little benefit of LARD on persistent connections
  • Scheduling granularity is still per connection
  • Routing throughput degradation
  • Overhead due to L7 processing front-end
    bottleneck low scalability

Throughput (13 KB file)
23
Content-aware Routers
  • Harvard Array of Cluster Computer
    usenixNT99-zhang
  • HACC services the requests at the server
    containing the desired contentas long as total
    load does not exceed the server capacity
  • If exceeds, it replicates the content into
    another server
  • load weight cpu_load(dynamic page generation)
    weight storage_load)

24
Server-based Approaches
  • DNS Redirection (HTTP redirection, distributed
    packet rewriting)
  • policies based on server capability load state,
    network proximity
  • distributed fine-grained control, extensible
    solution (LAN, WAN)
  • affected by addr- mapping cache (client,
    intermediate name servers)
  • latency(HTTP redirection)
  • packet rewriting overhead (distributed packet
    rewriting)

25
Server-based Approaches
  • TranSend Proxy within a university or a company
    sosp97-fox
  • transformation (distillation, filtering, format
    conversion)
  • aggregation (search, collect collate data from
    various sources)
  • caching (original or transformed contents)
  • customization (maintaining user preference
    database)

26
Web Server Accelerator (1)
  • Factors limiting the Web server performance
  • Coping the requested data several times across
    layers of software
  • Overheads such as OS scheduling and interrupt
    processing
  • Main idea
  • Servicing frequently requested pages from caches
    in front of a Web site
  • static pages, dynamic pages, or both

27
Web Server Accelerator (2)
  • IBM WSA (Olympic Winter Games 98 Web site)
  • Front-end (embedded OS and TCP/IP stacks
    optimization)
  • reduces scheduler and interrupt processing
    overheads and duplicated copy
  • maintains persistent TCP connections with
    back-end servers
  • caches static as well as dynamic pages
  • Routing throughput measured on a PowerPC-200MHz
  • TCP-router(15K http requests/sec), Content-router
    (9.8K http requests/sec)

client
TCP router
TCP handoff (for large page)
hit
Cache accel.
Cache accel.
miss
UDP (for small page lt 2KB)
Back-end
Back-end
28
References
  • ieeeIC99-cardellini V. Cardellini, M.
    Colajanni, and P. S. Yu. Dynamic Load Balancing
    on Web Server Systems. IEEE Internet Computing.
    May 1999
  • usenix98-Banga Gaurav Banga and Jeff Mogul,
    Scalable Kernel Performance for Internet Servers
    Under Realistic Loads, in the Proceedings of
    USENIX 1998 Technical Conference
  • asplos98-Vivek Vivek Pai, Gaurav Banga, Mohit
    Aron, Michael Svendsen, Peter Druschel, Willy
    Zwaenepoel, and Eric Nahum, Locality-Aware
    Request Distribution in Cluster-based Network
    Servers, in the Proceedings of the 8th
    International Conference on Architectural Support
    for Programming Languages and Operating Systems,
    Oct 1998.
  • usenix99-Vivek Flash An Efficient and Portable
    Web Server. Vivek Pai, Peter Druschel and Willy
    Zwaenepoel. In the Proceedings of the USENIX 1999
    Annual Technical Conference, June 1999.
  • www7-hunt D. H. Hunt, G. S Goldszmidt, R. King,
    and R. Mukherjee. Network Dispatcher A
    Connection Router for Scalable Internet Services.
    Proceedings of the 7th World Wide Web Conference
  • sosp97-fox Armando Fox, Steven Gribble, Yatin
    Chawathe, Eric Brewer, and Paul Gauthier.
    Cluster-based Scalable Network Services . Proc.
    of the 16th ACM Symp. on Operating Systems
    Principles, October 1997
  • usenixNT-zhang X. Zhang, M. Barrientos, J. B.
    Chen, and M. Seltzer. HACC An Architecture for
    Cluster-based Web Servers. Proceedings Of the 3rd
    USENIX Windows NT Symposium. July 1999.
  • wwwc99-song J. Song, E. Levy, A. Iyengar, and
    D. Dias, A Scalable and Highly Available Web
    Server Accelerator'' in Proc. of World Wide Web
    Conference, April, 1999.
  • infocom00-apostolopoulos G. Apostolopoulos, et
    al. Design, Implementation and Performance of a
    Content-Based Switch. INFOCOM 2000
Write a Comment
User Comments (0)
About PowerShow.com