Survey on Web Server Clusters - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Survey on Web Server Clusters

Description:

cache affinity(locality) based scheduling [asplos98-LARD] ... Little benefit of LARD on persistent connections. Scheduling granularity is still per connection ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 29

Provided by: cal54

Category:

more less

Transcript and Presenter's Notes

Title: Survey on Web Server Clusters

1

Survey on Web Server Clusters

2
Web Server Architecture
3
C10K Problems?

http//www.kegel.com/c10k.html (Dan Kegel,
California Institute of Tech.)
3,000 Web server
500MHz CPU, 1GB RAM, six 100Mbps Ethernet cards
0.3, 50KHz, 100 KB, 60 Kbps per client at 10,000
clients
Hardware is no longer the bottleneck
Sufficient to deliver 4 KB data per second to
each of 10,000 clients.
However, current most web servers lt 1,000
concurrent requests/sec
How to configure OS and write code to support
C10K?
Serve many clients with each server thread, and
use non-blocking I/O
Serve one client with each server thread
Serve many clients with each server thread, and
use asynchronous I/O
Build the server code into the kernel
kernel scalability, limits on open file handles
and threads

4
Single-Process Event-Driven (SPED)

Examples Zeus server, Harvest/Squid proxy
Similar to a state machine
Non-blocking system calls for asynchronous I/O
operations
to avoid context switching and thread
synchronization overheads
to overlap CPU, disk and network operations to
serve many requests
Good performance for workloads cached in the
server main memory
Drawbacks
Non-blocking reads may actually block for disk
files
Select calls waste CPU time when serving a lot of
concurrent connections.

5
SPED Overhead Measurement usenix98-banga

Under real workload, SPED has to manage a lot of
open concurrent connections due to large WAN RTTs
Traditional UNIX kernel features (select,
fdalloc) do not scales
Up to 53 of CPU time sleect() related routines
Up to 11 of CPU time user process collating
information from the bitmaps returned by select()

select() costs measured on a Squid proxy server
running on AlphaStation with 400MHz 21164 CPU,
192 MB memory and Digital Unix 4.0B
6
Multi-Process/Multi-Threads (MP/MT)

Example Apache server
Assign a process to execute sequential steps to
serve a client request
overlap CPU, disk, and network operations by
context switching
Drawbacks
context switching and synchronization overhead
difficult for optimization on global information
(e.g, shared URLs cache)

7
Asymmetric Multi-Process Event-Driven(AMPED)

Example Flash
The main SPED process handles all processing
steps
Non-blocking read, write, and accept system calls
on sockets and pipes
Helper processes to perform potentially blocking
operations
instructed by the main process via IPC when a
disk operation is necessary
mincore system call to check if a file is in main
memory cache
mmap system call to access data from the file
system

8
Performance of Web Server Architectures

Performance Evaluation usenix99-vivekon a
333MHz Pen-II server (128 MB, multiple 100Mbps,
FreeBSD 2.2.6)

Synthetic workload repetitive single file read
test
Real workload
9
Cluster-based Web Servers
10
Changes in Web Server Environmentswith the
Explosive Growth of World Wide Web

Bursts as well as perpetual increase in client
requests
Variation of content
Small static files -gt graphics or multimedia
content of various size
Increase of dynamic content
due to new Web-based applications such as
E-commerce
one or two order of magnitude larger resource
usage than static file
HTTP/1.1 protocol is prevalent
Persistent connections, request pipelining

11
Demand on scalable system

Distributed or Cluster-based Network Servers
Clustered Web server is a viable approach
cost effective and scalable
Burst requests and variation of content
may cause skewed utilization among the servers
within a cluster
Importance of load distribution
In order to actually attain scalable high
performance
Distribute the requests to the servers best
suited to respond

12
Request Distribution (Load Balancing) Strategies

Replication across mirrored servers
not user-transparent as well as not controllable
Load-balancing over a distributed Web server
system
Client-based approach
DNS-based approach
Dispatcher-based approach
Server-based approach

13
Client-based Approaches

Examples
Web-clients based on random select (Netscape
Navigator)
Smart clients
Client-side proxies
Limited applicability due to
Porting issues to client(Java applet) or network
proxy
Overhead message exchange with servers (load
state, network delays)

14
DNS-based Approaches

Cluster DNS
translate(URL-to-IP) and specify TTL values
distribution policies constant or adaptive TTL
algorithms
more user-transparent than client-based approach
limited control due to address caching in
intermediate name servers

15
Dispatcher-based Approaches

Dispatcher (IP-Sprayer)
packet single/double-rewriting, packet
forwarding, HTTP redirect
Examples
Magic Router, Cisco LocalDirector/DistributedDirec
tor, IBM NetDispatcher, Linux Virtual Server
(LVS)
user-transparent at IP-SVA level fine-grained
control
dispatcher bottleneck packet rewriting overhead

16
Dispatcher-based Approaches L4 Switch

TCP router mechanism
Clients know only the IP address of L4 switch
All client requests reach the front-end
Packet forwarding

Front-end L4 switch
Back-end servers
TCP router

Scheduling granularity connection
content-blind

Client
selects a server

Scheduling algorithms
RR, LC,
WLC, WRR
schedules the connections in proportion to each
servers weight
weight the excess capacity available for new
connections

17
L4 Switch

WRR Scheduling Algorithm

/ initially index 0, L (S0,S1,...,Sn-1)
/ While (1) i index if (i 0)
cw-- if (cw lt 0) cw max of
W(Sn) if (cw 0) return NULL if
(W(Si) gt cw) index (i 1) n
return Si else index (i1)
n
18
L4 Switch

WRR weight estimation

user
Server 1
Collector
Server 2
TCP router
Server n
kernel

Pros
high routing throughput
Cons
Content-blind scheduling
Difficult to compute weights accurately
insufficient load information, load information
feedback delay overhead
Dynamic load imbalance across servers

19
Content-aware Routers L5/L7 Switch

Main idea
Dispatch a request to the server best suited to
respond by inspecting URI

Front-end
Back-end servers

Scheduling granularity connection
but, content-aware scheduling

dispatcher (httpd)
Handoff
Client
TCP/IP

Benefits
Performance improvement of back-end servers
cache affinity(locality) based scheduling
asplos98-LARD
Easy to specialize the back-ends for certain
types of content or services

20
Content-aware Routers

Routing Mechanisms

21
Content-aware Routers

Locality-Aware Request Distribution
asplos98-vivek

22
Content-aware Routers

Drawbacks
Little benefit of LARD on persistent connections
Scheduling granularity is still per connection
Routing throughput degradation
Overhead due to L7 processing front-end
bottleneck low scalability

Throughput (13 KB file)
23
Content-aware Routers

Harvard Array of Cluster Computer
usenixNT99-zhang
HACC services the requests at the server
containing the desired contentas long as total
load does not exceed the server capacity
If exceeds, it replicates the content into
another server
load weight cpu_load(dynamic page generation)
weight storage_load)

24
Server-based Approaches

DNS Redirection (HTTP redirection, distributed
packet rewriting)
policies based on server capability load state,
network proximity
distributed fine-grained control, extensible
solution (LAN, WAN)
affected by addr- mapping cache (client,
intermediate name servers)
latency(HTTP redirection)
packet rewriting overhead (distributed packet
rewriting)

25
Server-based Approaches

TranSend Proxy within a university or a company
sosp97-fox
transformation (distillation, filtering, format
conversion)
aggregation (search, collect collate data from
various sources)
caching (original or transformed contents)
customization (maintaining user preference
database)

26
Web Server Accelerator (1)

Factors limiting the Web server performance
Coping the requested data several times across
layers of software
Overheads such as OS scheduling and interrupt
processing
Main idea
Servicing frequently requested pages from caches
in front of a Web site
static pages, dynamic pages, or both

27
Web Server Accelerator (2)

IBM WSA (Olympic Winter Games 98 Web site)
Front-end (embedded OS and TCP/IP stacks
optimization)
reduces scheduler and interrupt processing
overheads and duplicated copy
maintains persistent TCP connections with
back-end servers
caches static as well as dynamic pages
Routing throughput measured on a PowerPC-200MHz
TCP-router(15K http requests/sec), Content-router
(9.8K http requests/sec)

client
TCP router
TCP handoff (for large page)
hit
Cache accel.
Cache accel.
miss
UDP (for small page lt 2KB)
Back-end
Back-end
28
References

ieeeIC99-cardellini V. Cardellini, M.
Colajanni, and P. S. Yu. Dynamic Load Balancing
on Web Server Systems. IEEE Internet Computing.
May 1999
usenix98-Banga Gaurav Banga and Jeff Mogul,
Scalable Kernel Performance for Internet Servers
Under Realistic Loads, in the Proceedings of
USENIX 1998 Technical Conference
asplos98-Vivek Vivek Pai, Gaurav Banga, Mohit
Aron, Michael Svendsen, Peter Druschel, Willy
Zwaenepoel, and Eric Nahum, Locality-Aware
Request Distribution in Cluster-based Network
Servers, in the Proceedings of the 8th
International Conference on Architectural Support
for Programming Languages and Operating Systems,
Oct 1998.
usenix99-Vivek Flash An Efficient and Portable
Web Server. Vivek Pai, Peter Druschel and Willy
Zwaenepoel. In the Proceedings of the USENIX 1999
Annual Technical Conference, June 1999.
www7-hunt D. H. Hunt, G. S Goldszmidt, R. King,
and R. Mukherjee. Network Dispatcher A
Connection Router for Scalable Internet Services.
Proceedings of the 7th World Wide Web Conference
sosp97-fox Armando Fox, Steven Gribble, Yatin
Chawathe, Eric Brewer, and Paul Gauthier.
Cluster-based Scalable Network Services . Proc.
of the 16th ACM Symp. on Operating Systems
Principles, October 1997
usenixNT-zhang X. Zhang, M. Barrientos, J. B.
Chen, and M. Seltzer. HACC An Architecture for
Cluster-based Web Servers. Proceedings Of the 3rd
USENIX Windows NT Symposium. July 1999.
wwwc99-song J. Song, E. Levy, A. Iyengar, and
D. Dias, A Scalable and Highly Available Web
Server Accelerator'' in Proc. of World Wide Web
Conference, April, 1999.
infocom00-apostolopoulos G. Apostolopoulos, et
al. Design, Implementation and Performance of a
Content-Based Switch. INFOCOM 2000