Cooperative Web Caching Using ServerDirected Proxy Sharing - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Cooperative Web Caching Using ServerDirected Proxy Sharing

Description:

2. Proxy looks up object cache sites in its Cache Site Directory. If found, then skip next step. ... 5. Proxy requests object from fastest site. Site returns ... – PowerPoint PPT presentation

Number of Views:251

Avg rating:3.0/5.0

Slides: 55

Provided by: sandy5

Category:

more less

Transcript and Presenter's Notes

Title: Cooperative Web Caching Using ServerDirected Proxy Sharing

1
Cooperative Web Caching Using Server-Directed
Proxy Sharing

Sandra G. Dykes
Ph.D. Dissertation Proposal
University of Texas at San Antonio

2
Talk Outline

Problem Internet performance
Dissertation overview
Web server workloads
Taxonomy related research
SDP cache mesh
Simulation
Prototype

3
Internet Performance

Slow, variable response time
Denial-of-service (Cannot connect)
due to
Congestion at network routers, NAPs, MAEs
Overloaded servers
Interaction of TCP/IP with HTTP

4
Internet Caching
Server
Internet
Proxy Server
Proxy Server
User User
User
User User
User
5
Internet Caching

The Internet needs a scalable cache system.
Cache designs should consider network traffic.
Simulations should consider network traffic.

6
Dissertation Proposal

Design, simulate and implement a protocol for
cooperative web caching
Server-Directed Proxy Sharing (SDP).
Develop methods to improve simulation of network
cache designs.

7
Phases of Work

Collect and analyze Web server traces
Specify web cache design
Simulate protocol
Implement prototype

8
Web Server Workloads Arlitt Williamson,
Bestavros,et.al., Mogul, Gwertzman Seltzer

Exponential growth
Skewed popularity 90 requests ? 10 files
Small objects Mean
HTML and images 90 req, 54-92 bytes
Remote requests 70
Duplicate requests 97
Long lifetimes 50 - 100 days
Bursty arrivals
Geography affects popularity
Flash crowds and hot spots

9
UTSA Web Server Traces

CS Division
May 1997 - Sept 1997
561,292 requests
3.8 GB (24 MB/day)
Visualization and Image Processing Lab
May 1996 - Sept 1997
552,046 requests
4.2 GB (8 MB/day)

10
Growth in Server Load UTSA-CS
11
Popularity SkewUTSA-CS, UTSA-VIS
12
Object Size - TransfersUTSA-CS, UTSA-VIS
13
Workload Comparison
UTSA-CS UTSA-VIS
Literature

Growth
10 objects satisfy
Object size
Mean
Distribution
HTML and images
Requests
Bytes
Remote requests
Duplicate transfers

Yes Yes 69 req. 79
req. il 92 96 77
45 82 78 99
99
Exponential 90 req.
90 52 - 94 70 97
14
Object Types
UTSA
Literature
Transfers Avg Size
Transfers Avg Size
HTML Image Audio Video Dynamic
43 4 KB 51 12 KB 0.3
200 KB 0.5 452 KB 2 1 KB
42 4 KB 54 13 KB 5
179 KB 0.4 2300 KB 0-9 5 KB
42 3.3

Embedded Images Images per page
( 3 - 5 ? )
14
15
Implications for Web caching

Web caching is viable
Small percent of objects satisfy most requests
Long object lifetimes
Dynamic objects are small fraction of load
Bandwidth and latency are both important
Consider geography and network topology
Adapt quickly to shifts in popularity
Use Web page structure (embedded images)

16
Current Internet Caching

Proxy server caching
Hierarchical proxy server caching

17
Proxy Server Caching
Web Server
Proxy Server
Proxy Server
Proxy Server
18
Proxy server caching is not enough

Hit rates depend upon overlap in user requests,
so need many users or users accessing the same
objects.
Maximum achievable hit rates (8 cache, 2-4
months)
Boston Univ. 53
Virginia Tech. 29, 30, 46
DEC - Glassman 30 - 50

19
Hierarchical Proxy Server Caching (Harvest /
Squid)
Web Server
NLANR

Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
20
Network Caching Research

Taxonomy for network caching
Research projects

21
Taxonomy for Network Caching
Discovery
Dissemination
Delivery

Fixed Cache
Group Query
Manual
Automatic
Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
21
22
Taxonomy for Network Caching
Discovery
Dissemination
Delivery

Fixed Cache
Group Query
Manual
Automatic
Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
22
23
Taxonomy for Network Caching
Discovery
Dissemination
Delivery

Fixed Cache
Group Query
Manual
Automatic
Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
23
24
Taxonomy for Network Caching
Discovery
Dissemination
Delivery

Fixed Cache
Group Query
Manual
Automatic
Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
24
25
Web Caching Projects
Project
Delivery
Discovery
Dissemination
Direct Indirect Indirect Indirect Direct Direct Di
rect Direct
Client-initiated Client-initiated Client-initiated
Client-initiated Server-initiated Server-initiate
d Server-initiated Client-initiated
Proxy server cache Harvest / Squid Zhang, Floyd,
Jacobson Malpani, Lorch, Berger Gwertzman
Seltzer Bestavros, et.al. Tewari, Dahlin, Vin,
... SDP
Fixed cache Group query Manual Group query
Auto Group query Auto Directory
Centralized Directory Centralized Directory
Hierarchical Directory Lazy mesh
26
SDP Design Choices

Discovery Cache Site Directory
?(1) discovery if hit in local cache site
directory
Fixed cache and non-hierarchical group query do
not scale.
Cache hierarchies have large miss penalty
(response time)
Dissemination Client-initiated
Automatically adapts to shifts in popularity.
Uses current data while server-initiated uses
historic data.
Delivery Direct
Less network traffic and lower response times.
Indirect delivery under HTTP is store-and-forward.

27
Cache Site Directory

Discovery, dissemination delivery of location
info
Similar to object caching, but smaller size makes
prefetching viable.
Directory organization
Cache consistency
Overhead
Keep low - dont spend more time getting location
info than objects!

28
Server-Directed Proxy Sharing (SDP)

Proxy server caches share objects
Flat mesh of caches

28
29
Web Server
NLANR

Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy Caching
Hierarchical Proxy Caching
Web Server
Proxy
Proxy
Proxy
SDP
30
SDP Components
Cache
Server
Proxy
Proxy Table
Cache Site Directory
Popular List
30
31
SDP Protocol
1. Proxy looks for object in its cache. If
found, then done.
Cache
Proxy
31
32
SDP Protocol
2. Proxy looks up object cache sites in its
Cache Site Directory. If found, then skip
next step.
Proxy
Cache Site Directory
32
33
SDP Protocol
3. Proxy sends request to Server. Server
returns object and/or list of cache sites.
Cache Object
REQUEST
Server
Proxy
OBJECT and/or Cache Sites
Proxy Table Request info
Cache Site Directory Cache Sites for object
33
34
SDP Protocol
4. Proxy determines fastest sites.
ECHO
ECHO
1st REPLY
ECHO
34
35
SDP Protocol
5. Proxy requests object from fastest site.
Site returns object Popular List.
Cache Object
Cache
REQUEST
Proxy i
OBJECT POPULAR LIST
Popular List
Cache Site Directory Proxy i Popular List
35
36
SDP Protocol
For Web pages, Proxy requests embedded images in
parallel from different cache sites, using the
fastest sites.
REQUEST GIF 1
Proxy i
GIF 1 POPULAR LIST
REQUEST GIF 2
Proxy j
GIF 2 POPULAR LIST

36
37
Advantages of Design

Technical
Reduces Denial-of-service.
Designed to reduce server load AND response time.
Network correct - designed to reduce
congestion.
Practical
No change to routers or HTTP protocol.
No change to local cache policy.
No central administration, equipment or
personnel.

38
Design Issues Questions

Cache consistency
Stale object
Object removed from cache
What are hit rates at Cache Site Directories?
Use r.t. measurements or static criteria to
choose cache site?

39
Simulation

Analytical workload
Preliminary isolated single server model
Next Step
model network traffic, multiple servers and cache
sites
consider using Berkeley ns network simulator

40
Why Analytical Workloads?

Model the functional dependence of performance
metrics on workload variables.
Separate the effects of workload variables.
Predict behavior for different workloads.

41
Single Server Simulation

Sessions Ta varied Exp (a)
Embedded images Ta 221 ms
Log-Normal (a,b)
Connection duration Ts 289 ms Log-Normal
(a,b)
Session Probability P0.59
HTML with images P 0.13
HTML no images P 0.30
Non-HTML P 0.16
Embedded image P 0.41

41
42
Size and Type Distributions

Object type P_request Mean Size Size
Distribution
HTML 0.430 4 KB Pareto (a)
Image 0.506 11 KB Pareto (a)
Audio 0.003 140 KB Pareto (a)
Video 0.004 452 KB Pareto (a)
Application 0.007 260 KB Pareto (a)
Dynamic 0.019 1 KB Pareto (a)
Other 0.031 11 KB Pareto (a)

42
43
Server Load - MB Transferred
44
Server Load - Requests received
45
Connection Refusals
46
Effect on network congestion

47
Prototype Design
SDP Proxy Server
SDP Server
HTTP Proxy Server
HTTP Server
HTTP Proxy Server
SDP PROTOCOL
HTTP
HTTP
48
Contributions

Taxonomy
Network cache design
Scalable design Direct delivery AND
Internet-wide sharing
Cache site directory Lazy mesh organization
Cache site selection Dynamic vs. static
criteria
Web page structure Concurrent requests from
different sites
Network cache simulation
Analytical workload
Model network traffic, multiple servers caches
Estimate relative response times
Implementation

48
49

Clinton Jeffery
Samir Das
Garry Bernal
Shannon Williams

50
(No Transcript)
51
Cache Site Selection

Static
Geography
Hops
Capacity BW
Average BW
Dynamic
Latency measurement (ping)
BW measurement (? ping)

52
How is server load reduced?

Balance request load across servers and proxy
server caches.

53
How is congestion reduced?

Selecting cache sites from run-time response
measurements helps use less congested routes.
Retrieving embedded objects from multiple sites
helps distribute network traffic.

54
How is response time reduced?