Title: Cooperative Web Caching Using ServerDirected Proxy Sharing
1Cooperative Web Caching Using Server-Directed
Proxy Sharing
- Sandra G. Dykes
- Ph.D. Dissertation Proposal
- University of Texas at San Antonio
2Talk Outline
- Problem Internet performance
- Dissertation overview
- Web server workloads
- Taxonomy related research
- SDP cache mesh
- Simulation
- Prototype
3Internet Performance
- Slow, variable response time
- Denial-of-service (Cannot connect)
- due to
- Congestion at network routers, NAPs, MAEs
- Overloaded servers
- Interaction of TCP/IP with HTTP
4Internet Caching
Server
Internet
Proxy Server
Proxy Server
User User
User
User User
User
5Internet Caching
- The Internet needs a scalable cache system.
- Cache designs should consider network traffic.
- Simulations should consider network traffic.
6 Dissertation Proposal
- Design, simulate and implement a protocol for
cooperative web caching - Server-Directed Proxy Sharing (SDP).
- Develop methods to improve simulation of network
cache designs.
7Phases of Work
- Collect and analyze Web server traces
- Specify web cache design
- Simulate protocol
- Implement prototype
8 Web Server Workloads Arlitt Williamson,
Bestavros,et.al., Mogul, Gwertzman Seltzer
- Exponential growth
- Skewed popularity 90 requests ? 10 files
- Small objects Mean
- HTML and images 90 req, 54-92 bytes
- Remote requests 70
- Duplicate requests 97
- Long lifetimes 50 - 100 days
- Bursty arrivals
- Geography affects popularity
- Flash crowds and hot spots
9UTSA Web Server Traces
- CS Division
- May 1997 - Sept 1997
- 561,292 requests
- 3.8 GB (24 MB/day)
- Visualization and Image Processing Lab
- May 1996 - Sept 1997
- 552,046 requests
- 4.2 GB (8 MB/day)
10Growth in Server Load UTSA-CS
11Popularity SkewUTSA-CS, UTSA-VIS
12 Object Size - TransfersUTSA-CS, UTSA-VIS
13Workload Comparison
UTSA-CS UTSA-VIS
Literature
- Growth
- 10 objects satisfy
- Object size
- Mean
- Distribution
- HTML and images
- Requests
- Bytes
- Remote requests
- Duplicate transfers
Yes Yes 69 req. 79
req. il 92 96 77
45 82 78 99
99
Exponential 90 req.
90 52 - 94 70 97
14 Object Types
UTSA
Literature
Transfers Avg Size
Transfers Avg Size
HTML Image Audio Video Dynamic
43 4 KB 51 12 KB 0.3
200 KB 0.5 452 KB 2 1 KB
42 4 KB 54 13 KB 5
179 KB 0.4 2300 KB 0-9 5 KB
42 3.3
Embedded Images Images per page
( 3 - 5 ? )
14
15Implications for Web caching
- Web caching is viable
- Small percent of objects satisfy most requests
- Long object lifetimes
- Dynamic objects are small fraction of load
- Bandwidth and latency are both important
- Consider geography and network topology
- Adapt quickly to shifts in popularity
- Use Web page structure (embedded images)
16Current Internet Caching
- Proxy server caching
- Hierarchical proxy server caching
17Proxy Server Caching
Web Server
Proxy Server
Proxy Server
Proxy Server
18Proxy server caching is not enough
- Hit rates depend upon overlap in user requests,
so need many users or users accessing the same
objects. - Maximum achievable hit rates (8 cache, 2-4
months) - Boston Univ. 53
- Virginia Tech. 29, 30, 46
- DEC - Glassman 30 - 50
19Hierarchical Proxy Server Caching (Harvest /
Squid)
Web Server
NLANR
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
20Network Caching Research
- Taxonomy for network caching
- Research projects
21Taxonomy for Network Caching
Discovery
Dissemination
Delivery
- Fixed Cache
-
- Group Query
- Manual
- Automatic
- Cache Site Directory
Client-initiated Server-initiated
Direct Indirect
21
22Taxonomy for Network Caching
Discovery
Dissemination
Delivery
- Fixed Cache
-
- Group Query
- Manual
- Automatic
- Cache Site Directory
Client-initiated Server-initiated
Direct Indirect
22
23Taxonomy for Network Caching
Discovery
Dissemination
Delivery
- Fixed Cache
-
- Group Query
- Manual
- Automatic
- Cache Site Directory
Client-initiated Server-initiated
Direct Indirect
23
24Taxonomy for Network Caching
Discovery
Dissemination
Delivery
- Fixed Cache
-
- Group Query
- Manual
- Automatic
- Cache Site Directory
Client-initiated Server-initiated
Direct Indirect
24
25Web Caching Projects
Project
Delivery
Discovery
Dissemination
Direct Indirect Indirect Indirect Direct Direct Di
rect Direct
Client-initiated Client-initiated Client-initiated
Client-initiated Server-initiated Server-initiate
d Server-initiated Client-initiated
Proxy server cache Harvest / Squid Zhang, Floyd,
Jacobson Malpani, Lorch, Berger Gwertzman
Seltzer Bestavros, et.al. Tewari, Dahlin, Vin,
... SDP
Fixed cache Group query Manual Group query
Auto Group query Auto Directory
Centralized Directory Centralized Directory
Hierarchical Directory Lazy mesh
26SDP Design Choices
- Discovery Cache Site Directory
- ?(1) discovery if hit in local cache site
directory - Fixed cache and non-hierarchical group query do
not scale. - Cache hierarchies have large miss penalty
(response time) - Dissemination Client-initiated
- Automatically adapts to shifts in popularity.
- Uses current data while server-initiated uses
historic data. - Delivery Direct
- Less network traffic and lower response times.
- Indirect delivery under HTTP is store-and-forward.
27 Cache Site Directory
- Discovery, dissemination delivery of location
info - Similar to object caching, but smaller size makes
prefetching viable. - Directory organization
- Cache consistency
- Overhead
- Keep low - dont spend more time getting location
info than objects!
28Server-Directed Proxy Sharing (SDP)
- Proxy server caches share objects
- Flat mesh of caches
28
29Web Server
NLANR
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy Caching
Hierarchical Proxy Caching
Web Server
Proxy
Proxy
Proxy
SDP
30SDP Components
Cache
Server
Proxy
Proxy Table
Cache Site Directory
Popular List
30
31SDP Protocol
1. Proxy looks for object in its cache. If
found, then done.
Cache
Proxy
31
32SDP Protocol
2. Proxy looks up object cache sites in its
Cache Site Directory. If found, then skip
next step.
Proxy
Cache Site Directory
32
33SDP Protocol
3. Proxy sends request to Server. Server
returns object and/or list of cache sites.
Cache Object
REQUEST
Server
Proxy
OBJECT and/or Cache Sites
Proxy Table Request info
Cache Site Directory Cache Sites for object
33
34SDP Protocol
4. Proxy determines fastest sites.
ECHO
ECHO
1st REPLY
ECHO
34
35SDP Protocol
5. Proxy requests object from fastest site.
Site returns object Popular List.
Cache Object
Cache
REQUEST
Proxy i
OBJECT POPULAR LIST
Popular List
Cache Site Directory Proxy i Popular List
35
36SDP Protocol
For Web pages, Proxy requests embedded images in
parallel from different cache sites, using the
fastest sites.
REQUEST GIF 1
Proxy i
GIF 1 POPULAR LIST
REQUEST GIF 2
Proxy j
GIF 2 POPULAR LIST
36
37 Advantages of Design
- Technical
- Reduces Denial-of-service.
- Designed to reduce server load AND response time.
- Network correct - designed to reduce
congestion. - Practical
- No change to routers or HTTP protocol.
- No change to local cache policy.
- No central administration, equipment or
personnel.
38Design Issues Questions
- Cache consistency
- Stale object
- Object removed from cache
- What are hit rates at Cache Site Directories?
- Use r.t. measurements or static criteria to
choose cache site?
39 Simulation
- Analytical workload
- Preliminary isolated single server model
- Next Step
- model network traffic, multiple servers and cache
sites - consider using Berkeley ns network simulator
40 Why Analytical Workloads?
- Model the functional dependence of performance
metrics on workload variables. - Separate the effects of workload variables.
- Predict behavior for different workloads.
41 Single Server Simulation
- Sessions Ta varied Exp (a)
- Embedded images Ta 221 ms
Log-Normal (a,b) - Connection duration Ts 289 ms Log-Normal
(a,b) - Session Probability P0.59
- HTML with images P 0.13
- HTML no images P 0.30
- Non-HTML P 0.16
- Embedded image P 0.41
41
42Size and Type Distributions
- Object type P_request Mean Size Size
Distribution - HTML 0.430 4 KB Pareto (a)
- Image 0.506 11 KB Pareto (a)
- Audio 0.003 140 KB Pareto (a)
- Video 0.004 452 KB Pareto (a)
- Application 0.007 260 KB Pareto (a)
- Dynamic 0.019 1 KB Pareto (a)
- Other 0.031 11 KB Pareto (a)
42
43 Server Load - MB Transferred
44 Server Load - Requests received
45 Connection Refusals
46Effect on network congestion
47Prototype Design
SDP Proxy Server
SDP Server
HTTP Proxy Server
HTTP Server
HTTP Proxy Server
SDP PROTOCOL
HTTP
HTTP
48Contributions
- Taxonomy
- Network cache design
- Scalable design Direct delivery AND
Internet-wide sharing - Cache site directory Lazy mesh organization
- Cache site selection Dynamic vs. static
criteria - Web page structure Concurrent requests from
different sites - Network cache simulation
- Analytical workload
- Model network traffic, multiple servers caches
- Estimate relative response times
- Implementation
48
49 - Clinton Jeffery
- Samir Das
- Garry Bernal
- Shannon Williams
50(No Transcript)
51Cache Site Selection
- Static
- Geography
- Hops
- Capacity BW
- Average BW
- Dynamic
- Latency measurement (ping)
- BW measurement (? ping)
52How is server load reduced?
- Balance request load across servers and proxy
server caches.
53How is congestion reduced?
- Selecting cache sites from run-time response
measurements helps use less congested routes. - Retrieving embedded objects from multiple sites
helps distribute network traffic.
54How is response time reduced?
- Discovery
- Web pages - piggyback Proxy List onto HTML text.
- Cache Site Directories and piggybacked Popular
Lists. - Delivery
- Direct
- Choose cache site from run-time estimates of
response time. - Retrieve embedded images in parallel from
multiple sites.