Squirrel: A peer-to-peer web cache - PowerPoint PPT Presentation

About This Presentation
Title:

Squirrel: A peer-to-peer web cache

Description:

Squirrel: A peer-to-peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 44
Provided by: Unknow5
Category:

less

Transcript and Presenter's Notes

Title: Squirrel: A peer-to-peer web cache


1
Squirrel A peer-to-peer web cache
  • Sitaram Iyer (Rice University)
  • Joint work with
  • Ant Rowstron (MSR Cambridge)
  • Peter Druschel (Rice University)

PODC 2002 / Sitaram Iyer / Tuesday July 23 /
Monterey, CA
2
Web Caching
  • Latency,
  • External traffic,
  • Load on web servers and routers.
  • Deployed at Corporate network boundaries, ISPs,
    Web Servers, etc.

3
Web Cache
Browser Cache
Browser
Web Server
Client
Centralized Web Cache
Browser Cache
Browser
Client
Internet
Corporate LAN
4
Cooperative Web Cache
Browser Cache
Web Cache
Browser
Web Cache
Web Server
Client
Web Cache
Web Cache
Web Cache
Browser Cache
Browser
Client
Internet
Corporate LAN
5
Decentralized Web Cache
Squirrel
Browser Cache
Browser
Web Server
Client
Browser Cache
Browser
Client
Internet
Corporate LAN
6
Distributed Hash Table
  • Peer-to-peer location service Pastry

nodes
k1,v1
k2,v2
k3,v3
Operations Insert(k,v) Lookup(k)
Peer-to-peer routing and location substrate
k4,v4
k5,v5
k6,v6
ltkey,valuegt
  • Completely decentralized and self-organizing
  • Fault-tolerant, scalable, efficient

7
Why peer-to-peer?
  • Cost of dedicated web cache
  • No additional hardware
  • Administrative effort
  • Self-organizing network
  • Scaling implies upgrading
  • Resources grow with clients

8
Setting
  • Corporate LAN
  • 100 - 100,000 desktop machines
  • Located in a single building or campus
  • Each node runs an instance of Squirrel
  • Sets it as the browsers proxy

9
Mapping Squirrel onto Pastry
  • Two approaches
  • Home-store
  • Directory

10
Home-store model
client
URL hash
home
11
Home-store model
client
home
thats how it works!
12
Directory model
  • Client nodes always cache objects locally.
  • Home-store home node also stores objects.
  • Directory the home node only stores pointers to
    recent clients, and forwards requests.

13
Directory model
client
home
14
Directory model
client
home
Randomly choose entry from table
15
Directory Advantages
  • Avoids storing unnecessary copies of objects.
  • Rapidly changing directory for popular objects
    seems to improve load balancing.
  • Home-store scheme can incur hotspots.

16
Directory Disadvantages
  • Cache insertion only happens at clients, so
  • active clients store all the popular objects,
  • inactive clients waste most of their storage.
  • Implications
  • Reduced cache size.
  • Load imbalance.

17
Directory Load spike example
  • Web page with many embedded images, or
  • Periods of heavy browsing.
  • Many home nodes point to such clients!

Evaluate
18
Trace characteristics
Microsoft in Redmond Cambridge
Total duration 1 day 31 days
Number of clients 36,782 105
Number of HTTP requests 16.41 million 0.971 million
Peak request rate 606 req/sec 186 req/sec
Number of objects 5.13 million 0.469 million
Number of cacheable objects 2.56 million 0.226 million
Mean cacheable object reuse 5.4 times 3.22 times
19
Total external traffic
Redmond
105
No web cache
100
95
Directory
Total external traffic (GB)
lower is better
Home-store
90
Centralized cache
85
0.001
0.01
0.1
1
10
100
Per-node cache size (in MB)
20
Total external traffic
Cambridge
6.1
No web cache
6
5.9
Directory
Total external traffic (GB)
5.8
lower is better
Home-store
5.7
5.6
Centralized cache
5.5
0.001
0.01
0.1
1
10
100
Per-node cache size (in MB)
21
LAN Hops
Redmond
100
80
60
of cacheable requests
40
20
0
0
1
2
3
4
5
6
Total hops within the LAN
Centralized
Home-store
Directory
22
LAN Hops
Cambridge
100
80
60
of cacheable requests
40
20
0
0
1
2
3
4
5
Total hops within the LAN
Centralized
Home-store
Directory
23
Load in requests per sec
Redmond
100000
Home-store
Directory
10000
1000
Number of times observed
100
10
1
0
10
20
30
40
50
Max objects served per-node / second
24
Load in requests per sec
Cambridge
1e07
Home-store
Directory
1e06
100000
10000
Number of times observed
1000
100
10
1
0
10
20
30
40
50
Max objects served per-node / second
25
Load in requests per min
Redmond
100
Home-store
Directory
10
Number of times observed
1
0
50
100
150
200
250
300
350
Max objects served per-node / minute
26
Load in requests per min
Cambridge
Home-store
Directory
10000
1000
Number of times observed
100
10
1
0
20
40
60
80
100
120
Max objects served per-node / minute
27
Fault tolerance
  • Sudden node failures result in
  • partial loss of cached content.
  • Home-store Proportional to failed nodes.
  • Directory More vulnerable.

28
Fault tolerance
If 1 of Squirrel nodes abruptly crash, the
fraction of lost cached content is
Home-store Directory
Redmond Mean 1 Max 1.77 Mean 1.71 Max 19.3
Cambridge Mean 1 Max 3.52 Mean 1.65 Max 9.8
29
Conclusions
  • Possible to decentralize web caching.
  • Performance comparable to a centralized web
    cache,
  • Is better in terms of cost, scalability, and
    administration effort, and
  • Under our assumptions, the home-store scheme is
    superior to the directory scheme.

30
Other aspects of Squirrel
  • Adaptive replication
  • Hotspot avoidance
  • Improved robustness
  • Route caching
  • Fewer LAN hops

31
Thanks.
32
(backup) Storage utilization
Redmond Home-store Directory
Total 97641 MB 61652 MB
Mean per-node 2.6 MB 1.6 MB
Max per-node 1664 MB 1664 MB
33
(backup) Fault tolerance
Home-store Directory
Equations Mean H/O Max Hmax /O Mean (HS)/O Max max(Hmax,Smax)/O
Redmond Mean 0.0027 Max 0.0048 Mean 0.198 Max 1.5
Cambridge Mean 0.95 Max 3.34 Mean 1.68 Max 12.4
34
(backup) Full home-store protocol
other
other
req
req
req
(LAN)
(WAN)
a object or notmod from home
b req
home
client
1
b
b object or notmod from origin
2
3
origin
server
35
(backup) Full directory protocol
36
(backup) Peer-to-peer Computing
  • Decentralize a distributed protocol
  • Scalable
  • Self-organizing
  • Fault tolerant
  • Load balanced
  • Not automatic!!

37
Decentralized Web Cache
Browser Cache
Browser
Web Server
Browser Cache
Browser
Internet
LAN
38
Challenge
  • Decentralized web caching algorithm
  • Need to achieve those benefits in practice!
  • Need to keep overhead unnoticeably low.
  • Node failures should not become significant.

39
Peer-to-peer routing, e.g., Pastry
  • Peer-to-peer object location and routing
    substrate Distributed Hash Table.
  • Reliably maps an object key to a live node.
  • Routes in log16(N) steps
  • (e.g. 3-4 steps for 100,000 nodes)

40
Home-store is better!
  • Simpler home-store scheme achieves load balancing
    by hash function randomization.
  • Directory scheme implicitly relies on access
    patterns for load distribution.

41
Directory scheme seems better
  • Avoids storing unnecessary copies of objects.
  • Rapidly changing directory for popular objects
    results in load balancing.

42
Interesting difference
  • Consider
  • Web page with many images, or
  • Heavily browsing node
  • Directory many pointers to some node.
  • Home-store natural load balancing.

Evaluate
43
Fault tolerance
When a single Squirrel node crashes, the fraction
of lost cached content is
Home-store Directory
Redmond Mean 0.0027 Max 0.0048 Mean 0.2 Max 1.5
Cambridge Mean 0.95 Max 3.34 Mean 1.7 Max 12.4
Write a Comment
User Comments (0)
About PowerShow.com