Title: ZGP001 (zphddef.ppt - 07/15/03)
1Performance Evaluation of URL Routing for Content
Distribution Networks
PhD defense by Zornitza Genova Prodanoff Committe
e Members Dr. K. J. Christensen (Major
Professor) Dr. M. Varanasi Dr. R. Perez Dr.
Chari Dr. Labrador
ZGP001 (zphddef.ppt - 07/15/03)
2Acknowledgements
- I would like to thank
-
- My major professor Dr. Ken Christensen,
-
- My committee Dr. Varanasi, Dr. Perez, Dr.
Chari, and Dr. Labrador -
- Dr. Suen for his comments at my proposal defense
-
- My colleagues K. Yoshigoe, A. Aslam, G.
Perrera, and J. Shahbazian -
- My family
ZGP002
3Topics
- Motivation
- Problem and contributions
- URL Routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP003
4Motivation
2.5 Billion Hours Spent Waiting on the Web in
1998. - John Roth, chief executive of Nortel
Networks at Telecom '99
ZGP004
5Problem and contributions
- Problem
- Excessive delay in the Internet caused by the
inability to efficiently access distributed
content in the Web - My contributions
- 1) Architected a new URL router that uses HTTP
redirection - Investigated new use of CRC32 for reducing the
size of routing tables - Investigated a new self-adjusting hashing method
for faster URL routing look-up - Performed the first queuing evaluation of hashing
- effects of correlation discovered
ZGP005
6Topics
- Motivation
- Problem and contributions
- URL Routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP006
7URL routing
- Next generation Internet - Content Distribution
Networks - A CDN is an overlay network on the Internet
- A CDN co-locates content throughout the world
- CDNs are of a great commercial and research
interest - 15 million in NSF funding for Web services
research - Akamai is one major CDN provider
ZGP007
8URL routing continued
Global content distribution in a CDN
http//214.29.2.15/page
http//www.some.com/page
http//334.249.2.8/page
ZGP008
9URL routing continued
- HTTP redirection in a CDN
- (1) HTTP request and redirect
- (2) HTTP re-request and response
Reverse cache
Origin site
Proxy cache
Clients
Distributed server
ZGP009
10URL routing continued
Architecture of a new URL router
One armed URL router
HTTP requests and redirects
Network links
Layer 3 switch
ZGP010
11URL routing continued
- Need to exchange routing tables (digesting)
- Summary Cache 17
- Use Bloom filters to merge routing (hash)
tables - Bloom filter is probabilistic and does not
support updates - False positives if non-unique hashes
- Results in a routing collision in the context
of URLs
ZGP011
12URL routing continued
- Need to do look-ups in routing tables
- Why use hashing?
- Build routing tables as hash tables for efficient
look-up - Idea of selfadjusting hash
- Most frequently used keys are closer to the head
- If chained hashing rearrange after key accesses
- Transposition rule for lists 50, 7
- Move-to-front rule for lists 33
- Review of H1 hashing 74
- Self-adjusting by using transposition
ZGP012
13URL routing continued
Chained resolution of hash table collision
index
chain
key
record
r0
rn-1
r0
k0
0
r1
r1
k1
1
r2
r2
k2
2
The hashing collision at index 0 causes the chain
to be created
rs
m-1
rn-1
kn-1
ZGP013
14URL routing continued
H1 and Simple hashing algorithms based on 37
C1. Create lists For i ? 0 to m-1 set LISTi ?
NULL. C2. Hash Set i ? h(KEY), j ? 0 C3.
Is there a list? If LISTi NULL, go to C6.
C4. Compare If K LISTij, terminate C5.
Advance to next If LISTij ? NULL, set j ?
j1 and go to step C4. C6. Insert new key Set
LISTij ? KEY. C4A. Compare and transpose H1
hashing If K LISTij and j ? 0, swap
LISTij with LISTij-1 and terminate Else
terminate
ZGP014
15URL routing continued
Now begin my contributions in digesting and
hashing (and evaluation thereof)
ZGP015
16Topics
- Motivation
- Problem and contributions
- URL routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP016
17Improvements to URL routing
- Open problems
- Select best source based on state (and location
of client) - Reduce the size of the routing table to
update/share - Perform fast routing look-ups
My problems
ZGP027
18Improvements to URL routing continued
- My idea
- Use CRC32 for URL signatures
- CRC32 circuitry is already part of an Ethernet
adapter - Serial shift-register with wrapped XOR terms
- Use to get CRC32 signatures for URL in HTTP
request header - Need to calculate a CRC32 over a subfield 53
- The subfield is the URL in an HTTP request header
-
ZGP018
19Improvements to URL routing continued
- Define the following,
- P is CRC32 generator polynomial
- Ai, i 1, , m is a polynomial (bit sequence)
- We store in a table (for all possible M) the
remainders - , where M is length of subfield
Packet header
Subfield
Rest of packet
A0
A2
A1
ZGP019
20Improvements to URL routing continued
We have the following,
Returned by adapter - from CRC32 shift register
What we want (CRC32 for subfield)
ZGP020
21Improvements to URL routing continued
For the following
properties apply
ZGP021
22Improvements to URL routing continued
- Solve for RA2 as follows
- Let A3 be A0 shifted left M bits.
- Then
- and
-
.
32-bit multiply
ZGP022
23Improvements to URL routing continued
- My idea
- Aggressive hashing to perform fast look-up
- Self-adjusting chained collision resolution
- Fast way to do hash table look-ups
- Based on move-to-front rule for lists 33, 50
-
ZGP023
24Improvements to URL routing continued
- The new Aggressive hashing algorithm
-
-
C1. Create lists For i ? 0 to m-1 set LISTi ?
NULL. C2. Hash Set i ? h(KEY), j ? 0 C3.
Is there a list? If LISTi NULL, go to C6.
C4. Compare If K LISTij, terminate C5.
Advance to next If LISTij ? NULL, set j ?
j1 and go to step C4. C6. Insert new key Set
LISTij ? KEY. C4B. Compare and move-to-front
Aggressive hashing If K LISTij and j ? 0
LISTij ? TEMP, for k 0 to j LISTik LISTi ?
k-1. Terminate. Else terminate.
New
25Topics
- Motivation
- Problem and contributions
- URL routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP025
26 Evaluation of URL signatures
Evaluation done with trace-driven
simulation Response variables 1)
Probability of false hits due to signature
collisions 2) CPU time required to generate URL
signatures 3) Reduction in processing and
memory resources for URL look-up
ZGP026
27Evaluation of URL signatures continued
- Input data used in the evaluation
- Obtained lists of URLs from 9 cache and server
HTTP logs - Access lists
- URL lists
- CRC32 lists
- Unique URLs range from 70 to 2.5 million (1.5
to 146 MBytes) - Continuity of logs was in months
- Full URL string or CRC32 signatures lists were
built
generated by me
2.1 GBytes of ASCII format raw data was used
ZGP027
28Evaluation of URL signatures continued
Input data characteristics
ZGP028
29Evaluation of URL signatures continued
- Experiments on the performance of CRC32
-
- Experiment 1 Number of CRC collisions was
measured - CRC32 generated for each URL
- Non-unique CRC32s counted
- Experiment 2 Measured CPU time to generate
CRC32 URL list - Software CRC generation (8-bit look-up coded in
C) - Experiment 3 Measured CPU time required for
look-up - All entries from access list were looked up in
URL list - URL list is a Simple chained hash table
ZGP029
30Evaluation of URL signatures continued
Results for experiment 1
Measured and theoretical are close
ZGP030
31Evaluation of URL signatures continued
Results for experiment 2
Time per URL string is small (? sec)
ZGP031
32Evaluation of URL signatures continued
Results for experiment 3
0.6
0.5
up time (sec)
0.4
-
0.3
Look
0.2
Full URL
0.1
CRC32 URL signatures
0
10
12
14
16
18
20
22
H
value
CRC32 URL signature is better
ZGP032
33Evaluation of URL signatures continued
- Experiments for CRC32 vs. MD5-Bloom filter
digesting - Experiment 1 Measured digest size and
generation CPU time - MD5-Bloom filter
- CRC32
- 32-bit checksum
- Lempel-Ziv (LZ) compression (used pkzip25)
-
- Experiment 2 Measured digest size and CPU time
- MD5-Bloom
- Experiment 3 Measured collisions
- Control variable is URL length
- MD5-Bloom vs. CRC32
- URL length is a maximum of 25, 30, , 80 bytes
ZGP033
34Evaluation of URL signatures continued
- Experiments for CRC32 vs. MD5-Bloom filter
digesting (continued) - Experiment 4 Measured digest size of the hash
chain method - Based on the number of components
- Tree structure of 32 bits for a ltdepth, hash
codegt pair
ZGP034
35Evaluation of URL signatures continued
Results for experiments 1 and 2
Similar CRC32 and Bloom filter collisions
ZGP035
36Evaluation of URL signatures continued
Results for experiment 3
0.10
0.01
MD5-Bloom
Collisions ()
CRC32
0.00
25
35
45
55
65
75
URL length (bytes)
Collisions are same for CRC32 and Bloom filter
ZGP036
37Evaluation of URL signatures continued
- Results from experiment 4
- Hash chaining in an average of 212 larger
digests than CRC32
Substantially larger then the other methods
ZGP037
38Evaluation of URL signatures continued
- Discussion of results
- CRC32 URL signatures reduce the size of URL lists
and speed-up look-up in a hash table - Require less network bandwidth to transfer
- Require less memory for storage in the URL router
- For CRC32 the number of collisions was found to
be small - CRC32 digests require less CPU and produce same
collisions
ZGP038
39Topics
- Motivation
- Problem and contributions
- URL routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP039
40Evaluation of hashing for URL routing continued
- Look-up time experiments
- Experiment 1 Effect of hash table size on
look-up time (NASA access list) - Experiment 2 Effect of hash table size (in K )
on look-up time (Clark.net access list)
ZGP040
41Evaluation of hashing for URL routing continued
Hash table look-up time for experiment 1
60
50
Simple
40
30
Mean Look-up Time
Aggressive
20
H1
10
0
8
9
10
11
12
13
Hash table Size (K)
For dense hash tables Aggressive is better than H1
ZGP041
42Evaluation of hashing for URL routing continued
Hash table look-up time for experiment 2
40
30
Simple
Mean Look-up Time
20
Aggressive
10
H1
0
8
9
10
11
12
13
K
Similar to experiment 1 results
ZGP042
43Evaluation of hashing for URL routing continued
- Evaluation model (single server queue)
- Response variables
- mean queuing delay
- drop in utilization
Arrivals are URLs to be looked-up
Server is a hash table look
Queued URLs
ZGP043
44Evaluation of hashing for URL routing continued
- Mean queue length experiments
- Experiment 1 Effect of hash table size (K) on
queue length (L) for utilization U 80 (Simple
chain) and exponential arrivals - Experiment 2 Effect of burtiness (Tmax) on L
for U 80 (Simple chain) and K 8 - Experiment 3 Effect of (Tmax) on L for U 80
and K 8 - Experiment 4 Effect of autocorrelation
(unshuffled and shuffled ordering of requests) on
L for U 80 and K 8 - Experiment 5 Effect of autocorrelation
(unshuffled and shuffled ordering of requests) on
L for U 80 (Simple chain) and K 8
ZGP044
45Evaluation of hashing for URL routing continued
Results for experiment 1
6
Simple
5
4
L
3
2
Aggressive
1
H1
0
8
9
10
11
12
13
K
Self-adjusting methods show similar performance
ZGP045
46Evaluation of hashing for URL routing continued
Results for experiment 2
40
Simple hashing
-
value range is
30
5500 to 34000
L
20
H1
10
Aggressive
0
50
100
250
500
750
1000
T
max
H1 shows faster increase in L
ZGP046
47Evaluation of hashing for URL routing continued
Results for experiment 3
120K
H1
80K
L
40K
Aggressive
Simple
0
50
100
250
500
750
1000
T
max
H1 has magnitudes worse queue length
ZGP047
48Evaluation of hashing for URL routing continued
Results for experiment 4
H1 has magnitudes worse queue length
ZGP048
49Evaluation of hashing for URL routing continued
Results for experiment 5
ZGP049
50Evaluation of hashing for URL routing continued
- Discussion of results
- Aggressive hashing improves upon H1 hashing
- Modest look-up time improvement
- Significant improvement from a queueing
perspective - Queueing must be used for evaluating hashing
algorithms - LRD in look-up time of H1 results in extreme
queueing delay - Catastrophic effects on any application
ZGP050
51Topics
- Motivation
- Problem and contributions
- URL routing
- Improvements to URL routing
- Evaluation of URL signatures
- Evaluation of hashing for URL routing
- Summary
- List of my publications
-
ZGP051
52Summary
- In summary, I have address the problem of
- Excessive delay in the Internet caused by the
inability to efficiently access distributed
content in the Web - My work has shown that
- 1) A URL router that uses HTTP redirection is
feasible - CRC32 can be used for digesting of URL routing
tables - Aggressive hashing improves upon existing hashing
algorithms in fast look-up - Queueing behavior needs to be considered when
evaluating hashing algorithms
Four publications have resulted
ZGP052
53List of my related publications
- Z. Genova and K. Christensen, "Managing Routing
Tables for URL Routers in Content Distribution
Networks," submitted to the International Journal
of Network Management in June 2003 - Z. Genova and K. Christensen, Efficient
Summarization of URLs using CRC32 for
Implementing URL Switching, Proceedings of the
27th IEEE Conference on Local Computer Networks
(LCN), pp. 343-344, November 2002 - Z. Genova and K. Christensen, Using Signatures
to Improve URL Routing, Proceedings of IEEE
International Performance, Computing, and
Communications Conference, pp. 45-52, April 2002 - Z. Genova and K. Christensen, Challenges in URL
Switching for Implementing Globally Distributed
Web Sites, Proceedings of the Workshop on
Scalable Web Services, pp. 89-94, August 2000 -
-
ZGP053