Title: Caching and Content Distribution Networks
1Caching and Content Distribution Networks
2Web Caching
- As an example, we use the web to illustrate
caching and other related issues
3Web Browser Caching
- Web browsers have their own caches. When a page
is downloaded from a site the web page is put
into the browser cache. - This is especially useful in those cases when the
back button is pressed. - If a new copy is needed then a refresh can be
done. - No page stays permanently in the cache. There is
limited room. - A replacement algorithm is needed to determine
which cached page should be purged.
4Why Web Server Caching
- Latency
- Reduce latency
- Request does not require going to the server
- Request is served from the client side which
means that network communication is avoided - Reduce traffic
5Consistency
- What if the page changes after saved in the
cache? - This means that cached copy is out of date
- The copy and the original are not consistent
- There are different strategies for dealing with
this
6Web Browser Caching
- Client pull
- The server provides the content with instructions
on when the client should ask for a refreshed
copy of the content or if the content should be
cached. - Server push
- The server transmits page information to the
screen. - The browser application displays the information
and leaves the connection to the server open. - With an open connection, the server can continue
to push updated pages for your screen to display
on an ongoing basis. You can close the connection
by closing the page. - The server is in control
- Browser caches are different from proxy caches
(discussed next).
7Web Caching
- Proxy caches (also called proxy server)
- Intercepts HTTP requests from client
- Serves object if in its cache and the date is
still valid - If not go to objects home server
- On behalf of user, gets the object and possibly
deposits in its cache before returning to user - Usually deployed at edges of a network
- Wide area bandwidth savings, improved response
time and increased availability of static
web-based objects - A browser may have to be configured to point to
the proxy server. - Usually a proxy cache is purchased and installed
by an organization
8Web Caching
- Not all web pages can be cached
- If the Last-Modified tag then page can be cached
- Refresh is often done when
- There is a request and
- Expiry time has passed
9Cooperative Caching
- Caching infrastructure can have multiple web
proxies - Proxies can be arranged in a hierarchy or other
structures - Proxies can cooperate with one another
- Answer client requests
- Propagate server notifications
- Uses a combination of HTTP and ICP (Internet
Caching Protocol). - ICP can be used by one cache to quickly ask
another cache if it has an object. - HTTP is used to actually retrieve the object.
10Problems
- Caching proxies do not serve all Internet users
- Content providers (say, Web servers) cannot rely
on existence and correct implementation of
caching proxies. - Accounting issues with caching proxies
- Example www.cnn.com needs to know the number of
hits to the advertisements displayed on the web
page.
11Content Distribution Networks (CDN)
- Business Model A content provider such as
www.cnn.com or Yahoo pays a CDN company (such as
Akamai) to get its content to the requesting
users with short delays. - A CDN provides a mechanism for
- Replicating content on multiple servers in the
Internet - Providing clients with a means to determine the
servers that can deliver the content fastest.
12Terminology
- Content Any publicly accessible combination of
text, images, applets, frames, MP3, video, flash,
virtual reality objects, etc. - Content Provider Any individual, organization,
or company that has content that it wishes to
make available to users. - Origin Server Content providers server , where
the content is first uploaded. - Surrogate Server (sometimes called edge server)
Content distributors server, where the
replicated content is kept. -
13Players
Yahoo, MSNBC, CNN CBC
Content Provider
Send content
Akamai,
Content Distributor
Sells servers
Install servers
H/W and S/W Vendor
Hosting Provider
Bell
14CDN Distribution
14
- Content providers are CDN customers
- Content replication
- CDN company installs thousands of servers
throughout Internet - In large datacenters
- Or, close to users
- CDN replicates customers content
- When provider updates content, CDN updates
servers
origin server in North America
CDN distribution node
CDN server in S. America
CDN server in Asia
CDN server in Europe
15CDN Functional Components
- Distribution Service
- Redirection Service
- Accounting and Billing system
16CDNDistribution Service
- The content provider determines which of its
objects it wants the CDN to distribute. - The content provider tags and then pushes this
content to a CDN node, which in turn replicates
and pushes the content to all its CDN servers.
17CDN Redirection
- When a browser in a users host is instructed to
retrieve a specific object (specified using a
URL), how does the browser determine whether it
should retrieve the object from the origin server
or from one of the CDN servers? - As an example, suppose the hostname of the
content provider is www.cnn.com
18How Akamai Works
18
cnn.com (content provider)
DNS root server
GET index.html
Akamai cluster
Akamai global DNS server
http//a.73.g.akamai.net/7/23/cnn.com/af/cnn.com/f
oo.jpg
1
2
HTTP
Akamai regional DNS server
Nearby Akamai cluster
19CDN Redirection
- Users get an html document from www.cnn.com this
could be index.html - The file index.html uses a modified URL for
content that has been replicated. - Example If the jpeg files are what has been
replicated then - ltimg srchttp//cnn.com/af/foo.jpggt
- may be modified as follows
- ltimg srchttp//a73.g.akamai.net/7/23/cnn.com/af/f
oo.jpggt - The browser needs to resolve a73.g.akamai.net
hostname for replicated content.
20CDN Redirection
- What does this mean?
- ltimg srchttp//a73.g.akamai.net/7/23/cnn.com/af/f
oo.jpggt - host part a73.g.akamai.net
- Akamai control part /7/23
- Content URL /af/foo.jpg
21CDN Redirection
- DNS is configured so that all queries about
g.akamai.net that arrive at a DNS server are sent
to an authoritative DNS server for g.akamai.net. - This is referred to as a Akamai DNS server
(authoritative DNS server)
22How Akamai Works
cnn.com (content provider)
DNS root server
DNS lookup cache.cnn.com
Akamai cluster
Akamai global DNS server
3
1
2
4
ALIAS g.akamai.net
Akamai regional DNS server
Nearby Akamai cluster
23CDN Redirection
- DNS is configured so that all queries about
g.akamai.net that arrive at a DNS server are sent
to an authoritative DNS server for g.akamai.net.
This is referred to as a Akamai DNS server
(authoritative DNS server) - When the Akamai DNS server receives the query, it
extracts the IP address of the requesting
browser. - .
24How Akamai Works
P
cnn.com (content provider)
DNS root server
DNS lookup g.akamai.net
Akamai cluster
Akamai global DNS server
5
3
1
2
6
4
Akamai regional DNS server
ALIAS a73.g.akamai.net
Nearby Akamai cluster
25CDN Redirection
- Based on the IP address and information that it
has about the Internet (called a map), the IP
address of an Akamai regional server is returned
to the requesting browser based on policy - e.g., select the server that is the fewest hops
away. - The regional server may choose a surrogate server
for content retrieval
26How Akamai Works
HTTP
cnn.com (content provider)
DNS root server
Akamai cluster
Akamai global DNS server
5
3
1
2
6
4
Akamai regional DNS server
7
DNS a73.g.akamai.net
8
Address 1.2.3.4
Nearby Akamai cluster
27How Akamai Works
HTTP
cnn.com (content provider)
DNS root server
Akamai cluster
Akamai global DNS server
5
3
1
2
6
4
Akamai regional DNS server
7
8
Nearby Akamai cluster
9
GET /foo.jpg Host cache.cnn.com
28How Akamai Works
HTTP
cnn.com (content provider)
DNS root server
GET foo.jpg
11
12
Akamai cluster
Akamai global DNS server
5
3
1
2
6
4
Akamai regional DNS server
7
8
Nearby Akamai cluster
9
GET /foo.jpg Host cache.cnn.com
29CDN Redirection
- The Akamai DNS server IP address is now in the
cache of the local DNS server. - This implies that it is not always necessary to
go to the root DNS server. - The TTL associated with the IP address of an
Akamai server(surrogate) is relatively small. - This is done for performance reasons.
- Akamai content distribution servers are caches
30CDN Redirection
- What if content is not there?
- If the request content is not found then the
surrogate will ask other surrogates within a
specified region for information. - If requested information is still not found or is
stale, then a request is made to the original web
site.
31CDN Selection
- The tricky issue is selecting which local content
server to use for a particular request - Want to spread load evenly
- Want minimal impact if server is added or
removed. - In Akamai, each surrogate server sends
measurement results to the Network Operations
Communications Center (NOCC). - Measurement results include number of active TCP
connections, HTTP request arrival rate, bandwidth
availability, etc - This information is used by the Akamai DNS
server.
32Accounting Mechanism
- Accounting mechanisms collect and track
information related to request routing,
distribution and delivery. - Information is gathered in real time and put into
log files for each CDN component. -
- This gets sent to the Network Operations
Communications Center (NOCC).
33Full Site Delivery vs. Partial Site Delivery
- Full Site Delivery All the contents are
delivered by the CDN (including HTML, images,
and other objects). - Partial Site delivery Only images, streaming
media and other bandwidth intensive objects
delivered by the CDN.
34Current Akamai Customers
35Summary
- We have examined replication and issues related
to the design and implementation of a replicated
system. - Many choices and tradeoffs to consider