Internet Internals - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Internet Internals

Description:

www.ebay.com ... same first-level domain, must have a unique name (we can't have two www.ebay.com) ... http://news.bbc.co.uk/1/hi/technology/4482292.stm ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 30
Provided by: SBr54
Category:
Tags: co | ebay | ebaycouk | internals | internet | uk

less

Transcript and Presenter's Notes

Title: Internet Internals


1
Internet Internals
  • November 30, Unit 11

2
HTTP How Files are Transferred
  • Back at the beginning of the course we went over
    how our browser requests and receives web pages
    from a server
  • Browser asks the server for aq particular page,
    then the server sends the HTML
  • Your browser then realizes it needs an image,
    which it then requests from the server
  • The image is sent and the web page is complete

3
HTTP In More Detail
  • The previous example is very simplified
  • The conversation between your browser and the
    server must follow strict conventions
  • We have the URL http//www.sfu.ca/about/index.ht
    ml
  • The browser must first determine the server it
    needs to contact (just like in simplified
    example)
  • In this case www.sfu.ca

4
HTTP Request
  • Once it has the web server, it must send an HTTP
    request
  • GET /about/index.html HTTP/1.1
  • Host www.sfu.ca
  • The first line is the request line
  • Tells the server that we want the page
    /about/index.html
  • The web server takes this request and finds the
    file that we want
  • If it is a regular web page, it sends it
  • If it is a cgi script, it executes it

5
HTTP Request cont.
  • GET /about/index.html HTTP/1.1
  • GET is the HTTP method that tells the server that
    we want to retrieve a page
  • There are other HTTP methods
  • HTTP/1.1 tells the server that we are speaking
    version 1.1 of HTTP
  • Host www.sfu.ca
  • This is an example of an HTTP header
  • There can be many headers
  • Host here gives the server name
  • Why? This server might respond to requests for
    more than one domain

6
HTTP Server Response
  • Remember that when the server receives a request
    for a file, it must provide a response
  • In the simplified version we just said ok
  • HTTP/1.1 200 OK
  • Date Wed, 30 Nov 2006 133000 GMT
  • Last-Modified Mon, 28 Nov 2005 122723 GMT
  • Server Apache/1.3.29 (Unix)
  • Content-Type text/html

7
HTTP Server Response, cont.
  • HTTP/1.1 200 OK
  • Status Line
  • HTTP version 1.1 response
  • 200 OK indicates the page has been found and is
    being sent
  • Date
  • Gives the current time
  • It is a header
  • Last-Modified
  • Says when the file was last modified
  • Is also a header

8
HTTP Server Response, cont.
  • Server Apache/1.3.29 (Unix)
  • Header
  • Indicates the type of web server software the
    server is running
  • Apache is the most common on the Internet running
    approximately 2/3 of web servers
  • Remember, one of the biggest differences between
    a home PC and a server is the software that runs
    on the computers

9
HTTP Server Response, cont.
  • You should be familiar with the rest of the
    server response
  • Content-Type text/html
  • Indicates the MIME type
  • We generate this header manually when writing our
    Python script
  • The response when requesting an image is similar
  • MIME type is image/jpeg
  • Also specify the length of the content
  • Content-length 9325

10
Caching
  • We are already familiar with the concept of
    caching from web crawlers
  • But what about caching on your own computer?
  • If you request the same page twice within a short
    period of time, should you reload the entire
    page?
  • What if all the pages you are looking at have the
    same image on it. Should you keep downloading it
    repeatedly?

11
Caching on Home Computer
  • Caching involves storing a copy of the web page
    locally on your machine
  • Decreases time to reload the page
  • Decreases your network traffic
  • But what if a web page changes between visits?
  • Your browser can check the last-modified date to
    determine if it needs to download a fresh copy
  • Also, your browser will only assume pages will
    not change for a few hours
  • After that will automatically get a new page

12
Caching, cont.
  • Web-scripts do not usually generate a
    last-modified header because typically their
    content changes frequently
  • Caching is important for speed and you should
    keep in mind
  • Using one style sheet for all of your pages is
    preferable to using separate style sheets for
    each page
  • Style sheet only must be downloaded once for an
    entire site
  • The same is true for images
  • Using the same logo image on all pages reduces
    download time

13
ISP Caching
  • Your ISP may also cache pages which you and other
    users visit
  • Why?
  • If the page you need is already cached on their
    servers, you reduce their network traffic to the
    outside Internet which reduces their expenses
  • Remember that you are probably connected to a
    single gateway run by your ISP. If it can get the
    page you need without having to contact the
    actual server hosting the page
  • You get your page faster
  • They save money
  • Cache servers are invisible to you

14
Redirects
  • If you move your web page from one place to
    another either on the same server or on another
    server, it can break the links to your pages
  • You can use a redirect to fix this so that users
    do not get a page not found error message or a
    silly our site has moved to www.mypages.com.
    Please update your bookmarks
  • Basically a redirect automatically tells your
    browser the new location of the page
  • Invisible to you
  • You dont have to do anything
  • Neither does your browser have to do anything
    special

15
Content Negotiation
  • Not all browsers and users are the same
  • Not all can display SVG images or Unicode
    characters
  • Each person browsing the web has a set of the
    types of files they think are acceptable
  • Because of this, your browser and the web server
    can exchange information about which files are
    acceptable and send the best version
  • This process is called content negotiation

16
Content Negotiation, cont.
  • There are three areas in which your browser and
    the server can exchange information about
  • File type
  • Language
  • Character set
  • File type
  • Browser sends a list of the MIME types it can
    handle internally and usually accepts any other
    type of file if thats all thats available
  • Ex. Browser might prefer html but only PDF is
    available. It will try to find an external
    program to handle the PDF file (like Acrobat)
  • Ex. Browsers that can handle SVG images will
    accept them, otherwise the server will send a PNG

17
Content Negotiation Language
  • Browser has a list of languages the user can read
    in order of preference
  • Server tries to send the best page
  • The one that the browser can read
  • And the user can read
  • Means you can have multilingual sites so that
    users who speak different languages get different
    pages
  • Requires no effort on the part of the browser
  • No translation is required

18
Content Negotiation, cont.
  • Character Set
  • Not all browsers can handle Unicode for example
  • Also other specific character sets that browsers
    may not be able to read
  • Why use content negotiation?
  • Different users can view the same site tailored
    to their preferences and capabilities

19
Implementing Content Negotiation
  • Once the server is capable of content negotiation
    it requires little effort on the part of the
    developer
  • The only major change is that files must be
    linked without their extensions
  • Instead of page.html, we just link to page
  • The most appropriate file is sent
  • Not used online much because people arent aware
    of it

20
DNS
  • Whenever we browse online, we are using Dynamic
    Name Servers (DNS) without probably realizing it
  • What do Name Servers do?
  • Tell us exactly where to find the server with a
    given name
  • Most websites have a human-readable name
  • www.sfu.ca
  • www.ebay.com
  • www.wikipedia.com

21
DNS, cont.
  • Human-readable names are great for us because
    they are easy to remember
  • But computers identify themselves and each other
    according to their IP address
  • 216.30.176.2
  • 192.168.1.1
  • These are much harder to remember
  • We use Dynamic Name Servers to translate the
    human-readable name to the IP address of the
    server we want

22
DNS as a Database
  • The DNS system is the worlds largest database
  • No other database gets as many requests per day
    as the DNS system (billions)
  • It is a distributed system
  • No one DNS server has the address of every other
    server on the planet
  • There are lots of DNS servers
  • Every day some IP addresses and/or domain names
    change
  • Every day new domain names are added

23
Domain Names
  • We are familiar with domain names
  • www.google.ca
  • www.google.com
  • www.mit.edu
  • www.cs.sfu.ca
  • Top-level or first-level domains are the last bit
    of the domain name we are familiar with
  • .ca, .com, .net, .mil, .edu , .biz, .de, etc.
  • Hundreds of these

24
Second-Level Domains
  • Within each top-level domain we can have any
    number of second-level domains
  • www.yahoo.com
  • www.google.com
  • www.ebay.com
  • Each second-level domain, within the same
    first-level domain, must have a unique name (we
    cant have two www.ebay.com)
  • And, www.yahoo.com and www.yahoo.ca are in
    different domains
  • .com and .ca are different top-level domains

25
How DNS is Handled
  • Because each domain name must be unique , some
    central entity must control who gets which domain
    names
  • For instance, the company Verisign handles the
    .com top-level domain
  • Does not maintain a list of every single server
    in the .com domain
  • Too many
  • But it administrates the .com domain

26
Getting a Domain
  • When you want to register a domain name, you have
    to go through a company/organization that handles
    this sort of thing
  • Many registrars (Network Solutions is best known)
  • They work with the company who administrates the
    top-level domain you want to be a part of
  • Network solutions maintains a database of all of
    the domains they have registered and to whom they
    are registered

27
ICANN
  • So how does a company like Verisign get to
    administrate a HUGE top-level domain like .com?
    (and .net)
  • They get a contract from ICANN
  • ICANN International Corporation for Assigned
    Names and Numbers
  • They handle IP distribution as well as who
    administrates the top-level domains
  • They are non-profit but were given the contract
    to do this work from the US government where no
    competition for the contract was allowed

28
ICANN, cont.
  • So why the info about ICANN?
  • Recently in the news concerning a deal with
    Verisign breaking anti-trust laws
  • http//news.bbc.co.uk/1/hi/technology/4482292.stm
  • Most people around the world do not like the idea
    of the US controlling the Internet contracts like
    this
  • Part of recent talks at an Internet Summit
  • Many people want an independent UN body to handle
    this
  • So why does the US (or a US company have
    control?)
  • The early Internet was largely built and funded
    by the US government
  • They do not want to lose control of something
    they see as something they invented
  • BTW, Al Gore did NOT invent the Internet

29
Questions?
Write a Comment
User Comments (0)
About PowerShow.com