URLs and Resources - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

URLs and Resources

Description:

and what the various URLs mean and do. URL Shortcuts that many ... A video streamed by a video server: rtsp://www.cnn.com/headline.rm. Most URLs have the same ' ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 31
Provided by: csie6
Category:

less

Transcript and Presenter's Notes

Title: URLs and Resources


1
URLs and Resources
  • Herng-Yow Chen

2
Outline
  • Navigating the Internets Resources
  • URL syntax
  • and what the various URLs mean and do
  • URL Shortcuts that many web clients support
  • relative URLs
  • and expanded URLs
  • URL encoding and character rules
  • Common URL schemes
  • The future of URLs, including URNs

3
Navigating a resource by URL, which tells a web
client
  • URL scheme how to access the resource
  • Server location where the resource is hosted
  • Resource path what particular local resource
    on the server is being requested

http//english.csie.ncnu.edu.tw/demo/index.html
Web page
Scheme (how)
Host (where)
Path (what)
4
URLs
  • URLs can direct you to resources available
    through other than HTTP.
  • Email account mailtohychen_at_csie.ncnu.edu.tw
  • A file resides on a FTP serverftp//ftp.ncnu.edu
    .tw/a_file.txt
  • A video streamed by a video serverrtsp//www.cnn
    .com/headline.rm
  • Most URLs have the same scheme//server
    location/path structure

5
Navigating a resource by URL, which tells a web
client
  • URL scheme how to access the resource
  • Server location where the resource is hosted
  • Resource path what particular local resource
    on the server is being requested

http//english.csie.ncnu.edu.tw/demo/index.html
Web page
Scheme (how)
Host (where)
Path (what)
6
URL Syntax
  • ltschemegt//ltusergtltpasswordgt_at_lthostgtltportgt/ltpathgt
    ltparamsgt?ltquerygtltfraggt

7
Scheme what protocol to use
  • The scheme is really the main identifier of how
    to access a given resource
  • Must start with an alphabetic character,
  • And it is separated from the rest of the URL by
    the first
  • Scheme names are case-insensitive.

8
Hosts and Ports
  • The host component (IP or Domain Name) identifies
    that host machine on the Internet that has access
    to the resource.
  • The port component identifies the network port on
    which the server is listing.
  • Different services uses different default ports
    for a machine.
  • HTTP 80
  • FTP 21
  • Telnet 23
  • SMTP 25

9
Usernames and Passwords
  • Many servers require a username and password
    before you can access data through them. Here are
    a few examples
  • ftp//ftp.prep.ai.mit.edu/pub/gnu
  • ftp//anonymous_at_ftp.perp.ai.mit.edu/pub/gnu
  • ftp//anonymousmy_passwd_at_ftp.prep.ai.mit.edu/pub/
    gnu
  • http//joejoespasswd_at_www.joes-hardware.com/sales_
    info.txt
  • The default username and password
  • anonymous for username
  • Internet Explorer sends IEUser for password,
    while Netscape send mozilla.

10
Paths
  • The path component of the URL specifies where on
    the server machine the resource lives.
  • The path often resembles a hierarchical
    filesystem path. For example,
  • http//www.csie.ncnu.edu.tw/course/1998.htmlThe
    path in the URL is /course/1998.html, which
    resembles a filesystem path on a UNIX filesystem.
  • The path component for HTTP URLs can be divided
    into path segments separated by / . Each path
    segment can have its own params component
    (described later).

11
Parameters
  • For many schemes, a simple host and path to the
    object just arent enough.
  • Aside from what port the server is listening to
    and even whether or not you have access to the
    resource with a username and password, many
    protocols require more information to work.
  • For example,
  • ftp//ftp.ncnu.edu.tw/image.giftypea
  • ftp//ftp.ncnu.edu.tw/program.exetypei

12
Query strings
  • Some resources, such as database, can be queried
    according to input strings. For example,
  • http//www.xxx.tw/a.cgi?id123nameabc
  • There is no requirement for the format of the
    query component, except that some characters are
    illegal. By convention, many gateways except the
    query to be formatted as a series of namevalue
    pairs, separated by characters.

13
Query Strings
http//english.csie.ncnu.edu.tw/course/NWSMLViewer
.php?lectureidrctlee-20030909125212
lectureidrctlee-20030909125212
Internet
Server
viewer gateway
14
Fragments
  • Some finer resource fragments, such as sessions
    in a large HTML document , can friendly be
    accessed. For example,
  • http//engquiz.csie.ncnu.edu.tw/e-book/html/B001.h
    tmlpage10
  • Because HTTP servers generally deal only with
    entire objects, not with fragments of objects,
    clients dont pass fragments along to servers.
    Namely, the whole object is retreived, but only
    the partial content is displayed.
  • Note that in Range Request feature of HTTP/1.1,
    agents may request byte ranges of objects. (later
    lectures)

15
Fragments
(Fragment is NOT sent to the server) (b)Browser
makes request to http//www.csie.ncnu.edu.tw/hych
en/web_tech/
(a)User selects link to http//www.csie.ncnu.edu.
tw/hychen/web_tech/Resource
Internet
www.csie.ncnu.edu.tw
Client
(c)Server returns entire HTML page
Browser scrolls down to star at named Resource
fragment
(d)Browser displays HTML page starting with named
Resourcefragment
16
URL shortcuts
  • Web clients understand and use a few URL
    shortcuts.
  • Many browsers also support automatic expansion of
    URLs, where the user can type in a key
    (memorable) part of a URL, and the browser fills
    in the rest.
  • Relative URLs
  • Base URLs
  • Resolving relative references
  • Expanded URLs

17
Relative URLs
  • URLs comes in two flavors absolute and relative.
  • So far, we have looked only at absolute URLs, all
    the information you need to access a resource.
  • On the other hand, relative URL is incomplete. To
    get all the information need to access a
    resource, a relative URL must be interpreted on
    the basis of another URL, called its base.

18
HTML snippet with relative URL
  • ltHTMLgt
  • ltHEADgt ltTITLEgt Joes Tools lt/TITLEgt lt/HEADgt
  • ltBODYgt
  • ltH1gt Tools page lt/H1gt
  • ltH2gt Hammers lt/H2gt
  • ltPgt Joes HARDWARE online has the largest
    selection of ltA href ./hammers.htmlgt hammers
    lt/Agt on earth.
  • lt/BODYgt
  • lt/HTMLgt

19
Using a base URL
Relative URL ./hammers.html
Base URL http//www.joes-hardware.com/tools.html
http//www.joes-hardware.com/hammers.html New
absolute URL
20
Base URLs
  • The first step in the conversion process is to
    find a base URL, which can come from a few
    places.
  • Explicitly provided in the resource
  • Use ltBASEgt tag to define the base URL
  • Base URL of the encapsulating resource
  • Does not explicitly specify a base URL.
  • Use the URL of the resource in which the document
    is imbedded as a base, as the example in the
    preceding slide.
  • No base URL
  • In some instances, there is no base URL. This
    often means that you have an absolute URL
    however, sometimes you just have an incomplete or
    broken URL.

21
Resolving relative references
22
Expanded URLs
  • Some browser try to expand URLs automatically,
    either after you submit the URL or while youre
    typing. This provides users with a shortcut
    they dont have to type in the complete URL.
  • Hostname expansion
  • Ex yahho ? www.yahoo.com
  • History expansion
  • Ex http//www.ncnu ? http//www.ncnu.edu.tw

23
Shady characters in URLs
  • URLs were designed to be portable, to uniformly
    name all the resources on the Internet. This
    means that the URLs will be transmitted through
    various protocol.
  • However, because different protocols (schemes)
    use different mechanisms for transmitting, it is
    important for the URLs to be transmitted safely,
    namely without losing information, through any
    protocols over network.
  • Some protocols, such as the Simple Mail Transfer
    Protocol (SMTP) for email, use a 7-bit encoding
    for message this can strip off certain
    characters if the source is encoded in 8 bits or
    more.
  • To get around of this, URLs are permitted to
    contain only characters from a relatively small,
    universally safe alphabet.
  • In addition to the transportable issue, URLs
    should be readable. Hence, some invisible,
    nonprinting characters also are prohibited in
    URLs, even though these character may pass
    through mailers.
  • To complete matter further, URLs also need to be
    complete. One day people wound want URLs to
    contain binary data or characters outside of the
    universally safe of alphabets. So, an escape
    mechanism was added.

24
The URL Character Set
  • US-ASCII is very portable, due to its long
    legacy. It uses 7 bits to represent most keys
    available on an English typewriter and a few
    non-printing control character for text
    formatting and hardware signal. But it doesnt
    support the inflected characters common in
    European languages or non-Romanic language read.
  • Want to contain arbitrary binary data.
  • Use escape sequences allow the encoding of
    arbitrary values using restricted subset of the
    US-ASCII character set, yielding portability and
    completeness.

25
Encoding mechanism
  • Simply represents the unsafe character by an
    escape notation, consisting of a percent sign
    () followed by two hexadecimal digits.
  • For example
  • ? 0x7E, http//www.ncnu.edu.tw/7Ehychen
  • Space-gt 0x20, http//www.abc.com/web20tools.html
  • ? 0x25, http//www.abc.com/10025satisfaction.ht
    ml

26
Character Restrictions
  • escape token
  • / path delimiter
  • . Path component
  • .. Path component
  • fragment delimiter
  • ? Query-string delimiter
  • params delimiter
  • , Reserved
  • _at_ Reserved, special meaning in some scheme
  • \ Restricted, unsafe handling by various
    transport agent, such as gateway
  • ltgt Unsafe should be encoded because they
    often have meaning outside the scope of URL
  • 0x00-0x1F, 0x7F Restricted, fall within
    nonprintable range
  • gt0x7F Restricted, do not fall within 7-bit range
    of US-ASCII

27
Common scheme format
  • http, https
  • mailto
  • ftp
  • rtsp, rtspu
  • file
  • News
  • telnet

28
The Future URN?
Get http//purl.oclc.org/jhardware/
STEP1Ask the resource resolver what the Joes
Hardware URL is. Receive from the resolver the
current location of the resource
Internet
Client
Purl.oclc.org
Actualhttp//www.joes-hardware.com/
STEP2 Get the actual URL for the resource
Get http//www.joes-hardware.com
Internet
Client
www.joes-hardware.com
29
URIUniversal Resource Identifier
  • URIs defined in RFC 1630. (1994)
  • URI is a superset of URL and URN.
  • Full URI proto//hostname/path
  • http//www.csie.ncnu.edu.tw80/hychen/
  • Partial URI /path
  • /hychen/

Identifies the Server
No server mentioned
30
URLs information
  • http//www.w3.org/Addressing/
  • The W3C page about naming and addressing URIs and
    URLs.
  • http//www.ietf.org/rfc/rfc1738.txt
  • RFC 1738, Uniform Resource Locators (URL), by
    T. Berners-Lee, L. Masinter, and M. McCahill.
  • http//www.ietf.org/rfc/rfc2396.txt
  • RFC 2396, Uniform Resource Identifiers (URI)
    Generic Syntax, by T. Berners-Lee, R. Fielding,
    and L. Masinter.
  • http//www.ietf.org/rfc/rfc2141.txt
  • RFC 2141, URN Syntax, by R. Moats.
  • http//purl.oclc.org
  • The persistent uniform resource locator web site.
  • http//www.ietf.org/rfc/rfc1808.txt
  • RFC 1808, Relative Uniform Resource Locators,
    by R. Fielding.
Write a Comment
User Comments (0)
About PowerShow.com