LIS901N lecture 5: http URI and apache - PowerPoint PPT Presentation

About This Presentation
Title:

LIS901N lecture 5: http URI and apache

Description:

415 Unsupported Media Type. 416 Requested range not satisfiable. 417 ... Directives that define the parameters of the 'main' or 'default' server, which ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 75
Provided by: open6
Learn more at: http://openlib.org
Category:

less

Transcript and Presenter's Notes

Title: LIS901N lecture 5: http URI and apache


1
LIS901N lecture 5http URI and apache
  • Thomas Krichel
  • 2003-01-19

2
Structure
  • http
  • URI
  • apache

3
http
  • Stands for the hypertext transfer protocol. This
    is the most important application layer protocol
    on the Internet today, because it provides the
    foundation for the world wide web.
  • defined in Fielding, Roy T., James Gettys,
    Jeffrey C. Mogul, Paul J. Leach, Tim Berners-Lee
    Hypertext Transfer Protocol -- HTTP/1.1''
    (1999), RFC 2616

4
history
  • 1990 version 0.9 allows for transfer of raw
    data.
  • 1996 rfc1945 defines version 1.0. by adding
    attributevalue headers.
  • 1999 rfc 2616 adds support for
  • hierarchical proxies
  • caching,
  • virtual hosts and some
  • support for persistent connections
  • and is more stringent.

5
http resource identification
  • identification of resources is assumed through
    Uniform Resource Identifiers (URI).
  • As far as http is concerned, URIs are string.
  • http can use absolute'' and relative'' URIs.
  • A URL is a special case of a URI.

6
rfc about http
  • An application-level protocol for distributed,
    collaborative, hypermedia information systems.
  • HTTP is also used as a generic protocol for
    communication between user agents and
    proxies/gateways to other Internet systems,
    including those supported by the SMTP, NNTP, FTP,
    Gopher, and WAIS protocols. In this way, HTTP
    allows basic hypermedia access to resources
    available from diverse applications.

7
overall operation client side
  • Client sends request, required items are
  • method
  • request URI
  • protocol version
  • optional items are
  • request modifiers
  • client information

8
overall operation server side
  • Server sends response, required items are
  • status line
  • protocol version
  • success or error code
  • optional items are
  • server information
  • body

9
middleman
  • intermediaries come in three flavors
  • proxies, i.e. forwarding agents
  • gateways, i.e. receiving agents
  • tunnels, i.e. relay points that do not change the
    message such as an encryption and decryption
    device

10
http assumes transport
  • http assumes that there is a reliable way to
    transport data from one host on the Internet to
    another one.
  • All http requests and responses are separate TCP
    connections. The default is TCP port 80, but
    other ports can be used.

11
Absolute http URL
  • the absolute http URL is
  • http//hostportabs_path?query
  • If abs_path is empty, it is /.
  • The scheme name "http" and the host name are
    case-insensitive.
  • Characters other than those in the reserved''
    and unsafe'' sets of RFC 2396 are equivalent to
    their HEX HEX'' encoding.
  • optional components are in

12
character sets
  • A character set is a method used with one of more
    tables to convert a sequence of binary digits
    into a sequence of characters.
  • http shares the same registry as the MIME
    multimedia email extensions. It is based at the
    IANA, at
  • http//www.isi.edu/innotes/iana/
  • assignments/media-types/media-types
  • The default character set is ISO-8859-1.

13
http messages
  • There are two types of messages.
  • Requests are sent form the client to the server.
  • Responses are sent from the server to the client.
  • The generic format is the same as for email
    messages
  • start line
  • message headers
  • empty line
  • body
  • Empty lines before the start line are ignored.
  • The request's start line is called the
    request-line
  • The response start line is called the
    status-line.

14
The request headers
  • Accept Accept-Charset
  • Accept-Encoding Accept-Language
  • Authorization Expect
  • From Host
  • If-Match If-Modified-Since
  • If-None-Match If-Range
  • If-Unmodified-Since Max-Forwards
  • Proxy-Authorization Range
  • Referer TE
  • User-Agent

15
The status line
  • The status line is a set of lines that are of
    the form
  • HTTP-Version Status-Code Reason-Phrase
  • The status code is a 3-digit number used by the
    computer.
  • The reason line is a friendly note for a human to
    read.

16
Status code classe
  • 1 Informational Request received, continuing
    process
  • 2 Success The action was successfully received,
    understood, and accepted
  • 3 Redirection Further action must be taken in
    order to complete the request
  • 4 Client Error The request contains bad syntax
    or cannot be understood
  • 5 Server error The request is valid but can not
    be executed by the server

17
Error codes
  • 100 Continue
  • 101 Switching Protocols
  • 200 OK
  • 201 Created
  • 202 Accepted
  • 203 Non-Authoritative Information
  • 204 No Content
  • 205 Reset Content
  • 206 Partial Content

18
Error codes II
  • 300 Multiple Choices
  • 301 Moved Permanently
  • 302 Found
  • 303 See Other
  • 304 Not Modified
  • 305 Use Proxy
  • 307 Temporary Redirect

19
Error codes III
  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Time-out

20
Error codes IV
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Request Entity Too Large
  • 414 Request-URI Too Large
  • 415 Unsupported Media Type
  • 416 Requested range not satisfiable
  • 417 Expectation failed

21
Error codes V
  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Time-out
  • 505 HTTP Version not supported

22
Response headers
  • Accept-Ranges
  • Age
  • Etag
  • Location
  • Proxy-Authenticate
  • Retry-After
  • Server
  • Vary
  • WWW-Authenticate

23
Entityheaders, common to reponse and request
  • Allow
  • Content-Encoding
  • Content-Language
  • Content-Length
  • Content-Location
  • Content-MD5
  • Content-Range
  • Content-Type
  • Expires
  • Last-Modified

24
The body
  • The entity-body (if any) sent with an HTTP
    request or response is in a format and encoding
    defined by the entity-header fields.
  • When an entity-body is included with a message,
    the data type of that body is determined via the
    header fields Content-Type and Content-Encoding

25
GET and HEAD method
  • The GET method means retrieve whatever
    information (in the form of an entity) is
    identified by the Request-URI. If the Request-URI
    refers to a data-producing process, it is the
    produced data which shall be returned as the
    entity in the response and not the source text of
    the process, unless that text happens to be the
    output of the process.n the response.
  • The HEAD method is identical to GET except that
    the server MUST NOT return a message-body in the
    response.

26
Conditional partial GET
  • The semantics of the GET method change to a
    conditional GET'' if the request message
    includes an
  • If-Modified-Since
  • If-Unmodified-Since
  • If-Match
  • If-None-Match
  • If-Range header
  • The semantics of the GET method change to a
    partial GET'' if the request message includes a
    Range header field. A partial GET requests that
    only part of the entity be transferred

27
The POST method
  • The POST method is used to request that the
    origin server accept the entity enclosed in the
    request as a new subordinate of the resource
    identified by the Request-URI in the
    Request-Line. POST is designed to allow a uniform
    method to cover the following functions
  • Annotation of existing resources
  • Posting a message to a bulletin board, newsgroup,
    mailing list, or similar group of articles
  • Providing a block of data, such as the result of
    submitting a form, to a data-handling process
  • Extending a database through an append operation.

28
PUT and DELETE methods
  • The PUT method requests that the enclosed entity
    be stored under the supplied Request-URI. If the
    Request-URI refers to an already existing
    resource, the enclosed entity should be
    considered as a modified version of the one
    residing on the origin server.
  • The DELETE method requests that the origin server
    delete the resource identified by the Request-URI.

29
URIs (background)
  • URI uniform resource identifier
  • Originally, a generalization of
  • URL (uniform resource locator),
  • URN (uniform resource name),
  • URC (uniform resource citation),
  • and potentially others,
  • but mainly, URL and URN

30
The difference (in theory) between URL and URN
  • a URL is bound to a location
  • when resource moves, url changes
  • a URN is a name
  • thus location independent, and, in theory,
    persistent (whatever persistent means)

31
The Other View
  • Distinction between URL and URN is artificial
  • Both terms should be abolished and replaced by
    URI
  • thus all identifier schemes would be URI
    schemes (even http) and no prefix would be
    necessary (URL, URN, or even URI).

32
Reasoning
  • Original URI philosophy
  • URLs were a short-term solution and URNs
    long-term .
  • URL would be a temporary identification mechanism
    until a location-independent, persistent
    identifier was developed, the URN.
  • Now it seems
  • URNs wont be any more persistent than URLs.
  • persistence is a social problem, not a technical
    problem

33
URI vs URL
  • The term URL or Universal Resource Locator is
    not used in standards anymore. It generally means
    a URI that contains a domain-name but it is
    historical only.
  • This presentation uses the term URI exclusively.
  • The term URL is still sufficient to convey the
    meaning but should not be used when precision is
    necessary.

34
What does a URI identify?
  • A URI identifies a Resource.
  • A URI only comes into existence when it is bound
    to a Resource.
  • A Resource is defined as anything that is
    identified by a URI.
  • Resources only come into existence when a URI is
    bound to it.
  • A URI cannot exist without a Resource.
  • A Resource cannot exist without a URI.

35
it all comes from Plato
  • The URI identifies an abstract Resource
    formalism assumes the Platonic concept of form.
  • A Resource, once bound to a URI and brought into
    existence, is only the abstract essence of the
    real world thing we perceive.
  • Any physical or digital version of that Resource
    is only one of all possible physical
    representations of that Resource.
  • For example, http//openlib.org/home/krichel is a
    URI for a homepage. Using language and content
    negotiation it is possible to request that page
    in many languages and formats. Which version is
    the Resource?
  • Answer none of them. Each is only a
    representation. It is possible to assign a URI to
    even the representations. But even still, each
    Resource is only the abstraction of the physical
    or digital thing, not the thing itself.

36
What is resolution?
  • Resolution means accessing some representation
    of the Resource that a URI identifies.
  • For http//foo.com/ it means accessing the
    homepage of foo.com
  • For mailtokrichel_at_openlib.org it can mean
    sending an email message to that address.
  • For URIs that contain network location
    information it is simply a matter of visiting
    that location and doing some function. I.e.
    foo.com is the exact network host that can give
    you the web page.

37
The history
  • Tim Berners-Lee came to the IETF in 1992 to
    develop the WorldWideWeb standards. At the time
    URIs were known as Universal Resource Locators.
  • RFC 1738 Uniform Resource Locators (URL) was
    published in 1994.
  • RFC 1738 was updated by RFC 1808, RFC 2368, RFC
    2396.
  • RFC 2396 Uniform Resource Identifiers (URI)
    Generic Syntax is the current standard.
  • RFC 2396 may be updated to reflect developments
    in internationalization, terminology updates, and
    registration procedures.

38
Confusion
  • Due to misunderstandings and the formation of the
    W3C separately from the IETF, there was a long
    term disagreement on certain aspects of URIs,
    especially when it came to Uniform Resource Names
    (URNs).
  • A join IETF/W3C URI Interest Group was formed in
    2000 to investigate work that needed to be done
    with URIs in general.
  • That group published URIs, URLs, and URNs
    Clarifications and Recommendations Report from
    the joint W3C/IETF URI Planning Interest Group
    (draft-mealling-uri-ig-01.txt ) which begins to
    clarify the problems and proposes solutions.

39
URN Uniform Resource Names
  • Are defined by RFC 2141 as a particular URI
    scheme with these characteristics
  • Permanent Once a URN is assigned to some
    Resource it can never be re-assigned to something
    else.
  • Location Independent The actual URN should not
    contain any network location information such as
    domain-names, IP addresses, file path-names, etc.

40
RFC2396
  • Berners-Lee, Tim Roy T. Fielding and Larry
    Masinter (1998) Uniform Resource Identifiers
    (URI) Generic Syntax'', rfc2396
  • A Uniform Resource Identifier (URI) is a compact
    string of character for identifying an abstract
    or physical resource.
  • They provide a simple and extensible means for
    identifying a resource.

41
operations on a URI
  • There is a set of operations that can be applied
    to URIs. For example, for a URL, the access to
    the resource.
  • To understand if a given URI instance is valid,
    we have to study the operations applied to URIs.

42
benefits of uniformity
  • It allows different type of resource identifiers
    to be used in the same context, even when the
    mechanisms used to access those resources may
    differ
  • it allows uniform semantic interpretation of
    common syntactic conventions across different
    types of resource identifiers
  • it allows introduction of new types of resource
    identifiers without interfering with the way that
    existing identifiers are
  • it allows the identifiers to be reused in many
    different contexts, thus permitting new
    applications or protocols to leverage a
    pre-existing, large, and widely-used set of
    resource identifiers.

43
Resources and Identity in the RFC
  • A resource can be anything that has identity.
    Not all resources are network retrievable''.
    The resource is the conceptual mapping to an
    entity or set of entities, not necessarily the
    entity which corresponds to that mapping at any
    particular instance in time.
  • An identifier is an object that can act as a
    reference to something that has identity. In the
    case of URI, the object is a sequence of
    characters with a restricted syntax.

44
URI, URL, URN in the RFC
  • A URI can be further classified as a locator, a
    name, or both. The term Uniform Resource
    Locator'' (URL) refers to the subset of URI that
    identify resources via a representation of their
    primary access mechanism (e.g., their network
    location), rather than identifying the resource
    by name or by some other attribute(s) of that
    resource.
  • The term Uniform Resource Name'' (URN)
    refers to the subset of URI that are required to
    remain globally unique and persistent even when
    the resource ceases to exist or becomes
    unavailable.

45
URN in the RFC
  • A URN differs from a URL in that it's primary
    purpose is persistent labeling of a resource with
    an identifier. That identifier is drawn from one
    of a set of defined namespaces, each of which has
    its own set name structure and assignment
    procedures. The urn scheme has been reserved
    to establish the requirements for a standardized
    URN namespace, as defined in URN Syntax
    RFC2141 and its related specifications.

46
transcribability
  • The URI syntax was designed with global
    transcribability as one of its main concerns. A
    URI is a sequence of characters from a very
    limited set, i.e. the letters of the basic Latin
    alphabet, digits, and a few special characters.
    A URI may be represented in a variety of ways.

47
consequences of transcribability
  • A URI is a sequence of characters, which is not
    always represented as a sequence of octets.
  • A URI may be transcribed from a non-network
    source, and thus should consist of characters
    that are most likely to be able to be typed into
    a computer, within the constraints imposed by
    keyboards (and related input devices) across
    languages and locales.
  • A URI often needs to be remembered by people, and
    it is easier for people to remember a URI when it
    consists of meaningful components.

48
URI characters
  • URI consist of a restricted set of characters,
    nota sequence of octets. The allowable characters
    primarily chosen to aid transcribability and
    usability both in computer systems and in
    non-computer communications. Characters used
    conventionally as delimiters around URI are
    excluded.
  • In the simplest case, the original character
    sequence contains only characters that are
    defined in US-ASCII, and the two levels of
    mapping are simple and easily invertible each
    'original character' is represented as the octet
    for the US-ASCII code for it, which is, in turn,
    represented as either the US-ASCII character.

49
reserved characters
  • Many URI include components consisting of or
    delimited by, certain special characters. These
    characters are called reserved'', since their
    usage within the URI component is limited to
    their reserved purpose. If the data for a URI
    component would conflict with the reserved
    purpose, then the conflicting data must be
    escaped before forming the URI.
  • they are / ? _at_ ,
  • They are allowed within a URI, but which may not
    be allowed within a particular component of the
    generic URI syntax.

50
unreserved excluded characters
  • Those are the characters that are allowed and
    never take any special meaning. They are
  • the upper and lowercase letters a to z and A to
    Z
  • the decimal digits 0 to 9
  • the following - _ . ! ( )
  • All characters that are not reserved or
    unreserved are excluded
  • lt gt
  • and the blank
  • are excluded. They have to be escaped.

51
escaping
  • When you want to use a character in a URI that
    not one of the excluded characters, you have to
    escape it The way that this done is to write a
    construction of the form
  • hex hex
  • where hex is a digit or the letters a to f
    (uppercase or lowercase). The two hex characters
    represent the value of the character in unicode
    in hex. For example 7eis the character

52
The Semantic Web
  • The W3C has been developing a new architecture
    that applies knowledge representation technology
    to the WWW.
  • Using the Resource Description Framework (RDF),
    Statements are made using a Subject, Predicate
    and Object (very similar to Lisp and other
    predicate based languages).
  • Each Subject, Predicate or Object are Resources
    in the URI sense and are identified by URIs
    within an RDF Statement using XML Namespaces.

53
example
  • This statement says that the Resource identified
    by the URI http//openlib.org/home/krichel was
    created by the person Thomas Krichel
  • lt?xml version"1.0"?gt ltRDF xmlns"http//www.w
    3.org/1999/02/22-rdf-syntax-ns"gt ltDescription
    about"http//openlib.org/home/krichel"gt ltCreator
    xmlns"http//description.org/schema/"gtOra
    Lassilalt/Creatorgt lt/Descriptiongt lt/RDFgt

54
The Semantic Web
  • The combination of Web Services and the Semantic
    Web should give the Web the ability to turn any
    existing Web Resource into a full node in a
    purposefully built knowledge representation
    system with a functional component that allows
    that knowledge to be acted on.
  • And both are based on the simple Uniform Resource
    Identifier.

55
Apache
  • Is a free, open-source web server that is
    produced by the Apache Software Foundation, see
    http//www.apache.org
  • It has over 50 of the market share.
  • It runs best on UNX systems but can run an a
    Mickeysoft OS as well.
  • I will cover it here because it is freely
    available.
  • I am covering version 1.3

56
Apache in debian
  • /etc/apache/httpd.conf in set main configuration
    file.
  • /etc/init.d/apache action, where action is one of
  • start
  • stop
  • Restart
  • is used to fire the daemon up or down.
  • The daemon runs user www-data

57
Virtual host
  • On a single installation of apache serveral web
    servers can be supported.
  • That means the server can behave in a different
    way according to how it is being addressed.
  • The easiest way to implement addressing a server
    in different was is through DNS host names.

58
Directives in httpd.conf
  • The configuration directives are grouped into
    three basic sections
  • Directives that control the operation of the
    Apache server process as a whole (the 'global
    environment').
  • Directives that define the parameters of the
    'main' or 'default' server, which responds to
    requests that aren't handled by a virtual host.
    These directives also provide default values for
    the settings of all virtual hosts.
  • Settings for virtual hosts, which allow Web
    requests to be sent to different IP addresses or
    hostnames and have them handled by the same
    Apache server process.

59
Server type
  • On a UNX machine, the server can either be fired
    up on its own, or it can be run as part of the
    overall Internet daemon inetd.
  • Usually standalone is used.

60
Server root
  • Sets the directory where apache finds its own
    configuration files.
  • If log files names are not given as absolute
    paths, they will be placen in the server root
    directory.

61
Timeout
  • This set s the number of seconds that the server
    waits for the result of a request to be comupted
    before sending a timeout.
  • On wotan this is set to 300 seconds, this is
    rather a long time, the user will have gone for
    coffee by then.

62
Listen
  • Tells the server which port and ip address to
    listen to. This can be used to have the server
    only to respond to requests to a certain IP
    address or to listen to a non-standard port, i.e.
    Not port 80

63
Loadmodule
  • To extend apache, modules have written. They have
    to be loaded explicitly
  • LoadModule module file
  • Where module is the name of the module and file
    is the name of the file that contains the module
  • Looking at this gives you vital information about
    what the server can do.

64
Server directives
  • User
  • Gives the user name apache runs under
  • Group
  • Gives the group name the server runs under
  • ServerAdmin
  • Email of a human who runs the default server
  • ServerName
  • The name of the default server
  • DocumentRoot
  • The top level directory of the default server

65
Directory options
  • Many options for a directory can be set with
  • ltdirectory namegt instructionsltdirectorygt
  • Name is the name of a directory.
  • Instructions can be a whole lot of stuff

66
Directory instructions
  • Options sets global options for the directory, it
    can be
  • None
  • All
  • Or any of
  • Indexes (form directory indexes?)
  • Includes (all server side includes?)
  • FollowSymlinks (allow to follow server-side
    includes)
  • ExecCGI (allow cgi-scripts?)
  • MultiViews

67
Access control
  • Can be part of ltdirectorygt to set directory level
    access control
  • Example
  • Allow from friendly.com
  • Deny from evil.com
  • Sometimes you have to set the order, example
  • Order allow, deny

68
Authentication
  • This is used to enable password access. In that
    case the authentication is handled by a file
    .htaccess in the directory.
  • The AllowOverride instruction is used to state
    what the user can do within the .htaccess file.
    Depending on its values, you can password protect
    a web site.
  • We will not discuss this further here.

69
Userdir
  • This sets the directory that is created by the
    user in her home directory to be accessed by
    requests to user.
  • On wotan, we have
  • UserDir public_html
  • That is the default, actually.

70
Set up permission for user home directories
  • ltDirectory /home//public_htmlgt
  • AllowOverride FileInfo AuthConfig Limit
  • Options Includes
  • Options MultiViews Indexes SymLinksIfOwnerMatc
    h IncludesNoExec
  • ltLimit GET POST OPTIONS PROPFINDgt
  • Order allow,deny
  • Allow from all
  • lt/Limitgt
  • ltLimit PUT DELETE PATCH PROPPATCH MKCOL COPY
    MOVE LOCK UNLOCKgt
  • Order deny,allow
  • Deny from all
  • lt/Limitgt
  • lt/Directorygt

71
Logs
  • The web server logs every transaction.
  • The are severeal types of logs that used to be
    kept separately, in early days.
  • 209.73.164.50 - - 26/Jan/2003091951 -0500
    "GET /ramon/videos/ntsc175.html
  • HTTP/1.1" 206 808
  • Additional information may be kept in the referer
    and user agent log.
  • The referer log may have some interesting
    information on who links to your pages.

72
Alias
  • Is a directive to make links between things that
    are seen at the URL level and the file structure
    on the physical machine.
  • Example
  • Alias /home/krichel/stuff /stuff
  • Will show the content of /home/krichel/stuff at
    the url http///stuff.
  • Scriptalias works in the same way but allows for
    scripts to be executed.

73
Virtural hosts
  • Most apache directive can be wrapped in a
    ltvirtualhostgt lt/virtualhostgt grouping.
  • This implies that the only hold for the virtual
    host. Example, from wotan
  • ltVirtualHost gt
  • ServerAdmin krichel_at_openlib.org
  • DocumentRoot /home/connect/public_html
  • ServerName connections2003.liu.edu
  • ErrorLog /var/log/apache/connections2003-error
    .log
  • CustomLog /var/log/apache/connectios2003-acces
    s.log common
  • lt/VirtualHostgt

74
http//openlib.org/home/krichel
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com