Title: LIS650lecture 1 Major HTML
1LIS650 lecture 1Major HTML
- Thomas Krichel
- 2004-10-02
2structure
- It's not just about HTML
- web
- web server
- markup
- XML
- HTML
- fairly general but abstract
3literature
- I work from the text of the official standard at
http//www.w3.org/TR/html4/ - To work with it faster, I made a copy at
http//wotan.liu.edu/krichel/html4/ - You can work from any HTML book.
4The world wide web
- The World Wide Web (Web) is a network of
information resources. The Web relies on three
mechanisms to make these resources readily
available to the widest possible audience - A uniform naming scheme for locating resources on
the Web (i.e. URIs). - Protocols, for access to named resources over the
Internet (e.g., http). - Hypertext, for easy navigation among resources
(e.g., HTML).
5URI introduction
- Every resource available on the Web -- HTML
document, image, video clip, program, etc. -- has
an address that may be encoded by a Universal
Resource Identifier, or "URI". - URIs typically consist of three pieces
- The name of the mechanism used
- to access the resource
- or the otherwise resolve it
- The name of the machine hosting the resource.
- The name of the resource itself, given as a path.
6example URI
- http//openlib.org/home/krichel
- This URI may be read as follows There is a
document available via the HTTP protocol,
residing on the site openlib.org, accessible via
the path "/home/krichel". - mailtokrichel_at_openlib.org
- This URI may be read as follows There is
email user krichel in a domain openlib.org to
whom email may be sent.
7Internet application protocols
- On the Internet machines use different
application level protocols to do things - Common protocols include
- http -- dns --telnet
- smtp -- ssh --ftp
- All of the ones cited are client/server protocols
- client issues a request
- server gives a response
- All of them use a different port. A port is a
number that tells the machine what to do with the
incoming stream of data.
8http
- The web operates mostly on http.
- The client software is run on the local PC that
you are using, called - a web browser (not politically correct)
- a user agent (that's better)
- Our server is a piece of hardware called
wotan.liu.edu, wotan for short - It runs the Debian GNU/Linux operating system on
a Intel architecture. - It provides http daemon software that serves http
requests. The particular software is called
Apache.
9main features of http
- http is insecure. the contents of http
transactions (requests/responses) can be observed - http is stateless. each transaction is
self-contained and has no relationship to the
previous one. - http has a limited vocabulary of requests and
responses. It is no good, say, to operate a
machine remotely. - We can therefore not use it communicate with the
server.
10working with a remote machine
- There are two traditional ways to work with a
remote machine - issue commands to it
- used to be done with telnet
- transfer files to and from it
- used to be done with ftp
- Telnet and ftp servers are not available on
wotan.liu.edu. Telnet and ftp do not encrypt the
communication stream. Therefore they are not
secure.
11communication with wotan
- The protocol that we use for communicating with
the server is the secure shell, short ssh. It is
based public-key cryptography. - There are two PC programs commonly used as ssh
clients - putty for issuing commands
- winscp for file transfer.
- winscp is the one we will use. In offers a range
of other facilities besides file transfer.
12registration time
- As part of the course, you are being provided
with web space on the server wotan.liu.edu, at
the URL - http//wotan.liu.edu/username
- where username is a user name that you will
chose now. - It is my intention to maintain this web space for
you into the foreseeable future. - You should also choose a password, now.
- I will now register you.
13free software
- I maintain wotan.liu.edu server but you can build
your own server if - you have Internet access
- you have an old PC to spare
- All the server software, as well as putty and
winscp are free, open-source. - It is one of my fundamental beliefs that free
information should run on free software. - The library community can learn a hell of a lot
from the free software community. - See my talk at http//openlib.org/home/krichel/
- presentations/new_york_2003-11-07.ppt
14installing winscp
- http//winscp.sourceforge.net/eng/download.php
has - installation package. for use if you have
administrator rights on the machine where you are
installing to - application. for use otherwise, i.e. to just
download and run the application - at installation time, when/if asked about the
default interface, I suggest you use Windows
explorer style, rather than the default Norton
commander style . You can change that later, so
no panic.
15other stuff installing user agents
- Download and install a recent version of at least
two browsers. I suggest - Mozilla Firefox at http//www.mozilla.org/products
/firefox/ - Netscape Navigator at http//channels.netscape.com
/ns/browsers/download.jsp - Opera at http//www.opera.com
16open a wotan session
- start winscp
- the host name is wotan.liu.edu
- give your user name
- click on save, this will save the session,
after ok - you will be lead to the list of saved sessions
- double click to open the session
- Note
- you can save the password as part of the session
- it is risky to do that in a public classroom
17initial remote files on wotan
- a set of files starting with a dot.
- These are places where Linux Masters exert their
black magic. - Leave them alone.
- a directory called public_html
- This is the place where web masters exert their
magic. you can go into that directory to see the
files that you have on your web site at the
moment. - There should be two files
- empty.html
- validated.html
18public_html
- Imagine you are user user and you have a file
file in public_html. - The web server will map requests to
http//wotan.liu.edu/user/file to show the file
public_html/file. - Here user stands for your user id, and file is
the file name, and / is the directory
separator. - If file ends with .html or .htm the web
browser will be told that the file is a HTML
file. It will be rendered accordingly by the
browser.
19index.html
- The web server on wotan will map requests to
http//wotan.liu.edu/user to show the file
public_html/index.html - If this file is not there, the server will
prepare a html document from the list of files
that it finds in the directory and send it to the
user agent. - Once you have a file index.html, the web user can
no longer see the individual files in your
directory.
20HTML and XHTML
- HTML is the hypertext markup language
- HTML is a markup language that is widely used on
the Web. - The latest, and probably last version of HTML is
at http//www.w3.org/TR/html4/ - The WC3, the standard making body for the WWW,
have issued XHTML, a replacement of HTML that is
compatible with XML. - We will work with XHTML.
21SGML HTML XML
- You will probably have come across these terms.
- SGML was developed first. HTML and XML are
developed from SGML in different ways. - HTML is an SGML DTD
- XML is an SGML application
- One common thing here is the ML. It stands for
Markup Language. - Markup is everything in a document that is not
content. - (something to scratch your head about)
22procedural/descriptive
- Markup can be given in two ways
- 1 Procedural
- Codes identify point size, style, font, etc.
- Usually only understood by defining tool
- Example Microsoft Word
- 2 Descriptive
- Describes purpose of text within the document
- Chapter head, Paragraph, Section Head, TOC
- Structure and Style are kept separate
- Example LaTeX, SGML
23SGML
- Standard Generalized Markup Language
- Descriptive approach with three separate layers
- structure types of information in document
- content the information itself
- style defines how to typeset the document
- Developed for the publishing industry by a group
around Goldfarb. - So complicated that no software implements it
fully. - But an important idea that remains of it is the
document type definition.
24Document Type Definition (DTD)
- The DTD is a non-SGML language that describes
SGML. - Describes information the document handles, e.g.
- title
- chapter
- Relationships between fields e.g.
- a chapter contains sections
- Consistency and logical structure
25XML
- Since SGML is so complicated, it is not good for
use on the Web. - So the W3C has issued XML, the eXtensible markup
language. - Every XML document is SGML, but not the opposite.
- Thus XML is like SGML but with many features
removed.
26XML elements
- XML is based on elements. There are basically
three ways of writing an element. - The first way is write ltname/gt
- Here name is the name of the element.
- Example
- ltbang/gt
- Such an element is called an empty element. Here
its name is bang.
27non-empty elements
- If name is the name of the element, you can give
an element contents contents by writing
ltnamegtcontentslt/namegt. - Examples
- ltgreetinggtbonjourlt/greetinggt
- ltgreetinggt????????????lt/greetinggt
- ltsentencegtShe says ltgreetinggthellolt/greetinggt to
you.ltsentencegt - In fact ltname/gt is just a shortcut for
ltnamegtlt/namegt.
28attributes to elements
- Elements can have attributes. Here is an element
with two attributes - ltname attribute_name_one"value_one"
attribute_name_two"value_two"/gt - Here attribute_name_one and attribute_name_two
are attribute names and value_one and value_two
are attribute values. The element itself is
empty. - Example ltgreeting languagefrenchgtbeaujourlt/gre
etinggt
29more on attributes
- There can be no two attributes to the same
element with the same names. - Attribute values are simple strings. You can not
have an element inside attribute. - Attribute names are separated from their values
by the sign. - Attribute values can be enclosed in single or
double quotes. It does not matter. Double quotes
are more common, so I suggest you use those.
30XML document
- An XML document is a piece of data that is
written in XML. - But sometimes the author of a document makes a
mistake, and, in fact the XML is wrong in some
ways. - If there is no mistake, the document is called
well-formed. - If a document is not well-formed, it really is
not an XML document.
31some rules for well-formedness
- There must be one single element in the document.
- It is called the root element.
- All other elements are called children of the
root. - Whitespace that surrounds the root element is
ignored. - All elements must be properly nested. You can
only close the outer element after all inner
elements are closed. Examples - ltagtltbgtlt/agtlt/bgt not well-formed
- ltagtltbgtlt/bgtlt/agt well formed
32other stuff comments
- In an XML document, you can make comments about
your code. These are notes to yourself. - Comments start with lt!--
- Comments end with --gt
- Example lt!-- this is a comment --gt
- Comments can not be nested.
- Can appear anywhere in the document
33other stuff XML declaration
- The XML declaration is a special line that says
that what follows is XML and give some very basic
information about that XML. It is trendy to use
it. - It is optional, but if it is there it has to be
on the first line. - You will need to have an XML declaration if your
character encoding is not UTF-8. We will come
back to this point later.
34other stuff XML declaration
- Normally the XML declaration looks like
- lt?xml version"1.0" encoding"encoding"?gt
- where encoding is the character encoding. By
default, the character encoding is UTF-8, so if
you use that, you do not need to mention it. - There is now a version 1.1 of XML around, but
- it is not widely deployed
- it is not much different from version 1.0
35other stuff document type declaration
- XML documents, like any SGML documents, accept
document type declarations. - A document type declaration tells us something
about the vocabulary of elements and attributes
used in the document. - It should appear before the root element, after
the XML declaration, if you have one. - It takes the form lt!DOCTYPE mumbojumbo gt
- We will come back to the document type
declaration later.
36HTML
- HyperText Markup Language
- HTML is an SGML DTD
- Head, Title, Body, Paragraph, etc.
- Headings, Bold, Italic, etc.
- Table, List, Image, etc.
- Links to other documents
- Forms
- and many others
37HTML history
- HTML was a very bare-bones language when first
invented by Tim Berners-Lee. It did not describe
pages with much of a visual appeal. - In the 90s, successful browsers invented
extensions that aimed to stretch the visual
boundaries of HTML. - Some of these extensions found their way in the
official HTML spec issued by the W3C. - Later the W3C developed style sheets as a way to
accommodate for display requirements without
having to extend HTML
38HTML versions
- HTML 4.01 is the last version of HTML This
version has two different DTDs - the loose DTD
- the strict DTD
- I only the cover the elements of the strict DTD.
- The loose DTD has more elements, but all the
functionality of these elements is best done with
style sheets. - Thus, the pages created with HTML only will look
rather boring. - But we do cover style sheets later.
39XHTML
- XHTML is HTML written in an XML syntax.
- Every XHTML document has to be well-formed XML.
- non-XHTML HTML documents can violate some
well-formedness constraints, including - HTML element names are not case sensitive
- some HTML elements do not need closing.
- there is no need for a single root element in a
HTML document.
40XHTML pain without gain?
- In this course we study XHTML.
- When I say HTML in the following, I mean XHTML.
- Reasons to study XHTML rather than HTML
- syntactic rules of XML are easier to understand.
- any tool that can work with XML can be applied to
XHTML, but can not be applied to HTML. - in general XML documents are more computer
understandable. This is crucial in the age of the
search engine.
41Example HTML snippet
- lta href"http//openlib.org/home/krichel"
title"homepage of Thomas Krichel"gtThomas
Krichellt/agt - the whole thing is an ltagt element. It creates an
anchor. (I use lt and gt to surround element
names.) - href is an attribute name
- http//openlib.org/home/krichel is the value of
the "href" attribute - (I surround attribute names with straight
quotes) - 'Thomas Krichel' is character data.
42Characters concept
- A character set combine two things
- Character repertoire a set of characters e.g.
"A", "?" "?", "?" - Character code positions defines a number for
each character in the repertoire. - Character encoding is a way to encode the code
positions in bytes - To correctly display a document, the user agent
needs to know both!
43playing safe with characters
- Only use the characters on the US keyboard, don't
insert symbols. - Save as ASCII or UTF-8. All ASCII files are also
UTF-8 files. - Never save as "Unicode" within MS Notepad.
- If you encounter a character that is not on your
keyboard, use an SGML entity. - The SGML entity is the last special SGML thing
that we have to study.
44SGML entities
- SGML entities are something like a way to
represent non-ASCII characters when only ASCII
input is possible. - Codes can can be code
- Ex. eacute
- Inserts and e with acute accent.
- this is called a character entity
- Codes are often abbreviation of the character
names - Codes can be in hex form
- Ex. 38 to insert an ampersand
- this is called a numeric entity
45XHTML entities
- They are officially defined in three files that
are maintained by the W3C - http//www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
- http//www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
- http//www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
- A sample line is
- lt!ENTITY ccedil "231"gt lt!-- latin small letter
c with cedilla, U00E7 ISOlat1 --gt - lt!ENTITY is DTD speak for defining an entity
- it is followed by the character form and the
numeric form of the entity - the rest of the line is a comment, of course
46entities used in XML
- There are three that you need to know and use.
- lt stands for lt
- gt stands for gt
- amp stands for
- Every time you want to insert lt, gt or in the
documents, you have to use the entities instead.
47another look at empty.html
lt!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN" "http//www.w3.org/TR/xhtml1/DTD/xh
tml1-strict.dtd"gt lthtmlgt ltheadgt
lttitlegtlt/titlegt ltmeta http-equiv"content-type"
content"text/html charsetUTF-8"/gt
lt/headgt ltbodygtlt/bodygt lt/htmlgt
48empty.html dissected
- the lt!DOCTYPE ... gt is an SGML document type
declaration. It says that the document contains
XHTML of the strict flavor. - The document type declaration is the only thing
that we have in the prolog. We could have placed
an XML declaration before it but chose not to do
so. - lthtmlgt is the root element. It contains some
other elements. Some of these we discuss now,
others later.
49the lthtmlgt element
- It is the root element of an XHTML document.
- It has required contents ltheadgt and ltbodygt.
- It has two optional attributes
- the "dir" attribute says in which direction the
contents is rendered. The classic value is "ltr",
"rtl" is also valid. - the "lang" attribute says in which language the
contents is. Use ISO 639 codes, e.g. lang"en-us"
- these two attributes are know as the
internationalization (i8n) attributes. - Example lthtml lang"en-us"gt lt/htmlgt
50the lttitlegt element
- appears in the ltheadgt
- defines the title of the document
- takes the i18n attributes
- Example
- lthtmlgtlthead lang"en-us"gt
- lttitlegtThomas Krichel's favorite
limericklt/titlegt - lt/headgt
- ltbodygtltdivgtThere was a young friar named Tuck
- it must not contain other HTML tags.
51usability concerns with lttitlegt
- The title is used by the user agent in a special
manner - as bookmark default title
- as the title for a window in which the user agent
runs - Google uses the title as anchor text to your web
page. - It is a crucial ad for your page
- Google may truncate the title.
- Bad ideas for titles
- section 1 -- home page
52the ltbodygt element
- This encloses the contents of the page as opposed
to its header. - Validation requires one and only one body.
- It takes the i18n attributes. as well as some
others that we will discuss now. These fall into
a another group of attributes we call core
attributes. - We will study those core attributes now.
53core attributes "id"
- This attribute assigns a name to a element.
- This name must be unique in a document. In the
ltbodygt element, this requirement is superfluous,
of course. - The "id" attribute has several roles in HTML,
including - As a style sheet selector
- As a target anchor for hypertext links
54core attributes "class"
- The class attribute is a friend of the "id"
attribute. - It assigns one or more class names to a element.
Class names are separated by colons. The element
may be said to belong to these classes. A class
name may be shared by several elements. - The "class" attribute has several roles in HTML,
but it is most useful as a style sheet selector,
when you want to assign style information to a
set of elements.
55Example for "class" and "id"
- ltp class"limerick" id"limerick_1"gt
- There was a young man from Perultbr/gt
- Whose limericks stopped at line two.lt/pgt
- ltpgtOK, that's a stupid limerick. Let us look at
anotherlt/pgt - ltp class"limerick" id"limerick_2"gt
- There was a young man from Japanltbr/gt
- Whose limericks would never scanltbr/gt
- And when they asked whyltbr/gt
- He said it is because Iltbr/gt
- Try to put as many words into the last line as I
possibly can.lt/pgt
56core attributes "title"
- The "title" attribute sets a title in use with
the tag. - There is no prescribed way in with the title is
being rendered by a user agent. - Sometimes it is shown as a tool tip, i.e.
something that flashes up when the mouse is
rolled over it. - This is not to say that the "title" attribute is
for flashers only.
57core attributes style
- Use the "style" attribute to give style
information to a particular element. - This will be more discussed when we do the style
sheets. - Usually there are better ways to attach style
information then writing it onto every element.
It is better to place the tag into a class by
giving them the same "class" attribute, and then
give style sheet information for the class. - We will discuss this later.
58summary core attributes
- To summarize, we have a group of core attributes.
- These attributes can be used with almost all
elements. - There are other attributes that can be almost
universally used, called "event attributes", but
they have to do with scripting, they are
therefore not studied in this course.
59block-level vs text-level elements
- Block-level elements contain data that is aligned
vertical by visual user agent. - Text-level elements are aligned horizontally by
visual user agents. - The reasons behind this distinction
- Block level can contain other block level
elements and text-level elements. - Text-level elements can not contain block-level
elements. - Visual user agents start a new line at the
beginning of block-level elements. - Multidirectional text would be impossible without
it.
60the ltdivgt and ltpgt elements
- The ltdivgt elements allows you to set arbitrary
block level divisions in your document. - It takes the core attributes.
- RULE put all your contents that is vertically
aligned into a ltdivgt. - The ltpgt tag is like ltdivgt but it signals the
start and end of a paragraph.
61the ltbr/gt element
- is used to create a line break.
- Note its emptiness!
- It has the "clear" attribute that can take the
values "left", "right" and "center" and "all".
This prevents textual contents to float around
other content.
62The ltspangt element
- This is another element for arbitrary divisions,
but it operates on inline content. This is
contents that is put in lines horizontally,
rather than block-level contents, that is put in
vertically. - Admits core attributes.
- Put things in a ltspangt that belong together in a
line.
63ltspangt example
- A worse poet however was
- Jltspan class"r"gtennylt/spangt.ltbr/gt
- Her limericks werent worth a Pltspan
class"r"gtennylt/spangtltbr/gt - Though the invention was
- sltspan class"r"gtoundlt/spangtltbr/gt
- She always fltspan class"r"gtoundlt/spangtltbr/gt
- That, whenever she tried to write ltspan
class"r"gtanylt/spangtltbr/gt - She always had one line to
- mltspan class"r"gtanylt/spangtltbr/gt.
64abstraction ends here
- Up until now, we have done a lot of abstract
elements and attributes that do not achieve much
visual impact. - Instead, they
- point the style sheet to where things are
- create a semantic design
- We will now turn to more physical descriptions.
65try it out
- right click empy.html in your winscp window.
- you will see the option to copy the file.
- copy it, say to tryout.html.
- right-click tryout.html and choose edit.
- you can now edit tryout.html
- open a user agent to
- http//wotan.liu.edu/user/tryout.html
- where user is the name of your user name. You
should be able to see your changes, as last saved.
66the ltagt element I
- opens a hyperlink, contents of element is the
anchor text, it is limited to text only - "href" attribute has the target URL
- "hreflang" has the language of the target
- "type" attribute gives the MIME-type of the
target - Some other attributes for which we have no use
- coords shape accesskey tabindex
- and of course, ltagt takes the core attributes
67the ltagt element II
- It takes the "rel" attributes to specify the
relationship between the current document and the
link target, as well as the "rev" attribute to
specify the reverse. - This is not currently well supported by the
browsers. - I will come back to these relational attributes
when discussing the ltlinkgt tag. - Ex lta hrefhttp//openlib.org/home/krichelgta
nice manlt/agt.
68linking within a document
- If the "id" attribute of an element in a document
at a URL URL is set to id , you can make the
element the target of a link. - You use the URL URLid for this purpose.
- If the document linked to is the current
document, you dont need to reference its URL. - example lta href "http//openlib.org/home/krichel
joke"gtjokelt/agt links to the element with id
"joke" in Thomas Krichel's homepage.
69the ltimggt element I
- makes an image.
- "src" attribute says where the image is
- "alt" attribute give a text to show for user
agents that do not display image. It may be shown
by the user agents as the user highlights the
image. It is limited to 1024 characters. - "longdesc" attribute is the same as "alt" but
does not have the length limitation. - Example ltimg src"thomas_krichel.jpg"
alt"picture of Thomas Krichel"gt
70the ltimggt element II
- "width" attribute gives the user agent a
suggestion for the width of the image. - "height" attribute gives the user agent a
suggestion for the height of the image - both can be expressed
- in pixels, as a number
- in age of the current display width
- of course ltimggt supports the core attributes.
71HTML checking
- validated.html has some additional code (as
compared to empty.html), that we can now
understand. - ltpgt
- lta href"http//validator.w3.org/check?urirefere
r"gt - ltimg style"border 0pt"
- src"http//wotan.liu.edu/valid-xhtml10.png"
- alt"Valid XHTML 1.0!" height"31"
- width"88" /gt
- lt/agtlt/pgt
- click no the icon to check your code. That's cool!
72 header elements
- Headers lth1gt to lth6gt
- Simple form of text formatting
- Vary text size based on the headers level.
- Actual size of text of header element is selected
by browser. - Results can vary significantly between user
agents. - All take the core attributes.
73lthr/gt element
- creates a horizontal rule
- admits the core attributes
- other attributes have been deprecated, i.e. are
allowed in the loose DTD but not the strict one.
74contents-based style elements
- ltabbrgt encloses abbreviations
- ltacronymgt encloses acronyms
- ltcitegt encloses citations
- ltcodegt encloses computer code snippets
- ltdfngt encloses things being defined
- ltemgt encloses emphasized text
- ltkbdgt encloses text typed on a keyboard
- ltsampgt encloses literal samples
- ltstronggt encloses strong text
- ltvargt encloses variables
- all admit the core attributes
75physical style elements
- ltbgt encloses bold contents
- ltbiggt encloses big contents
- ltsmallgt encloses small contents
- ltigt encloses italics contents
- ltsubgt encloses subscripted contents
- ltsupgt encloses superscripted contents
- ltttgt encloses typewriter-style contents
- all admit the core attributes
76the ltpregt element
- encloses contents that is to be rendered with the
characters and line breaks just like in the
source text. Markup is still allowed, but
elements that do spacing should not be used,
obviously. - It takes the core attributes and a "width"
attribute setting the number of characters per
line.
77ltblockquotegt and ltqgt elements
- ltblockquotegt quotes a paragraph
- ltqgt make a short quote inside a paragraph
- both takes a "cite" attribute that take the value
of a URL of the source of the quote. - They also take the core attributes.
78list elements
- ltolgt creates an ordered list.
- ltligt encloses each item
- ltulgt unordered list
- ltligt encloses each item
- ltdlgt encloses a definition list
- ltdtgt encloses the term that is being defined
- ltddgt encloses the definition
- All take the core attributes and the i18n
attributes.
79http//openlib.org/home/krichel
- Thank you for your attention!