Title: SGML, HTML, XML: Do We Really Need All That?
1SGML, HTML, XMLDo We Really Need All That?
- ISMT Multimedia
- Fall 2002
- Dr Vojislav B Mišic
2Lecture Overview
- What is a markup language?
- HTML markup whats good, whats wrong
- Extensions to HTML (dHTML and style sheets, XML
and XSL, ) - XML
- Basic elements
- Well-formed vs. valid XML
- Writing a DTD
- Examples of XML
3Markup languages
- What is markup?
- Text (actual contents of the document)
- is interspersed with markings
- Markup is related to the text
- notes on the content
- notes on text presentation
- but virtually anything can be marked (remember
Fermats last theorem?) - Markup language allows separation of concerns
content vs. presentation
4Standards for markup
- SGML (IBM) a standardized way to write other
markup languages (actually, a meta-language) - SGML-based language is specified using a DTD
(Document Type Definition) - SGML is not really a user-friendly language,
hence its use was rather limited, even though
software support for it does exist
5Other markup languages
- TeX (Knuth) is another widely used markup
language - Performs extremely well for complex texts with
- mathematical formulas and symbols
- cross-references
- different typefaces
- foreign language
6A TeX example
- \beginequation\labelcoh1
- \Psi (S) \displaystyle
- \frac\displaystyle
- \sum_x \in R (S)
- \left( \ S_w (x) - 1 \right)
- \displaystyle
- \sum_x \in R (S)
- \left( \ S - 1 \right)
- \endequation
7HTML
- HTML (HyperText Markup Language) is the language
of the Internet - Allows platform-independent browsing
- Text-only at first, media later
- Hyperlinks, limited visual formatting
- However, it is far from perfect, and is gradually
being replaced (current version 4.01)
8HTML markup
- First you write the text, then add appropriate
markup tags - Tags can describe logical entities
- Headings of different levels H1, H2,
- Lists and list elements (UL, OL, LI)
- But tags can describe visual effects (display
rendering) - Bold and italic text (B, IT)
- Font and typeface changes
9If you make an error
- Anything not recognized as correct HTML is
essentially ignored - HTML browser just treats it as plain text and
displays it directly - In this manner, users are still able to see most
of the source, albeit without proper formatting - Your opinion is this good or bad?
10HTML editing
- HTML source is ASCII and essentially layout
independent - Plain text editors can be used
- You can put extra white space to your hearts
content, with no effect on what is displayed by
the browser - Most browsers allow you to view and save the HTML
source of the document displayed the quickest
way to learn HTML - HTML is interpreted editing changes are
displayed (almost) instantly
11HTML on the Internet
- HTML browsers can display graphics and other
media objects - Although HTML by itself provides only the most
primitive support for multimedia - Tags can specify target URLs (hyperlinks)
- Error tolerance ensures that anyone with a
browser (any browser) can access HTML documents - all of which made HTML the language of choice
for hypertext on the Internet
12More HTML features
- Visual formatting is allowed but not forced
- you can specify a typeface, but the browser will
substitute another one of its own choice if the
one specified is not available - User can easily change the presentation
- just resize window and select different
fonts/sizes - Browser differences (IE vs. Navigator)
actually, not very important any more
13HTML Interactivity
- Interactivity at first limited to hyperlinks
- Forms introduced later (Navigator 3)
- Form support still limited, most often a client-
or server-side scripting is required - Proliferation of scripting languages
- CGI scripts
- JavaScript and Jscript (more details later)
- Vbscript, ASP
- perl
14Is HTML a Good Markup Language?
- Logical and visual formatting capabilities
together - Some people argue for cleaner separation of
logical from visual formatting - Others want more author control
- Many extensions (some proprietary)
- Changes generally lean towards greater author
control over document rendering more direct
formatting instructions included
15Dynamic HTML
- Commercial term there is no such thing as a
dHTML standard - Combination of HTML with new technologies
- Stylesheets add greater author control
- Scripting allows improved interactivity,
including user input - Even simple animations are possible
- As always, not quite compatible extensions by
Microsoft and Netscape
16HTML styles
- In standard HTML, logical markup tags (such as
ltH1gt) have predefined properties for - Typeface
- Font size
- Mode
- Line spacing
- Properties cannot be changed, and we cannot
define our own tags - The only way is to use a (possibly way too long)
sequence of appropriate primitive tags every time
not a very convenient solution
17Stylesheets to the rescue
- Cascaded stylesheets (CSS) cleaner separation of
markup from actual content - Style a named set of properties that define
presentation of a chunk of text (character,
paragraph, ) - Styles are present in text processing software
(WinWord) but in some markup languages as well
(TeX) - CSS is used with HTML, but its not HTML
although browsers know how to handle them together
18CSS Syntax
- A CSS-compatible stylesheet contains a set of
rules, each with a selector (name), a number of
properties and their values - Rules can be
- Inline (within a HTML tag, in document body)
- Embedded (in the head of a HTML document)
- External, in a separate file which is then linked
or imported into a HTML document - Position of the rule defines the scope of its
effect on the document
19CSS Selectors
- HTML selectors text portions of HTML tags
- Class selectors can be applied to any HTML tag
- ID selectors usually applied only once per page
to a particular HTML tag - Type of HTML tag defines the scope of CSS
properties - Block level (DIV, LI, H1)
- Inline (B, FONT, TT)
- Replaced tags (IMG)
20CSS Properties
- Always of the form propertyvalue
- Categories of properties control
- Typefaces (fonts, size, mode)
- Text (kerning, leading, alignment)
- Lists (bullets, indentation)
- Colors (borders, text, rules, background)
- Margins
- Positioning of individual elements
21CSS Rule with a HTML selector
- Effective redefinition of HTML tags, e.g.B
fonts bold 18pt times,serif
text-decoration underline - Redefines the ltBgt (boldface) tag throughout the
rest of the document - Dont forget to close the brace!
22CSS Rule with a class selector
- Independent style, applicable to any HTML
tag.extra font-size 28pt .huge
font-size 48pt - Class selector must be referred to within the
HTML tagltB class"extra"gtExtralt/BgtltB
class"huge"gtHUGElt/Bgt
23CSS Rule with a class selector
- May be linked to a specific HTML tagp.mini
font-size 8pt p.big font-size 14pt - Class selector may be applied to this HTML tag
onlyltP classmini"gtminilt/PgtltP
classbig"gtBIGlt/Pgt
24CSS Rule with an ID selector
- Another independent style, applicable to any HTML
tagarea1 position relative
margin-left 9em color red - ID is specified within the HTML tagltSPAN
ID"area1"gt ... lt/SPANgt
25More on CSS selectors
- Several CSS selectors may share the same
definition, and individual selectors may get
additional properties separately - CSS rules can refer to tags nested within other
tags, e.g.,P B background pink - redefines the ltBgt tag only when encountered
within the ltPgt tag
26Adding CSS to your document
- Within a style container in the document
headltHEADgtltSTYLE TYPE"text/css"gtlt!--
CSS rules go here--gtlt/STYLEgtlt/HEADgt - HTML comment tags hide the CSS rules form non-CSS
browsers
27Importing CSS into your document
- Create a separate file, stylefile.css, then
writeltHEADgtltLINK RELstylesheets
TYPE"text/css HREF"stylefile.cssgtlt/HEAD
gt - Several files may be added in this manner
28More on CSS
- Single line comments start with //
- Multiline comments between matched pairs of /
and / - A stylesheet file may import another stylesheet
file (hence the name CSS) with the
statement_at_import url(stylefile) - But the last rule listed wins!
- Also beware of browser differences!
29More CSS capabilities
- Font selection
- Text control
- List properties
- Background properties
- Absolute and relative positioning (but this is
very dangerous!) - Visibility (which probably has little use by
itself but it can be quite useful when changed
though appropriate scripts) - Stacking (vertical) order
30Document Object Model
- DOM describes the structure of HTML HTML document
as a hierarchy - Thus allowing a script written in a suitable
language to access and manipulate only selected
element (or elements) within that document - document.images.b1.src"button_on.gif" describes
a path from root or top (which is the document
itself) to a particular element an image file - Then, a script can manipulate this element (e.g.,
hide, show, replace, move, ) in response to
certain events
31XML
- eXtended Markup Language a simplified (easier,
more consistent) version of SGML - XML-compliant languages defined with appropriate
DTDs - XML parsers signal syntax errors (unlike HTML)
use of authoring tools implied - current uses (with more to follow)
- SMIL for synchronized multimedia
- RDF for resource definition exchange
32What is XML?
- A method for putting structured data in a text
file - Data stored on disk can be in binary or text
format - Binary formats are often more concise
- Text format allows human inspection
- XML is a set of rules/guidelines/conventions for
designing text formats for such data, to produce
files that are - Easy to generate and read (by a computer)
- Unambiguous and platform-independent
- Extensible, easy to localize/internationalize
33XML looks like HTML but isn't HTML
- XML makes use of
- tags (words bracketed by 'lt' and 'gt') and
- attributes (of the form name"value")
- HTML specifies what each tag attribute means
(and often how the text between them will look in
a browser) - XML uses the tags only to delimit pieces of data
and leaves the interpretation to the application
34XML is text, but isn't meant to be read
- XML files are text files, but they are not made
for human readers - Text format allows experts (such as programmers)
to more easily debug applications - Text format allows the use of a simple text
editor to fix a broken XML file - Rules for XML files much stricter than for HTML
- Applications are not allowed to try to
second-guess the creator of a broken XML file
if the file is broken, just stop and issue an
error message
35XML is verbose, but that is not a problem
- XML is a text format and uses tags to delimit the
data - Therefore, XML files are nearly always larger
than comparable binary formats - But disk space isn't as expensive anymore as it
used to be, and compression/decompression can be
fast and reliable - Communication protocols can compress data on the
fly, thus saving bandwidth as effectively as a
binary format
36XML is good
- XML is license-free
- XML is platform-independent
- XML is well-supported
- Choosing XML is a lot like choosing SQL
- you still have to build your own database and
your own programs/procedures that manipulate it - but there are many tools available and many
people that can help you - XML isn't always the best solution, but it is
always worth considering
37XML is a family of technologies
- XML the specification that defines what "tags"
and "attributes" are - Xlink describes a standard way to add hyperlinks
to an XML file - CSS is applicable to XML as it is to HTML
- XSL an advanced language for style sheets
(presentation and manipulation) - XSLT a transformation language
- SMIL Synchronized Multimedia Modeling
- and others
38Well-formed vs. valid XML
- Well-formed vs. valid XML
- Well-formed documents comply with XML
well-formedness constraints, which require that - Elements properly nest within each other
- Elements use other markup syntax correctly
- XML allows you to use elements of your own
naming ESSAY, SECTION, PARAGRAPH, NOTE,
IMPORTANT - unlike HTML, which forces all documents into a
fixed document type
39Writing XML One, Two
- XML Declaration declares the nature of XML
documents to document readers - lt?xml version"1.0" standalone"yes"?gt
- lt?xml version"1.0" standalone"no"?gt
- lt?xml version"1.0 standalone"no
encoding"UTF-8"?gt - Root element contains all other elements (i.e.,
the rest of the document) - Root element is synonymous with your document
type - Root element cannot be repeated
40An XML example
- lt?xml version"1.0" standalone"yes"?gt
ltTRIVIAgtltMATHgtltQUESTIONgtWhat is the square
root of 25lt/QUESTIONgtltANSWERgt5lt/ANSWERgtlt/MATHgtÂ
ltGENERALgtltQUESTIONgtWhat is the season after
Summerlt/QUESTIONgtltANSWERgtFalllt/ANSWERgtltANSWERgtAu
tumn lt/ANSWERgtlt/GENERALgtlt/TRIVIAgt
41Rules for XML elements
- All elements must have opening and closing (start
and end) tags - ltMATHgt ... lt/MATHgt
- There are exceptions tags like
- ltQUESTION ... /gt
- Case matters CML is case-sensitive
- Proper tag nesting must be observed
- You can add whitespace to your hearts content
it is ignored in processing
42XML Writing
- Describe content with elements of your own naming
- Invent a new element each time you introduce
content that significantly differs from any
previous - More elements greater control you will have
later, when you use it - Add attributes to elements
- Attributes describe the content or behavior of
elements
43Another Example
- lt?xml version"1.0" standalone"yes"?gtltHELPgtltTIT
LEgtXML Helplt/TITLEgtltQUERY area"XML"gtltQUESTIONgt
Where do I start?lt/QUESTIONgtltANSWERgtStart with
your root element. Break your document down into
parts, fill them in, repeat.lt/ANSWERgtlt/QUERYgtlt
QUERY area"XML"gtltQUESTIONgtAre my element names
are well chosen?lt/QUESTIONgtlt/HELPgt
44XML Writing 4
- Parsing checking well-formedness
- ltPRICEgt57.80lt/PRICEgtltPETgtltCAT type"Cornish
Rex"gtCat nests properly within PET.lt/CATgtlt/PETgtlt
WEATHERgtFoggy no closing tagltLEVELgtIntermedia
teltLEVELgt improper tagltPASSWORDgtplanetB612lt/PAS
SWDgt wrong spellingltDISTANCE TYPEKM
120lt/DISTANCEgt missing closing
bracketltCARgtltenginegtengine does not nest
properly within CARlt/CARgtlt/enginegt improper
nesting
45Valid XML
- Valid XMLunlike well-formed onerequires a
Document Type Definition - DTD a set of rules that a particular document
type must follow - The rules state the name and contents of each
element, and the contexts in which a particular
element can and must exist - DTD enables communication with databases
- Valid XML documents may be accompanied by style
sheets for proper presentation
46Whats in a DTD
- Two essential structures the element and the
attribute - Root element contains all other elements
- Contents of other elements defined recursively
starting from the root, until you reach
text-level elements, e.g., - lt!ELEMENT NAME CONTENTgt
- Elements may have attributes, which are defined
within the element definition, or separately,
e.g., - lt!ATTLIST ELEMENT-NAME NAME CDATA IMPLIEDgt
47Writing a DTD
- lt!ELEMENT novel (preface,chapter,biography?,criti
calessay)gt - lt!ELEMENT preface (paragraph)gt
- lt!ELEMENT chapter (title,paragraph,section)gt
- lt!ELEMENT section (title,paragraph)gt
- lt!ELEMENT biography (title,paragraph)gt
- lt!ELEMENT criticalessay (title,section)gt
- lt!ELEMENT paragraph (PCDATAkeyword)gt
- lt!ELEMENT title (PCDATAkeyword)gt
- lt!ELEMENT keyword (PCDATA)gt
48DTD Declarations (1)Element type declaration
- Each element type includes a name, content, and
possibly a set of attributes - A document can contain many conforming elements
of that type - Sequence ordered list of components (,)
- Choice alternative components ()
- Components may be optional (?)
- Components may be required and repeatable ()
- Components may be optional and repeated ()
- Mixed-content declarations must include PCDATA ,
parsed character data (i.e., text) as their first
member
49DTD Declarations (2)Attribute List Declarations
- Much more variation here ?
- String type attributes (CDATA) virtually
unconstrained text strings - Enumeration attributes require a list of options
to pick from - Attribute defaults
- REQUIRED, required
- IMPLIED, optional
- FIXED "value", a fixed value,
- "value", a default but overridable value
- Usage
- ltELEMENT-NAME NAME"value"gt
50An Attribute List Example
- lt!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)gtlt
!ATTLIST MEMO importance (HIGHMEDIUMLOW)
"LOW"gtlt!ELEMENT TO (PCDATA)gtlt!ELEMENT
FROM (PCDATA)gtlt!ELEMENT SUBJECT
(PCDATA)gtlt!ELEMENT BODY (P)gtlt!ELEMENT P
(PCDATA)gtlt!ELEMENT SIGN
(PCDATA)gtlt!ATTLIST SIGN signatureFile CDATA
IMPLIED email CDATA
REQUIREDgt
51XML Writing
- Add an XML declaration
- Valid XML documents must include the appropriate
DTD - either as a set of internal definitions, or
- lt!DOCTYPE NAME SYSTEM definitions gt
- as a reference to an external DTD file,
- lt!DOCTYPE NAME SYSTEM "file gt
- or both simultaneously
- lt!DOCTYPE NAME SYSTEM "file definitions gt
- DTD enables the parser to check validity of the
document (errors are NOT permitted!)
52Writing and Parsing Valid XML
- First suggestion use a specialized editor
- Lots of choices, some of which are free ?
- Second suggestion use a validating parser
- Again, lots of choices are available, mostly in
Java, some in C, perl, JavaScript - IE5 includes an XML parser (not quite up to the
standard, yet) - XML interfaces to be included in standard DBMS
systems Oracle, DB2, MS SQL Server
53SMIL
- Synchronized Multimedia Integration Language
- based on XML specification, endorsed by W3C
http//www.w3.org/TR/PR-smil - integration of a set of independent media objects
into a synchronized presentation - enables authors to describe
- temporal behavior of a presentation
- spatial layout of the presentation
- hyperlinks between media objects
54Basic elements of a SMIL specification
- smil element can have an id attribute, and it can
contain body and head children elements - head contains information not related to temporal
behavior - head can contain the following children layout,
switch (but not both), and meta (zero or more) - layout determines how the elements in the body
are positioned on an abstract rendering surface
(audio or visual) - if no layout is specified, the rendering is
implementation dependent - Alternative layouts specified with a switch
element
55Basic elements (III)
- each element has an id and a type
- element type specifies the layout language used
in the layout element (default
text/smil-basic-layout) - the default type information contains region and
root-layout elements - non-default type information is simply character
data - SMIL basic layout is a subset of the visual
rendering model - only positionable media object elements are
controlled by the SMIL basic layout
56A region example
- A text element is set to a 5 pixel distance from
the top border of the rendering window - ltsmilgt ltheadgt ltlayoutgt ltregion
id"a" top"5" /gt lt/layoutgt lt/headgt
ltbodygt lttext region"a" .../gt
lt/bodygtlt/smilgt
57Meta attributes
- define properties of a document
- each meta element specifies a single
property/value pair - the list of properties is open-ended
- authoring tools should ensure that all meta
elements have a title with meaningful description - information related to temporal and linking
behavior of the document - Parallel/sequential playback of the children
- Complex synchronization possible
- Synchronization alternatives possible
58Hyperlinking elements
- navigational links between elements
- links are unidirectional and single-headed
- SMIL supports name fragment identifiers and the
'' connector (just like HTML
http//foo.com/some/pathanchor1) - the a element used as in HTML associates a link
with a complete media object only - New link (presentation) can replace the old one
- New link (presentation) can be added to the old
one - New link (presentation) can pause the old one
59Summary
- XML is HTML done right
- Widespread use in many areas web publishing,
document processing, multimedia, B2B electronic
commerce - Tools added daily
- Database connection crucial for success
60XML links
- www.w3c.org
- http//www.software.ibm.com/xml/
- http//msdn.microsoft.com/xml/
- www.xml.org
- www.xml.com