SGML, HTML, XML: Do We Really Need All That? - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

SGML, HTML, XML: Do We Really Need All That?

Description:

Extensions to HTML (dHTML and style sheets, XML and XSL, ...) XML. Basic elements ... Commercial term there is no such thing as a dHTML standard ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 61
Provided by: drv2
Category:
Tags: html | sgml | xml | dhtml | need | really

less

Transcript and Presenter's Notes

Title: SGML, HTML, XML: Do We Really Need All That?


1
SGML, HTML, XMLDo We Really Need All That?
  • ISMT Multimedia
  • Fall 2002
  • Dr Vojislav B MiÅ¡ic

2
Lecture Overview
  • What is a markup language?
  • HTML markup whats good, whats wrong
  • Extensions to HTML (dHTML and style sheets, XML
    and XSL, )
  • XML
  • Basic elements
  • Well-formed vs. valid XML
  • Writing a DTD
  • Examples of XML

3
Markup languages
  • What is markup?
  • Text (actual contents of the document)
  • is interspersed with markings
  • Markup is related to the text
  • notes on the content
  • notes on text presentation
  • but virtually anything can be marked (remember
    Fermats last theorem?)
  • Markup language allows separation of concerns
    content vs. presentation

4
Standards for markup
  • SGML (IBM) a standardized way to write other
    markup languages (actually, a meta-language)
  • SGML-based language is specified using a DTD
    (Document Type Definition)
  • SGML is not really a user-friendly language,
    hence its use was rather limited, even though
    software support for it does exist

5
Other markup languages
  • TeX (Knuth) is another widely used markup
    language
  • Performs extremely well for complex texts with
  • mathematical formulas and symbols
  • cross-references
  • different typefaces
  • foreign language

6
A TeX example
  • \beginequation\labelcoh1
  • \Psi (S) \displaystyle
  • \frac\displaystyle
  • \sum_x \in R (S)
  • \left( \ S_w (x) - 1 \right)
  • \displaystyle
  • \sum_x \in R (S)
  • \left( \ S - 1 \right)
  • \endequation

7
HTML
  • HTML (HyperText Markup Language) is the language
    of the Internet
  • Allows platform-independent browsing
  • Text-only at first, media later
  • Hyperlinks, limited visual formatting
  • However, it is far from perfect, and is gradually
    being replaced (current version 4.01)

8
HTML markup
  • First you write the text, then add appropriate
    markup tags
  • Tags can describe logical entities
  • Headings of different levels H1, H2,
  • Lists and list elements (UL, OL, LI)
  • But tags can describe visual effects (display
    rendering)
  • Bold and italic text (B, IT)
  • Font and typeface changes

9
If you make an error
  • Anything not recognized as correct HTML is
    essentially ignored
  • HTML browser just treats it as plain text and
    displays it directly
  • In this manner, users are still able to see most
    of the source, albeit without proper formatting
  • Your opinion is this good or bad?

10
HTML editing
  • HTML source is ASCII and essentially layout
    independent
  • Plain text editors can be used
  • You can put extra white space to your hearts
    content, with no effect on what is displayed by
    the browser
  • Most browsers allow you to view and save the HTML
    source of the document displayed the quickest
    way to learn HTML
  • HTML is interpreted editing changes are
    displayed (almost) instantly

11
HTML on the Internet
  • HTML browsers can display graphics and other
    media objects
  • Although HTML by itself provides only the most
    primitive support for multimedia
  • Tags can specify target URLs (hyperlinks)
  • Error tolerance ensures that anyone with a
    browser (any browser) can access HTML documents
  • all of which made HTML the language of choice
    for hypertext on the Internet

12
More HTML features
  • Visual formatting is allowed but not forced
  • you can specify a typeface, but the browser will
    substitute another one of its own choice if the
    one specified is not available
  • User can easily change the presentation
  • just resize window and select different
    fonts/sizes
  • Browser differences (IE vs. Navigator)
    actually, not very important any more

13
HTML Interactivity
  • Interactivity at first limited to hyperlinks
  • Forms introduced later (Navigator 3)
  • Form support still limited, most often a client-
    or server-side scripting is required
  • Proliferation of scripting languages
  • CGI scripts
  • JavaScript and Jscript (more details later)
  • Vbscript, ASP
  • perl

14
Is HTML a Good Markup Language?
  • Logical and visual formatting capabilities
    together
  • Some people argue for cleaner separation of
    logical from visual formatting
  • Others want more author control
  • Many extensions (some proprietary)
  • Changes generally lean towards greater author
    control over document rendering more direct
    formatting instructions included

15
Dynamic HTML
  • Commercial term there is no such thing as a
    dHTML standard
  • Combination of HTML with new technologies
  • Stylesheets add greater author control
  • Scripting allows improved interactivity,
    including user input
  • Even simple animations are possible
  • As always, not quite compatible extensions by
    Microsoft and Netscape

16
HTML styles
  • In standard HTML, logical markup tags (such as
    ltH1gt) have predefined properties for
  • Typeface
  • Font size
  • Mode
  • Line spacing
  • Properties cannot be changed, and we cannot
    define our own tags
  • The only way is to use a (possibly way too long)
    sequence of appropriate primitive tags every time
    not a very convenient solution

17
Stylesheets to the rescue
  • Cascaded stylesheets (CSS) cleaner separation of
    markup from actual content
  • Style a named set of properties that define
    presentation of a chunk of text (character,
    paragraph, )
  • Styles are present in text processing software
    (WinWord) but in some markup languages as well
    (TeX)
  • CSS is used with HTML, but its not HTML
    although browsers know how to handle them together

18
CSS Syntax
  • A CSS-compatible stylesheet contains a set of
    rules, each with a selector (name), a number of
    properties and their values
  • Rules can be
  • Inline (within a HTML tag, in document body)
  • Embedded (in the head of a HTML document)
  • External, in a separate file which is then linked
    or imported into a HTML document
  • Position of the rule defines the scope of its
    effect on the document

19
CSS Selectors
  • HTML selectors text portions of HTML tags
  • Class selectors can be applied to any HTML tag
  • ID selectors usually applied only once per page
    to a particular HTML tag
  • Type of HTML tag defines the scope of CSS
    properties
  • Block level (DIV, LI, H1)
  • Inline (B, FONT, TT)
  • Replaced tags (IMG)

20
CSS Properties
  • Always of the form propertyvalue
  • Categories of properties control
  • Typefaces (fonts, size, mode)
  • Text (kerning, leading, alignment)
  • Lists (bullets, indentation)
  • Colors (borders, text, rules, background)
  • Margins
  • Positioning of individual elements

21
CSS Rule with a HTML selector
  • Effective redefinition of HTML tags, e.g.B
    fonts bold 18pt times,serif
    text-decoration underline
  • Redefines the ltBgt (boldface) tag throughout the
    rest of the document
  • Dont forget to close the brace!

22
CSS Rule with a class selector
  • Independent style, applicable to any HTML
    tag.extra font-size 28pt .huge
    font-size 48pt
  • Class selector must be referred to within the
    HTML tagltB class"extra"gtExtralt/BgtltB
    class"huge"gtHUGElt/Bgt

23
CSS Rule with a class selector
  • May be linked to a specific HTML tagp.mini
    font-size 8pt p.big font-size 14pt
  • Class selector may be applied to this HTML tag
    onlyltP classmini"gtminilt/PgtltP
    classbig"gtBIGlt/Pgt

24
CSS Rule with an ID selector
  • Another independent style, applicable to any HTML
    tagarea1 position relative
    margin-left 9em color red
  • ID is specified within the HTML tagltSPAN
    ID"area1"gt ... lt/SPANgt

25
More on CSS selectors
  • Several CSS selectors may share the same
    definition, and individual selectors may get
    additional properties separately
  • CSS rules can refer to tags nested within other
    tags, e.g.,P B background pink
  • redefines the ltBgt tag only when encountered
    within the ltPgt tag

26
Adding CSS to your document
  • Within a style container in the document
    headltHEADgtltSTYLE TYPE"text/css"gtlt!--
    CSS rules go here--gtlt/STYLEgtlt/HEADgt
  • HTML comment tags hide the CSS rules form non-CSS
    browsers

27
Importing CSS into your document
  • Create a separate file, stylefile.css, then
    writeltHEADgtltLINK RELstylesheets
    TYPE"text/css HREF"stylefile.cssgtlt/HEAD
    gt
  • Several files may be added in this manner

28
More on CSS
  • Single line comments start with //
  • Multiline comments between matched pairs of /
    and /
  • A stylesheet file may import another stylesheet
    file (hence the name CSS) with the
    statement_at_import url(stylefile)
  • But the last rule listed wins!
  • Also beware of browser differences!

29
More CSS capabilities
  • Font selection
  • Text control
  • List properties
  • Background properties
  • Absolute and relative positioning (but this is
    very dangerous!)
  • Visibility (which probably has little use by
    itself but it can be quite useful when changed
    though appropriate scripts)
  • Stacking (vertical) order

30
Document Object Model
  • DOM describes the structure of HTML HTML document
    as a hierarchy
  • Thus allowing a script written in a suitable
    language to access and manipulate only selected
    element (or elements) within that document
  • document.images.b1.src"button_on.gif" describes
    a path from root or top (which is the document
    itself) to a particular element an image file
  • Then, a script can manipulate this element (e.g.,
    hide, show, replace, move, ) in response to
    certain events

31
XML
  • eXtended Markup Language a simplified (easier,
    more consistent) version of SGML
  • XML-compliant languages defined with appropriate
    DTDs
  • XML parsers signal syntax errors (unlike HTML)
    use of authoring tools implied
  • current uses (with more to follow)
  • SMIL for synchronized multimedia
  • RDF for resource definition exchange

32
What is XML?
  • A method for putting structured data in a text
    file
  • Data stored on disk can be in binary or text
    format
  • Binary formats are often more concise
  • Text format allows human inspection
  • XML is a set of rules/guidelines/conventions for
    designing text formats for such data, to produce
    files that are
  • Easy to generate and read (by a computer)
  • Unambiguous and platform-independent
  • Extensible, easy to localize/internationalize

33
XML looks like HTML but isn't HTML
  • XML makes use of
  • tags (words bracketed by 'lt' and 'gt') and
  • attributes (of the form name"value")
  • HTML specifies what each tag attribute means
    (and often how the text between them will look in
    a browser)
  • XML uses the tags only to delimit pieces of data
    and leaves the interpretation to the application

34
XML is text, but isn't meant to be read
  • XML files are text files, but they are not made
    for human readers
  • Text format allows experts (such as programmers)
    to more easily debug applications
  • Text format allows the use of a simple text
    editor to fix a broken XML file
  • Rules for XML files much stricter than for HTML
  • Applications are not allowed to try to
    second-guess the creator of a broken XML file
    if the file is broken, just stop and issue an
    error message

35
XML is verbose, but that is not a problem
  • XML is a text format and uses tags to delimit the
    data
  • Therefore, XML files are nearly always larger
    than comparable binary formats
  • But disk space isn't as expensive anymore as it
    used to be, and compression/decompression can be
    fast and reliable
  • Communication protocols can compress data on the
    fly, thus saving bandwidth as effectively as a
    binary format

36
XML is good
  • XML is license-free
  • XML is platform-independent
  • XML is well-supported
  • Choosing XML is a lot like choosing SQL
  • you still have to build your own database and
    your own programs/procedures that manipulate it
  • but there are many tools available and many
    people that can help you
  • XML isn't always the best solution, but it is
    always worth considering

37
XML is a family of technologies
  • XML the specification that defines what "tags"
    and "attributes" are
  • Xlink describes a standard way to add hyperlinks
    to an XML file
  • CSS is applicable to XML as it is to HTML
  • XSL an advanced language for style sheets
    (presentation and manipulation)
  • XSLT a transformation language
  • SMIL Synchronized Multimedia Modeling
  • and others

38
Well-formed vs. valid XML
  • Well-formed vs. valid XML
  • Well-formed documents comply with XML
    well-formedness constraints, which require that
  • Elements properly nest within each other
  • Elements use other markup syntax correctly
  • XML allows you to use elements of your own
    naming ESSAY, SECTION, PARAGRAPH, NOTE,
    IMPORTANT
  • unlike HTML, which forces all documents into a
    fixed document type

39
Writing XML One, Two
  • XML Declaration declares the nature of XML
    documents to document readers
  • lt?xml version"1.0" standalone"yes"?gt
  • lt?xml version"1.0" standalone"no"?gt
  • lt?xml version"1.0 standalone"no
    encoding"UTF-8"?gt
  • Root element contains all other elements (i.e.,
    the rest of the document)
  • Root element is synonymous with your document
    type
  • Root element cannot be repeated

40
An XML example
  • lt?xml version"1.0" standalone"yes"?gt
    ltTRIVIAgtltMATHgtltQUESTIONgtWhat is the square
    root of 25lt/QUESTIONgtltANSWERgt5lt/ANSWERgtlt/MATHgt 
    ltGENERALgtltQUESTIONgtWhat is the season after
    Summerlt/QUESTIONgtltANSWERgtFalllt/ANSWERgtltANSWERgtAu
    tumn lt/ANSWERgtlt/GENERALgtlt/TRIVIAgt

41
Rules for XML elements
  • All elements must have opening and closing (start
    and end) tags
  • ltMATHgt ... lt/MATHgt
  • There are exceptions tags like
  • ltQUESTION ... /gt
  • Case matters CML is case-sensitive
  • Proper tag nesting must be observed
  • You can add whitespace to your hearts content
    it is ignored in processing

42
XML Writing
  • Describe content with elements of your own naming
  • Invent a new element each time you introduce
    content that significantly differs from any
    previous
  • More elements greater control you will have
    later, when you use it
  • Add attributes to elements
  • Attributes describe the content or behavior of
    elements

43
Another Example
  • lt?xml version"1.0" standalone"yes"?gtltHELPgtltTIT
    LEgtXML Helplt/TITLEgtltQUERY area"XML"gtltQUESTIONgt
    Where do I start?lt/QUESTIONgtltANSWERgtStart with
    your root element. Break your document down into
    parts, fill them in, repeat.lt/ANSWERgtlt/QUERYgtlt
    QUERY area"XML"gtltQUESTIONgtAre my element names
    are well chosen?lt/QUESTIONgtlt/HELPgt

44
XML Writing 4
  • Parsing checking well-formedness
  • ltPRICEgt57.80lt/PRICEgtltPETgtltCAT type"Cornish
    Rex"gtCat nests properly within PET.lt/CATgtlt/PETgtlt
    WEATHERgtFoggy no closing tagltLEVELgtIntermedia
    teltLEVELgt improper tagltPASSWORDgtplanetB612lt/PAS
    SWDgt wrong spellingltDISTANCE TYPEKM
    120lt/DISTANCEgt missing closing
    bracketltCARgtltenginegtengine does not nest
    properly within CARlt/CARgtlt/enginegt improper
    nesting

45
Valid XML
  • Valid XMLunlike well-formed onerequires a
    Document Type Definition
  • DTD a set of rules that a particular document
    type must follow
  • The rules state the name and contents of each
    element, and the contexts in which a particular
    element can and must exist
  • DTD enables communication with databases
  • Valid XML documents may be accompanied by style
    sheets for proper presentation

46
Whats in a DTD
  • Two essential structures the element and the
    attribute
  • Root element contains all other elements
  • Contents of other elements defined recursively
    starting from the root, until you reach
    text-level elements, e.g.,
  • lt!ELEMENT NAME CONTENTgt
  • Elements may have attributes, which are defined
    within the element definition, or separately,
    e.g.,
  • lt!ATTLIST ELEMENT-NAME NAME CDATA IMPLIEDgt

47
Writing a DTD
  • lt!ELEMENT novel (preface,chapter,biography?,criti
    calessay)gt
  • lt!ELEMENT preface (paragraph)gt
  • lt!ELEMENT chapter (title,paragraph,section)gt
  • lt!ELEMENT section (title,paragraph)gt
  • lt!ELEMENT biography (title,paragraph)gt
  • lt!ELEMENT criticalessay (title,section)gt
  • lt!ELEMENT paragraph (PCDATAkeyword)gt
  • lt!ELEMENT title (PCDATAkeyword)gt
  • lt!ELEMENT keyword (PCDATA)gt

48
DTD Declarations (1)Element type declaration
  • Each element type includes a name, content, and
    possibly a set of attributes
  • A document can contain many conforming elements
    of that type
  • Sequence ordered list of components (,)
  • Choice alternative components ()
  • Components may be optional (?)
  • Components may be required and repeatable ()
  • Components may be optional and repeated ()
  • Mixed-content declarations must include PCDATA ,
    parsed character data (i.e., text) as their first
    member

49
DTD Declarations (2)Attribute List Declarations
  • Much more variation here ?
  • String type attributes (CDATA) virtually
    unconstrained text strings
  • Enumeration attributes require a list of options
    to pick from
  • Attribute defaults
  • REQUIRED, required
  • IMPLIED, optional
  • FIXED "value", a fixed value,
  • "value", a default but overridable value
  • Usage
  • ltELEMENT-NAME NAME"value"gt

50
An Attribute List Example
  • lt!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)gtlt
    !ATTLIST MEMO importance (HIGHMEDIUMLOW)
    "LOW"gtlt!ELEMENT TO (PCDATA)gtlt!ELEMENT
    FROM (PCDATA)gtlt!ELEMENT SUBJECT
    (PCDATA)gtlt!ELEMENT BODY (P)gtlt!ELEMENT P
    (PCDATA)gtlt!ELEMENT SIGN
    (PCDATA)gtlt!ATTLIST SIGN signatureFile CDATA
    IMPLIED email CDATA
    REQUIREDgt

51
XML Writing
  • Add an XML declaration
  • Valid XML documents must include the appropriate
    DTD
  • either as a set of internal definitions, or
  • lt!DOCTYPE NAME SYSTEM definitions gt
  • as a reference to an external DTD file,
  • lt!DOCTYPE NAME SYSTEM "file gt
  • or both simultaneously
  • lt!DOCTYPE NAME SYSTEM "file definitions gt
  • DTD enables the parser to check validity of the
    document (errors are NOT permitted!)

52
Writing and Parsing Valid XML
  • First suggestion use a specialized editor
  • Lots of choices, some of which are free ?
  • Second suggestion use a validating parser
  • Again, lots of choices are available, mostly in
    Java, some in C, perl, JavaScript
  • IE5 includes an XML parser (not quite up to the
    standard, yet)
  • XML interfaces to be included in standard DBMS
    systems Oracle, DB2, MS SQL Server

53
SMIL
  • Synchronized Multimedia Integration Language
  • based on XML specification, endorsed by W3C
    http//www.w3.org/TR/PR-smil
  • integration of a set of independent media objects
    into a synchronized presentation
  • enables authors to describe
  • temporal behavior of a presentation
  • spatial layout of the presentation
  • hyperlinks between media objects

54
Basic elements of a SMIL specification
  • smil element can have an id attribute, and it can
    contain body and head children elements
  • head contains information not related to temporal
    behavior
  • head can contain the following children layout,
    switch (but not both), and meta (zero or more)
  • layout determines how the elements in the body
    are positioned on an abstract rendering surface
    (audio or visual)
  • if no layout is specified, the rendering is
    implementation dependent
  • Alternative layouts specified with a switch
    element

55
Basic elements (III)
  • each element has an id and a type
  • element type specifies the layout language used
    in the layout element (default
    text/smil-basic-layout)
  • the default type information contains region and
    root-layout elements
  • non-default type information is simply character
    data
  • SMIL basic layout is a subset of the visual
    rendering model
  • only positionable media object elements are
    controlled by the SMIL basic layout

56
A region example
  • A text element is set to a 5 pixel distance from
    the top border of the rendering window
  • ltsmilgt ltheadgt ltlayoutgt ltregion
    id"a" top"5" /gt lt/layoutgt lt/headgt
    ltbodygt lttext region"a" .../gt
    lt/bodygtlt/smilgt

57
Meta attributes
  • define properties of a document
  • each meta element specifies a single
    property/value pair
  • the list of properties is open-ended
  • authoring tools should ensure that all meta
    elements have a title with meaningful description
  • information related to temporal and linking
    behavior of the document
  • Parallel/sequential playback of the children
  • Complex synchronization possible
  • Synchronization alternatives possible

58
Hyperlinking elements
  • navigational links between elements
  • links are unidirectional and single-headed
  • SMIL supports name fragment identifiers and the
    '' connector (just like HTML
    http//foo.com/some/pathanchor1)
  • the a element used as in HTML associates a link
    with a complete media object only
  • New link (presentation) can replace the old one
  • New link (presentation) can be added to the old
    one
  • New link (presentation) can pause the old one

59
Summary
  • XML is HTML done right
  • Widespread use in many areas web publishing,
    document processing, multimedia, B2B electronic
    commerce
  • Tools added daily
  • Database connection crucial for success

60
XML links
  • www.w3c.org
  • http//www.software.ibm.com/xml/
  • http//msdn.microsoft.com/xml/
  • www.xml.org
  • www.xml.com
Write a Comment
User Comments (0)
About PowerShow.com