Title: Languages of the Web: Background
1Languages of the Web Background
- David Cornforth
- School of ITEE
- University of NSW,
- Australian Defence Force Academy
2Summary
- History of typesetting and markup shows
- Technology becomes redundant - we need something
that can grow, or be extended - Documents contain data - how can we make it
accessible? - Data and format are different things
- XML could be the answer?
3Why XML?
- Its extensible - create your own tags
- Incorporates meta-data (self-describing)
- It separates data from formatting information
(data can be portable)
4Separate data, formatting
Source data Customer Smith bought 56 units at
45.80 each Customer Patel bought 34 units at
39.50 each Customer Wicks bought 23 units at
47.50 each
5How did we get here document processing, a brief
history
- Writing invented around 4000 BCE - pictographic
(Egyptian, Chinese) - Alphabet invented E Mediterranean and Mesopotamia
around 2000-1500 BCE - Printing with moveable type invented around 1000
CE in China - Made really practicable by Gutenberg, Germany,
around 1450 CE
6Markup languages brief history
- Pre-computer
- Markup conventions, for denoting the formatting
of documents have existed since at least the
start of the printing industry - Early computer markup
- The first computer markup conventions were based
on hand markup systems - Commands could have parameters - ltpointsize,10gt -
but users could not define macros - Why do we need macros?
7Markup languages brief history
- ROFF/NROFF/TROFF (1970s)
- A series of DEC-based lineprinter/ typesetter
formatting systems - Often still used in UNIX documentation
- .\" _at_()Golem1.oalpha.1 1.25 89/03/26 SMI
- .TH GOLEM1.0Alpha 1 "October 26, 1990"
- .SH NAME
- golem \- first order induction algorithm based on
relative least general - generalisation
- Supported limited macroing
8Markup languages brief history
- COMTEXT / XICS (1970s)
- Early CSIRO / Xerox markup language
- ltbd/cm/pt,18gtGOLEM1.0Alphaltlt/ju/pt,12gt
- supported limited macroing, but without
parameters - lthdbd/cm/pt,18gt
9Markup languages brief history
- SCRIBE
- The first concerted attempt at high-level rather
than layout markup - AImed at academic documents papers, theses
- High level description of document, combined with
a separate style definition defining how the
document is to be layed out
10Markup languages brief history
- TeX / AMSTeX
- TeX highly flexible programming language built
on top of a markup language - Basic operations low level, but macro packages
can be defined - AMSTeX (for the American Mathematical Society)
the first major macro package highly
sophisticated, but relatively layout-dependent
11- \hrule
- \vskip 1in
- \centerline\bf A SHORT STORY
- \vskip 6pt
- \centerline\sl by A. U. Thor !??!
(modified) - \vskip .5cm
- Once upon a time, in a distant galaxy called
\"O\"o\c c, there lived a computer named R.J.
Drofnats. - Mr.Drofnats---or R. J.,'' as he preferred to
be called--- error has been fixed! - was happiest when he was at work typesetting
beautiful documents. - \vskip 1in
- \hrule
- \vfill\eject
12Markup languages brief history
- LaTeX
- Combined many of the ideas of TeX and Scribe a
generic macro package, permitting high-level
document descriptions to be built on top of TeX - Still widely used in scientific literature
13- \documentstyle12ptarticle
- \chapterThe Formation of Hierarchical Structures
in a Pseudo-Spatial Co-Evolutionary Artificial
Life Environment - \author David Cornforth
- \date8 March 2005
- \sectionIntroduction
- One of the key questions in the study of
Artificial Life is to understand "open-ended
complexity". That is, how do increasingly complex
structures and behaviour arise in natural
systems? In particular, is it possible to capture
this phenomenon within a simulation model 1?
14Markup languages brief history
- Xanadu
- Ted Nelson, 1960 - present
- A full-scale implementation of hypertext,
including multi-way links, version control,
licensing mechanisms - One of the ancestors of html/www
- http//www.xanadu.net/
15Markup languages brief history
- Dynabook
- Alan Kay, late 1960s - 1970s (Xerox PARC)
- First seriously GUI-based hypertext system
- Click on links and other metaphors now embedded
in the www - http//ftp.sheridanc.on.ca/randy/design.dir/softw
are.dir/alan_kay.htm
16Markup languages brief history
- Hypercard
- Bill Atkinson, 1987
- A graphic programming environment as much as a
hypertext system - Nevertheless the first widely-available GUI-based
hypertext environment
17Markup languages brief history
- Office Document Architecture (ODA)
- Initially an IBM internal effort. Initiated as
part of the ISO OSI standards effort in the early
1980s, and issued as ISO standard 8613 in 1987.
It defines a language for describing the
structure of office documents. ODIF specifies the
exact format to be used for the interchange of
documents between computer systems. - Distinguishes between a document's LOGICAL
structure and its LAYOUT structure. The logical
structure associates the content of the document
with a hierarchical tree of logical objects,
whereas the layout structure associates the same
content with a hierarchical tree of layout
objects. - (ISO International Organization for
Standardization)
18Markup languages brief history
- Generalised Markup Language (GML)
- Also initially an IBM internal effort, initiated
by Charles Goldfarb. Concentrated on the
definition of logical structure, with an emphasis
on language extensibility - Migrated out of IBM to become the ISO SGML
standard
19Markup languages brief history
- Standard Generalised Markup Language (SGML)
- Became an ISO standard (8879) in 1986
- Initially very slow to gain acceptance
- Nevertheless the root of almost all important
markup developments since that date
20- ltchaptergt
- ltchaphdgt SGML tags lt/chaphdgt
- ltchapabsgt In this chapter, the various properties
of SGML tags are investigated in monotonous
detail. lt/chapabsgt - ltchaptxtgt ltparagt Tags are the basic markup
elements of SGML, .....lt/paragt - ltfig id"tagfig"gt
- ltfigbodygt
- ltartwork depth"24p"gt lt/artworkgt
- lt/figbodygt
- ltfigcapgt A Picture of Pure Monotony lt/figcapgt
- lt/figgt
- ltparagt Yet more monotonous detail about SGML tags
lt/paragt
21Markup languages brief history
- Hypertext Markup Language (HTML)
- Designed by Tim Berners-Lee At CERN in the late
1980s - Originally intended as a means of communication
between physicists - Intended to be an SGML instance language, though
mistakes in the original design and a poor
separation of logical and layout structures meant
that it was originally non-compliant - Led to a number of HTML-specific browsers (lynx,
mosaic, Netscape, Internet Explorer etc - highly
efficient, but unusable for other SGML)
22- ltHTMLgtltHEADgt
- ltMETA HTTP-EQUIV"Content-Type"
CONTENT"text/html charsetiso-8859-1"gt - ltTITLEgtSGMLlt/TITLEgtlt/HEADgt
- ltBODYgt
- ltBgtltFONT FACE"Helvetica" SIZE7gtltP
ALIGN"CENTER"gtFurther High Level Representation
Standardslt/Pgt - lt/FONTgtltFONT FACE"Helvetica" SIZE6gtltPgtDSSSLlt/Pgt
- lt/Bgtlt/FONTgtltFONT FACE"Helvetica" SIZE5gtltP
ALIGN"JUSTIFY"gt(Document Style, Semantics and
Specification Language)lt/Pgt
23Markup languages brief history
- Document Style, Semantics and Specification
Language (DSSSL) - The other half of the SGML logic/layout
separation - A language for specifying the layout of documents
based on their logical (SGML) structure also
provides a mechanism for transforming from one
SGML language to another - Standardised in the late 1990s
- Syntax and semantics heavily based on Scheme
- But because of SGMLs flexibility, actually an
SGML compliant language
24- (element note
- (make sequence
- (make paragraph font-size 14pt
- font-weight bold
- (literal "Warning "))
- (make paragraph font-size 12pt
- font-weight medium
- (process-children))
- (make paragraph font-size 14pt
- font-weight bold
- (literal "You have been warned! "))))
25Markup languages brief history
- Extensible Markup Language (XML)
- An SGML sub-language
- Restrictions intended to
- retain most of the flexibility of SGML
- Approximate the processing cost of HTML
- Not require a document type definition for
tokenisation (though obviously full parsing
requires a DTD or schema)
26- lt?XML version"1.0"?gt
- lt!DOCTYPE play PUBLIC "-//Free Text Project//DTD
Play//EN"gt - ltPLAYgt
- ltTITLEgtThe Tragedy of Hamlet, Prince of
Denmarklt/TITLEgt - ltfmgtltpgtText placed in the public domain by Moby
Lexical Tools, 1992.lt/pgt - ltpgtXML version by Jon Bosak, 1996-1997.lt/pgtltfmgt
- ltPERSONAEgt
- ltTITLEgtDramatis Personaelt/TITLEgt
- ltPERSONAgtCLAUDIUS, king of Denmark. lt/PERSONAgt
- ltPERSONAgtHAMLET, son to the late, and nephew to
the present king.lt/PERSONAgt
27Markup languages brief history
- Extensible Style Language (XSL)
- The XML analogue of DSSSL
- However XMLs inflexibility means that the syntax
has to be standard XML syntax (ie fairly crude
for a programming language) - Provides both layout and transformation
facilities of XML
28- ltxslstylesheetgt
- lt!--Here is the pattern part --gt
- ltxsltemplate match greetinggt
- lt!--Here is the action part --gt
- ltfoblock color red font-size16ptgt
- ltprocess-children/gt
- lt/foblockgt
- lt/xsltemplategt
- lt/xslstylesheetgt
29Why XML?
- SGML provides all the flexibility needed for the
web, but - Processing costs are unacceptable
- SGML documents cannot be processed without their
defining DTDs, which may be many times larger
than the documents themselves, so communication
costs may also be unacceptable
30Why XML?
- HTML provides the speed and DTD-independence
needed for the web, but - Too generic - syntax cannot match semantics
- Thus of limited usefulness in communicating
structured information
31Why XML?
- XML provides a compromise between SGML and HTML
- A generic language like SGML
- Restricted syntax (eg no omitted end tags, single
character set) permits unambiguous tokenisation
even without a DTD - Restricted syntax also permits HTML-like
processing speeds
32What will XML do for us?
- Search and display of structured data
- Ie the obvious extension of the uses of HTML
- With better (ie structured) searching
capabilities - Standardisation of data within organisations
- The structuring capabilities of XML mean that
organisations can standardise the structure of
their information through schemas - Standardisation of data between organisations
- Entire industries can standardise on
data-exchange formats, enabling new levels of
e-commerce service
33What will XML do for us?
- Encapsulation of metadata
- XML has permitted the definition of extended
languages to encode structured metadata - The semantic web becomes a real possibility
- Standardisation of heirarchical databases
- The standardisation of SQL led to the recent
dominance of relational databases - The standardisation permitted by XML is leading
to a major resurgence of heirarchical data storage
34What will XML do for us?
- Reduction of Middleware
- XML solutions provide a real alternative to much
of what is currently done in middleware - We can expect to see a massive reduction in the
use of programming solutions for middleware
requirements - Enabling Diverse Enterprise Architectures
- XML standards can provide the glue enabling
diverse specialised systems to cooperate - We can expect to see such solutions providing
serious competition to monolithic enterprise
resource systems
35What will XML do for us?
- Integration of Legacy Systems
- XML solutions can provide standards for the
integration of legacy systems, provided their
interfaces can be modified to interact with XML
streams - However SGML/DSSSL may be useful to provide full
functionality - Because almost any structured data stream can be
treated as an SGML stream - DSSSL may be used to translate it into an XML
stream without any requirement to alter the
system itself
36What will XML do for us?
- Plus all the things we havent thought of
- Like SGML, XML is already being used for purposes
way beyond the wildest imaginings of its creators - E.g. Slideware (replace PowerPoint?)
37Questions?