Title: I'What is XML
1I.What is XML? XML and HTML Where does it
fit in with other markup languages? II. How
does it work? Your own private
language DTDs and schemas
XSLT Extensible style sheet transformation
language Xpath, Xlink, Xpointer,
Xforms III. How will it change the
web? Examples of XML applications
2I. What is XML? XML is Extensible Markup
Language It is a meta-language It is a
language used to create languages that can
describe data It is extensible Authors can
define their own tags and attributes that can
be easily processed and displayed across
platforms XML became a World Wide Web Consortium
(W3C) Recommendation 2/10/98, corrected
10/6/00 http//www.w3.org/TR/REC-xml
3Phase 1 Began 6/96, ended in the W3C XML 1.0
Recommendation, 2/98 (revised 10/00) Phase 2
Began 2/98 Working Groups developed
Recommendation Namespaces in XML (1/99) and
Recommendation for style sheet linking
(6/99) Phase 3 Began 9/99, with unfinished work
from phase 2 and ended 5/02 Introduced a Working
Group on XML Query XML Protocol Activity was
launched in 9/00 Phase 4 Began 5/02, focus on
completing work in progress, cleaning up existing
specs, and aligning them better with each other
and with other W3C specifications http//www.w3.or
g/XML/Activity
4So whats wrong with HTML? Its simple enough for
children to use This is because it is rigid and
inflexible It does a good job representing the
structure and format of documents It cant tell
us anything about the meaning of documents It can
be used across platforms It is rife with
proprietary markup It can be searched The
inability of search engines to capture the
meaning of content leads to poor performance
5So whats right with XML? It should be easily
usable over the Internet Web servers should
require minimal configuration changes to be
able to serve XML documents It should be easy
to write programs that process XML
documents Experimental XML software is written
in Java, with some XML parsers contained in
class files of a few KB XML documents should be
human-legible and clear Users of XML can create
their own tags and attributes with
self-explanatory names An XML file should as
readable as plain text
6The XML standard should be prepared quickly The
design of XML shall be formal and concise Syntax
descriptions in XML specification use a formal
grammar that is concise, easy to understand,
and easy to translate into code XML documents
shall be easy to create Well-formedness enables
you to quickly mark up any document or
translate it from HTML to XML Terseness in XML
markup is of minimal importance Clear and
unambiguous syntax is always given preference
over saving a few keystrokes
7XML is used to create specialized markup
languages by defining sets of tags and attributes
It is a subset of SGML and allows generalized
markup It is useful for storing structured
data that will be published in a variety of
media By itself, XML does not define any tags
You create your own tags (your own markup
language) CML Chemical Markup Language
MathML Mathematical Markup Language ebXML
Electronic Business Markup Language Properly
done, XML documents can be viewed across
platforms
8XML describes data in a human readable and
machine understandable format This format is
intended to capture the meaning of the
data There is no indication of how the data
are to be displayed It is a database-neutral and
device-neutral language Data marked up in XML
can be targeted to different formats XML can
also be used to publish data on different
platforms
9Some relationships among markup languages
SGML
XML
HTML
CML
HTML 3.2
XSLT
ebXML
HTML 4.01
CSS
MML
XHTML
10How XML supports other Web markup languages and
applications
http//www.w3.org/XML/Activity
11- What is XML?
- XML and HTML
- Where does it fit in with other markup
languages? - II. How does it work?
- Your own private language
- DTDs and schemas
- XSLT Extensible style sheet
transformation language - Xpath, Xlink, Xpointer, Xforms
- III. How will it change the web?
- Examples of XML applications
12II. How does it work? An XML document us actually
composed of three different files 1. The raw XML
file (.xml) This file has the basic data marked
up with XML tags It will contain markup that
will link the file to both the DTD(or schema)
and the XSL stylesheet It must follow certain
rules to be considered well formed and
valid This is necessary if the document is to
be displayed by a browser or parser
13Here's a simple HTML document lthtmlgt ltheadgt
lttitlegtMemo formlt/titlegt lt/headgt ltbodygt
ltbgt4.10.01lt/bgtltbr /gt ltbgtTOlt/bgt
John Doeltbr /gt ltbgtCClt/bgt Jane Doeltbr
/gt ltbgtFROMlt/bgt Bozo T. Clownltbr /gt
ltpgtPlease take note our phone number
has changed.lt/pgt ltpgtYours in
clownitude,ltbr /gt Bozolt/pgt lt/bodygt lt/htmlgt
14XML reflects the structure of the data by
creating tags identifying The type of document
as a ltmemogt Its content divisions a ltheadergt
and a ltmemotextgt When it was sent ltdategt An
addressing scheme with two types of actions
lttogt and ltccgt The sender of the message as
ltsendergt The name of the recipient as
ltnamegt The text of the memo ltmemotextgt The
signature as an entity called sig
15lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
MEMO SYSTEM "http//www.site.com/dtds/memo.dtd"gt
ltmemogt ltheader typeinformativegt ltdategt04.1
0.01lt/dategt lttogt To ltnamegtJohn
Doelt/namegt lt/togt ltccgt CC ltnamegt
Jane Doe lt/namegt lt/ccgt ltfromgt From
ltsendergtBozo T. Clownlt/sendergt lt/fromgt
lt/headergt ltmemotextgt Please take note our
phone number has changed. sig lt/memotextgt
lt/memogt
Heres the same document as an XML file
16Rules for writing XML There must be a root
element Documents must be well
formed Elements must be properly nested If a
DTD is used, documents must be valid Markup on
the document must conform to the DTD Every tag
must be closed Empty tags are closed with a
slash ltpicture /gt XML is case sensitive All
attribute values must be in quotation marks All
entity references must be declared in a DTD
before being used in a document
172. A Document Type Definition (DTD) It is a set
of rules that defines the tags, elements,
entities, attributes and other elements that can
be used in XML files It determines how they can
be used It also specifies how they are logically
related Elements in a DTD are hierarchical
and nested DTDs can be internal (within the
document) or external (.dtd extension) For the
XML document to be valid, it must conform to
the rules laid out in the DTD to which it is
linked
18DTDs have Elements These are the basic tags used
in the markup One must be a root element and
is the most inclusive container All other
elements are nested with it An element can be
defined by using other elements It can also be
defined as containing text (PCDATA) The
sequence determines the nesting Elements
defined in the DTD must appear in the
document There is special markup that allows
choice
19The generic form of an element is lt!ELEMENT
element_name rulegt The rule is the content
model of the element It specifies the nested
elements used to define the main element
It also specifies the order in which the elements
must appear In our example the root element
is ltmemogt It is defined in terms of ltheadergt and
ltmemotextgt It is written as lt!ELEMENT memo
(header, memotext)gt
20DTDs have Attributes These contain additional
information associated with the element The
information is a form of metadata It is
about the element rather than part of the
element They are useful for enumerated data (ex
product id ) There is a small predefined set of
attributes that can be used Attributes and
their values appear in the opening tag of a
paired tag (or in the unpaired tag)
21The generic form of an attribute is lt!ATTLIST
element_name attribute_name
attribute_type default_value attribute_name
attribute_type default_value attribute_name
attribute_type default_valuegt The element name
is required because attributes must be attached
to elements There is a set of attribute types
that can be used to specify categories of content
(for example) CDATA Character data (anything
except markup) ID unique value (only appear
once in a document) NOTATION provides
processing instructions (how to open a binary
file)
22In our example there is an attribute called
type that is placed in the opening ltmemogt
tag The value is informative Assume this is
one of several types of memos that could be
sent In a DTD, it might look like
this lt!ATTLIST memo (informative directive
scheduling) The (pipe) is a separator It
sets a condition where one one value from the
sequence may appear in the document markup
23Entities provide a type of shorthand in XML
markup They reference text or other elements and
call them when used in the DTD or
document General entities place data into the
document Internal means that they are used
only within the document External means
that they are in an external DTD and can be
reused Parameter entities are used in the DTD
They can refer to another element or group of
elements and can be reused in the same or
different DTDs
24The entity has the generic form lt!ENTITY
entity_name text stringgt In the example, it
appears in the DTD as lt!ENTITY sig Yours in
clownitude, Bozogt In our example, we represented
a text string with an entity Yours in
clownitude, Bozo was represented in the
document with sig The entity is expanded
when the document is parsed This is a convenient
way to include large blocks of text that only
have to be entered once
25Heres what a DTD (memo.dtd) would look like for
this memo lt!ENTITY sig Yours in clownitude,
Bozogt lt!ELEMENT memo (header,
memotext)gt lt!ELEMENT header (date, to, cc?,
from)gt lt!ATTLIST header type
(informative directive scheduling)gt lt!ELEMENT
date (PCDATA)gt lt!ELEMENT to (name)gt lt!ELEMENT
name (PCDATA)gt lt!ELEMENT cc (name)gt lt!ELEMENT
from (sender) lt!ELEMENT sender
(PCDATA)gt lt!ELEMENT memotext (PCDATA)gt
must appear at least once or many times ?
may be omitted or can appear once may be
omitted or can appear many times one or the
other but only one may appear PCDATA text
26Schemas XML Schema are an alternative to
DTDs DTDs are global, so an element can only
be defined once This is a problem if the
element is used differently in two different
contexts Schemas allow global (the same
everywhere) and local (differ in different
contexts) elements DTDs cannot specify the data
type of an element Schemas can specify data
types DTDs are not written in XML Schemas are
27Schemas divide content into two types Simple
types These contain only text In DTDs these
are represented by the attribute_type
PCDATA (a name, integer, date) Complex
types These elements define the structure of
the document Some will contain other
elements Some will contain elements and
text Some will contain only text Some
will be empty
28lt?xml version1.0?gt ltxmlschema
xmlnsxsdhttp//www.w3c.org/2000/10/XMLSchemagt
ltxsdelement namename typexsdstringgt ltxs
dcomplexType namememogt ltxsdsequencegt
ltxsd complexTypeheadergt ltxsd
element namedate type xsddategt ltxsd
complexTypetogt ltxsdelement
refname/gt lt/xsdcomplexTypegt ltxsd
complexTypeccgt ltxsdelement
refname/gt lt/xsdcomplexTypegt ltxs
d complexTypefromgt ltxsd
elementsender type xsdstringgt
lt/xsdcomplexTypegt ltxsdelement
namememotext type xsdstringgt lt/xsd
sequencegt ltxsdattribute nametype
valueinformative directive
schedulinggt lt/xsdcomplextypegt lt/xsdschemagt
Here is the memo DTD as a schema
29lt?xml version1.0?gt ltmemo
xmlnsxsihttp//www.w3c.org/2000/10/XMLSchema-in
stancegt ltxsinoNamespaceSchemaLocation/xml/ns/m
emo.xsdgt ltmemogt ltheader typeinformativegt lt
dategt04.10.01lt/dategt lttogt To
ltnamegtJohn Doelt/namegt lt/togt ltccgt
CC ltnamegt Jane Doe lt/namegt
lt/ccgt ltfromgt From ltsendergtBozo T.
Clownlt/sendergt lt/fromgt lt/headergt
ltmemotextgt Please take note our phone number
has changed. sig lt/memotextgt lt/memogt
Here is how the memo calls the schema
303. An XSL stylesheet This file contains
transformation rules that determine how the
components of an XML file will be rendered and
displayed in a range of formats (.xsl extension)
With XSL-FO, specific formatting or style rules
can be applied to specific components of a DTD
This language is not supported by any browsers
yet With XSLT, a transformation process can be
specified to convert XML documents into other
formats (HTML, RTF, LaTeX, text) This can be
used An XSL stylesheet is also an XML document
and must be "well formed"
31The process begins with an XML document and an
XSLT style sheet The XSLT parser translates both
into trees The XML document is the source
tree The XSLT style sheet is the style
tree Trees consist of nodes Root node Element
nodes Text nodes Attribute nodes Processing
instruction nodes Namespace node The XSLT
processor uses these trees to create a result
tree This becomes the final or result document
32XML Memo as a source tree
Memo
Header
Memotext
Date
To
From
CC
PCDATA
PCDATA
Name
Name
Sender
PCDATA
PCDATA
PCDATA
33And heres what the XSL stylesheet might look like
lt?xml version1.0gt ltxsl stylesheet
xmlnshttp//www.w3c.org/1999/XSL Transform
version1.0gt ltxsl template match/gt lthtmlgt
ltheadgt lttitlegtMemo formlt/titlegt lt/headgt ltbodygt
ltxsltemplate matchheadergt ltbgtltxslapply
-templates selectdate /gtltbgtltbr /gt
ltbgtltxslapply-value-of selectto/name
/gtltbgtltbr /gt ltbgtltxslapply-value-of
selectcc/name /gtltbgtltbr /gt ltbgtltxslapply-valu
e-of selectfrom/sender /gtlt/bgt
lt/xsltemplategt ltpgtltxslapply value-of
selectmemotextgtlt/pgt ltpgtltxslapply value-of
selectsiggtlt/pgt lt/bodygtlt/htmlgt lt/xsltemplat
egt lt/xslstylesheetgt
34There are other components of XML that greatly
extend its power and flexibility Xpath This is a
syntax that locates nodes in the hierarchical
structure of an XML document It is used in
XSLT ltxsltemplate matchnode_namegt This
specifies the current node It uses patterns
these can be repeated throughout the
document It also uses expressions these are
context specific This syntax is a sophisticated
shorthand to use when writing processing
instructions
35Xlink This is extensible linking language It
allows more complex type of linking Heres a
simple link ltlogo xlinktypesimple
xlinkhref../images/logo.gif
xlinkroleimage xlinktitlelogo
xlinkshowembedded
xlinkactuateonload /gt Xlink defines
linksets or extended links A set of files can
be connected through a chain of links moving
from the first to the last file in the linkset
replace new
onLoad
36Xpointer This is a syntax for linking to specific
locations within XML documents It uses Xpath
expressions to define the locations xpointer(ele
ment_nameposition()1) This is appended to the
end of a URL in an Xlink expression Xforms This
is a subset of XML that is going to be used
someday to allow more complex forms to be
created in XHTML
37- What is XML?
- XML and HTML
- Where does it fit in with other markup
languages? - II. How does it work?
- Your own private language
- DTDs and schemas
- XSLT Extensible style sheet
transformation language - Xpath, Xlink, Xpointer, Xforms
- III. How will it change the web?
- Examples of XML applications
38III. How will it change the web? XML has
interesting potential to change a portion of the
web It is expected to move us closer to write
once display anywhere (XSLT) It will be an
important component of the semantic
web Search engines that can process XML
should be much more precise and return more
relevant results It can improve business
processes, particularly if professions develop
their own markup languages
39Examples of XML applications Resource Description
Framework (RDF) This is a framework that allows
the description and interchange of
metadata Because it is designed to be platform
independent, it becomes a hub for metadata
activity RDF provides a model for metadata, and
a syntax so that independent parties can
exchange it and use it RDF makes it possible to
use multiple pieces of software to process the
same metadata It also allows a single piece of
software to process (at least in part) many
different metadata vocabularies
40Extensible Hypertext Markup Language
(XHTML) Synchonized MultiMedia Markup Language
(SMIL) Math Markup Language (MathML) Chemical
Markup Language (CheML) Commerce Markup Language
(CML) Electronic Business XML (ebXML) National
Library of Medicine XML Data formats Electronic
Component Information Exchange (ECIX) Geography
Markup Language (GML) Research Information
Exchange Markup Language (RIXML) MARC to XML
conversions