Title: Why XML
1Why XML ?
Problems with HTML HTML design -
HTML is intended for presentation of information
as Web pages. - HTML
contains a fixed set of markup tags. This
design is not appropriate for data -
Tags dont convey meaning of the data inside the
tags. - Tags are not extensible.
2Motivation for XML
- The need for a more SGML-like markup language for
representing - metadata.
- The XML design
- - Separates syntax (structural representation)
from - semantics - and only considers syntax.
- - There is no fixed set of markup tags -
we may define our - own tags, according to the information.
- The objective is to have XML as the universal
format for structuring - information.
- A formal specification of XML can be found at
- http//www.w3.org/TR/2000/REC-xml-20001006
3Simple XML Example
ltBookstoregt ltBook ID101gt ltAuthorgtJohn
Doelt/Authorgt ltTitlegtIntroduction to
XMLlt/Titlegt ltDategt12 June 2001lt/Dategt
ltISBNgt121232323lt/ISBNgt ltPublishergtXYZlt/Publishe
rgt lt/Bookgt ltBook ID102gt ltAuthorgtFoo
Barlt/Authorgt ltTitlegtIntroduction to
XSLlt/Titlegt ltDategt12 June 2001lt/Dategt
ltISBNgt12323573lt/ISBNgt ltPublishergtABClt/Publisher
gt lt/Bookgt lt/Bookstoregt
Make up your own tags
Sub-elements (properties) of Book
- XML by itself is just hierarchically structured
text - We need some sort of grammar (for a Book in
the example) to check for - correctness
- A stylesheet is needed to define how the data
will be shown
4DTD Document Type Definition
It is used to describe certain classes of XML
documents (similar to grammars for other
languages), and thus enforces constraints on the
structure of the XML document. A Valid XML
document is one, which conforms to a given DTD.
Example DTD
lt!ELEMENT BookStore (Book)gt lt!ELEMENT
Book (Title, Author, Date, ISBN,
Publisher)gt lt!ATTLIST Book ID REQUIREDgt lt!ELEME
NT Title (PCDATA)gt lt!ELEMENT Author
(PCDATA)gt lt!ELEMENT Date (PCDATA)gt lt!ELEMENT
ISBN (PCDATA)gt lt!ELEMENT Publisher (PCDATA)gt
The Bookstore element can have 0 or more Books
Sub-Elements have to occur in the order specified
implies one or more Authors
Attribute ID of element Book is a required
field
Title is a Parsed Character Data i.e. string.
5Problems with XML DTDs
- It's not XML syntax
- You write your XML document using one syntax and
the DTD using another syntax -gt inconsistent,
more work for the parsers. - Limited set of primitive datatypes
- Desire a set of datatypes compatible with those
found in databases - One of the main weaknesses of DTD is its lack of
support for data types beyond character strings
(PCDATA). - Limited support for applying constraints.
- Can support only constraints like (1 or more
occurences), ? (0 or 1 occurences), (0 or
more occurences), etc. No facility for providing
constraints like those found in databases
(enumerations, ranges, string length, etc.)
6XML Schema
- XML Schemas are an improvement over DTDs
- Enhanced datatypes.
- Wider range of primitive data types, supporting
those found in databases (string, boolean,
decimal, integer, date, etc.) - Can create your own datatypes (complexType)
- Written in the same syntax as XML documents.
- Object-oriented
- Can derive new type definitions on the basis of
old ones (refinement) - Can have further constraints on the range of data
values. - Examples maxlength, precision, enumeration,
maxInclusive (upper bound), minInclusive (lower
bound), encoding (applies only to binary)
7An important diversion Namespaces
- What is a Namespace ?
- The Namespace of an element, is the scope within
which, it (and thus its name) is valid. (Ex. A
basic block in C) -
- Why do we need Namespaces ?
- If elements were defined within a global scope,
it becomes a problem when combining elements from
multiple documents. Name collision is hard to
avoid. - Modularity If such a markup vocabulary exists
which is well understood and for which there is
useful software available, it is better to reuse
this rather than make it again. - Namespaces in XML
- An XML namespace is a collection of names,
identified by a URI reference. - Names from XML namespaces may appear as
qualified names, which contain a single colon,
separating the name into a prefix and a local
part. The prefix, which is mapped to a URI
reference, selects a namespace
8XML Schema example revisited
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsdhttp//www.w3.org/2001/XMLSchema
targetNamespace"http//www.books
.org" xmlnshttp//www.book
s.orggt ltxsdelement
name"Bookstore"gt ltxsdcomplexTypegt
ltxsdsequencegt
ltxsdelement ref"Book" minOccurs"1"
maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccursunbounded/gt
ltxsdelement ref"Date" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"ISBN" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Publisher" minOccurs"1"
maxOccurs"1"/gt lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdDate"/gt
ltxsdelement name"ISBN" type"xsdinteger"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
Prefix xsd refers to the XMLSchema namespace
xmlns refers to the default namespace
Defining the Element Bookstore as a complex
Type
Containing a sequence of 1 or more Book
elements
When referring to another Element, use ref
The Author can be 1 or more
Element definitions
Notice the use of more meaningful data types
9XSL XML Stylesheet Language
An Example
lt?xml version'1.0'?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/TR/WD-xsl"gt
ltxsltemplate match"/"gt lthtmlgt ltbodygt
lttable cellpadding"2" cellspacing"0"
border"1" bgcolor"FFFFD5"gt lttrgt
ltthgtTitlelt/thgt
ltthgtAuthorlt/thgt ltthgtDatelt/thgt
ltthgtISBNlt/thgt lt/trgt
ltxslfor-each select"Bookstore/Book"gt
lttrgtlttdgtltxslvalue-of select"Title"/gtlt/tdgt
lttdgtltxslvalue-of select"Author"/gtlt/
tdgt lttdgtltxslvalue-of
select"Date"/gtlt/tdgt
lttdgtltxslvalue-of select"ISBN"/gtlt/tdgt
lt/trgt lt/xslfor-eachgt lt/tablegt
lt/bodygt lt/htmlgt lt/xsltemplategt
lt/xslstylesheetgt
xsl prefix refers to the XSL namespace
Match the Root Element
Go through Each Book Element (inside a
Bookstore Element)
What you print out, on a Match of the root element
And, print out their Title, Author, Date, and ISBN
Result (Notice, that some fields have been
filtered out from the XML file)
10Tools/Software
XML Spy By far, the most comprehensive editor.
Handles XML files, DTDs, XSL files, as well as
XSD (XML Schema). Unfortunately only a 30 day
trial version. http//www.xmlspy.com/download.html
XML Notepad Microsoft XML Notepad is a simple
application for building and editing small sets
of XML-based data. Freeware. http//msdn.microsoft
.com/xml/notepad/download.asp XML Pro XML Pro is
a top-notch XML editor but it doesnt include as
many features as XML Spy. Shareware. http//www.ve
rvet.com/demo.html You can also validate your
XML files by just opening them with IE5.0 or
above. It checks if the XML file is well-formed
or not, and also validates against a DTD (if
specified on the DOCTYPE declaration Some nice
short Tutorials on XML/XSL/DTD/XML Schemas can be
found at www.w3schools.com